Tuesday, February 3, 2015

More Calculated Field Analysis - Fields in Formulae

A previous post — Do you know what your Calculated Fields are? — presented a way to parse calculated fields' formulae from Tableau workbooks. It works well, but was missing something: the ability to tease the individual fields from the formulae. Doing this would make it much easier to tell what's being calculated from what.

It's been a while, but the new and improved Calculated Fields Analysis fills the gap, identifying the formulae's field references and capturing them in a CSV file.

As before, the analysis is done via a Ruby script that scans workbooks, identifies the Calculated fields, analyzes them and creates CSV files of the results. The Ruby script–TTC_CalculatedFields.rb–is below along with instructions for running it.

The Tableau Public workbook below contains the results of conducting this analysis on a sample workbook.
(I've run it on a fair number of workbooks, but this example serves well I think)

The "Calculated Fields Analyses" dashboard shows the following views of the calculated fields:

  • Calculated Fields and Formulae
    Lists each of the calculated fields located in the TWBs, along with its formula.
    The fields are organized according to the TWB and Data Source they're in, and the directory the TWB is in (in case the same TWB name occurs in multiple scanned directories).
  • Fields Referenced in Calculation Formulae
    Similar to "Calculated Fields and Formulae" the calculated fields located in the TWBs are listed here.
    The difference is that the individual Fields used in the calculated Field's formula, if any, are listed separately.
    This makes it easy to identify the formula's fields (there being no formatting control over the forumla text they can be difficult to read).
  • Fields used in Calculation(s)
    Each of the Fields used in a calculated field formula is listed along with the calculated field or fields whose formula it appears in.

Introducing TTC_CalculatedFields.rb

Here's how it works in the simplest case: you copy/paste the code below into file named oh, say "TTC_CalculatedFields.rb" — it's then a bona fide Ruby script. Then from a command prompt run it from the directory containing the TBWs you want the calculated fields from.

The usual way to run it is with

"{pathtoRuby}\Ruby" TTC_CalculatedFields.rb

It will then look in the current and sub-directories for TWBs to analyze, and then analyze them. Excepting "Tableau Calculated Fields Analyses.twb", which it the one published above.

The usual caveats apply.

  • You need to have Ruby and the "nokogiri" and "open-uri" gems installed for it to work. This is very simple and straightforward, and most organizations have someone who's already using Ruby to help if you need it.
  • It only examines TWBs, not packaged workbooks. It's mechanically not difficult to add TWBX parsing functionality it didn't get included.
  • It carries no warranty or guarantees that it'll work.

On the other hand, it should be pretty easy to address any shortcomings it has. Ruby is an elegant language that makes fixing and improving things generally simple and straightforward.


# TTC_CalculatedFields.rb - this Ruby script Copyright 2013, 2014, 2015 Christopher Gerrard require 'nokogiri' require 'open-uri' require 'set' $recNum = 0 $CSVHeader = <<CSVHEADER.gsub("\n",'') Record Number, Workbook, Workbook Dir, Data Source, Data Source Caption, Data Source Name (tech), Field Name, Field Caption, Field Name (tech), Data Type, Role, Type, Class, Scope Isolation, Formula Code, Formula, Formula Comments CSVHEADER $formFieldsRecNum = 0 def init $calculatedFields = File.open("TTC_CalculatedFields.csv",'w') $calculatedFields.puts $CSVHeader unless $calculatedFields.nil? $formulaFields = File.open("TTC_FormulaFields.csv",'w') $formulaFields.puts("Rec #,Workbook, Workbook Dir,Data Source,Field - Calculated,Field - Formula") unless $formulaFields.nil? end def parseFieldsFromFormula formula fieldSet = Set.new [] if formula =~ /\[.+\]/ then stripFrt = formula.gsub( /^[^\[]*[\[]/ , '[' ) stripBck = stripFrt.gsub( /\][^\]]+$/ , ']' ) stripMid = stripBck.gsub( /\][^\]]{2,}\[/ , ']]..[[' ) stripCom = stripMid.gsub( /\][ ]*,[ ]*\[/ , ']]..[[' ) fields = stripCom.split(']..[') fields.each { |field| fieldSet.add field} end return fieldSet end def parseCalculatedFields twbWithDir twb = File.basename(twbWithDir) return if twb == "Tableau Calculated Fields Analyses.twb" twbDir = File.dirname(File.expand_path(twbWithDir)) printf(" %-85s in %s \n", twb, twbDir) doc = Nokogiri::XML(open(twbWithDir)) calculationNodes = doc.xpath("//workbook/datasources/datasource/column/calculation").to_a # puts " - has calculations " if !calculationNodes.nil? calculationNodes.each do |d| #-- Data Source -- dsTechName = d.xpath('../../@name').text dsCaption = d.xpath('../../@caption').text dsName = if dsCaption == '' then dsTechName else dsCaption end #-- Field -- caption, = d.xpath('../@caption').text techName = d.xpath('../@name').text name = if caption == '' then techName.gsub(/^\[/,'').gsub(/\]$/,'') else caption end dataType = d.xpath('../@datatype').text role = d.xpath('../@role').text type = d.xpath('../@type').text #-- Formula -- formulaCode = d.xpath('./@formula').text.gsub(/\r\n/, ' ') formulaLines = d.xpath('./@formula').text.split(/\r\n/) formula = getFormula( formulaLines ) comments = getComments( formulaLines ) fClass = d.xpath('./@class').text scopeIsolation = d.xpath('./@scope-isolation').text #-- Fields -- formulaFields = parseFieldsFromFormula(formulaCode) emitFormulaFields(twb,twbDir,dsName,name,formulaFields) # puts "formula code:: #{formulaCode}" #-- emit CSV putCSV(twb, twbDir, dsName, dsCaption, dsTechName, name, caption, techName, dataType, role, type, fClass, scopeIsolation, formulaCode, formula, comments ) end end def getFormula lines formula = '' lines.each do |line| line.strip formula += ' ' + line unless line =~ /[ ]*\/\// end return formula.strip end def getComments lines comments = '' lines.each do |line| line.strip comments += ' ' + line.strip if line.start_with?('//') end return comments.strip end def putCSV(twb, twbDir, dsName, dsCaption, dsTechName, name, caption, techName, dataType, role, type, fClass, scopeIsolation, formulaCode, formula, comments ) $recNum += 1 csvLine = <<CSV.gsub("\n",'') "#{$recNum}", "#{twb.gsub('"','""')}", "#{twbDir.gsub('"','""')}", "#{dsName.gsub('"','""')}", "#{dsCaption.gsub('"','""')}", "#{dsTechName.gsub('"','""')}", "#{name.gsub('"','""')}", "#{caption.gsub('"','""')}", "#{techName.gsub('"','""')}", "#{dataType.gsub('"','""')}", "#{role.gsub('"','""')}", "#{type.gsub('"','""')}", "#{fClass.gsub('"','""')}", "#{scopeIsolation.gsub('"','""')}", "#{formulaCode.gsub('"','""')}", "#{formula.gsub('"','""')}", "#{comments.gsub('"','""')}" CSV $calculatedFields.puts csvLine unless $calculatedFields.nil? end def emitFormulaFields(twb,twbDir,dataSource,dsField,formulaFields) return if formulaFields.nil? formulaFields.each do |field| $formulaFields.puts("'#{$formFieldsRecNum+=1}',\"#{twb}\",#{twbDir},\"#{dataSource}\",\"#{dsField}\",\"#{field}\"") end end init Dir.glob("**/*.twb") {|twb| parseCalculatedFields twb } $calculatedFields.close unless $calculatedFields.nil?

What's Next?

Other Calculation Types
There are Tableau data elements that could that could be usefully analyzed similarly to this, e.g. bins, calculations in extracts, and quite likely others. By and large these aren't analyzed in this process for two main reasons: they're coded differently in the workbooks, so processing them would be complicated; and it's not clear how their semantics could be represented in the same way as ordinary calculated fields. It's entirely reasonable to contemplate including these in future processing and analysis.

Record vs Table Calculations
It might be useful to distinguish between record-level and table calculations. I'm not sure if it makes sense, although it feels like there would be value in it.

Diagramming Field Relationships
It would be really handy to auto-generate diagrams of the relationships between the calculated fields and the fields they operate upon. I've done some preliminary work on this and hope to have the time to get it working well before too long.

Anything else?
This area is ripe for experimentation and further developments. I'm not sure where it'll take me, but figuring out what's in workbooks is always interesting and informative.


Suggestions and code welcome.

Is there anything you'd like to see? Did you find a calculated field that doesn't parse correctly?—I expect this to happen, there's no spec for the workbooks outside Tableau and it's always a matter of working with the examples on hand.

Whatever your experience, I'd like to hear about it, so please leave a comment.

Perhaps you'll even take the code and take it in unexpected directions. If so I hope that you'll make your code available for the Tableau community to use and benefit from, and leave a comment indicating where to find it.

Or just leave a comment.

10 comments:

  1. Chris, we've created a Workbook Audit tool that extracts the Calculated Field definitions and cleans them up with their "friendly" names (on both Desktop and Server), including published datasources. I'd recommend that you check it out: http://powertoolsfortableau.com/tools/workbook-tools-suite/

    ReplyDelete
    Replies
    1. Hello Interworks.

      I'm familiar with Interworks' tools, and think they're a fine piece of work. However, it makes more sense to me to use tools that I've created myself, including the Tableau Workbook Inventory System (TWIS), which does what Interworks Workbook Audit tool does and more. TWIS also happens to be free while Interworks' tools are pretty pricey.

      Sometimes a more targeted tool is appropriate for the job at hand, and for those times I like having small, focused tools on hand. I've created a fair number of them and will be publishing them as time permits. In the meantime the tools I've already published here are available for anyone to take and customize for their particular purposes.

      I'm contemplating starting an open source Tableau tools project to help people address their Tableau monitoring and management needs without needing to adopt closed, expensive proprietary tools. Just need some spare time to lay the groundwork.

      Delete
    2. P.S. In case you didn't notice, the field names shown here are their "friendly" ones.

      Delete
  2. Hi Chris,

    Thank you for writing this script. I tried to run it on OS X Yosemite, but I get the following error:

    $ Ruby TTC_CalculatedFields.rb
    TTC_CalculatedFields.rb:182: can't find string "CSVHEADER" anywhere before EOF
    TTC_CalculatedFields.rb:9: syntax error, unexpected end-of-input, expecting tSTRING_CONTENT or tSTRING_DBEG or tSTRING_DVAR or tSTRING_END
    $CSVHeader = <<CSVHEADER.gsub("\n",'')
    ^

    Do you know what these errors mean?

    By the way, I installed nokogiri gem, but could not find 'open-uri' one. I used 'rest-open-uri' instead. https://rubygems.org/search?utf8=✓&query=open-uri

    Also I could not find 'set' gem. Which gem should I use as a replacement?

    Thank you in advance,
    Ivan

    ReplyDelete
  3. Hello Chris,

    You were talking about on implementation of TWISS in JAVA. Is it still available ?

    ReplyDelete
  4. Hello Chris,

    You were talking about an implementation of TWISS in JAVA. Is it still available ?

    Thank you !

    ReplyDelete
  5. Hey Chriss,

    Is your java version of TWISS still available ? If yes, where can I find it ?

    Thank you in advance !

    ReplyDelete
  6. Sorry for asking this silly Q'S ..how to load XSD schema in tableau.can u explain step by step process

    ReplyDelete
  7. what is steps for load the xsd schema in tableau.

    ReplyDelete
    Replies
    1. I'm not sure but I think you're under the impression that there is an xsd for Tableau workbooks, and that it's used somehow in their interpretation. As far as I know there isn't one. The tools' code 'knows' how to interpret the workbooks as xml because I've written it that way.

      Delete