Monday, September 8, 2014

Do you know what your Calculated Fields are?

or, Is my 'Profit' the same as your 'Profit'?

It's helpful to be able to identify what fields are being calculated in Tableau. From an enterprise governance perspective it's essential.

Unfortunately Tableau doesn't provide this information except through manual inspection of the fields' calculations in the calculation dialog. This is a serious deficiency, one that hinders Tableau's adoption in the Enterprise, where senior management is reluctant to let Tableau loose "in the wild" for fear of having dueling metrics without transparency.

Fortunately, there's an easy solution. This post presents a simple Ruby script I whipped up that examines Tableau workbooks and creates a CSV file containing the relevant information about the calculated fields. This CSV file is then usable with Tableau to provide the transparency necessary to tell whether or not two metrics are in fact the same.

Example: the Tableau Sample Workbooks' Calculated Fields

As an illustration of somewhat real world calculated fields, here are those found in the Tableau v8.2 sample workbooks. They range from simple to reasonably complex.

Introducing TTC_CalculatedFields.rb

Here's how it works in the simplest case: you copy/paste the code below into file named oh, say "TTC_CalculatedFields.rb" — it's then a bona fide Ruby script. Then from a command prompt run it in from the directory containing the TBWs you want the calculated fields from.

The usual way to run it is with

"{pathtoRuby}\Ruby" TTC_CalculatedFields.rb

It will then find all of the TWBs in the directory, and all the sub-directories, and record all of their calculated fields.

The usual caveats apply.

  • You need to have Ruby and the nokogiri and open-uri gems installed for it to work. This is very simple and straightforward, and most organizations have someone who's already using Ruby to help if you need it.
  • It only examines TWBs, not packaged workbooks. It's mechanically not difficult to add the functionality but it introduces complexities in handling name collisions and I wanted to keep it simple but useful.
  • There are technical properties of calculated fields that are meaningful to deeper analysis but don't mean much on the surface.
  • Groups are a specialized calculated field, but their internal structure is very different from the other types, making it difficult to add them to the properties/fields of the other classes. I decided not to force them so they show without formulae - this is a fairly common problem in analyzing Tableau workbooks and other hierarchical and network data structures, and Tableau isn't designed to support analysis of data like this.
  • It carries no warranty or guarantees that it'll work.

On the other hand, it should be pretty easy to address any shortcomings it has. Ruby is an elegant language that makes fixing and improving things generally simple and straightforward.


# TTC_CalculatedFields.rb - this Ruby script Copyright 2013, 2014 Christopher Gerrard require 'nokogiri' require 'open-uri' $recNum = 0 $CSVHeader = <<CSVHEADER.gsub("\n",'') Record Number, Workbook, Workbook Dir, Data Source Name, Data Source Caption, Data Source Name (tech), Name, Caption, Name (tech), Data Type, Role, Type, Class, Scope Isolation, Formula Code, Formula, Formula Comments CSVHEADER def init $f = File.open("TTC_CalculatedFields.csv",'w') $f.puts $CSVHeader unless $f.nil? end def parseCalculatedFields twbWithDir twb = File.basename(twbWithDir) twbDir = File.dirname(File.expand_path(twbWithDir)) puts "Found '#{twb}' in #{twbDir}" doc = Nokogiri::XML(open(twb)) calculationNodes = doc.xpath("//workbook/datasources/datasource/column/calculation").to_a calculationNodes.each do |d| #-- Data Source -- dsTechName = d.xpath('../../@name').text dsCaption = d.xpath('../../@caption').text dsName = if dsCaption == '' then dsTechName else dsCaption end #-- Field -- caption, = d.xpath('../@caption').text techName = d.xpath('../@name').text name = if caption == '' then getName(techName) else caption end dataType = d.xpath('../@datatype').text role = d.xpath('../@role').text type = d.xpath('../@type').text #-- Formula -- formulaCode = d.xpath('./@formula').text.gsub(/\r\n/, ' ') formulaLines = d.xpath('./@formula').text.split(/\r\n/) formula = getFormula( formulaLines ) comments = getComments( formulaLines ) fClass = d.xpath('./@class').text scopeIsolation = d.xpath('./@scope-isolation').text #-- emit CSV putCSV(twb, twbDir, dsName, dsCaption, dsTechName, name, caption, techName, dataType, role, type, fClass, scopeIsolation, formulaCode, formula, comments ) end end def getName techName techName.gsub(/^\[/,'').gsub(/\]$/,'') end def getFormula lines formula = '' lines.each do |line| line.strip formula += ' ' + line unless line =~ /[ ]*\/\// end return formula.strip end def getComments lines comments = '' lines.each do |line| line.strip comments += ' ' + line.strip if line.start_with?('//') end return comments.strip end def putCSV(twb, twbDir, dsName, dsCaption, dsTechName, name, caption, techName, dataType, role, type, fClass, scopeIsolation, formulaCode, formula, comments ) $recNum += 1 csvLine = <<CSV.gsub("\n",'') "#{$recNum}", "#{twb.gsub('"','""')}", "#{twbDir.gsub('"','""')}", "#{dsName.gsub('"','""')}", "#{dsCaption.gsub('"','""')}", "#{dsTechName.gsub('"','""')}", "#{name.gsub('"','""')}", "#{caption.gsub('"','""')}", "#{techName.gsub('"','""')}", "#{dataType.gsub('"','""')}", "#{role.gsub('"','""')}", "#{type.gsub('"','""')}", "#{fClass.gsub('"','""')}", "#{scopeIsolation.gsub('"','""')}", "#{formulaCode.gsub('"','""')}", "#{formula.gsub('"','""')}", "#{comments.gsub('"','""')}" CSV $f.puts csvLine unless $f.nil? end init Dir.glob("**/*.twb") {|twb| parseCalculatedFields twb } $f.close unless $f.nil?

An interesting exercise would be to round up the various workbooks that have been published about Table Calculations to see what that population looks like. Maybe something to do during the conference this week.


I hope you find this little utility useful. If so, please leave a comment. If it doesn't work, please leave a comment — I may or may not be able to help (time, other demands upon, etc.). If you extend it in some useful way, please leave a comment.

Or just leave a comment.

10 comments:

  1. I am keen on using this - unfortunately being on a corporate network, I'll need to first find and hop through various hoops to get the Ruby resources lined up to run it. Will advise once I've begun exploring with this handy-looking script.

    ReplyDelete
  2. Chris,

    A very good and useful utility.

    I once developed similar utilities for Informatica and Hyperion Brio. I did it by reading the repository databases and tables. I wanted to do the same thing with Tableau but I currently have no access to a Tableau Server repository database.

    I'm wondering whether there is another way of dissecting/extracting a .twb/.twbx. I did notice the variable "doc = Nokogiri::XML(open(twb))", which makes leads me to believe that a Tableau file is in fact an XML file. Can you conform this?

    Thanks,

    René

    ReplyDelete
    Replies
    1. René,

      You're correct. Tableau workbooks are XML files. Normal workbooks are plain XML and have the .twb file extension. Packaged workbooks are really just the workbook zipped together with auxiliary files - extracts, custom shapes and color palettes, images, etc.

      They are, however, really really complex and it can be difficult to understand what the relationships are between their component elements. I've been delving into them for 5 or 6 years and can't claim to have unraveled all of their mysteries.

      One thing that separates Tableau from enterprise BI platforms is that Tableau doesn't have a central database that contains the analytics' information. In Tableau everything's in the workbooks. It may be that when workbooks are published to Tableau Server they're decomposed and stored in Tableau Server's internal database, but I don't know whether this is the case - I've never needed to know, since I work with the workbooks locally, downloading them from Tableau Server if necessary. Further, decomposing workbooks and storing their information in tables the Tableau Server databse would be a horribly complicated process - there's a mismatch in the data structural characteristics that makes it very, very difficult. And it's unnecessary; Tableau already has the ability to consume the XML so there's no need to build the mechanism to deconstruct at storage and reconstruct for usage. Clearly, Tableau Server identifies some of the workbook elements, e. g. the data connection, but this is worlds different from extracting everything.

      Delete
    2. Chris,

      Thank you for the insight. I have installed Ruby, will download and install the other modules and use your utility. This utility is very helpful.

      Best Regards,


      René V Orong

      Delete
  3. Chris,

    This is amazing; it does exactly what I needed! I plan to share with the whole team and have no doubt it'll save us hours. Thanks so much!

    Jim

    ReplyDelete
    Replies
    1. Happy to hear it's useful Jim. One interesting extension is to parse out the individual fields from the calculations so that the trace can loop all the way back to the base data source.

      Delete
  4. Chris,

    Thanks for sharing this. It is helping me a lot. But I am facing issue in this, Name(tech) for calculation fields are printed as [Calculation_*****] and some are printed as [Calculation_fieldname(copy)]. Similarly, Formula Code using those calculation field prints the same. I am sure, it is because of replacing datasource, but I can see the calculations properly in Tableau, why not in csv?. Could you please throw some lights on it.

    Jeeva.

    ReplyDelete
  5. █████████████████████████████████████████████████████████████████
    can ny onee tell me that how we can type that bars in calculated field

    ReplyDelete
  6. █████████████████████████████████████████████████████████████████
    can anyone tell me that how we can type that bars in calculated field

    ReplyDelete
  7. Thanks for the awesome solution.

    ReplyDelete