Thursday, May 17, 2012

Garbled Data from HTML Tables

I was putting together my tides dashboards -here- and found a source of tidal data on a web page.

Lucky day, thought I: with Tableau's copy-and-paste data-grabbing ability it promised to be a simple matter of highlighting the table, ctrl-c, switch the Tableau, ctrl-v, and away we go.

Sadly, my hopes were dashed. Upon pasting the data into Tableau it wasn't at all the same as the web page presented it. In fact, it was pretty much unusable. I was a wee bit miffed, my plans to get back to the beach, nice new tides tables showing up on my iPad were dashed.

The trouble turned out to be the way Tableau recognizes data copied from HTML tables. In this specific instance, the presence of a <br> tag in one of the table headers to make it easier to read was interpreted by Tableau as an end-of-line signal, which scrambled the record processing in the middle of reading the field name first record.

The workbook below, published to Tableau Public, details the situation.

Interestingly, when I contacted Tableau about this situation they told me that it's behaving precisely as intended, that <br> tags are explicitly interpreted as end of record characters because "those are the rules".


  1. How is that Excel can properly tell the difference between a /tr and br/ when pasting from a web browser?

    Maybe that is something Tableau developers can look into.

  2. I've talked to Tableau's developers and product people. Their position is that the current behavior is correct, that's the way it was programmed, they way it's supposed to work. Their idea is that tags that provide inter-cell structure are field delimiters and as such get preserved in the Tableau records.

    They didn't accept my argument that the normal use of
    and similar tags in table cells is purely visual - like the "The HTML Tables" example in my workbook. As far as I know it's a closed book, leaving us poor shlubs to scrape and parse HTML tables for ourselves.

  3. I see, seems like they view that task as being outside of their scope.

    It saddens me that they seem to value their scope definition over user experience.