Saturday, September 29, 2012

Reconsidering previous IFNULL post

This post is an extension to my original post on IFNULL.

The comments it received (thanks to everyone) got me thinking a little differently and somewhat deeper (I think). The upshot is that I propose this as a workable improvement to the current IFNULL function:

new IFNULL

IFNULL(expr1, expr2, expr3)

Its structure matches the language:

IF expr1 IS NULL
THEN expr2
ELSE expr3

Benefits

  • The logic is clear and transparent.
  • It's more powerful and flexible, with the opportunity to provide an explicit non-NULL value when expr1 IS NULL.
  • It avoids backwards compatibility problems—a new function, it doesn't conflict with existing calculations.

Drawbacks

  • In the normal (maybe) case, expr3 will be the same as expr1; this will be a bit if a cognitive burden, but not all that much.

About the current IFNULL

new - Mapping IFNULL parsing

I'm adding this section because I'm pretty sure that my objection to the current IFNULL operation is clear. So, pictures being better than words in some circumstances, I mapped it out:

Even though these formulations are logically equivalent, I'm confident that the second one is clearer than the first, and it matches the current IFNULL documentation in the function editor.

The problem with the second reading is that the function's name is the reading's negation. It's this collision that I'm urging be fixed.

I understand that the current formulation is readable by programmers and technical people who are experienced in interpreting these things (I've been at it since 1974). But, and this really is the whole point of this blog, Tableau was invented to move data analysis and visualization from the technical-only realm and open it up to non-technical people who need to understand their data but shouldn't be burdened with the arcane technical aspects. Every single thing that's not as clear and transparent as it could be is an impediment to real human people. Forcing non-technical people to adapt to the technical conventions is contrary to the whole idea of eliminating the friction between between people and the information and insights in their data.

addendum to original post

Note: in my original post's comments, this section was unclear because I used angle brackets—< and >, and these were interpreted as invalid tags and not shown. Sorry about that.

In a nutshell, my objection to the current IFNULL(expr1,expr2) function is that from a human standpoint, the reader has to figure out:

  • what, exactly, is being evaluated to see if it's null, and depending upon the assessment:
    • what positive action is being taken when it is null, and
    • what positive action is being taken when it's not null.

If I have it right (it's been a long time since undergraduate logic class), the two semantic constructions for {FUNCTION}(expr1,expr2) are:

  1. If {IS NOT NULL -expr1-}, then return expr1 else return expr2
  2. If {IS NULL     -expr1-}, then return expr2 else return expr1

  1. parses strictly right to left and is easy to understand
  2. requires a forward leapfrog when -expr1- IS NULL, followed by a backreference when -expr1- IS NOT NULL, and is, at least for me, twisty and unclear

I believe that:

  1. 1. is better than 2., and
  2. the documentation describes 1., although it's not as clearly worded as it could be.

I understand the problems associated with 1., most notably that a negative expression is typically harder to parse and a negative–null coupling is doubly difficult to wrap one's head around, and changing the expression causes backward compatibility problems (and Tableau hasn't yet implemented self-analysis).

Footnote: I'm sure someone could do a better job on the logical formalisms.

5 comments:

  1. I disagree with your preference.

    2. makes perfect sense and aligns well with your "new IFNULL" at the beginning of this post, with expr3 being expr1.

    I read the current IFNULL(expr1,expr2) as:
    When expr1 is null return expr2 otherwise expr1

    I see this as the same as your "new IFNULL".

    and the current function aligns well with this, it is shorthand, and makes sense. I fail to see any leapfrogging or how this is unclear.

    If I wanted the extra logic to return something else when expr1 is not null, your expr3, then I would write out the full IF/ELSE statement like you did, and not use the shorthand IFNULL().

    I guess if someone has difficulty understanding this, then they are more than welcome to write the full logical statement out, IFNULL() is just a shortcut, shorthand for a frequently used logical setup. Just like ZN().

    Also IFNULL() is a SQL function, and Tableau is exposing it in the calculated field editor.

    Tableau documentation matches MySQL documentation: http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html

    ReplyDelete
  2. Hi Chris,

    Are you familiar with IIF()? Your proposed expression could be written as:

    IIF(ISNULL(expr1),expr2,expr3)

    This is nearly as compact as your proposed form. It's also important to note that IFNULL() can be useful for avoiding a repeated subexpression. This may not be much of a cognitive burden, especially if you are referencing a calculated field, but the full subexpression will appear in multiple places in the SQL query that Tableau has to generate. For databases with poor query optimizers this may result in worse performance than the simple IFNULL form that Tableau already supports.

    -Robert

    ReplyDelete
  3. Hi Joe and Robert,

    My point here is that the current formulation of IFNULL is not as simple and clear as it should be, so that ordinary non-technical people have as low a barrier as possible in achieving the result they're after, which is to obtain one value for their calculated field when an expression is null, and another when it's NOT NULL.

    I know that it's an almost invisible difference to people who know this stuff, and that it's possible to achieve the desired functionality with other means.

    But that's not the point. Tableau was designed so that regular people would need absolutely minimal technical skills, either by having them or by getting help from others who do, to be able to analyze, visualize, and understand their data.

    I'm pretty much inoculated against arguments that use technical framings to rationalize Tableau's friction points.

    It's perfectly OK that MySQL's IFNULL function is the same as Tableau's... for MySQL and the people who use it, but they're not same audience as Tableau's.

    Whether or not the current formulation has back end SQL query advantages is a moot point. It's Tableau's job to shield the user from the back end technical aspects of data source access efficiencies as much as possible, not force the User to accommodate to the needs of the machinery.

    ReplyDelete
  4. I think you would like this:
    http://worrydream.com/LearnableProgramming/
    I believe the section on Flow applies to view you have raised here.

    What you are describing is a barrier to learning, and once the barrier is crossed it is no longer an issue. Getting past the barrier is difficult. Does this article address the need you have?

    ReplyDelete
  5. Thanks, Joe. I'd seen this by Bret Victor: https://vimeo.com/36579366 - it's Bret "Victor - Inventing on Principle"

    I very much like his ideas about surfacing the effects to make comprehension easier. It's very much, I think, in the spirit of Tableau's providing a way for people to visualize their data without needing to know anything about the arcane programmatic/technical aspects of the "how" of it's getting done. In essence, one says to Tableau "sort across by this, down by this, and show me the magnitude of this" and Tableau provides a high quality visualization of just that.

    I've brought this up before, and at the risk of being a Johnny one-note, here's some pseudo code that does (or should) the same thing:
    with Employee data
    sum Bonus
    by Employee {top-down}
    across Department {left-right across)
    Is it VizQL? I expect not, although I've never seen actual VizQL

    How about this:
    Table file Employee
    sum Bonus
    by Employee
    across Department
    END
    It's pretty much exactly the same thing, and has the great benefit that, if it's presented to FOCUS it will create exactly what it says it will. And it's been around for appx 40 years.

    Another though on Bret Victor's take on things. One of the great benefits of learning programming with mainframes, punch cards, FORTRAN, and COBOL in the way back is that, because machine time was so expensive, one had to learn how to hand-work one's algorithms before being allowed to submit the card deck for processing. In a very real sense it was related to Bret's illustrating the effects of code in operation as a way of assisting people who don't already understand it to gain that understanding.

    I learned programming this way in the mid-1970s at the University of Guelph - it was hard, mostly because it was tedious and I was impatient. But I did learn to think like a problem-solving, algorithmic, methodical computing machine.

    Although I've been a programmer for almost 40 years, and enjoy it, I'm more interested in the outcomes than the process. Anything that gets in the way of seeing the clear path to what I'm trying to achieve is frustrating, particularly when whatever's obstructing the view needn't be there. And it's doubly frustrating when the answer to something not being human-clear is "well that's the programming convention so it's right".

    ReplyDelete