While keeping a weather eye on the goings-on in the business world, I'm frequently fascinated by what I see presented as insights and valuable information into the state of business data analysis (BDA).
Data analysis has been around ever since people started keeping records. In fact, there's no reason to keep records if they're not going to used to inform someone in the future. You'd think we'd have gotten pretty good at it by now. But no......
Computer-assisted BDA (CaBDA, hmmm may be on to something here) took a huge dive down a rabbit hole a couple of decades ago. Up until then we'd been making pretty good progress. When I started in the field we were still using punch cards. My first project at the University of Guelph's Introduction to Data Processing course was using COBOL to analyze bird banding data for the North American bird watching society. COBOL programming was the standard for CaBDA for a long time. Then some clever people invented Ramis, FOCUS, and other 4GLs that were a tremendous leap forward, in which the fundamental analytical operations were abstracted into English-based high level declarative languages that would analytically process data, transforming it and producing reports with line printers and terminal outputs. Fast forward to today and Tableau does, in many ways, the same things—there are only a handful of basic analytical operations—only much, much better because of its invention of a user interface that represents data and directly supports the analytical things people do with it, incorporating visualization best practices into the gestalt.
I didn't forget about the rabbit hole. During the mid- to late-80s, and into the 1990s things changed. Prior to then the movement had been to bring business data closer to the business, make it easier to access and analyze. The tools and technologies were evolving from mainframe-terminal block-mode based, to non-mainframe character-wise interactive models, and finally to GUIs with full integrated windowing systems.
Unfortunately, at the same time business data began to slip away from the business, sucked back behind the technology curtain, where only Oz, the Great and Powerful (i.e. IT) was capable of creating, holding, and safeguarding it. And occasionally doling out some of it from time to time. There are many reasons why this happened. Among them is that the near-simultaneous emergence relational database theory and management systems, coupled with the success of table-based PC database products meant that anyone with minimal skills could create applications for capture and store data; the huge problem here being that this data was impossible to non-technical understand, scattered across an organization, was inconsistent in all possible ways.
Into this sorry state of affairs rode the solution, the savior who was going to bring order to the chaos. The emergence of data warehousing was touted as the means to make business data available for analysis, and analyzable. A great, grand theory. And it might have worked. But it didn't. It was, overall, a failure on a scale that should have shamed everyone involved. Even the most optimistic estimates and assessments showed that the failure rate of data mart- and warehouse-based enterprise BI projects was around half. Half. In what other human endeavor would a failure rate of half not been enough to stop the continual very large expenditure of time, energy, effort, money, and human resources?
Fortunately, things are changing. The emergence of new tools, Tableau first among them, has led to a sea change in the way data analysis is conducted, and this change is percolating across the landscape, even seeping deep down into the dark chambers where corporate BI hoards their mines of valuable data. But the old ways don't give up easily; too many people and organizations have too much invested in the way things have been done—billions and billions of dollars of BI revenue from the sale of Big BI Databases and Platforms, and the billings from armies of BI technical resources And what does it say that people are called 'resources'? I'm serious: this single, simple word encapsulates most of what's wrong with the prevailing CaBDA paradigm.
I could go on. And likely will, soon enough. (too soon for some) On to the reason I was prompted to write this post.
I read the following information-management.com article this evening: Paxata Gives Back Analysts' Valuable Time.
Fascin7tating reading. it makes the case that a new data analytical tool will make CaBDA faster, more efficient, and better. Because its primary interface is the old familiar spreadsheet model.
Really. The future of CaBDA is bright because there's a new spreadsheet-like tool that will ease the burden on the back room analysts' data preparation jobs easier.
I submitted the following comment (I wonder if it'll get through)
There are two major flaws in this article.
The first is neatly encapsulated in the statement: "In fact, our research shows that analysts consistently spend anywhere from 40 percent to 60 percent of their time in the data preparation phase that precedes actual analysis of the data." This reveals the failure to recognize that the concept that there is an "actual analysis" of data that's somehow a separate realm is fundamentally wrong. There is no "actual analysis" of data that occurs as an end point of business data analysis. Rather, data analysis occurs everywhere there's data to be understood, all the way from the raw source data through to the enterprise-homogenized data that is, apparently, in this traditional framing, the be-all and end-all of business data analysis.
There are many reasons for the traditional state of affairs. But the present and future need not be locked into the same sad conditions.
Things have changed with the emergence of the modern generation of direct-access immediate-results human-oriented data analytical tools. Tableau, the most visible, and its cousins bring fast, highly effective data analysis to everyone, including the 'analysts' who need to understand the data within their horizons, not just the end-point business consumers. Using modern tools across the spectrum can eliminate the need to build elaborate data cathedrals in many cases, and in those circumstances where data marts and warehouses are still useful they can be built better, faster, and cheaper when the new tools are brought to bear across the activity spectrum. There's no reason why a data warehouse project can't deliver valuable outputs right after initiation, and continue to deliver new value for its lifetime, at a mere fraction of the cost of the traditional ways. Vendors won't generally tell you this, because their revenues are based in selling highly expensive platforms, legions of consultants and their billable hours, or both.
The second flaw is that the spreadsheet is a good or effective mechanism for data analysis. It is not. It is, in fact, very poor at the job; there are much, much better tools available, any of them superior to spreadsheets. Spreadsheets' wide use for analysis is due to historical factors, not to their suitability. Over the past 40 years there have been a fair number of attempts to use a variety of table-based approaches for data analysis and reporting, none of them have succeeded in overcoming the basic fact that there's a fundamental cognitive mismatch between their structure and how people conceive of data and of the analytical relationships between data elements, and the foundational analytical operations.
The landscape is changing, already has changed more than the conservative traditional business environment and media recognize and acknowledge. It won't be business as usual, it will be business done better because the essence of BI-helping everyone understand the data that matters to them, will be better.
Ahhh. I feel refreshed. That was a nice break from pointing out how Tableau could be even better.