A Tableau Consultant’s View on Data Delivery
“Our ultimate goal is to create a ‘one version of the truth’ data source!”…a statement a modern data consultant will get used to hearing. Questions such as “how can Tableau Server be used as a centralised, one version of the truth repository” or “how can we restrict what people can create with Alteryx to ensure it doesn’t disrupt our one version of the truth philosophy.” Honestly I do not like the phrase “one version of the truth”, upon its whispered utterance I immediately ask “one version of who’s truth?”
Seeking out everybody’s truth
So what’s wrong with everybody singing from the same hymn book? (…and yes if you’re going throw IT catchphrases at me I’m going to throw corny phrases back). Well in theory nothing, if you’re analysing your company’s profit and loss for an official report, especially if it’s going to Her Majesty’s Government, then need to be certain that the underlying figures are as good as they can be. But that’s one data source and a very specific one at that. What advocates of “one version of the truth” tend to promote is the idea that everybody’s question can be answered with just a single data source.
Everybody’s question…from one data source…? This is a logical and even a physical impossibility. Like those who travelled to seek out Deep Thought in The Hitchhiker’s Guide to The Galaxy to hear that the answer to the ultimate question of life, the universe and everything was 42, those who seek to answer every analytical question that every person in a company could possibly have will inevitably be disappointed with the output. Either a dataset has been overly simplified for ‘user friendliness’ so that no real questions can be answers, or they’ve been made so large and overly complicated nobody has a single chance of understanding how to use them.
The Data Team, The Challenge
So this brings us back to “who’s version of the truth?” Let’s think how one version of the truth comes to being in a company. Somebody in middle to upper management decides that the BI capability needs to be improved and believes that information should be democratised across the organisation. So they set up a meeting to bring together potential end users, line managers of critical departments and the heads of IT. They scope out the known major data systems within the organisation, argue about the complexity of matching these all together, discuss the known unknowns of each, try to ignore the possible existence of unknown unknowns, and many hours, jugs of coffee and Visio diagrams later they come out with a methodology to achieve success. But that’s just one part of the problem, how should they get the data to the users? Well you can’t just present it as SQL tables and views, most users don’t understand those (big assumption there) and anyway imagine the anarchy of giving users raw, row level data which they can join and blend at will! Insane!
So what to do? Well you can either present multiple single views on the data through some kind of delivery mechanism (a locked down datamart, a data exporter such as Business Objects) or you try to mash everything together into a single cube. That second one sounds really clean. One source, a controlled logical machine which will present correct figures at different drill-down levels, sweet! So the team go with that and 9 to 12 months (18 in reality after discovering those unknown unknowns) later their cube is ready to be presented to the company.
The outcome? Well the designer couldn’t decide how many different date fields people would want, so they’ve added them all. Then there’s the finance guys who want everything broken down by department, which is actually a cost centre, but that’s not the same cost centre as what everybody else uses on the internal CapEx & OpEx planning system. Then of course there’s the holes in the data which were never resolved from combining all these systems into one single source. So you can analyse headcount by calendar year, but not financial year, that will generate an error.
And that’s just what can be seen in the data connection. What about all the logic written into the system to calculate the values. Some present percentages while a quick switch of a dimension will switch that same measure to an average with a total average line included…which can’t be hidden. The result? Confusion, blind assumptions, and request for change, after request for change, after request for change. But if people are building reports on existing measures, how can you change the underlying logic? So a new measure and dimensional hierarchy is added, and another and another. Doesn’t sound very ‘one version of the truth’ does it? Of course they could have just denied every change request, but then nobody would end up using the system.
Data to the People
So where did it all go wrong? In my opinion it all happened back when the first assumption was made, that all users couldn’t connect directly to the data and combine datasets as they needed. This is the new age of data analysis, it’s here. Tools such as Alteryx for data blending and Tableau for visual analysis mean end users no longer need to be SQL gurus to make use of row level data. Yes I said it…Row Level Data! And before you argue that you couldn’t possibly make row level data available, send somebody on a fact finding mission around different departments. They’re already doing it! People are becoming experts in sending just the right query to your cube through Excel to get row level data out. They’re picking the Business Objects report to export the past 30 days of data in to Access, their “new” datamart.
What could the team have done differently? Well besides trusting that the people who have the questions which can be answered by the data aren’t stupid enough to report the sales which they’re responsible for as being twice the amount they really are, they could have used the working group and all that meeting time to transform the IT department. With data blending and analysis moving out of IT and to line of business, IT needs to get back to what it’s good at…keeping systems up and performing as well as they possibly can. Companies who can transform their IT departments into the Amazon Web Services of the corporate world will find they can make full use of a workforce used to working with technology that doesn’t require a user manual or week long training course. A workforce that has the freedom to do what they need to do to get the answers to the questions they have.
So yes, I’m not a great fan of one version of the truth. Instead I’d prefer a conversation which allows truths to be tested, refined and made into fact by scientific testing and organic peer review. To make it happen, accept that IT don’t need to report on data any more, they don’t have to understand the entire organisation. Instead they just need to understand query times, server clustering and IOPS.
This is fantastic, Craig. Beyond this, look at some major issues where audited, single versions of the truth turn out to be wrong.
1. Tesco’s £200million black hole in its profits. That was in audited accounts.
2. Tibco’s black hole in their buy-out strategy. Tibco’s a BI company and even it couldn’t get the correct truth.
Even in vetted, audited, reviewed numbers there are errors and disagreements. As you say, “truth” comes down to the involvement of humans and conversations between people. Playing with and diving into the data helps discover multiple truths.
+5 points for referencing Deep Thought
Adding to the data soup is all the external sources via the spooky-sounding “Shadow IT”. It’s data-mixology all around
IT orgs are under tremendous pressure to quantify the business value from their efforts. Many CIOs now report to the CFO, and just keeping the lights blinking green isn’t good enough.
Bonus: Did you just create a new “Truther” epithet?
Very good article, a dream platform came true only you see and test Tableau!
Great post Craig! I think companies should take your words to rethink about their approach to anlyze data and share results.
Interesting piece, some food for thought. You may also like Data Warehousing and Sources of Truth: Rarely Pure, Never Simple available here http://goodstrat.com/2014/12/19/data-warehousing-and-sources-of-truth-rarely-pure-never-simple/
That catchphrase goes a long 20-30 years way and in deals with a VERY important problem in ANY modern enterprise: To have the same data tell the same story. It is a bit of classics so I wonder if you know it. Anyway, here is the most famous use case: CEO asks how much was sold to Sales, Finances and Logistics. He gets back three different answers and had he asked another departament he would have gotten (is that tense right?) another different answer.
So, the “single version of truth” is achieved more having a clearly defined data than an exclusive set of data. More or less like Finances report only received cash (but not necessarilly delivered goods), Sales account cash from not yet paid orders and Logistics deem “sold” what effectivelly got out of the door. Had all of them agreed on what “sales” meant, there should have been only one answer.
So this is why this “slogan” is not quite a fallacy, but a business case in a nutshell.
That said, DW industry kept going on for the last twenty-thirty years and now we say “all versions of truth” because there is now cheap technology enough to efficientty acquire and store every data your company and parts of the world generate. So “all versions of truth” now means you can get the “right” version, the “dirty” version, the individual versions and so on. It is mostly enabled by the Data Vault technology, but columnar and plain data stores (like Hadoop) also play a sizeable part on it.
So, “(…)advocates of ‘one version of the truth’ tend to promote is the idea that everybody’s question can be answered with just a single data source(…)” is not really the case. Maybe some mislead novice is telling that, but he is wrong as it is really really not the case.
Just think of the consequences of letting go the clearly defined dataset: You are going to get called back to explain why two guys are reporting different numbers on the same issue. It happens all the time in every enteprise I met (and still happens).
Right? 😉
I believe I could go on and on replying every statement from that text but I believe I’ve made the point here.
It’s good to see that phrase raising someone else’s hackles.
The pursuit of a single version stems from a number of factors, some of them are technical, some of them are conceptual, and they all lead to the same place: the creation of an answer-anything all-encompassing analytical engine that spans and entire enterprise, suitable for providing the information needed by everyone.
This pursuit is behind the massive misguidance that’s taken place in business data analysis over the past 20+ years, wasting vast amounts of time, energy, effort, money, and other valuable resources trying to build the enterprise-scale industrial strength data cathedrals and the analytical facades presented to the information seekers.
Questioning the status quo, the received wisdom of the purse-holders, and the prevailing paradigm is rarely a way to get ahead. Advocating the rejection of the patent silliness of “one version of the truth” hasn’t been fruitful during the heyday of Big BI, but we’re starting to see cracks in the dam. Tableau has been instrumental in this, providing the ability for normal human beings to access, explore, and understand the data that matters to them independently of whether it’s been integrated into the matrix of vetted and approved institutional truthiness.
A couple of years ago I was consulting with one of the quasi-governmental international finance and reconstruction organizations in Washington, D.C., helping them get started with Tableau. It was a local initiative within the organization, supporting economists and statisticians with their data analysis and sharing needs, and it went very well. Once Tableau reached the level of penetration and visibility where it became advantageous for the institution’s BI group to get involved, things changed. Their first response to “we need to get on top of this so we can control things” (a cynic would add ‘preserve our positions and high salaries’) was to embark on a multi-year program to canvass the entire organization, hold meeting, co-ordinate among the various constituencies to consolidate their meaningful and valuable data, after which, and only after which, would it be made available for analysis with Tableau.
The simple truth, the one that Tableau so beautifully reveals, is that all data is true. All data tells stories that are worth knowing. There is no bad data, only some that’s misunderstood.
The idea that there is a single version of the truth is a bad idea, and it’s time for it to be put out to pasture.