A Tableau Consultant’s View on Data Delivery
“Our ultimate goal is to create a ‘one version of the truth’ data source!”…a statement a modern data consultant will get used to hearing. Questions such as “how can Tableau Server be used as a centralised, one version of the truth repository” or “how can we restrict what people can create with Alteryx to ensure it doesn’t disrupt our one version of the truth philosophy.” Honestly I do not like the phrase “one version of the truth”, upon its whispered utterance I immediately ask “one version of who’s truth?”
Seeking out everybody’s truth
So what’s wrong with everybody singing from the same hymn book? (…and yes if you’re going throw IT catchphrases at me I’m going to throw corny phrases back). Well in theory nothing, if you’re analysing your company’s profit and loss for an official report, especially if it’s going to Her Majesty’s Government, then need to be certain that the underlying figures are as good as they can be. But that’s one data source and a very specific one at that. What advocates of “one version of the truth” tend to promote is the idea that everybody’s question can be answered with just a single data source.
Everybody’s question…from one data source…? This is a logical and even a physical impossibility. Like those who travelled to seek out Deep Thought in The Hitchhiker’s Guide to The Galaxy to hear that the answer to the ultimate question of life, the universe and everything was 42, those who seek to answer every analytical question that every person in a company could possibly have will inevitably be disappointed with the output. Either a dataset has been overly simplified for ‘user friendliness’ so that no real questions can be answers, or they’ve been made so large and overly complicated nobody has a single chance of understanding how to use them.
The Data Team, The Challenge
So this brings us back to “who’s version of the truth?” Let’s think how one version of the truth comes to being in a company. Somebody in middle to upper management decides that the BI capability needs to be improved and believes that information should be democratised across the organisation. So they set up a meeting to bring together potential end users, line managers of critical departments and the heads of IT. They scope out the known major data systems within the organisation, argue about the complexity of matching these all together, discuss the known unknowns of each, try to ignore the possible existence of unknown unknowns, and many hours, jugs of coffee and Visio diagrams later they come out with a methodology to achieve success. But that’s just one part of the problem, how should they get the data to the users? Well you can’t just present it as SQL tables and views, most users don’t understand those (big assumption there) and anyway imagine the anarchy of giving users raw, row level data which they can join and blend at will! Insane!
So what to do? Well you can either present multiple single views on the data through some kind of delivery mechanism (a locked down datamart, a data exporter such as Business Objects) or you try to mash everything together into a single cube. That second one sounds really clean. One source, a controlled logical machine which will present correct figures at different drill-down levels, sweet! So the team go with that and 9 to 12 months (18 in reality after discovering those unknown unknowns) later their cube is ready to be presented to the company.
The outcome? Well the designer couldn’t decide how many different date fields people would want, so they’ve added them all. Then there’s the finance guys who want everything broken down by department, which is actually a cost centre, but that’s not the same cost centre as what everybody else uses on the internal CapEx & OpEx planning system. Then of course there’s the holes in the data which were never resolved from combining all these systems into one single source. So you can analyse headcount by calendar year, but not financial year, that will generate an error.
And that’s just what can be seen in the data connection. What about all the logic written into the system to calculate the values. Some present percentages while a quick switch of a dimension will switch that same measure to an average with a total average line included…which can’t be hidden. The result? Confusion, blind assumptions, and request for change, after request for change, after request for change. But if people are building reports on existing measures, how can you change the underlying logic? So a new measure and dimensional hierarchy is added, and another and another. Doesn’t sound very ‘one version of the truth’ does it? Of course they could have just denied every change request, but then nobody would end up using the system.
Data to the People
So where did it all go wrong? In my opinion it all happened back when the first assumption was made, that all users couldn’t connect directly to the data and combine datasets as they needed. This is the new age of data analysis, it’s here. Tools such as Alteryx for data blending and Tableau for visual analysis mean end users no longer need to be SQL gurus to make use of row level data. Yes I said it…Row Level Data! And before you argue that you couldn’t possibly make row level data available, send somebody on a fact finding mission around different departments. They’re already doing it! People are becoming experts in sending just the right query to your cube through Excel to get row level data out. They’re picking the Business Objects report to export the past 30 days of data in to Access, their “new” datamart.
What could the team have done differently? Well besides trusting that the people who have the questions which can be answered by the data aren’t stupid enough to report the sales which they’re responsible for as being twice the amount they really are, they could have used the working group and all that meeting time to transform the IT department. With data blending and analysis moving out of IT and to line of business, IT needs to get back to what it’s good at…keeping systems up and performing as well as they possibly can. Companies who can transform their IT departments into the Amazon Web Services of the corporate world will find they can make full use of a workforce used to working with technology that doesn’t require a user manual or week long training course. A workforce that has the freedom to do what they need to do to get the answers to the questions they have.
So yes, I’m not a great fan of one version of the truth. Instead I’d prefer a conversation which allows truths to be tested, refined and made into fact by scientific testing and organic peer review. To make it happen, accept that IT don’t need to report on data any more, they don’t have to understand the entire organisation. Instead they just need to understand query times, server clustering and IOPS.