Health check your data using Alteryx and Tableau
I often get data and need to know how clean it is before I start looking at the data, either because I’m going to be using it directly in a Tableau visualisation, or because I have some work to do it in Alteryx. Sometimes I just want a quick way of going back to the supplier and saying “Yes, that’s fine” or “There’s a problem with column X”.
This is where the visualisation below comes in, I call it a “DNA profile” (Data nose-around), it shows at a glance any issues. We see each column has a profile bar and it’s rows are represented as lines. Each line is coloured and tells us if the data is a Null or Empty, whether it’s a Float, String or Integer. This can help with no end of things; e.g. checking joins (you want to ensure you have no NULLs so they’ll join!), checking data consistency (is all your Sales data populated for every State), etc.
Let’s look at the final dashboard, then look at how I produced it.
[tableau server=”public.tableausoftware.com” workbook=”Workbook_9″ view=”DNADashboard” tabs=”no” toolbar=”no” revert=”” refresh=”yes” linktarget=”” width=”600px” height=”800px”][/tableau]
Some things immediately jump out from this dataset (which is a download of railway station use in the UK). Firstly the NLC appears to be all Integer – let’s check by clicking on the Null and String highlights at the bottom of the screen (there aren’t enough pixels to show every row bar at once) – yes it’s all integer. Perfect…
..and take a look at all those following String Columns, no mismatched datatypes their either, perfect. But oh dear Station Group has a lot of NULLs not happy at that!
Even worse look at what happens for the Entries fields, there’s a whole block of blue String fields where the rest of the data shows Integer, these are meant to all show Integer. What is going on here? [I can tell you by looking at the data that someone has added comma’s into the data as a thousands separator – that might cause an issue in my import later – glad I caught it!]. The fact I pre-sorted the data on one of the region columns tells me this is limited to one region [in this case London] which is also very nice, in the original sort order this may not have stood out.
So you see how useful this can be to just see a quick helicopter view of the data – is it healthy? or is it full of Nulls and other rubbish?
Producing the Viz Data
I wrote a quick module in Alteryx to transform the data, which you can see below:
The main steps where:
1. Import the data
2. Identify the NULLs, Empty, Integer and Floats for each column [using the Multi-Field Formula tool to apply one formula to apply this to every column]
3. Assign a flag to say which Type the data cell was (I assumed it can only be one – though obviously Integers are a subset of Strings)
4. “Flip” the data – making it tall and thin for Tableau reporting, i.e. transpose so we have one Dimension called [Column Name], with a [Column Value] Measure, rather than having separate dimensions per column.
I then took my module and encapsulated into another, there was no technical reason that meant I had to do this as a macro, it just gave me a more aesthetic result:
Note the interface icons which I will come onto, but otherwise this module isn’t doing much, it is just importing the data, sorting it and then calling the transformation macro, before outputting to a Tableau workbook [note: this final tool can be obtained from the Visual Analytics toolkit from Alteryx – loads of great Tableau tableau specific resources inside that].
Running the module
If you have Alteryx and Tableau you can take my module, which I’ve shared below, simply change the input and run it in the designer, then click on the twb link in the log to open up the resulting workbook. However I’ve also leveraged the power of Alteryx, using those 3 or 4 interface tools, to create an app. You can also use this rather than going to the designer:
Now I can use this “app” to select a file and sort order and…using the clicks below:
Our new Viz opens directly in Tableau. How cool is that? No coding required for really nice, useful application.
Please try the download here and let me know how you get on, clearly this is the start of what could be a much grander data understanding visualisation, so let me know what else you’d like to see. Also remember you can try Alteryx for free for 14 days by downloading it via alteryx.com.
The above process could be improved, e.g. by calling tableau.exe so it opens automatically after the app runs but with different versions of Tableau out there I couldn’t distribute this version and guarantee it would work. I’d also like to publish a version up to the Alteryx Gallery so the app was on the web – but allowing the tde to open immediately wasn’t possible because of security limitations.