<< Back

Data Scaffolding in Tableau Prep

I previously wrote a blog explaining how to create a data scaffold using Alteryx. Please review that blog for a more detailed explanation of what a scaffold data set is and why you may need to use this technique.

This diagram from that blog shows one particular use case, where filtering the data may remove certain rows from the data that are required for a complete visualisation:

I previously explained how to create a scaffold data from superstore data using the Append Tool in Alteryx. A recent blog by Ben Moss explains how to replicate the function of an ‘Append Data Tool’ using Tableau Prep.

First, I’d like to re-introduce the problem that needed to be solved using a scaffold in my previous blog:

When looking at all of Superstore data there are 793 unique customers, however when you filter to just the ‘Technology’ Category there are only 687. If you wanted a visualisation that showed all customers, but have it being able to be filtered to a specific Category you would not be able to, as some Customers have not made purchases of ‘Technology’ products.

 

The steps required to create the scaffold for this particular problem are as follows:

  1. Create 3 separate ‘Aggregate’ steps from Superstore data;
    • Group By Customer Name
    • Group By Category
    • Group by Customer Name & Category
  2.  Create a calculated field with the dummy value of ‘1’ which will be used as the join clause in step 3.
  3. Join the Customer Name Aggregation to the Category Aggregation using the dummy field. This will result in every row from the Customers Agg. being joined to every row of the Category Agg.
  4. Add another join between the combined Customer & Category Agg. with this Cartesian join output. Join using Customer Name and Category as join clauses. Make sure you just take the LEFT output only as this will be the combination of Customer and Category that do not currently exist in the data. These rows are now what you use to create the data that was missing when filtering.
  5. Clean the columns up so you just have Customer Name and Category remaining
  6. Union the scaffold rows to the original data set
  7. Output to .hyper

You can keep adding aggregated outputs using this Cartesian join approach to add more functionality to your scaffold. For example, if you wanted to add a country filter, you would add an additional aggregation to group by ‘Country’, add the dummy field then an additional join.

Phillip Lowe

London, UK

Leave a Reply

Your email address will not be published. Required fields are marked *