## How to build a Sankey diagram in Tableau without any data prep beforehand

First of all I want to say this post is hugely indebted to Olivier Catherin and his post four years ago on building Sankeys in Tableau and Jeff Schaffer whose work Olivier built off and countless others I probably don’t even realise.

This post sets out how to build a Sankey Diagram without any data prep before Tableau. The viz below is built off the vanilla Superstore data packaged with Tableau, but I must warn you there’s a labyrinth of table calculations to get to this point.

There are 20 calculations to get through to get to this stage so what makes it worth it? All previous solutions I’ve seen for building Sankeys have needed to multiply the size of the data by at least a factor of two beforehand. Some data sets are so large that we don’t want to do that and sometimes we come across data sources such as published data sources on Tableau Server where we’re not the owners and this isn’t an option. So if we can work off the vanilla data set it’d be worth the effort.

So without further ado let’s get going.

### Step 1: Dimension and measure parameterisation

Set up 3 calculations, 2 for your dimensions and 1 for your measure. It’s useful down the line to reference these in case you want to change which dimensions you want in the view part way through building and you can edit the calcs rather than fiddling with “replace references”. In my example I used these calculations where [Select Dimension 1] and [Select Dimension 2] are parameters I made to fit the case statement:

#### Dimension 1

`CASE [Select Dimension 1]`

WHEN 1 THEN [Region]

WHEN 2 THEN [Category]

WHEN 3 THEN [Sub-Category]

WHEN 4 THEN [Segment]

WHEN 5 THEN [Ship Mode]

END

#### Dimension 2

`CASE [Select Dimension 1]`

WHEN 1 THEN [Region]

WHEN 2 THEN [Category]

WHEN 3 THEN [Sub-Category]

WHEN 4 THEN [Segment]

WHEN 5 THEN [Ship Mode]

END

#### Chosen Measure

`[Sales]`

### Step 2: Create a frame for data densification

Comparing our measure to the fixed min of the measure we can ensure two data points to hang our data densification from.

#### Path Frame

`IF [Chosen Measure] = {FIXED : MIN([Chosen Measure])} THEN 0 ELSE 97 END`

#### Path Frame (bin)

For this create bins of size 1 from Path Frame

### Step 3: Index

#### Path Index

`Index()`

This is computed along Path Frame (bin) and allows us to do calculations across the Path Frame range.

### Step 4: Sigmoid set up

First we have a variable, T, and then the sigmoid curve is calculated using it.

#### T

`IF [Path Index] < 50`

THEN (([Path Index]-1)%49)/4-6

ELSE 12 - (([Path Index]-1)%49)/4-6

END

#### Sigmoid

`1/(1+EXP(1)^-[T])`

### Step 5: Sankey arm sizing

This gives us the size of each Sankey arm as a percentage of the full data set.

#### Sankey Arm Size

`SUM([Chosen Measure])/TOTAL(SUM([Chosen Measure]))`

### Step 6: Top line calculations

This method requires separate lines for the top and bottoms of each Sankey arm and throughout these calculations “Position 1” will refer to Dimension 1 on the left hand side and “Position 2” to Dimension 2 on the right. Here are the calculations for the tops:

#### Max Position 1

`RUNNING_SUM([Sankey Arm Size])`

#### Max Position 1 Wrap

`WINDOW_SUM([Max Position 1])`

#### Max Position 2

`RUNNING_SUM([Sankey Arm Size])`

#### Max Position 2 Wrap

`WINDOW_SUM([Max Position 2])`

### Step 7: Bottom line calculations

These calculations help generate the bottom lines:

#### Max for Min Position 1

`RUNNING_SUM([Sankey Arm Size])`

#### Min Position 1

`RUNNING_SUM([Max for Min Position 1])-[Sankey Arm Size]`

#### Min Position 1 Wrap

`WINDOW_SUM([Min Position 1])`

#### Max for Min Position 2

`RUNNING_SUM([Sankey Arm Size])`

#### Min Position 2

`RUNNING_SUM([Max for Min Position 2])-[Sankey Arm Size]`

#### Min Position 2 Wrap

`WINDOW_SUM([Min Position 2])`

### Step 8: Sankey polygon calculation

This calculation brings together all of the above once set up correctly. The table calculation set up itself will come in later steps.

#### Sankey Polygons

`IF [Path Index] > 49`

THEN [Max Position 1 Wrap]+([Max Position 2 Wrap]-[Max Position 1 Wrap])*[Sigmoid]

ELSE [Min Position 1 Wrap]+([Min Position 2 Wrap]-[Min Position 1 Wrap])*[Sigmoid]

END

### Step 9: Prepare the sheet

Put Path Frame (bin) onto Rows, then right click it and select “show missing values” as below

This should give you something like below. This is the step that enables our data densification within Tableau.

Then move Path Frame (bin) to detail and add Dimension 1 and Dimension 2 to detail too. Next, put [T] on Columns and calculate it along Path Frame (bin). Finally change mark type to Polygon, and add [Path Index] to path and calculate it along Path Frame (bin). This should leave you with something like this:

### Step 10: Add Sankey polygons

Add [Sankey Polygons] to Rows. This is going to look a bit messy until it is calculated correctly across all nested calculations.

This first image shows all of the nested calculations involved and we need to go through each one setting them up correctly.

Below I’ve shown the settings for each of the 12 nested calculations to make this work. If it doesn’t look right at the end, make sure that not only the correct fields are ticked in each box but that they’re ordered correctly as well.

Now hopefully you have something that looks like this! (If not go back and check each table calculation step carefully)

### Step 11: Dashboarding

Now we’re ready to put this onto a dashboard with stacked bars either side for dimensions 1 and 2 to complete our Sankey.

This graph is filterable if you want it to be, just make sure they are on context so that they happen before the Fixed LOD calculation in Tableau’s order of operations, otherwise the filters might knock out the Path Frame calculation that all of this hinges upon.

I hope you’ve made it to the end and have been able to replicate this Sankey. Thanks for reading 🙂

Awsome! have been waiting for someone to pull this through 🙂

Brilliant and works fantastic.

Cheers

Daniel

This Is amazing! So worth the 20 calculations in my opinion, thank you!!

Where can I find the data set used for this sankey? Thank you

This is sample-superstore, it comes packaged with Tableau and is in the bottom left hand corner of the window when you first open up the program.

Fantastic – thanks very much for this!

Thank you for figuring this out and sharing; I’m looking forward to trying this out!

Hi Ian, I have a problem with my sankey, I would like that by selecting one of the two columns, the other one is highlighted.

I have applied the highlights but, only one of the dimensions is highlighted. I can get the result I want only by selecting the sankey curves, it is possible to have the same result by selecting the dimensions?

Thanks

Anna

Wow!

Well done. I can’t wait to try it

Thanks for sharing

This is great! Thanks so much, it works so well.

One thing that I spotted – if you have a row of data that only occurs once in that combination in the dataset and the chosen measure is the minimum value across the dataset, it upsets the arm size formula. For example, if sales from the South in Office Supplies were $1k, and this is the minimum value in the dataset, and this was the only row in the dataset where the South and Office Supplies occurred, this would cause a problem. A quick workaround is to open up the dataset in Excel and just add in additional rows with the chosen measure set to zero for any records where this is the case.

Thanks again 🙂

This is very helpful! Joe–I am running into this exact problem…but the data I am using is connected to a larger database used by many people (i.e. I cannot add rows of data). Are there any other workarounds or solutions? Thanks!

If I’m understanding your question, my work around was to use a different measure to create the path frame. In my case, I had one record for each client so the record number was 1 for everyone. But I had a measure of days of service so I used that to great the path frame. It worked just great.

G’day Ian,

I have wanted to implement this chart for about three years now, but the data prep made this impossible.

Until now….

Thanks so much for putting this together.

Cheers,

Ben

Really great work Ian, thank you for providing such detailed instructions.

Any thoughts as to how you would be able to space out the dimensions like Olivier did in his visual from 2015? I have been trying to modify the calculations to replicate the positioning and I can’t quite figure it out.

David,

I got the simple decision tree to work but can’t figure out the polygon loveliness. I’ve posted on tableau community here: https://community.tableau.com/message/775142#775142

my workbook with a request for help. Perhaps you can figure it out!

Nicole

This is amazing and I was able to create this with two different days sets. Thanks.

I’m trying to figure out how to change the calculations so as to have it on a decision tree format, similar to how

Oliver Catherine did it. I’m guessing it is in the nested table calculation, but can’t get it right. Any suggestions?