<< Back

How to build a Sankey diagram in Tableau without any data prep beforehand

First of all I want to say this post is hugely indebted to Olivier Catherin and his post four years ago on building Sankeys in Tableau and Jeff Schaffer whose work Olivier built off and countless others I probably don’t even realise.

This post sets out how to build a Sankey Diagram without any data prep before Tableau. The viz below is built off the vanilla Superstore data packaged with Tableau, but I must warn you there’s a labyrinth of table calculations to get to this point.

There are 20 calculations to get through to get to this stage so what makes it worth it? All previous solutions I’ve seen for building Sankeys have needed to multiply the size of the data by at least a factor of two beforehand. Some data sets are so large that we don’t want to do that and sometimes we come across data sources such as published data sources on Tableau Server where we’re not the owners and this isn’t an option. So if we can work off the vanilla data set it’d be worth the effort.

So without further ado let’s get going.

Step 1: Dimension and measure parameterisation

Set up 3 calculations, 2 for your dimensions and 1 for your measure. It’s useful down the line to reference these in case you want to change which dimensions you want in the view part way through building and you can edit the calcs rather than fiddling with “replace references”. In my example I used these calculations where [Select Dimension 1] and [Select Dimension 2] are parameters I made to fit the case statement:

Dimension 1

CASE [Select Dimension 1]
WHEN 1 THEN [Region]
WHEN 2 THEN [Category]
WHEN 3 THEN [Sub-Category]
WHEN 4 THEN [Segment]
WHEN 5 THEN [Ship Mode]
END

Dimension 2

CASE [Select Dimension 1]
WHEN 1 THEN [Region]
WHEN 2 THEN [Category]
WHEN 3 THEN [Sub-Category]
WHEN 4 THEN [Segment]
WHEN 5 THEN [Ship Mode]
END

Chosen Measure

[Sales]

Step 2: Create a frame for data densification

Comparing our measure to the fixed min of the measure we can ensure two data points to hang our data densification from.

Path Frame

IF [Chosen Measure] = {FIXED : MIN([Chosen Measure])} THEN 0 ELSE 97 END

Path Frame (bin)

For this create bins of size 1 from Path Frame

 

Step 3: Index

Path Index

Index()

This is computed along Path Frame (bin) and allows us to do calculations across the Path Frame range.

Step 4: Sigmoid set up

First we have a variable, T, and then the sigmoid curve is calculated using it.

T

IF [Path Index] < 50
THEN (([Path Index]-1)%49)/4-6
ELSE 12 - (([Path Index]-1)%49)/4-6
END

Sigmoid

1/(1+EXP(1)^-[T])

Step 5: Sankey arm sizing

This gives us the size of each Sankey arm as a percentage of the full data set.

Sankey Arm Size

SUM([Chosen Measure])/TOTAL(SUM([Chosen Measure]))

Step 6: Top line calculations

This method requires separate lines for the top and bottoms of each Sankey arm and throughout these calculations “Position 1” will refer to Dimension 1 on the left hand side and “Position 2” to Dimension 2 on the right. Here are the calculations for the tops:

Max Position 1

RUNNING_SUM([Sankey Arm Size])

Max Position 1 Wrap

WINDOW_SUM([Max Position 1])

Max Position 2

RUNNING_SUM([Sankey Arm Size])

Max Position 2 Wrap

WINDOW_SUM([Max Position 2])

Step  7: Bottom line calculations

These calculations help generate the bottom lines:

Max for Min Position 1

RUNNING_SUM([Sankey Arm Size])

Min Position 1

RUNNING_SUM([Max for Min Position 1])-[Sankey Arm Size]

Min Position 1 Wrap

WINDOW_SUM([Min Position 1])

Max for Min Position 2

RUNNING_SUM([Sankey Arm Size])

Min Position 2

RUNNING_SUM([Max for Min Position 2])-[Sankey Arm Size]

Min Position 2 Wrap

WINDOW_SUM([Min Position 2])

Step 8: Sankey polygon calculation

This calculation brings together all of the above once set up correctly. The table calculation set up itself will come in later steps.

Sankey Polygons

IF [Path Index] > 49
THEN [Max Position 1 Wrap]+([Max Position 2 Wrap]-[Max Position 1 Wrap])*[Sigmoid]
ELSE [Min Position 1 Wrap]+([Min Position 2 Wrap]-[Min Position 1 Wrap])*[Sigmoid]
END

Step 9: Prepare the sheet

Put Path Frame (bin) onto Rows, then right click it and select “show missing values” as below

 

This should give you something like below.  This is the step that enables our data densification within Tableau.

Then move Path Frame (bin) to detail and add Dimension 1 and Dimension 2 to detail too. Next, put [T] on Columns and calculate it along Path Frame (bin).  Finally change mark type to Polygon, and add [Path Index] to path and calculate it along Path Frame (bin). This should leave you with something like this:

Step 10: Add Sankey polygons

Add [Sankey Polygons] to Rows. This is going to look a bit messy until it is calculated correctly across all nested calculations.

This first image shows all of the nested calculations involved and we need to go through each one setting them up correctly.

Below I’ve shown the settings for each of the 12 nested calculations to make this work. If it doesn’t look right at the end, make sure that not only the correct fields are ticked in each box but that they’re ordered correctly as well.

 

Now hopefully you have something that looks like this! (If not go back and check each table calculation step carefully)

Step  11: Dashboarding

Now we’re ready to put this onto a dashboard with stacked bars either side for dimensions 1 and 2 to complete our Sankey.

This graph is filterable if you want it to be, just make sure they are on context so that they happen before the Fixed LOD calculation in Tableau’s order of operations, otherwise the filters might knock out the Path Frame calculation that all of this hinges upon.

 

I hope you’ve made it to the end and have been able to replicate this Sankey. Thanks for reading 🙂

Ian Baldwin

London, UK

81 thoughts on “How to build a Sankey diagram in Tableau without any data prep beforehand

  1. Awsome! have been waiting for someone to pull this through 🙂
    Brilliant and works fantastic.

    Cheers
    Daniel

  2. i did the same what you explain but i didn’t get viz as you shown i am getting t and sankey polygon axes only in my viz,what is the reason ?

  3. Hi Ian, I have a problem with my sankey, I would like that by selecting one of the two columns, the other one is highlighted.
    I have applied the highlights but, only one of the dimensions is highlighted. I can get the result I want only by selecting the sankey curves, it is possible to have the same result by selecting the dimensions?
    Thanks
    Anna

  4. Thanks, Ian.

    I’m unable to produce the chart following your guide. While checking step-by-step, I found that at Step 9 after I move “Path Frame (bin)” to Details. there’s only two values shown, i.e 0 and 97, rather than the 98 values. I don’t know why this is the case, and hope you can kindly advise.

  5. Thanks very much for this. I’ve been able to produce some great Sankey diagrams (and a useful tip – add a dashboard action so, if you hover over a left- or right-column dimension, it highlights the flows from or to that entry).

    I do have one problem however. I’ve added a choice of Chosen Measures, using a parameter as for the dimensions. When these measures are numerical fields such as cost, all works fine. What I want to do is a count-based, rather than a cost-based Sankey bar. I’ve tried using [Number of Records] but this gives me a blank chart, and I’ve tried adding a dummy field, either constant (1) or using a formula to produce an almost constant psuedo-random number (1.0001, etc).

    The only way I can make it work is to do a kludge like this:
    IF RIGHT(STR([Issue ID]),1) = ‘1’ THEN 0 ELSE 97 END
    instead of:
    IF [Chosen Measure] = {FIXED : MIN([Chosen Measure])} THEN 0 ELSE 97 END
    in Path Frame. This looks OK with larger data sizes but as the data size decreases, I see distortions in the Sankey arms and mismatches between the arms and the left/right-hand columns. I think this is because Path Frame is making unexpectedly large transitions midway across the chart.

    Do you have any suggestions for this?

  6. This is great. I have a dataset of over 1.3 million rows and it is very responsive.

    I am having two issues which are probably related. In my data, I am looking at counts of records rather than a measure like sales. I cannot see how to update the Path Frame calculation to support a distinct count of records. My data has a 24 character ID, which if I reference as a decimal measure, gives me a correct calculation for the Path Frame. But it would be great to be able to show the distinct count of records in the sankey tooltip. My second issue is around filtering. I have added filters to context and I can filter and remove small groups from the sankey, but if the group is too large the sankey disappears. Any help with either of these would be appreciated.

    Thanks again for sharing.

  7. This is great! Thanks so much, it works so well.

    One thing that I spotted – if you have a row of data that only occurs once in that combination in the dataset and the chosen measure is the minimum value across the dataset, it upsets the arm size formula. For example, if sales from the South in Office Supplies were $1k, and this is the minimum value in the dataset, and this was the only row in the dataset where the South and Office Supplies occurred, this would cause a problem. A quick workaround is to open up the dataset in Excel and just add in additional rows with the chosen measure set to zero for any records where this is the case.

    Thanks again 🙂

    1. This is very helpful! Joe–I am running into this exact problem…but the data I am using is connected to a larger database used by many people (i.e. I cannot add rows of data). Are there any other workarounds or solutions? Thanks!

      1. If I’m understanding your question, my work around was to use a different measure to create the path frame. In my case, I had one record for each client so the record number was 1 for everyone. But I had a measure of days of service so I used that to great the path frame. It worked just great.

        1. Hi Nicole,

          I’m facing the same issue. I want to use number of records (always 1) to show the flow between dimensions.

          Is there a way to create a new field from within Tableau which will be able to create the path frame? Maybe something like: {fixed p_key : avg(random()) } ? (btw this didn’t work)

  8. I have built it following the steps however I cannot see any data in the sankey sheet.
    Why do you think this could be?

  9. G’day Ian,

    I have wanted to implement this chart for about three years now, but the data prep made this impossible.

    Until now….

    Thanks so much for putting this together.

    Cheers,

    Ben

  10. This worked like a charm. I was able to use with data on Tableau server without reshapping or adding a zillion rows. As Joe notes, if your chose measure has a set value across all data – in my case it was number of cases which was the same as record number – it won’t work. I picked another measure – days of service – and used that in the path frame dimension.

    Today I was able to do a second Sankey for a co-worker in short order. Really appreciate the screen shots and your detail. This will come in so useful for me.

  11. Really great work Ian, thank you for providing such detailed instructions.

    Any thoughts as to how you would be able to space out the dimensions like Olivier did in his visual from 2015? I have been trying to modify the calculations to replicate the positioning and I can’t quite figure it out.

  12. This is amazing and I was able to create this with two different days sets. Thanks.

    I’m trying to figure out how to change the calculations so as to have it on a decision tree format, similar to how
    Oliver Catherine did it. I’m guessing it is in the nested table calculation, but can’t get it right. Any suggestions?

  13. Thanks for the post. Unfortunately, I’m unable to replicate the chart. Should it work with Tableau 10.2? I tried following the steps twice and double checked them but the result is an empty book. When I randomly change the “At the level” fields in Step 10 I get some shapes but, unfortunately, they are far from the ideal output. Do you know how to troubleshoot?

  14. Thanks for this amazing sankey tutorial !

    I’m trying to use an aggregated field for the chosen Measure (like number of record) but I have some difficulties with the calculation of the Path frame.

    Any suggestions ?

    Thanks in advance

  15. Ian – this has been an amazing resource for one of our charts, thank you!

    One thing we have noticed is that when filtered so that some of the Sankey arms are potentially zero, the calcs are not quite right and the bands overlap. My way around that was a tweak to the Sankey Arm calculation to add the ZN function to ZN(SUM([Chosen Measure]))/ZN(TOTAL(SUM([Chosen Measure])))

    I need to convince myself of how/why this tweak works on filtered versions with zero bands, but for now it seems to do the trick, useful to know if anyone else has similar problems.

  16. Hi Ian,

    Thank you for providing the steps to create this Sankey! It’s really amazing.

    I’m trying to replicate the above steps using a sample data that i have.
    I manage to display the Sankey diagram plus the 2 stacked bars at the side. However, the sankey diagram doesn’t align according to the specific category based on the stacked bar. Do you have any ideas on what might be the cause?

    Thanks!!

  17. Fantastic piece Ian! Really appreciate your effort 🙂
    I have a ‘Week’ field that i want to put in the filter. But cant seem to figure out how to do it? Can you please guide me. Thanks

  18. Hi , My measure is “Number of Records” but it is not working.
    How do I fix this issue? I do not want to change the measure to anything else.

  19. Hello Ian,

    What an incredible and beautiful chart you created! I have just applied it to my workbook and looks great!

    I just have a simple question: How can I add in the tooltip the number of records for each “branch”? When I drag it and include it in the tooltip text, it shows a blank always. Any ideas?

    And thank you again for everything!

  20. Hi Ian,

    Thank you for this great chart! I first applied the above on the sample set, and then altered it to apply it on a different dataset.

    I had to make a small change in the calculations for ‘Max for min positon 1’ (&2), as for the smallest value in the dataset, the polygon would be filled from the bottom up for both positions 1 and 2, instead of having a small arm. This is only an issue if position 1 is not sorted from smallest to largest.

    In case anyone else encounters this issue: I altered the calculation to the following (likewise for ‘Max for min position 2’):

    if zn(RUNNING_SUM([Max for min position 1])-[Sankey Arm Size]) = 0
    and zn([Sankey Arm Size]) = 0
    then RUNNING_SUM([Max for min position 1])
    else RUNNING_SUM([Max for min position 1])-[Sankey Arm Size] end

    Kind regards

    1. Hey Ian,
      This was fantastic, really enjoyed it. However I had same issue as Siebe when applying it so would recommend adding this step in!
      Many thanks,
      Marc

    2. Hi Siebe

      I have exactly this problem with my Sankey – as I want to sort dimension 1 by a different order. However, I don’t quite understand your suggestion, as it seems to contain a self-reference (i.e. the “Max for min position 1” field calculation cannot refer to itself). Which field should we change?

    3. This helped a lot. FYI — you need to change the min ‘position 1’ and min ‘position 2’ with the above calculation, not the ‘max for min position 1’ and ‘max for min position 2’.

  21. Even though this is based on the superstore data, a file with the calculations would have been helpful. I cannot get this to work, and I realize it must be something very obvious, but cannot figure out my problem. Thanks!

  22. What are the changes and time needed for more than two dimensions (say four dimensions) with your solution? Is this possible?

  23. This is an AMAZING resource, thanks and my client loves it! The only issue is they want to filter the dimensions on the right with Top 10 results and when I try implementing this, the left and right charts work but the sankey chart itself breaks. Any ideas on how to get this to work?

  24. The information of the ‘Dimension 2’ is wrong.

    was:
    Dimension 2
    CASE [Select Dimension 1]

    should be:
    Dimension 2
    CASE [Select Dimension 2]

  25. Hi all,

    At TCEurope2018 in London a Free Tableau Extension that was presented that was able to create a Sankey Diagram in just a few clicks, no calculations filters parameters or whatsoever. I found it here:

    http://bit.ly/showme-more

    Anyone experience with this? It looks really simple drag and drop.

  26. One thing to note – you’ve got a typo where you show the case when statements within Dimension 1 and Dimension 2. It reads “CASE [Dimension 1]” for both Dimension 1 and Dimension 2. I believe Dimension 2 should have “CASE [Dimension 2]”.

  27. Hi Ian,

    I’ve tried to do the Sankey Diagram using your method, but still, it didn’t work. I’ve re-check the table calculation and it is still the same. Nothing comes out right after step 10. This is my 3rd tries, I still get the same result. I’m using the default superstore data too.

    1. I was able to make it with steps mentioned. Check if your dimension is on the color card as shown in screenshot above. I figured this when it didn’t come up at first.

    2. I had to download and dig around in the completed workbook to figure out what I was missing.

      If you’re stuck in the same place I was..a table with a solid bar..make sure all the values in the Table Calculation are in the correct order.

      Some times Path Frame (bin) needs to be the first value.. or the Dimension 1.. so on and so forth.

  28. I had a problem. When editing the table calcs I only had 8 dimensions calcs to address vs 12 calcs in the example.
    Any thoughts..I checked my formulas?

  29. I spent a lot of time on this also, I finally realized that the Sorting Order for each of the table calculations was causing my issues. For example Max Position 2 needs to have Dimension 2 first then Dimension 1 and for Max Position 1 Dimension 1 needed to be first, its just like the screenshots above, but as I was working through it I didn’t see the differences until after a multitude of attempts.

  30. Hi Ian,

    A wonderful tutorial from a great teacher! I followed this tutorial and it was pretty straight forward, but for some reasons my Sankey worked until the point I select Polygon & add Path Index to the path mark. Everything disappears. I have cross-checked every calculation and can’t seem to find a way around. Can I please get help on this? I will appreciate. Thank you.

  31. I had to double check the check boxes are the right order (drag and drop) – it apparently is important if “Dimension 1” or “Dimension 2” are listed first. When I checked against the screen shots, I had a few updates and then it worked!

    If it doesn’t look right at the end, make sure that not only the correct fields are ticked in each box but that they’re ordered correctly as well.

  32. Hmm I think I’m missing something with the 2 Dimension calculations.
    I get an error on the [Select Dimension X] line with a reference to undefined field is the error. I get it with the SuperStore data and my own dataset. It’s just a created calculated field right? What am I missing?

  33. This was critical for me; I could not get the visual to work without making this adjustment (even with changing sort orders a few different ways).

    Thanks!!

  34. Hello Shahirah,

    I was having difficulty figuring it out as well. It seemed that I followed all steps, yet I just had a blank sheet with an axis. I downloaded Ian’s workbook at the top and was able to figure it out. I never knew this before, but it matters what order your fields are on your Marks card. For me, it was just a matter of putting the fields in this order: Dimension 1 (color), Dimension 2 (detail), Path Frame (bin) (detail), Path Frame (path).

    Then, I had to step through all the nested calculations in Step 10 and voila! I hope this helps!

  35. This is a great Viz! Thanks for sharing. My one question for anyone is how to add in an aggregated measure such as Count of Customers and still be able to create the Bins for the paths?

  36. Hi Ian,

    It will be helpful if you can add up the link to the dashboard that you have created above. I am stuck at a point where it is not giving me curves. It is giving me the rectangle for the any selected dimension for all the list of items inside the dimension.

    Trying to figure out what have I done wrong again. Its bit difficult task to do it without understanding.

    -Ashish C.

    1. @AJ Romero
      Update Min Position 1
      from:
      RUNNING_SUM([Max for Min Position 1])-[Sankey Arm Size]

      to:
      if zn(RUNNING_SUM([Max for Min Position 1])-[Sankey Arm Size]) = 0
      and zn([Sankey Arm Size]) = 0
      then RUNNING_SUM([Max for Min Position 1])
      else RUNNING_SUM([Max for Min Position 1])-[Sankey Arm Size] END

      AND

      Update Min Position 2

      from:
      RUNNING_SUM([Max for Min Position 2])-[Sankey Arm Size]

      to:
      if zn(RUNNING_SUM([Max for Min Position 2])-[Sankey Arm Size]) = 0
      and zn([Sankey Arm Size]) = 0
      then RUNNING_SUM([Max for Min Position 2])
      else RUNNING_SUM([Max for Min Position 2])-[Sankey Arm Size] END

  37. This is awesome.
    Question regarding the dashboarding step:

    Would anyone know whether it would be possible to configure an “action” that would highlight the relevant sankey polygons when selecting corresponding sections in the bar charts? I’ve been able to do so using other methods (i.e., by multiplying my dataset by two), but I haven’t been able to figure it out in this case.

  38. Thanks for these detailed and well written instructions!

    I too would like to get additional information on setting up the left- and right-side stacked bars with the dimension labels and ensuring that they properly ‘line up’ with the Sankey diagram.

  39. I’m having the hardest time figuring out how to make it size based upon [Number of Records]. Does anyone know the work around for this?

  40. Amazing work! It was very useful for me,thank you 🙂
    I am trying to figure out now how to prepare correctly the stacked bar charts showing the dimensions next to the Sankey 🙂

  41. Hello! Awesome work and very useful since I don’t want to deal with data-prep before building my viz!!

    Anyway the above is not working for me (I also tried several times starting from scratch, but at the end I always have the same problem): when I arrive at “Step 10: Add Sankey polygons” I drag the [Sankey Polygon] to Rows as suggested, but nothing happens. I only get an empty graph and I get the tiny pill on the bottom right saying “96 nulls”. I guess this points to the fact that some of the previous calculations are broken, but I checked and I can’t really figure out what is going wrong.

    I already fixed the [Dimension 2] definition making it point to [Select Dimension 2], but still not working 🙁

    Did anyone have the same problem?

    1. Am having the exact same issue, despite doing it twice from scratch.
      Wondering if the new version of Tableau require a tweak to the approach

      (big thanks to Ian for the otherwise super clear instructions!)

    2. Excellent post, thank you Ian!

      I have exact same problem as Francesco, even after redoing it from scratch. Wondering if new version of Tableau requires change to this method ?

    3. * ignore previous messages *
      I had the same issue – it had to do with the order of the pills in the ‘Marks’ tab. As per screenshot, it has to be Dimension 1, Dimension 2, then Path Frame (bin)

  42. Why do the people praise this broken tutorial? It’s broken.

    I created 1st calculated field as “Dimension 1″ and then I code in:

    Dimension 1

    CASE [Select Dimension 1]
    WHEN 1 THEN [Region]
    WHEN 2 THEN [Category]
    WHEN 3 THEN [Sub-Category]
    WHEN 4 THEN [Segment]
    WHEN 5 THEN [Ship Mode]
    END

    The tableau reported this as error. In fact, I can’t recreate calculated field based on this flaw coding. The Tabluea report ” The Calculation contains errors”!!!

    How the people manage to make their sankey diagram?!?!!?

  43. Can anyone explain Step 9 properly for me? It doesn’t make sense nor it works for me.

    “. Next, put [T] on Columns and calculate it along Path Frame (bin)”

    how do you “calculate it along path frame(bin)”?!?! Do Ian actually mean that as “computing using”??? Thanks for the clarification or rephrase.

  44. I think the tutorial works great. I can’t tell from your example, but you did first create a parameter for [select dimension 1]. right?

Leave a Reply

Your email address will not be published. Required fields are marked *