Histograms on aggregate measures
A fairly common type of question I hear asked from Tableau users is how can they produce a visualisation from their transactional type data showing the number of customers who have placed N number orders, or the number of customers who have ordered goods with a total value between 0 – 100, 100 – 200… etc. The question is seemingly simple but the solution involves a double aggregation, which takes some additional thought to work through.
It’s not limited to financial / transactional data, the problem exists in many different scenarios – show the number of employees who have completed X number training sessions from a dataset showing all the staff attendance to those sessions. Or show a histogram of the total number of weeks this year which achieved a certain interval of number of bookings, from a booking list dataset.
Really what’s being asked is “How do I aggregate aggregated data?” All of these queries are similar – we need to calculate the number of bookings per week, or training sessions per employee, or order value per customer first, and then count how many times each of these results occur. Once this is understood, a solution in Tableau can be constructed.
Lets start with that first calculation – how many orders per customer. I’m using the Sample – Superstore Subset data that comes installed with Tableau 8. I put Customer Name out on ‘Rows’ and Number of Records onto ‘Text’
So far so easy, right? Now comes the tricky bit. We want to now count the number of customers, per SUM(Number of Records), but SUM(Number of Records) still has to be based on Customer Name, so we much keep the dimension Customer Name in the viz, even if we don’t want to show individual names. Move Customer Name to ‘Detail’, and then make SUM(Number of Records) discrete (right click > discrete) and move it to ‘Rows’ to maintain our table shape.
Starting to take shape? If we count all those “Abc”s in each row we’d get our answer! Let’s count how many distinct customers in each row by dragging Customer Name onto ‘Text’ with the right mouse button and selecting CNTD(Customer Name). [If you don’t have this option, you may need to do an extract]. Now all the “Abc”s are 1s that we can sum up over the window using the table calculation WINDOW_SUM.
Createfield and write the formula:
WINDOW_SUM(COUNTD([Customer Name]))
Add it to ‘Text’ to replace the old CNTD(Customer Name) added earlier. As with all table calculations, we need to set the partitioning and addressing correctly – in this case we can select Compute using > Customer Name
Oof! 2,703? That’s not right! There’s a couple of steps left. Our table calculation is not being partitioned by the SUM(Number of Records) but this is required. Discrete aggregates will be ignored in table calcs by default – this can be changed by clicking on the pill SUM(Number of Records) and removing the tick from “Ignore in Table Calculations”
Nearly there! The numbers look right but they are repeated over and over – it’s showing the total for each customer. From here, if you’re happy with a bar chart you can move your calculated field (mine is called Total distinct customer) to ‘Columns’ and turn stacked marks off from the Analysis drop down menu, or if you’d prefer a table you can tell Tableau to only return the first number in each case by changing the formula to
IF FIRST()==0 THEN WINDOW_SUM(COUNTD([Customer Name])) END
From there you could create your own calculated bins using logical statements and add those to the view, instead of the SUM(Number of Records) field. Just remember to set the ‘Ignore in Table Calculations’ to off.
So, next time you’re asking yourself something like “how do I create bins at a rolled up level”, you’ll know the steps to achieve this.
If all those steps seem complicated then that’s because, well, there are a couple of quite advanced steps involved! Tableau is super easy to work with when you aggregate once, but when you want an aggregate of an aggregate it can be a little more complex. If you have access to the raw data you’re connecting to, then perhaps consider creating a new computed column that does the first step of aggregating for you, then this task is easy in Tableau. Alternatively, perform the first easy step in Tableau, and then export the results to something like Excel, and reconnect to that.
Want more information or want to chat about Tableau? Leave a comment below or get in touch at info@theinformationlab.co.uk
Nice example.
Minor point, but you’re showing the number of items ordered per customer, not the number of orders per customer, which would be COUNTD([Order ID]).
Very true. I forget that the SuperStore data has multiple items per order. Thanks, Jim.
let’s say you wanted to know value per customer rather than number of transactions (so it truly isn’t discrete). How could you also add a value bin by customer?
Rob, were you able to find a solution to this? I’m also finding it difficult to ‘count’ the number of Customers who might meet a condition based on an aggregated value, for example:
how many customers spent more than $X.xx by a time period.
Robin, if you’ve got any ideas here, let us know!
-EH
In this case, your aggregated field is SUM(Sales). To create some bins from this you’d want to use a formula to round down to the nearest $1,000 or similar:
int(sum([Sales])/1000) * 1000
See here for a working example:
Link
Dear Robin,
I bumped in to this website/article through a forwarded link from an acquaintance. Thanks for sharing such a nice feature in Tableau with step by step instructions. I learned a new trick from this blog. Appreciate your time and efforts writing this article.
regards,
Bachi
Thanks, this was very helpful! I’ve shared with all my team members because we often run into this issue.
One snag I’ve run into: I’ve created a detail table using logical bins that group people by amount donated during the FY. What I haven’t figured out is how to do a running total of amount donated across the bins. Tableau’s quick table calcs are not available and I can’t seem to write a formula that does what I need. I’ll take this to the Tableau forums if you can’t think of something off the top of your head.
Thanks again!
Jim
Jim,
Were you able to find out how to do running total through Tableau forum?
Really really really thank you for this article. It helps me a lot.
This was a huge help for me.
Thank you!
Great post, thanks a lot Mr. Kennedy, I was looking how to do this and found your post very helpful, however I am going to bother you with a minor thing:
I am having issues in creating the bins for the last step, could you share the formula?
Thanks in advance.
You’d be able to do this with a formula like
if sum([Number of Records]) >= 10 then “10 and over”
elseif sum([Number of Records]) >= 5 then “5 – 9 transactions”
else “4 or less”
end
Hope that helps, see here for a working example that you can download:
https://public.tableausoftware.com/views/Histogramsagain/Salessizebins?:embed=y&:display_count=no
Excellent, thanks a lot, I was not using [Number of Records] to do the bin, I was using Count[field].
Thanks again, and sorry for the late reply.
Thanks Robin, very helpful. Using the Stack Marks Flag and Ignore in Table Calculations was new to me. Expanding on this, I want to make bins that scale to the range of values. Ideally the bins are based on table calcs but that doesn’t seem possible (A). It would lead to a table calc trying to bracket another table calc, the count. I resorted to the giant if approach (B). Any advice for improving this?
A – Table Calc bin, doesn’t work
IF [ValueSum] < window_avg([ValueSum]) – 4*window_stdev([ValueSum]) then window_avg([ValueSum]) – 4*window_stdev([ValueSum])
ELSEIFIF [ValueSum] < window_avg([ValueSum]) – 3.5*window_stdev([ValueSum]) then window_avg([ValueSum]) – 3.5*window_stdev([ValueSum])
…
B – Massive IF. Works but it's messy and I'm concerned
IF [ValueSum] < 1*[Bin Size] then 1*[Bin Size]
elseIF [ValueSum] < 2*[Bin Size] then 2*[Bin Size]
elseIF [ValueSum] < 3*[Bin Size] then 3*[Bin Size]
…
elseIF [ValueSum] < 400*[Bin Size] then 400*[Bin Size]
else 401*[Bin Size]
END
Didn’t finish my thought in B
B – Massive IF. Works but it’s messy and I’m concerned about performance
Found a solution to the same problem without table calc
http://vizdiff.blogspot.com/2015/02/histograms-on-aggregated-measures.html
This solution was big help for me in a pinch—thanks!
This is fantastic. I wonder if this can be done any easier now that LoD calculations are out. I don’t think it works quite the same way.
regards,
Samuel E.
Everything’s going fine for me till it’s time to “right click drag” the dimension to columns and select countd. Seems like right click drag isn’t working for me, does anyone know why that would be?
It’s an editing error: just drag without any right click, then you have to right click to get to the CountD function (under Measure). I think it should read “AND with the right mouse button, selecting…”
Just brilliant, thanks so much!!! I consider myself kind of advanced user but would never have worked out the intricacies such as “Ignore in table calculations”. And thanks a lot for the step by step explanation with the screenshots which really allows following your instructions step by step.
Thanks, George. Tableau has moved on a bit since this post and added a new calculation type called a LOD calc, which would be able to achieve this much easier than all the steps here… On my list to write a new blog post!
thank you! worked perfectly!
Hi Mr. Kennedy,
This was a great help in solving a question I had with some sales data. As is usually the case, one questions leads to another. I can see the total for each purchase occasion, now I would like to be able to see what percentage each purchase occasion represents. So if single buys are 3000 of 10,000 total, how can I make that calc?
Thank you so much!
Matt
Extremely helpful !!
Thanks
Hi,
Thank u very much about the explanation, It was very helpful.
I have a question, How we can calculate the percentage form total for IF FIRST()==0 THEN WINDOW_SUM(COUNTD([Customer Name])) END ?
I was literally 4 hours trying to do this. THANKS!!!! Step by step you got me there…. I was trying to do bins with Price and everything I tried ended up summing the field.
Works!!
But crazy that Tableau wouldn’t make this process a little easier with a built in function.
Anyways, thanks!
Thank you! So many smart (and for me obscure) steps. I needed exactly what you describe and would never ever achieved it without your great step-by-step guide.