Weighting Survey Data in Alteryx
When looking at survey data, there are a number of quirks that can come up. I’m going to tackle one in this post – Imagine you’ve designed with the desire to represent a population correctly but when you first begin to look at the demographics of your respondents you identify that isn’t the case.
It’s not uncommon – in fact it happens in almost all surveys – but there are a number of ways around this through what is known as ‘weighting’, which is essentially the process of assigning respondents from an under represented population with a greater value than those who are over represented.
The process of weighting survey data is quite simple.
- You understand the demographics of your survey respondents.
- You acknowledge the demographics of your desired population.
- You then divide your desired ratio by your respondent ratio.
|Age||% Respondents||% Desired||Weight|
It is then a simply process of shredding our weight across all of the members of that specific demographic.
The process should you have multiple demographics is exactly the same, but instead you aggregate your % of total at this level instead of a single level.
So that’s weighting. But how do I go about applying this in Alteryx?
Lets start with the simple thing. Use an input tool to bring your data into Alteryx.
Secondly, we need to calculate the % of total respondents by our demographics in our actual survey responses. In order to do this I will firstly summarise by my demographic fields, returning a count of respondents within each ‘demographic group’. I’m then going to use a second summarise tool to return the total number of respondents.
Once you have bought the data together, I have used an append tool to do this, we can use a formula tool to create our % of total respondents calculation.
Next we need to bring in the desired % of total from our target population. In my case I have bought this in as a second file which simply has a ‘target’ value for each of the demographic groups in my data, you will need something similar, and it’s also important to ensure all the different demographic groups are covered in your target data and the target percentages should of course sum to equal 100%.
We then need to bring this target detail against our actual respondent detail, which I have done using a join.
Before then performing our final calculation to create our ‘weight’ value for each demographic group using a formula tool.
The final step of the process is too join our new weighting value back to each respondent in our data. We can now sum this field rather than use a count of responses in order to get a weighted survey score which is representative of our population.
Beneath I have visualised how using the weighted value instead of an unweighted value (count of respondents) affects the results of my survey.
The complete workflow used in this blog post can be found here.