## Creating a customer churn risk status in Alteryx

#### What type of customers is this for?

This particular method is useful for customers in a ‘one-time sale’ industry (think online shopping), as opposed to a subscription-based industry (think Netflix account).

With subscription-based business there is a very clear end-point to a customer’s engagement with you – them canceling their subscription. However, with a ‘one-time sale’ business, the hope is that a customer continues to spend with you but there may be long periods where they don’t. Either they don’t need your product for a period of time, or they have decided to shop somewhere else. It’s impossible to know which just from their sales history, but using this method should hopefully give you some insight into those customers using some very simple statistics.

#### How well does this method work?

I developed something similar to this for a client and we observed roughly 50-70% accuracy for both the progression from ‘at risk of churn’ classification to ‘churned’, as well as for ‘churned’ customers remaining churned.

This meant that a customer that was currently classified as churned had between a 20-50% chance of never ordering again unless some action was taken from the business.

#### What is the method being used?

Essentially, this method relies on the assumption that the distribution of ordering by individual customers follows the normal distribution – that most orders are made a consistent number of days apart and that a substantial deviation from that norm indicates behaviour worth flagging. I’ll add a caveat here: this does mean that it requires a fairly large transaction history per customer, and is less reliable for customers with a small number of orders.

In essence, the concept is to:

- count the days between orders for each customer
- calculate the mean and standard deviation of days between orders for each customer
- count the days since the most recent order per customer
- plot the days since the most recent order as a Z-score against their past behaviour

The image below shows a graph of the normal distribution, with the corresponding percentage likelihoods that a data point sits in that region. I’ve added the Z-score beneath.

Essentially the Z-score the value minus the standard deviation, divided by the mean to give values that can be compared between data sets.

For this model, we are assuming that the number of days between orders follows a distribution similar to this. Most orders are made a regular number of days apart, with some orders occurring in quick succession, and some spread far apart.

We want to compare the number of days since the most recent order to past behaviour, and find where it sits on this distribution. The further to the right, the more likely it is that our customer is on the road to churning.

In the example I’ve included here, the mean days between orders is 10, with a standard deviation of 2.5 days, and the most recent order was placed 18 days ago.

In this situation, the customer sits far to the right of the distribution, 18 days is more than 3 times the standard deviation above the mean – therefore this customer would be classified as ‘churned’.

#### How can Alteryx help to create this model?

It’s super easy to create this in Alteryx and I will walk through it using the Sample – Superstore data set that comes with Tableau, but any data set with customers and order dates will work fine for this.

#### Step 1 – count the days between orders

After inputting the data, the first step is to count the days between orders.

First, sort the data by order date; then use the Multi-Row Formula tool to count the days between orders.

Make sure to set the output to a numeric data type, to group by customer name or ID, and use the following formula.

#### Step 2 – calculate mean days between orders and standard deviation

The next step is to use the summarise tool to calculate the mean days between orders and the standard deviation. We can use this to get the most recent order date for calculating how long ago the most recent order was placed.

Make sure to group this data by customer name or ID.

#### Step 3 – calculate days since last order, z-score and churn risk

Now we can use the aggregated data per customer to estimate churn risk.

First, use the formula tool to calculate the days since the most recent order using the formula below. Make sure the new field is numeric.

In the same formula tool, we can calculate a Z-score per customer with the formula below. Make sure the output is numeric

A third calculation in this formula tool will give a classification of churn risk based on this Z-score.

#### Step 4 – join back to original sales data and output for analysis

Finally, we can join the classification back to the full data set we calculated this from and output for use in a report or BI tool.

## What next?

This data can be really useful in the short term for establishing which customers to prioritise with actions and interventions, for example, the customers that have delivered the most overall and might be slowing down their business to catch them before they shop elsewhere.

In the following blog I go over how to use a batch macro to retrospectively create historical snapshots of customer churn risk status to understand the flow between statuses per customer and come up with the measure of reliability of each of the churn statuses, as I mentioned at the beginning of the blog.

Originally posted on https://warumdatum.wordpress.com/