
Customer segmentation using RFM analysis
8 min read
Customer segmentation is the first step of any successful marketing strategy. It ensures that the right product or solution is delivered to the right customer at the right time and through the right channel.
Although the “how” of segmentation differs depending on the type of business (B2B vs. B2C), the product sold, and many other factors, there are a few typical methods used for segmentation.
Also, segmentation in the consumer marketing space is seldom a “set and forget” process. Effective customer segmentation involves tracking dynamic changes and frequently updating customer data.
Typical types of segmentation include:
- Demographic: Based on customer demographic information such as gender, age group, income, and occupation.
- Geographic: Based on customer location-related details such as country, region, city, weather, and more.
- Psychographic: Based on customer psychological details such as goals, interests, and personality traits, to name a few key ones.
- Behavioral: Based on customer behavioral information such as website engagement, new/returning customers, brand engagement, and purchase intention.
One of the techniques for data-driven segmentation based on behavioral aspects is the Recency, Frequency, Monetary (RFM) one. RFM focuses on the spending patterns of a customer and the remainder of this article will provide more details about this approach and how to practically implement it.
What is RFM analysis?
RFM is an analysis tool used to identify a business’s best customers based on their spending habits. This analysis helps create segments of “important customers.” Those customers are the ones most likely to have the highest return on marketing investments.
Specifically, the analysis can help improve targeting, reduce costs, and increase return on advertising investments.
The three elements of RFM analysis are defined as follows:
- Recency: How many days have passed since the last purchase?
- Frequency: How many transactions did a customer do?
- Monetary value: How much money did the customer spend?
RFM segmentation use cases
RFM segmentation can help understand various aspects of a customer journey. Here are a few examples of questions that RFM analysis can help answer:
- Who are the best and most loyal customers? Who are my top n customers?
- Which customers are the most likely to churn?
- How are customers distributed across the country, companies, and brands?
- How to increase the likelihood of recurring purchases (for new customers)?
- Who has the potential to become a highly valuable customer?
- What is the best marketing campaign to target a specific customer group?
- Which customers are the most likely to remain loyal?
RFM implementation details
Technology stack
- Data storage: Azure Storage Account
- Transformation logic implementation: Databricks
- RFM outcome visualization: Power BI
Solution overview
1. Set-up
- Import required python libraries: The required libraries for segmentation process are numpy, pyspark.functions
- Input data source: Azure Data Lake – Raw CSV File
- Output data source: Azure Data Lake – Delta Table
- Input datasets: Retail transactional data.
from pyspark.sql.functions import *
from pyspark.sql import functions as F
import numpy as np
from pyspark.sql.types import *
from pyspark import SparkContext
from pyspark.sql import SparkSession
2. Import the raw data
In this example, we use retail data, from different distribution channels, each containing customers’ transaction details.
Also, we only consider the regular sales records, and exclude the exchange/return records.
3. Pre-process the raw data
First, the solution needs to pre-process the raw data to create a segmentation dataset that can be used for analysis.
This pre-processing includes:
-
- Detecting and removing outliers in numerical variables
- Dropping or filling in (i.e., interpolating) missing values
- Removing duplicates
- Applying Data Quality (DQ) rules on the key columns to ensure the values come in the expected format. In this example, we apply DQ rules on the mobile, email, and country fields.
4. Select segmentation granularity
At this stage, the user selects the attributes they want to base upon the segmentation. Once selected, we can then calculate the RFM values at the attribute level.
For instance, in this example, we will segment customers using calculations at the brand and country levels.
5. Calculate RFM values
We will first create an intermediate table to calculate RFM values at the selected segmentation granularity level. The table will contain the aggregated data and the RFM calculations. If needed, we can also apply filters at this stage.
To perform RFM segmentation, we need to first calculate recency, frequency, and monetary value metrics for each customer:
-
- Recency – Number of days since the customer last made a purchase.
- Frequency – Number of unique invoices.
- Monetary value – Total sales amount for this period.
It’s important to note that, when calculating the recency metric, we need to consider the date at which the dataset terminates. We will use that date as the “current” date for calculating the time since each customer’s last purchase.
For frequency, we are not counting the number of purchases made by a customer in a single day but rather the number of times they came to purchase one or more items. So, if a customer makes multiple purchases on a single day, this will be counted as a multiple-purchase event.
6. Calculate quantile cutoffs
Quantile calculation divides the data into five equal buckets and provides cutoff numbers to get the recency, frequency, and monetary value score.
r_quantile = rfm_retail.approxQuantile(‘recency’, np.linspace(0.1, 0.9, num=5).tolist(), 0)
f_quantile = rfm_retail.approxQuantile(‘frequency’,np.linspace(0.1, 0.9, num=5).tolist(), 0)
m_quantile = rfm_retail.approxQuantile(‘monetary_value‘, np.linspace(0.1, 0.9, num=5).tolist(), 0)
7. Calculate the RFM score and corresponding segment
RFM score is calculated by comparing the RFM values with quantile cutoffs.
scores = (
rfm_retail
.withColumn(‘recency_score‘,
when(rfm_retail.recency >= r_quantile[4] , 1).
when(rfm_retail.recency >= r_quantile[3] , 2).
when(rfm_retail.recency >= r_quantile[2] , 3).
when(rfm_retail.recency >= r_quantile[1] , 4).
when(rfm_retail.recency >= r_quantile[0] , 5).
otherwise(5)
)
.withColumn(‘frequency_score‘,
when(rfm_retail.frequency > f_quantile[4] , 5).
when(rfm_retail.frequency > f_quantile[3] , 4).
when(rfm_retail.frequency > f_quantile[2] , 3).
when(rfm_retail.frequency > f_quantile[1] , 2).
when(rfm_retail.frequency > f_quantile[0] , 1).
otherwise(1)
)
.withColumn(‘monetary_score‘,
when(rfm_retail.monetary_value > m_quantile[4] , 5).
when(rfm_retail.monetary_value > m_quantile[3] , 4).
when(rfm_retail.monetary_value > m_quantile[2] , 3).
when(rfm_retail.monetary_value > m_quantile[1] , 2).
when(rfm_retail.monetary_value > m_quantile[0] , 1).
otherwise(1)
)
Now, the algorithm assigns customers scores ranging from 1 to 5 (5 being the highest). The relation between frequency and monetary value with the score is direct, i.e., the higher the value, the higher the score.
With recency, the range is inverted, i.e., the lower the recency, the higher the score. A low recency means the customer has recently purchased products.
The individual R, F, and M scores are then combined to derive an overall RFM score.
The next step is to map the RFM score with the corresponding segment using, for example, the RFM matrix below.
Segments | Scores |
---|---|
Champions | 555, 554, 544, 545, 454, 455, 445 |
Loyal customers | 543, 444, 435, 355, 354, 345, 344, 335 |
Potential loyalist | 553, 551, 552, 541, 542, 533, 532, 531, 452, 451, 442, 441, 431, 453, 433, 432, 423, 353, 352, 351, 342, 341, 333, 323 |
Recent customers | 512, 511, 422, 421, 412, 411, 311 |
Promising | 525, 524, 523, 522, 521, 515, 514, 513, 425, 424, 413, 414, 415, 315, 314, 313 |
Customers needing attention | 535, 534, 443, 434, 343, 334, 325, 324 |
About to sleep | 331, 321, 312, 221, 213 |
At risk | 255, 254, 245, 244, 253, 252, 243, 242, 235, 234, 225, 224, 153, 152, 145, 143, 142, 135, 134, 133, 125, 124 |
Can’t lose them | 155, 154, 144, 214, 215, 115, 114, 113 |
Hibernating | 332, 322, 231, 241, 251, 233, 232, 223, 222, 132, 123, 122, 212, 211 |
Lost | 111, 112, 121, 131, 141, 151 |
Step by step RFM segmentation algorithm
- Calculate Recency (R), Frequency (F), and Monetary (M) values at the customer, source, brand, and country levels.
- Calculate approximate quantile cutoff values for R, F, and M using the approxQuantile() function. Quantiles divide the data into five equal parts.
- Calculate the RFM score based on quantile cutoff values. For example, customers who are frequent buyers, have recently bought from you, and usually are spending a lot of money would get a score of 555: Recency (R) = 5, Frequency (F) = 5, Monetary (M) = 5.
- Get RFM segment based on the RFM score (using RFM Matrix).
RFM segmentation architecture
Understanding different segments
Here is the detailed description for each customer segment.
Customer segment |
Description |
---|---|
Champions | Recent purchase, frequent transactions, high spending |
Loyal Customers | Often spend good money buying your products. Responsive to promotions |
Potential Loyalist | Recent customers but spent a good amount and bought more than once |
Recent Customers | Bought most recently, but not often |
Promising | Recent shoppers but haven’t spent much |
Customers Needing Attention | Above-average recency, frequency and monetary values. They may not have bought very recently though |
About to Sleep | Below average recency, frequency, and monetary values. Will lose them if not reactivated |
At Risk | They spent big money and purchased often. But the last purchase was a long time ago |
Can’t Lose Them | Often made the biggest purchases but they haven’t returned for a long time |
Hibernating | The last purchase was long ago. Low spenders with a low number of orders |
Lost | Lowest recency, frequency, and monetary scores |
Data extracts of final segmentation output
Customer level segments:
Quantile cutoffs:
We calculate the RFM segment based on the RFM score and, similarly, the RFM score using quantile cutoffs.
Conclusion
RFM segmentation helps create customer groups that indicate the best customers, customer retention, customer engagement, and customer attrition using recency, frequency, and monetary value.
One of the advantages of RFM analysis is that it’s both understandable for the end user (no hard-to-trust “black box” calculation) and easy to implement based on data readily available.
Of course, it’s not the only way to segment customers. But it’s still both an effective and efficient way to leverage data to create segments that will help marketers build effective and targeted campaigns.