Customer segmentation using RFM analysis

Customer segmentation using RFM analysis

8 min read

Customer segmentation is the first step of any successful marketing strategy. It ensures that the right product or solution is delivered to the right customer at the right time and through the right channel.  

Although the “how” of segmentation differs depending on the type of business (B2B vs. B2C), the product sold, and many other factors, there are a few typical methods used for segmentation. 

Also, segmentation in the consumer marketing space is seldom a “set and forget” process. Effective customer segmentation involves tracking dynamic changes and frequently updating customer data. 

Typical types of segmentation include: 

  • Demographic: Based on customer demographic information such as gender, age group, income, and occupation. 
  • Geographic: Based on customer location-related details such as country, region, city, weather, and more. 
  • Psychographic: Based on customer psychological details such as goals, interests, and personality traits, to name a few key ones. 
  • Behavioral: Based on customer behavioral information such as website engagement, new/returning customers, brand engagement, and purchase intention. 

One of the techniques for data-driven segmentation based on behavioral aspects is the Recency, Frequency, Monetary (RFM) one. RFM focuses on the spending patterns of a customer and the remainder of this article will provide more details about this approach and how to practically implement it. 

What is RFM analysis? 

RFM is an analysis tool used to identify a business’s best customers based on their spending habits. This analysis helps create segments of “important customers.” Those customers are the ones most likely to have the highest return on marketing investments.  

Specifically, the analysis can help improve targeting, reduce costs, and increase return on advertising investments. 

The three elements of RFM analysis are defined as follows: 

  • Recency: How many days have passed since the last purchase? 
  • Frequency: How many transactions did a customer do? 
  • Monetary value: How much money did the customer spend? 

RFM segmentation use cases 

RFM segmentation can help understand various aspects of a customer journey. Here are a few examples of questions that RFM analysis can help answer: 

  • Who are the best and most loyal customers? Who are my top n customers? 
  • Which customers are the most likely to churn? 
  • How are customers distributed across the country, companies, and brands? 
  • How to increase the likelihood of recurring purchases (for new customers)? 
  • Who has the potential to become a highly valuable customer? 
  • What is the best marketing campaign to target a specific customer group? 
  • Which customers are the most likely to remain loyal? 

RFM implementation details 

Technology stack 

  • Data storage: Azure Storage Account  
  • Transformation logic implementation: Databricks 
  • RFM outcome visualization: Power BI  

RFM analysis block diagram

Solution overview 

1. Set-up 

  • Import required python libraries: The required libraries for segmentation process are numpy, pyspark.functions
  • Input data source: Azure Data Lake – Raw CSV File 
  • Output data source: Azure Data Lake – Delta Table
  • Input datasets: Retail transactional data. 

from pyspark.sql.functions import * 
from pyspark.sql import functions as F 
import numpy as np 
from pyspark.sql.types import * 
from pyspark import SparkContext 
from pyspark.sql import SparkSession 

2. Import the raw data 

In this example, we use retail data, from different distribution channels, each  containing customers’ transaction details.  

Also, we only consider the regular sales records, and exclude the exchange/return records. 

RFM analysis example data

3. Pre-process the raw data 

First, the solution needs to pre-process the raw data to create a segmentation dataset that can be used for analysis.   

This pre-processing includes: 

    • Detecting and removing outliers in numerical variables 
    • Dropping or filling in (i.e., interpolating) missing values 
    • Removing duplicates 
    • Applying Data Quality (DQ) rules on the key columns to ensure the values come in the expected format. In this example, we apply DQ rules on the mobile, email, and country fields. 

4. Select segmentation granularity 

At this stage, the user selects the attributes they want to base upon the segmentation. Once selected, we can then calculate the RFM values at the attribute level.   

For instance, in this example, we will segment customers using calculations at the brand and country levels.   

5. Calculate RFM values 

We will first create an intermediate table to calculate RFM values at the selected segmentation granularity level. The table will contain the aggregated data and the RFM calculations. If needed, we can also apply filters at this stage. 

To perform RFM segmentation, we need to first calculate recency, frequency, and monetary value metrics for each customer: 

    • Recency – Number of days since the customer last made a purchase. 
    • Frequency – Number of unique invoices. 
    • Monetary value – Total sales amount for this period. 

It’s important to note that, when calculating the recency metric, we need to consider the date at which the dataset terminates. We will use that date as the “current” date for calculating the time since each customer’s last purchase.  

For frequency, we are not counting the number of purchases made by a customer in a single day but rather the number of times they came to purchase one or more items. So, if a customer makes multiple purchases on a single day, this will be counted as a multiple-purchase event. 

6. Calculate quantile cutoffs 

Quantile calculation divides the data into five equal buckets and provides cutoff numbers to get the recency, frequency, and monetary value score.

r_quantile = rfm_retail.approxQuantile(‘recency’, np.linspace(0.1, 0.9, num=5).tolist(), 0) 
f_quantile = rfm_retail.approxQuantile(‘frequency’,np.linspace(0.1, 0.9, num=5).tolist(), 0) 
m_quantile = rfm_retail.approxQuantile(‘monetary_value‘, np.linspace(0.1, 0.9, num=5).tolist(), 0)

7. Calculate the RFM score and corresponding segment 

RFM score is calculated by comparing the RFM values with quantile cutoffs.  

scores = ( 
    rfm_retail 
      .withColumn(‘recency_score‘, 
        when(rfm_retail.recency >= r_quantile[4] , 1). 
        when(rfm_retail.recency >= r_quantile[3] , 2). 
        when(rfm_retail.recency >= r_quantile[2] , 3). 
        when(rfm_retail.recency >= r_quantile[1] , 4). 
        when(rfm_retail.recency >= r_quantile[0] , 5). 
        otherwise(5) 
        ) 
      .withColumn(‘frequency_score‘,  
        when(rfm_retail.frequency > f_quantile[4] , 5). 
        when(rfm_retail.frequency > f_quantile[3] , 4). 
        when(rfm_retail.frequency > f_quantile[2] , 3). 
        when(rfm_retail.frequency > f_quantile[1] , 2). 
        when(rfm_retail.frequency > f_quantile[0] , 1). 
        otherwise(1) 
        ) 
      .withColumn(‘monetary_score‘,  
          when(rfm_retail.monetary_value > m_quantile[4] , 5). 
          when(rfm_retail.monetary_value > m_quantile[3] , 4). 
          when(rfm_retail.monetary_value > m_quantile[2] , 3). 
         when(rfm_retail.monetary_value > m_quantile[1] , 2). 
         when(rfm_retail.monetary_value > m_quantile[0] , 1). 
         otherwise(1) 
         ) 

Now, the algorithm assigns customers scores ranging from 1 to 5 (5 being the highest). The relation between frequency and monetary value with the score is direct, i.e., the higher the value, the higher the score.  

With recency, the range is inverted, i.e., the lower the recency, the higher the score. A low recency means the customer has recently purchased products.  

The individual R, F, and M scores are then combined to derive an overall RFM score.  

The next step is to map the RFM score with the corresponding segment using, for example, the RFM matrix below. 

Segments Scores
Champions 555, 554, 544, 545, 454, 455, 445
Loyal customers 543, 444, 435, 355, 354, 345, 344, 335
Potential loyalist 553, 551, 552, 541, 542, 533, 532, 531, 452, 451, 442, 441, 431, 453, 433, 432, 423, 353, 352, 351, 342, 341, 333, 323
Recent customers 512, 511, 422, 421, 412, 411, 311
Promising 525, 524, 523, 522, 521, 515, 514, 513, 425, 424, 413, 414, 415, 315, 314, 313
Customers needing attention 535, 534, 443, 434, 343, 334, 325, 324
About to sleep 331, 321, 312, 221, 213
At risk 255, 254, 245, 244, 253, 252, 243, 242, 235, 234, 225, 224, 153, 152, 145, 143, 142, 135, 134, 133, 125, 124
Can’t lose them 155, 154, 144, 214, 215, 115, 114, 113
Hibernating 332, 322, 231, 241, 251, 233, 232, 223, 222, 132, 123, 122, 212, 211
Lost 111, 112, 121, 131, 141, 151

Step by step RFM segmentation algorithm 

  1. Calculate Recency (R), Frequency (F), and Monetary (M) values at the customer, source, brand, and country levels. 
  2. Calculate approximate quantile cutoff values for R, F, and M using the approxQuantile() function. Quantiles divide the data into five equal parts.  
  3. Calculate the RFM score based on quantile cutoff values. For example, customers who are frequent buyers, have recently bought from you, and usually are spending a lot of money would get a score of 555: Recency (R) = 5, Frequency (F) = 5, Monetary (M) = 5. 
  4. Get RFM segment based on the RFM score (using RFM Matrix). 

RFM segmentation architecture

RFM segmentation architecture diagram

Understanding different segments  

 

Here is the detailed description for each customer segment.

Customer segment

Description

Champions Recent purchase, frequent transactions, high spending
Loyal Customers Often spend good money buying your products. Responsive to promotions
Potential Loyalist Recent customers but spent a good amount and bought more than once
Recent Customers Bought most recently, but not often
Promising Recent shoppers but haven’t spent much
Customers Needing Attention Above-average recency, frequency and monetary values. They may not have bought very recently though
About to Sleep Below average recency, frequency, and monetary values. Will lose them if not reactivated
At Risk They spent big money and purchased often. But the last purchase was a long time ago
Can’t Lose Them Often made the biggest purchases but they haven’t returned for a long time
Hibernating The last purchase was long ago. Low spenders with a low number of orders
Lost Lowest recency, frequency, and monetary scores

Data extracts of final segmentation output 

Customer level segments: 

Customer level segments data extraction

Quantile cutoffs: 

Quantile cutoffs data extraction

We calculate the RFM segment based on the RFM score and, similarly, the RFM score using quantile cutoffs. 

Conclusion 

RFM segmentation helps create customer groups that indicate the best customers, customer retention, customer engagement, and customer attrition using recency, frequency, and monetary value.    

One of the advantages of RFM analysis is that it’s both understandable for the end user (no hard-to-trust “black box” calculation) and easy to implement based on data readily available.   

Of course, it’s not the only way to segment customers. But it’s still both an effective and efficient way to leverage data to create segments that will help marketers build effective and targeted campaigns. 

External resources/references: