Download Improve Customer Retention by Answering Who and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Marketing communications wikipedia , lookup

Visual merchandising wikipedia , lookup

Product planning wikipedia , lookup

Guerrilla marketing wikipedia , lookup

Green marketing wikipedia , lookup

Digital marketing wikipedia , lookup

Multicultural marketing wikipedia , lookup

Bayesian inference in marketing wikipedia , lookup

Marketing wikipedia , lookup

Marketing plan wikipedia , lookup

Integrated marketing communications wikipedia , lookup

Advertising campaign wikipedia , lookup

Marketing mix modeling wikipedia , lookup

Global marketing wikipedia , lookup

Street marketing wikipedia , lookup

Direct marketing wikipedia , lookup

Marketing strategy wikipedia , lookup

Retail wikipedia , lookup

Business model wikipedia , lookup

Customer experience wikipedia , lookup

Customer relationship management wikipedia , lookup

Customer satisfaction wikipedia , lookup

Sensory branding wikipedia , lookup

Customer engagement wikipedia , lookup

Services marketing wikipedia , lookup

Service blueprint wikipedia , lookup

Transcript
NESUG 2010
Statistics and Analysis
Improving Retention by Predicting Both Who and Why
Ward Thomas, Euro RSCG Discovery, Richmond, VA
ABSTRACT
Customer attrition is a significant issue in many industries such as financial services, wireless, healthcare, and insurance. As
companies work to acquire greater market share, they must also focus on customer retention in order to sustain that growth.
To support customer retention efforts, organizations can apply data mining to behavior patterns of customers who remain
loyal and those who defect. Predictive models can be built to periodically and systematically score customers for their likelihood to leave. Armed with this insight, firms can target their retention marketing budget to those customers with the greatest
attrition risk. This is the traditional approach to answering the question, “Who?” is likely to leave.
However, when faced with the question of “Why?”, satisfaction studies and/or focus groups are commonly conducted in order
to learn why customers are dissatisfied. Then, these findings are universally applied to all customers and addressed in an
effort to mitigate attrition. This paper introduces a methodology for how an attrition model can be used to identify not only who
is likely to leave, but what the most significant risks are for each customer. Therefore, in a sense, a target audience selection
model can be used to answer two key questions for marketing decision support: “Who?” and “Why?”
INTRODUCTION
The purpose of this paper is to demonstrate how a predictive attrition model can be used to estimate the likelihood of customer defection, and isolate key attrition risk factors for each customer. The paper begins with an explanation of how to collect
data for and develop an attrition model. It then provides details on how the model can be dissected to derive unique attrition
risk factors for each customer. The paper concludes with an explanation of how the attrition model and its risk factor insights
can be translated into a retention marketing effort, and tested for effectiveness.
All procedures will be illustrated using SAS BASE and STAT, and as such, the paper assumes a general knowledge of SAS
syntax in order to understand the content. This presentation will focus on using SAS PROC LOGISTIC to develop the attrition
model and subsequently scrutinize the results at an individual level to isolate the attrition drivers.
ATTRITION MODELING MOTIVATION
Customer attrition, also called “churn”, or “defection”, is a major business issue for organizations to address. It is crucial to
maintain and grow customer relationships in order to sustain profitable growth. As companies pursue new customers through
acquisition marketing efforts, existing customer attrition undermines that growth. Unchecked, customer attrition can create a
“leaky bucket” scenario, whereby new customers poured into the business are insufficient to offset the outflow of customers
from the business.
Exacerbating the problem of customer attrition are the financial implications of acquisition cost and change in customer value
over time. “It’s much cheaper to keep a customer than acquire a new one” is a marketing axiom that has been quantified as 3
times up to 10 times more expensive to acquire a new customer, depending upon industry vertical. Additionally, customer
value often increases over time, as new customers progress from initial trial to more lucrative relationships. This progression
is catalyzed by up-sell and cross-sell marketing campaigns. Up-sell campaigns attempt to migrate customers into more valuable products or services. Cross-sell campaigns encourage customers to choose a broader array of products and services.
Losing these more valuable customers makes it that much more difficult to compensate with new customer acquisition.
Figure 1: Customer Segmentation for Attrition Marketing
If we could identify which customers were most likely to leave,
we could focus our efforts on retaining them proactively. In this
illustration, commonly used as a customer retention segmentation framework, the horizontal dimension scales customer value
from lowest to highest, while the vertical dimension scales customer attrition risk from lowest to highest. The resulting four
quadrants have unique marketing imperatives, from “Agressively Retain” to “Divest”.
Aggressively Retain: Customers who are both valuable and
likely to leave should be given priority, with communications and
service that address their needs.
Maintain: Customers who are valuable with lower attrition risk
should also be managed well with effective customer service,
1
NESUG 2010
Statistics and Analysis
but warrant less aggressive retention messaging.
Grow: Customers who are less valuable but also less likely to attrite, should be actively up-sold and cross-sold to increase
their value to the organization.
Divest: Finally, customers who are less valuable and likely to attrite should receive a degree of communications designed to
engender value and loyalty, but do not warrant significant marketing investment.
DATA PREPARATION
In an attrition modeling project, it is critical to work with subject matter experts to gather their hypotheses for attrition behavioral relationships. It is then essential to work with data owners to ensure that all available variables are assembled and that
true leading indicators of attrition are identified. Variables that represent subject matter expert hypotheses should be created,
as well as any appropriate variable derivations such as ratios or trends. Looking back at historic loyal and disloyal customers
to re-create the past often presents problems such as overwritten data that may bias the results of the analysis.
It is necessary to operationally define attrition, and structure the data to support its prediction. Attrition differs by industry and
circumstance, and can be either active or passive in nature. Active attrition results when a customer does not renew a contract, cancels an account, or makes an observable, point-in-time choice. Passive attrition occurs when a customer stops
transacting, or lapses into inactivity. In either case, an historic timeline can be established to study the leading indicators of
attrition.
As shown in Figure #2 below, observing customer behavior over a length of time prior to the attrition event observation window is the first time interval. The second time period that should be considered is a “dark” period, simulating the time during
which the attrition model will be applied to the customer base and pre-emptive retention marketing communications delivered
to stem attrition, prior to the attrition event observation window. The length of this window should be dictated by operational
realities including database update periodicity and marketing channel execution times. For example, an e-mail could be sent
or a customer service call made in a short time, whereas a direct mail package may take weeks to produce and deliver
through the postal system. Finally, the attrition event observation window is the time period in which customers in the study
are determined to attrite or remain loyal. Again, this could entail classifying customers based on whether or not they renew
contracts, cancel their accounts, or begin a period of inactivity that is deemed by subject matter experts to be indefinite, and
therefore synonymous with attrition.
Figure 2: Attrition Model Development Data Timeline
Once attrition has been defined and a timeline established for historic analysis, the next step is to consider what data exists
for use in making the prediction. Best practice is to cast a wide net at this stage, considering all data being collected and already accessible within the organization. Dimensions of predictors include:
RFM: Recency-Frequency-Monetary Value metrics, derived via transaction dates and spending amounts. This fundamental
concept will likely be the backbone of any predictive customer model.
Transactional detail: beyond RFM, transactional details include item(s) purchased, payment method, and channel (e.g., instore, online, mail order, etc.)
Customer service: interactions customers have had, including dates, call durations, and disposition codes relating to issues
presented and how they were resolved
Surveys: satisfaction surveys or other attitudinal information collected from customers through various channels may also be
considered providing that responses have been collected on a sufficient proportion of the customer base to actionably scale.
2
NESUG 2010
Statistics and Analysis
ATTRITION MODELING
Once the data has been thoroughly and accurately prepared, keeping the subject matter expert hypotheses in mind, exploratory data analysis and predictive modeling can ensue. Many different algorithms can be applied to study attrition behavior
including logistic regression, neural networks, decision trees, or survival analysis. This paper will use SAS PROC LOGISTIC
to develop the attrition model and subsequently scrutinize the results at an individual level to isolate attrition drivers.
Exploratory data analysis (EDA) methods such as random dataset partitioning, correcting for outliers, missing value imputation, non-linear transformations and interaction terms are best practices that should be followed prior to developing the final
predictive model. Explanation of how to apply these techniques is beyond the scope of this paper.
Once EDA is complete, candidate variables for attrition prediction can be submitted to SAS PROC LOGISTIC for model development. The basic SAS code framework for the procedure is as follows:
ods output ParameterEstimates=Parms;
proc logistic data=libname.filename descending;
model attrition = var1 var_2 var_n;
output out=pred (keep=customerID attrition pred var1 var2 var_n) pred=pred;
run;
ods output close;
Features of this code that are of interest include the output statement, which is invoked to capture the parameter estimates
along with the predicted values and independent variables in output datasets. Once this code has been run, the pred dataset
can be manipulated to evaluate model performance. Typically, this is accomplished by sorting the probabilities from high to
low, dividing the customers into ten equally-sized groups called “deciles”, such that ten percent of the customer base is contained in each decile, and observing model performance in terms of attrition rate by decile.
proc rank data=pred groups=10 out=pred descending;
var pred; ranks decile;
run;
data pred;
set pred;
decile=decile+1;
run;
proc means data=pred n mean sum;
var attrition;
class decile;
run;
Figure 3: Attrition Model Decile Analysis
The above graph illustrates typical attrition
model performance when studied through the
lens of a decile analysis. If the model is performing as expected, actual attrition rates will
decline from highest to lowest from decile 1
(most likely to attrite) to decile 10 (lease likely
to attrite). Clearly this is true in the example in
Figure 3: with an overall attrition rate of 3.5%,
and a top decile attrition rate of 14%, customers in decile 1 are at 4 times greater risk than
average. The curve slopes consistently down to
th
nearly 0% attrition rate in the 10 decile, where
the model’s prediction is lowest.
Note that best practice is to conduct the above
analysis on a customer sample that was held
out from model development. This is accomplished by building the model on a development sample, then scoring a held-out validation sample, sorting, slicing into deciles, and measuring actual attrition rate vis a
3
NESUG 2010
Statistics and Analysis
vis the model’s predictions. The purpose of the validation sample is to ensure that the model is robust, not over-fitting spurious relationships found only in the development sample.
After the attrition model is built and validated, the business has a means of ranking customers by their likelihood to attrite at
an individual level. Along with a measure of customer value, this enables the business to segment customers into the fourquadrant framework illustrated in Figure 1.
IDENTIFYING ATTRITION RISK FACTORS
Typically, a predictive attrition model is applied to identify a target audience, which is a crucial factor in the success of any
retention marketing campaign. Normally, a single version of an anti-attrition message is extended to any customer who
scores above a certain threshold. However, using a predictive model to differentiate the types of customers who score above
that threshold to enable more relevant communications to each type is an approach which extends the model’s value and
application.
Often, model deciles are studied to yield insight into the characteristics of customers in each. In the table below, means for
each decile across hypothetical attrition model variables are presented:
Figure 4: Attrition Model Decile Profile
The table illustrates that customers who are most likely to attrite have the shortest tenure, have lapsed the longest, have
made the most customer service calls, purchased in the fewest product categories, and at the highest discount. They are also
the youngest customers with the highest percentage of single marital status. These central tendencies give the business a
notion of the type of customer who is at greatest attrition risk, and can plan marketing interventions accordingly in terms of
offers, creative/tone, and channel.
However, not all at-risk customers are the same. Despite this overall attrition profile, even customers in Decile 1 have different reasons for their high scores. After all, a decile is a collective group of individuals, whose similar scores have been derived in myriad ways based on their unique characteristics.
Figure 5: Individual at-risk customers
Customer A is at risk due to short tenure, high discounts, young age, and single marital status.
Customer B is at risk due to a long lapse and narrow product range.
4
NESUG 2010
Statistics and Analysis
Customer C is at risk due to frequent customer service calls.
In all of these examples, customers score in the top 10% of attrition risk, but the characteristics that led to their scores are
distinct, and warrant unique marketing intervention. If it were possible to systematically differentiate these high-risk customers
based on their particular risk factors, the business could address their needs individually, and therefore make retention communications more effective at stemming attrition.
Computationally, the following progression yields estimates of individual-level risk factors:
This is the general form of the logistic regression model, with the probability of the event, y, as a function of independent variables, x.
For individual j, this expression quantifies the contribution of the value for any given independent variable, x i to the probability
of the event y.
This expression quantifies the average contribution of any given independent variable x i to the probability of the event y.
Finally, differencing the prior two expression outcomes yields the relative risk posed to customer j by his/her value of independent variable xi.
It is important to subtract the average contribution to the probability of attrition for each independent variable, because all variables make a positive contribution to attrition probability, even if they are below average. For example, there will be a positive coefficient for the variable, “days since last transaction”, such that the more days of inactivity have elapsed, the greater
the likelihood of attrition. However, if a customer has been inactive for 5 days, whereas the average number of days since last
transaction is 30, that customer is actually much less likely to attrite than average, and therefore, days since last transaction
should not be deemed a risk factor.
SAS datastep programming for conducting these manipulations is as follows:
proc sql noprint;
select Estimate into: est_var_1 from Parms where Variable='var_1';
select Estimate into: est_var_2 from Parms where Variable='var_2';
select Estimate into: est_var_n from Parms where Variable='var_n';
quit;
data pred;
set pred;
xbeta_var_1 = &est_var_1. * var_1;
xbeta_var_2 = &est_var_2. * var_2;
xbeta_var_N = &est_var_n. * var_n;
prob_var_1=1/(1+exp(-xbeta_var_1));
5
NESUG 2010
Statistics and Analysis
prob_var_2=1/(1+exp(-xbeta_var_2));
prob_var_n=1/(1+exp(-xbeta_var_n));
run;
proc sql;
select mean(prob_var_1), mean(prob_var_2), mean(prob_var_n)
into :mean_var_1, :mean_var_2, :mean_var_n
from pred;
quit;
data pred1(keep=customerID diff_prob:);
set pred;
diff_prob_var_1 = prob_var_1 - &mean_var_1.;
diff_prob_var_2 = prob_var_2 - &mean_var_2.;
diff_prob_var_n = prob_var_N - &mean_var_N.;
run;
proc sort data=pred1; by customerID; run;
proc transpose data=pred1 out=pred2; by customerID; run;
proc sort data=pred2; by customerID descending col1; run;
data pred2;
set pred2;
by customerID descending col1;
if first.customerID then rank=0;
rank+1;
_NAME_=tranwrd(_NAME_, "diff_prob_", "");
run;
proc transpose data=pred2 out=pred2(drop=_NAME_); by customerID; var rank;
run;
Figure 6: Attrition Risk Factor Rankings
Armed with contributions and ranks for
each independent variable for each customer, several types of diagnostic analyses
can be performed to inform retention marketing decisions:
First, frequency distributions can be studied
to evaluate which variables are the most
prevalent in raising attrition risk. Just knowing which factors are most common can
enable an organization to adjust policies
and communicate those changes effectively
to the most vulnerable customers to reduce
attrition. In the graph above, transaction
lapse, short tenure, and high discount levels are the top three attrition primary risk
factors.
Second, ranked attrition factors can be
listed for each customer along with commensurate retention marketing tactics.
These business rules can be deployed in
different customer interaction settings such
as the call center, point-of-sale, or online.
6
NESUG 2010
Statistics and Analysis
Figure 7: Individual Customer Retention Strategies
TESTING AND DEPLOYING CUSTOMIZED ATTRITION MARKETING
Before a tailored attrition marketing system can be deployed within an organization, two key research questions must be
answered:
1.
Does the attrition model accurately identify customers at risk of defection?
2.
Does customized messaging reduce attrition more than a standard message?
TESTING THE ATTRITION MODEL
To partially answer the first question, a holdout sample was used for model validation to guard against overfitting. However,
additional, go-forward evaluation of the model’s predictive accuracy is necessary to determine if it is robust in the face of
evolving market dynamics.
By continuing to track attrition rates by decile, an organization can ensure the attrition model’s accuracy. If rates begin to differ from model predictions, model refinements or complete refreshes may be warranted. It is important to actively maintain a
model for it to perform at peak efficiency.
TESTING CUSTOMIZED ATTRITION MARKETING
Once the model is found to be accurate over time, the second question can be addressed with in-market testing. Experimental design methodology should be applied to assess the effectiveness of customized retention marketing efforts relative to
standard retention marketing efforts.
To be specific, based on the expected attrition rate and minimally acceptable rate reduction, customized and standard communication sample sizes can be estimated using SAS PROC POWER. These customer groups can then be followed through
a marketing campaign cycle to quantify the impact of customized retention marketing initiatives.
proc power ;
TwoSamplefreq
Alpha = .05
Sides = 1
refproportion=.05
proportiondiff=0.005
groupns=250000 | 10000 to 250000 by 10000
Power = .;
run;
7
NESUG 2010
Statistics and Analysis
In the above example, a two-independent-sample proportion test is being conducted. Confidence is set at the standard 95%
level, and the test is 1-sided since the hypothesis is that customized retention marketing will outperform standard messaging.
An expected attrition rate is input, as well as an incremental difference in attrition rate. Then, in an attempt to maximize the
number of customers receiving customized marketing treatment, the “groupns” statement will return power levels for each
accompanying sample size allocated to the standard marketing treatment. Essentially, this search will determine how to allocate customers into the two marketing treatment groups in order to achieve statistical standards of confidence and power.
GENERALIZING THE APPROACH
Since logistic regression can be employed to predict any marketing event (e.g., response, cross-sell, etc.), the methodology
described in this paper can be applied to myriad marketing problems. For example, a response model can identify who is
likely to respond, say, decile 1, and then be dissected among these individuals to identify different reasons for the response
behavior that can be capitalized upon through marketing customization. The features and benefits of the product or service
being promoted can be positioned differently according to the response drivers unique to each individual. Presented with a
more relevant offer, consumers may be more likely to make a purchase accordingly. Incremental response rate due to offer
customization can then be quantified through the application of sound experimental design as illustrated in the attrition case.
CONCLUSION
In summary, the organization will be armed with a model that estimates the probability of attrition for each customer on the
database. Customer-level reasons for attrition risk will also then be available as a derivative of the predictive modeling effort.
These reasons can be distributed overall to identify areas for the organization to address. They can be ranked for each customer along with appropriate retention messaging to form business rules for subsequent customer interactions. And, they can
be used in a multivariate setting to derive clusters of customers that warrant unique retention marketing communication strategies. Experimental design can then be leveraged to quantify the impact that the combination of target audience identification, reason detection, and commensurate retention marketing initiatives have on stemming attrition behavior. Understanding
which valuable customers are most likely to leave, and why they are at risk, yields a best practice decision support platform
for an organization to minimize customer attrition.
ACKNOWLEDGMENTS
The author would like to thank Euro RSCG Discovery colleagues Stephen Easter and Vishal Patel who contributed statistical and SAS syntax expertise to the content of this paper.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Ward Thomas
Euro RSCG Discovery
4490 Cox Rd
Glen Allen, VA 23060
Work Phone: 804 968 7570
Email: [email protected]
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in
the USA and other countries. ® indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
************************************************
8