Download Customer Selection for Targeted Promotion: A Market Basket

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Customer Selection for Targeted Promotion:
A Market Basket Analysis Approach
Yinghui (Catherine) Yang
Graduate School of Management
University of California, Davis
[email protected]
http://faculty.gsm.ucdavis.edu/~yiyang
Chunhui Hao & Ansheng Ge
Institute of Automation
Chinese Academy of Sciences
[email protected]; [email protected]
Research questions
Data mining processes can sometimes generate overwhelming amount of information, which is
very difficult for managers to comprehend, let alone put into actions. Directly incorporating
managers’ decision goals into the data mining process can potentially provide more values to
managers. In this research, we integrate the decision on customer selections into market basket
analysis (i.e. association rule analysis). Market basket analysis is a useful method of discovering
customer purchasing patterns by extracting associations among products from purchase
transactions. For a retailer (either online or offline), a very common decision to make is to select
customers to provide promotions so that the money spent on the promotion can achieve the
maximum value. Given the potential for cross-selling among products, market basket analysis is
a viable tool of discovering such correlations among different products. Integrating the results of
market basket analysis into the customer selection process can provide potentially significant
value for retailers.
Problem Formulation and Approach
1
We consider a market basket transaction database that contains purchase records
generated by a group of consumers. Let P = { p1 , p 2 ,..., p n } be a set of products and
C = {c1 , c 2 ,..., c w } be a set of consumers. Any consumer c k can purchase any product pi (for
k ∈ [1...w] , i ∈ [1...n] ). An shopping transaction is a list of products purchased by a consumer at
a single check-out (it can also be extended to include purchases made within a certain period of
time, e.g. a week): c k : ( p k 1 , p k 2 ,..., p kl ) ( k ∈ [1...w] , kj ∈ [1...n] for j ∈ [1...l ] ), where p ki ∈ P ,
c k ∈ C and p ki ≠ p kj (for i ≠ j and i, j ∈ [1...l ] ).
(1) Single item promotion
Assuming a supermarket is considering to offer discounted coupons on a certain product,
and the coupon of this item can only be offered to limited number of customers due to budget
limitations. The decision the supermarket needs to make is which N customers the coupons
should be sent to in order to maximize the campaign profit given that the purchases of products
are correlated. According to the results of market basket analysis, we are able to estimate the
likelihood of other related purchases from a certain customer once the coupon is offered to him
or her. In a simpler version, we can calculate a “value” for each customer. Naturally, the
customers with higher value should be offered coupons first.
For each customer, we define the following value:
Definition 1 Value of a single item for a customer (denoted as function Value1())
∑ [NT * PRRHS * (Confidence of the rule − Support of RHS)]
rules in S A
S A is a set of rules with the item on promotion on the left hand side and one single item on the
right hand side (e.g. AÆ X). For rules with multiple items on the right hand side, we can simply
consider them separately. For example, A Æ {B, C} can be separated into AÆ B and AÆ C.
RHS is short for the item on the right hand side of the rule. NT is the number of transactions
2
supporting the rule. PR RHS is the unit profit of the item on the right hand side of the rule. The lift
of the rule is defined as
Confidence of the rule
, and it can be used to measure whether the
Support of RHS
purchase of the item on the left hand side is positively affecting the purchase of the item on the
right hand side. Because lift value is always positive, we cannot directly use it in Definition 1.
The reason is that for rule with lift smaller than 1, we don’t want it to positively contribute to the
value of the item on the left hand side. Therefore, we use a modified form of lift,
(Confidence of the rule − Support of RHS) . This value will be negative if the lift is smaller than
1, and will be positive if it is greater than 1. And this value will range between [-100%, 100%].
The rules discovered are from the transactions of the individual customer.
The problem with one item on promotion is fairly simple. For each customer, we
calculate the value of the item based on Definition 1. We can simply pick the top N customers to
send coupons to.
(2) Multiple item promotion
For simultaneous multiple item promotion, the retailer can decide to send the coupon for
a certain product (or products) to a certain customer. For simplicity, we assume that each item
has the same number of coupons N. A naïve approach is to use the method for single item
promotion for each item (i.e. pick the top N customers for each item to send coupons to).
However, there are issues with this approach. For example, after we send coupon for Item A to a
customer, the additional benefits of Item B’s coupon may not be as high as it is sent without Item
A. One reason is because Item A might encourage the purchase of Item B, and Item B’s coupon
may not be as useful for this customer even through this item has high value independently for
this customer. Item B’s coupon might generate higher benefit for someone else with lower value
for Item B alone. Below is a heuristic for this multiple item problem.
3
Step 1, For each customer, calculate Value1() for all the items on promotion.
Step 2, For each item, calculate the aggregated value of the top N customers for this item.
Step 3, Rank the items according to the aggregated values. Let R be the list of ranked items.
Step 4, Start from the top item i1 in R.
Step 5, Select the top N customers for i1
Step 6, For the next item i2 in R,
Step 7, For each of the top N customers for i2, check whether this customer was selected for the
previous items. If not, select this customer. If yes, calculate the additional value that i2 can add
on top of the previous items. Compare this added value with the value of the next available
customer outside the top N customers (also consider its added value if this customer has been
selected for other items), and pick the one with higher value. Do this down the list of the top
customers until the number of customers selected reaches N.
Step 8, Repeat Step 6 and 7 until all items have been dealt with.
Expected contributions
First, we believe that the paper is set out to address a very real and important problem
faced by supermarkets, grocery stores, online retailers and etc. The specific problem this paper is
addressing belongs to the broader family of targeted marketing, which has a very wide
application in practice. Secondly, this research incorporates profit maximization into the data
mining process and thus contributes to the actionability of data mining. Moreover, the methods
proposed in this paper combine both optimization and data mining, and thus can contributes to
the existing literature on applying optimization in data mining and vice versa. In addition, the
framework set up in this paper can be easily extended to related problems such as item
recommendation.
Current status of the manuscript
Currently we are evaluating alternative heuristics to approach the problem. We have obtained
data from a supermarket containing information about customers, products (price etc.) and
purchase transactions. We are planning to implement several alternative heuristic-based
algorithms on both synthetic data and the real supermarket data to test the effectiveness of our
approach.
4