Download Consumer Behavior Prediction using Parametric and Nonparametric

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
Consumer Behavior Prediction
using Parametric and
Nonparametric Methods
Elena Eneva
Carnegie Mellon University
25 November 2002
[email protected]
Recent Research Projects

Dimensionality Reduction Methods and Fractal Dimension
(with Christos Faloutsos)

Learning to Change Taxonomies
(with Valery Petrushin, Accenture Technology Labs)

Text Re-Classification Using Existing Schemas
(with Yiming Yang)

Learning Within-Sentence Semantic Coherence
(with Roni Rosenfeld)

Automatic Document Summarization
(with John Lafferty)

Consumer Behavior Prediction
(with Alan Montgomery [Business school] and Rich Caruana [SCS])
Outline
Introduction & Motivation
 Dataset
 Baseline Models
 New Hybrid Models
 Results
 Summary & Work in Progress

How to increase profits?

Without raising the overall price level?

Without more advertising?

Without attracting new customers?
A: Better Pricing Strategies
Encourage the demand for products
which are most profitable for the store
Recent trend to consolidate
independent stores into chains
 Pricing doesn’t take into account the
variability of demand due to
neighborhood differences.

A: Micro-Marketing

Pricing strategies should adapt to the
neighborhood demand
 The basis: the difference in interbrand
competition in different stores

Stores can increase operating profit margins
by 33% to 83% [Montgomery 1997]
Understanding Demand

Need to understand the relationship
between the prices of products in a
category and the demand for these
products

Price Elasticity of Demand
Price Elasticity
consumer’s response to price change
percent Q
E
percent P
inelastic
Q is quantity purchased
P is price of product
elastic
Prices and Quantities

Q demanded of a specific product is a
function of the prices of all the products
in that category

This function is different for every store,
for every category
The Function

 
q ~ f ( p)  
 ~ N (0,  )
Category
2
Price of Product 1
Quantity bought of Product 1
Price of Product 2
Quantity bought of Product 2
Price of Product 3
...
Predictor
“I know
your
Price of Product N
customers”
Quantity bought of Product 3
...
Quantity bought of Product N
Need to multiply this across many stores, many categories.
How to find this function?

Traditionally – using parametric models
(linear regression)
Data Example
100000
quantity
80000
60000
40000
20000
0
0.02
0.03
0.04
price
0.05
0.06
Data Example – Log Space
5.25
ln(quant)
4.75
4.25
3.75
3.25
2.75
-1.58
-1.53
-1.48
-1.43
-1.38
ln(price)
-1.33
-1.28
The Function



ln( q ) ~ f (ln( p))  
convert to original space
Price of Product 1
Quantity bought of Product 1
Price of Product 2
Quantity bought of Product 2
convert to ln space
Category
 ~ N (0,  2 )
Price of Product 3
...
Predictor
“I know
your
Price of Product N
customers”
Quantity bought of Product 3
...
Quantity bought of Product N
Need to multiply this across many stores, many categories.
How to find this function?

Traditionally – using parametric models
(linear regression)

Recently – using non-parametric
models (neural networks)
Our Goal
Advantage of LR: known functional form
(linear in log space), extrapolation ability

Advantage of NN: flexibility, accuracy
accuracy

NN
new
LR
robustness

Take Advantage: use the
known functional form to
bias the NN
 Build hybrid models from
the baseline models
Evaluation Measure
RMSerror 
1 N
qi  qˆi

N i 1
2
Root Mean Squared Error (RMS)
 the average deviation between the true
quantity and the predicted quantity

Error Measure – Unbiased Model
ln( q) ~ f (ln( p))  
 ~ N (0,  )
2
RMSerror 

ln( q) | f (ln( p)) ~ N (  ,  2 )
E[e
ln(q )
]e
1
2
  2
1 N
qi  qˆi

N i 1
 E[e

ln(q )
^
but
] e
qˆ  eln(q )  e 
1 2

2
by computing the
integral over the
distribution
^
e
ln(q )
is a biased estimator for q, and we correct the bias by using
^
qˆ  e
1
ln(q )   2
2
2
which is an unbiased estimator for q.
Dataset
Store-level cash register data at the
product level for 100 stores
 Store prices updated every week
 Two Years of transactions
 Chilled Orange Juice category (12
Products)

Models
Baselines
–Linear Regression
–Neural Networks

Hybrids
–
–
–
–
Smart Prior
MultiTask Learning
Jumping Connections
Frozen Jumping
Connections
Baselines

Linear Regression

Neural Networks
Linear Regression
K
ln( q)  a   bi ln( pi )  
i 1
 ~ N (0,  2 )

q is the quantity demanded
 pi is the price for the ith product
 K products overall
 The coefficients a and bi are determined by
the condition that the sum of the square
residuals is as small as possible.
Linear Regression
Results - RMS Error
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Neural Networks
Generic nonlinear function
approximators
 Collection of basic units (neurons),
computing a (non)linear function of their
input
 Random initialization
 Backpropagation
 Early stopping to prevent overfitting

Neural Networks
1 hidden layer, 100 units, sigmoid activation function
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Hybrid Models
Smart Prior
 MultiTask Learning
 Jumping Connections
 Frozen Jumping Connections

Smart Prior
Idea: Initialize the NN with a “good” set of
weights; help it start from a “smart” prior.

Start the search in a state which already gives
a linear approximation
 NN training in 2 stages
– First, on synthetic data (generated by the LR model)
– Second, on the real data
Smart Prior
LR
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Multitask Learning
[Caruana 1997]
Idea: learning an additional related task in
parallel, using a shared representation
Adding the output of the LR model (built
over the same inputs) as an extra
output to the NN
 Make the NN share its hidden nodes
between both tasks

MultiTask Learning
•Custom halting function
•Custom RMS function
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Jumping Connections
Idea: fusing LR and NN
Modify architecture of the NN
 Add connections which “jump” over the
hidden layer
 Gives the effect of simulating a LR and
NN together

Jumping Connections
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Frozen Jumping Connections
Idea: show the model what the “jump” is for
Same architecture as Jumping
Connections, but two training stages
 Freeze the weights of the jumping layer,
so the network can’t “forget” about the
linearity

Frozen Jumping Connections
Frozen Jumping Connections
Frozen Jumping Connections
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Models
Baselines:

–Linear Regression
–Neural Networks
Hybrids
–
–
–
–
Smart Prior
MultiTask Learning
Jumping Connections
Frozen Jumping
Connections
Combinations
–Voting
–Weighted Average
Combining Models
Idea: Ensemble Learning
Use all models and then combine their
predictions

Committee Voting
 Weighted Average
2 baseline and 3 hybrid models
(Smart Prior, MultiTask Learning, Frozen
Jumping Conections)
Committee Voting

Average the predictions of the models
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Weighted Average – Model Regression

Optimal weights determined by a linear
regression model over the predictions
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Normalized RMS Error

Compare model performance across stores
with different:
– Sizes
– Ages
– Locations


Need to normalize
Compare to baselines

Take the error of the LR benchmark as unit
error
Normalized RMS Error
1.10
1.05
1.00
0.95
0.90
0.85
0.80
0.75
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV


Category
Summary
P of Prod1
Predictor Q bought of Prod1
Q bought of Prod2
P of Prod2
P of Prod3 “I know
Q bought of Prod3
...
your
P of ProdN customers”
...
Q bought of ProdN
Built new models for better pricing strategies for
individual stores, categories
Hybrid models clearly superior to baselines for
customer choice prediction

Incorporated domain knowledge (linearity) in Neural
Networks

New models allow stores to
– price the products more strategically and optimize profits
– maintain better inventories
– understand product interaction
www.cs.cmu.edu/~eneva
References




Montgomery, A. (1997). Creating MicroMarketing Pricing Strategies Using
Supermarket Scanner Data
West, P., Brockett, P. and Golden, L (1997) A
Comparative Analysis of Neural Networks
and Statistical Methods for Predicting
Consumer Choice
Guadagni, P. and Little, J. (1983) A Logit
Model of Brand Choice Calibrated on
Scanner data
Rossi, P. and Allenby, G. (1993) A Bayesian
Approach to Estimating Household
Parameters
Work In Progress
analyze Weighted Average model
 compare extrapolation ability of new
models
 Other MTL tasks:

– shrinkage model – a “super” store model
with data pooled across all stores
– store zones