Download Consumer Behavior Prediction using Parametric and Nonparametric

Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva Carnegie Mellon University 25 November 2002 [email protected] Recent Research Projects  Dimensionality Reduction Methods and Fractal Dimension (with Christos Faloutsos)  Learning to Change Taxonomies (with Valery Petrushin, Accenture Technology Labs)  Text Re-Classification Using Existing Schemas (with Yiming Yang)  Learning Within-Sentence Semantic Coherence (with Roni Rosenfeld)  Automatic Document Summarization (with John Lafferty)  Consumer Behavior Prediction (with Alan Montgomery [Business school] and Rich Caruana [SCS]) Outline Introduction & Motivation  Dataset  Baseline Models  New Hybrid Models  Results  Summary & Work in Progress  How to increase profits?  Without raising the overall price level?  Without more advertising?  Without attracting new customers? A: Better Pricing Strategies Encourage the demand for products which are most profitable for the store Recent trend to consolidate independent stores into chains  Pricing doesn’t take into account the variability of demand due to neighborhood differences.  A: Micro-Marketing  Pricing strategies should adapt to the neighborhood demand  The basis: the difference in interbrand competition in different stores  Stores can increase operating profit margins by 33% to 83% [Montgomery 1997] Understanding Demand  Need to understand the relationship between the prices of products in a category and the demand for these products  Price Elasticity of Demand Price Elasticity consumer’s response to price change percent Q E percent P inelastic Q is quantity purchased P is price of product elastic Prices and Quantities  Q demanded of a specific product is a function of the prices of all the products in that category  This function is different for every store, for every category The Function    q ~ f ( p)    ~ N (0,  ) Category 2 Price of Product 1 Quantity bought of Product 1 Price of Product 2 Quantity bought of Product 2 Price of Product 3 ... Predictor “I know your Price of Product N customers” Quantity bought of Product 3 ... Quantity bought of Product N Need to multiply this across many stores, many categories. How to find this function?  Traditionally – using parametric models (linear regression) Data Example 100000 quantity 80000 60000 40000 20000 0 0.02 0.03 0.04 price 0.05 0.06 Data Example – Log Space 5.25 ln(quant) 4.75 4.25 3.75 3.25 2.75 -1.58 -1.53 -1.48 -1.43 -1.38 ln(price) -1.33 -1.28 The Function    ln( q ) ~ f (ln( p))   convert to original space Price of Product 1 Quantity bought of Product 1 Price of Product 2 Quantity bought of Product 2 convert to ln space Category  ~ N (0,  2 ) Price of Product 3 ... Predictor “I know your Price of Product N customers” Quantity bought of Product 3 ... Quantity bought of Product N Need to multiply this across many stores, many categories. How to find this function?  Traditionally – using parametric models (linear regression)  Recently – using non-parametric models (neural networks) Our Goal Advantage of LR: known functional form (linear in log space), extrapolation ability  Advantage of NN: flexibility, accuracy accuracy  NN new LR robustness  Take Advantage: use the known functional form to bias the NN  Build hybrid models from the baseline models Evaluation Measure RMSerror  1 N qi  qˆi  N i 1 2 Root Mean Squared Error (RMS)  the average deviation between the true quantity and the predicted quantity  Error Measure – Unbiased Model ln( q) ~ f (ln( p))    ~ N (0,  ) 2 RMSerror   ln( q) | f (ln( p)) ~ N (  ,  2 ) E[e ln(q ) ]e 1 2   2 1 N qi  qˆi  N i 1  E[e  ln(q ) ^ but ] e qˆ  eln(q )  e  1 2  2 by computing the integral over the distribution ^ e ln(q ) is a biased estimator for q, and we correct the bias by using ^ qˆ  e 1 ln(q )   2 2 2 which is an unbiased estimator for q. Dataset Store-level cash register data at the product level for 100 stores  Store prices updated every week  Two Years of transactions  Chilled Orange Juice category (12 Products)  Models Baselines –Linear Regression –Neural Networks  Hybrids – – – – Smart Prior MultiTask Learning Jumping Connections Frozen Jumping Connections Baselines  Linear Regression  Neural Networks Linear Regression K ln( q)  a   bi ln( pi )   i 1  ~ N (0,  2 )  q is the quantity demanded  pi is the price for the ith product  K products overall  The coefficients a and bi are determined by the condition that the sum of the square residuals is as small as possible. Linear Regression Results - RMS Error 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Neural Networks Generic nonlinear function approximators  Collection of basic units (neurons), computing a (non)linear function of their input  Random initialization  Backpropagation  Early stopping to prevent overfitting  Neural Networks 1 hidden layer, 100 units, sigmoid activation function Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Hybrid Models Smart Prior  MultiTask Learning  Jumping Connections  Frozen Jumping Connections  Smart Prior Idea: Initialize the NN with a “good” set of weights; help it start from a “smart” prior.  Start the search in a state which already gives a linear approximation  NN training in 2 stages – First, on synthetic data (generated by the LR model) – Second, on the real data Smart Prior LR Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Multitask Learning [Caruana 1997] Idea: learning an additional related task in parallel, using a shared representation Adding the output of the LR model (built over the same inputs) as an extra output to the NN  Make the NN share its hidden nodes between both tasks  MultiTask Learning •Custom halting function •Custom RMS function Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Jumping Connections Idea: fusing LR and NN Modify architecture of the NN  Add connections which “jump” over the hidden layer  Gives the effect of simulating a LR and NN together  Jumping Connections Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Frozen Jumping Connections Idea: show the model what the “jump” is for Same architecture as Jumping Connections, but two training stages  Freeze the weights of the jumping layer, so the network can’t “forget” about the linearity  Frozen Jumping Connections Frozen Jumping Connections Frozen Jumping Connections Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Models Baselines:  –Linear Regression –Neural Networks Hybrids – – – – Smart Prior MultiTask Learning Jumping Connections Frozen Jumping Connections Combinations –Voting –Weighted Average Combining Models Idea: Ensemble Learning Use all models and then combine their predictions  Committee Voting  Weighted Average 2 baseline and 3 hybrid models (Smart Prior, MultiTask Learning, Frozen Jumping Conections) Committee Voting  Average the predictions of the models Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Weighted Average – Model Regression  Optimal weights determined by a linear regression model over the predictions Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Normalized RMS Error  Compare model performance across stores with different: – Sizes – Ages – Locations   Need to normalize Compare to baselines  Take the error of the LR benchmark as unit error Normalized RMS Error 1.10 1.05 1.00 0.95 0.90 0.85 0.80 0.75 LR NN SmPr MTL JC FJC Vote WAV   Category Summary P of Prod1 Predictor Q bought of Prod1 Q bought of Prod2 P of Prod2 P of Prod3 “I know Q bought of Prod3 ... your P of ProdN customers” ... Q bought of ProdN Built new models for better pricing strategies for individual stores, categories Hybrid models clearly superior to baselines for customer choice prediction  Incorporated domain knowledge (linearity) in Neural Networks  New models allow stores to – price the products more strategically and optimize profits – maintain better inventories – understand product interaction www.cs.cmu.edu/~eneva References     Montgomery, A. (1997). Creating MicroMarketing Pricing Strategies Using Supermarket Scanner Data West, P., Brockett, P. and Golden, L (1997) A Comparative Analysis of Neural Networks and Statistical Methods for Predicting Consumer Choice Guadagni, P. and Little, J. (1983) A Logit Model of Brand Choice Calibrated on Scanner data Rossi, P. and Allenby, G. (1993) A Bayesian Approach to Estimating Household Parameters Work In Progress analyze Weighted Average model  compare extrapolation ability of new models  Other MTL tasks:  – shrinkage model – a “super” store model with data pooled across all stores – store zones

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Consumer Behavior Prediction using Parametric and Nonparametric