Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva Carnegie Mellon University 25 November 2002 [email protected] Recent Research Projects Dimensionality Reduction Methods and Fractal Dimension (with Christos Faloutsos) Learning to Change Taxonomies (with Valery Petrushin, Accenture Technology Labs) Text Re-Classification Using Existing Schemas (with Yiming Yang) Learning Within-Sentence Semantic Coherence (with Roni Rosenfeld) Automatic Document Summarization (with John Lafferty) Consumer Behavior Prediction (with Alan Montgomery [Business school] and Rich Caruana [SCS]) Outline Introduction & Motivation Dataset Baseline Models New Hybrid Models Results Summary & Work in Progress How to increase profits? Without raising the overall price level? Without more advertising? Without attracting new customers? A: Better Pricing Strategies Encourage the demand for products which are most profitable for the store Recent trend to consolidate independent stores into chains Pricing doesn’t take into account the variability of demand due to neighborhood differences. A: Micro-Marketing Pricing strategies should adapt to the neighborhood demand The basis: the difference in interbrand competition in different stores Stores can increase operating profit margins by 33% to 83% [Montgomery 1997] Understanding Demand Need to understand the relationship between the prices of products in a category and the demand for these products Price Elasticity of Demand Price Elasticity consumer’s response to price change percent Q E percent P inelastic Q is quantity purchased P is price of product elastic Prices and Quantities Q demanded of a specific product is a function of the prices of all the products in that category This function is different for every store, for every category The Function q ~ f ( p) ~ N (0, ) Category 2 Price of Product 1 Quantity bought of Product 1 Price of Product 2 Quantity bought of Product 2 Price of Product 3 ... Predictor “I know your Price of Product N customers” Quantity bought of Product 3 ... Quantity bought of Product N Need to multiply this across many stores, many categories. How to find this function? Traditionally – using parametric models (linear regression) Data Example 100000 quantity 80000 60000 40000 20000 0 0.02 0.03 0.04 price 0.05 0.06 Data Example – Log Space 5.25 ln(quant) 4.75 4.25 3.75 3.25 2.75 -1.58 -1.53 -1.48 -1.43 -1.38 ln(price) -1.33 -1.28 The Function ln( q ) ~ f (ln( p)) convert to original space Price of Product 1 Quantity bought of Product 1 Price of Product 2 Quantity bought of Product 2 convert to ln space Category ~ N (0, 2 ) Price of Product 3 ... Predictor “I know your Price of Product N customers” Quantity bought of Product 3 ... Quantity bought of Product N Need to multiply this across many stores, many categories. How to find this function? Traditionally – using parametric models (linear regression) Recently – using non-parametric models (neural networks) Our Goal Advantage of LR: known functional form (linear in log space), extrapolation ability Advantage of NN: flexibility, accuracy accuracy NN new LR robustness Take Advantage: use the known functional form to bias the NN Build hybrid models from the baseline models Evaluation Measure RMSerror 1 N qi qˆi N i 1 2 Root Mean Squared Error (RMS) the average deviation between the true quantity and the predicted quantity Error Measure – Unbiased Model ln( q) ~ f (ln( p)) ~ N (0, ) 2 RMSerror ln( q) | f (ln( p)) ~ N ( , 2 ) E[e ln(q ) ]e 1 2 2 1 N qi qˆi N i 1 E[e ln(q ) ^ but ] e qˆ eln(q ) e 1 2 2 by computing the integral over the distribution ^ e ln(q ) is a biased estimator for q, and we correct the bias by using ^ qˆ e 1 ln(q ) 2 2 2 which is an unbiased estimator for q. Dataset Store-level cash register data at the product level for 100 stores Store prices updated every week Two Years of transactions Chilled Orange Juice category (12 Products) Models Baselines –Linear Regression –Neural Networks Hybrids – – – – Smart Prior MultiTask Learning Jumping Connections Frozen Jumping Connections Baselines Linear Regression Neural Networks Linear Regression K ln( q) a bi ln( pi ) i 1 ~ N (0, 2 ) q is the quantity demanded pi is the price for the ith product K products overall The coefficients a and bi are determined by the condition that the sum of the square residuals is as small as possible. Linear Regression Results - RMS Error 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Neural Networks Generic nonlinear function approximators Collection of basic units (neurons), computing a (non)linear function of their input Random initialization Backpropagation Early stopping to prevent overfitting Neural Networks 1 hidden layer, 100 units, sigmoid activation function Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Hybrid Models Smart Prior MultiTask Learning Jumping Connections Frozen Jumping Connections Smart Prior Idea: Initialize the NN with a “good” set of weights; help it start from a “smart” prior. Start the search in a state which already gives a linear approximation NN training in 2 stages – First, on synthetic data (generated by the LR model) – Second, on the real data Smart Prior LR Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Multitask Learning [Caruana 1997] Idea: learning an additional related task in parallel, using a shared representation Adding the output of the LR model (built over the same inputs) as an extra output to the NN Make the NN share its hidden nodes between both tasks MultiTask Learning •Custom halting function •Custom RMS function Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Jumping Connections Idea: fusing LR and NN Modify architecture of the NN Add connections which “jump” over the hidden layer Gives the effect of simulating a LR and NN together Jumping Connections Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Frozen Jumping Connections Idea: show the model what the “jump” is for Same architecture as Jumping Connections, but two training stages Freeze the weights of the jumping layer, so the network can’t “forget” about the linearity Frozen Jumping Connections Frozen Jumping Connections Frozen Jumping Connections Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Models Baselines: –Linear Regression –Neural Networks Hybrids – – – – Smart Prior MultiTask Learning Jumping Connections Frozen Jumping Connections Combinations –Voting –Weighted Average Combining Models Idea: Ensemble Learning Use all models and then combine their predictions Committee Voting Weighted Average 2 baseline and 3 hybrid models (Smart Prior, MultiTask Learning, Frozen Jumping Conections) Committee Voting Average the predictions of the models Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Weighted Average – Model Regression Optimal weights determined by a linear regression model over the predictions Results RMS 12000 10000 8000 6000 4000 RMS 2000 0 LR NN SmPr MTL JC FJC Vote WAV Normalized RMS Error Compare model performance across stores with different: – Sizes – Ages – Locations Need to normalize Compare to baselines Take the error of the LR benchmark as unit error Normalized RMS Error 1.10 1.05 1.00 0.95 0.90 0.85 0.80 0.75 LR NN SmPr MTL JC FJC Vote WAV Category Summary P of Prod1 Predictor Q bought of Prod1 Q bought of Prod2 P of Prod2 P of Prod3 “I know Q bought of Prod3 ... your P of ProdN customers” ... Q bought of ProdN Built new models for better pricing strategies for individual stores, categories Hybrid models clearly superior to baselines for customer choice prediction Incorporated domain knowledge (linearity) in Neural Networks New models allow stores to – price the products more strategically and optimize profits – maintain better inventories – understand product interaction www.cs.cmu.edu/~eneva References Montgomery, A. (1997). Creating MicroMarketing Pricing Strategies Using Supermarket Scanner Data West, P., Brockett, P. and Golden, L (1997) A Comparative Analysis of Neural Networks and Statistical Methods for Predicting Consumer Choice Guadagni, P. and Little, J. (1983) A Logit Model of Brand Choice Calibrated on Scanner data Rossi, P. and Allenby, G. (1993) A Bayesian Approach to Estimating Household Parameters Work In Progress analyze Weighted Average model compare extrapolation ability of new models Other MTL tasks: – shrinkage model – a “super” store model with data pooled across all stores – store zones