Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Database Marketing Dr. Ron Rymon Marketing Communications Program IDC, Herzliya Overview Goal: describe the framework, and touch on the current trends and buzzwords Outline: Uses of the marketing database The Data Implementation technologies Analysis techniques Modeling techniques Uses of the Marketing Database The Marketing Database • Comprehensive collection of interrelated data ... • Arranged around each customer ... • Allow timely and accurate retrieval ... • Support analytical, predictive, operational needs ... • Serving multiple applications … Active Database: An Integrated Business Resource Marketing Customer Service Finance Database Research Sales Distribution Information is Power: Active databases drive the business… • Identify your best customers – profitability analysis, clustering • Develop new customers and cross-sell – similar to current, identify competitors’ customers • Improve delivery of sales promotion – response modeling / targeting • Personalize message – based on purchase patterns, volume • Use as a research tool across the organization – customer, product, market research Key Building Blocks • Data – a database is only as powerful as its data • Implementation technologies – hardware, networking, warehouses • Analysis techniques – RFM, LTV, OLAP, Segmentation, Visualization • Modeling – Regressions, Artificial Intelligence, Data Mining The Data A database is only as powerful as the data it houses. Customer-centric Database • Every database is a collection of records • Each record is a collection of fields • Here, one record per customer and/or prospect – unique identifier – general customer/account information • including demographic, psychographic, socio-economic – our offers+communications to the customer – customer’s actions: response, purchase, payment What Data To Hold • Too often, data is collected based on availability, and not based on projected need • Should accumulate internally – data that can be used to support current and future strategies (mktg and otherwise, e.g., operations) – …. data that may be valuable to other organizations • Should source external data – unavailable internally – too expensive to maintain/update What’s in YOUR database? Data Requirements Marketing + other Applications Basic Demo Targeting Retention Usage upgrading Usage Payment Data Sources : Internal • Operations / Sales – past usage/purchase, e.g., amount, variability • Finance – payments, e.g., timeliness, amounts • Customer service – e.g., inquiries, complaints • Other data collection methods: – sales/orders, promotions/drawings, inquiries, surveys, warranty cards, research panels... Data Sources: Distribution Channels • Many companies use distributors, retailers • Problem: lack of direct communications with endcustomer, no “relationship” • Part-solution 1: keep tabs on channel + aggregate statistics on customers • More aggressive solution: special marketing programs to reach customers Data Sources : External Lists • 50% of U.S. DMers sell their lists • Use to enlarge universe : new names – can buy segments by specific features (model) • Enhance data : cross information enhance data – U.S. census data – Credit bureau – Various marketers of related products – List compilers / maintainers / sellers enlarge universe John Smith Eric Cohen Jack Marshal 60 35 20 10.2 1.3 0 Other Data Sources • Mass-advertised offers – TV call-ins, direct response • Joint offers with other merchants – take-one brochures in banks, restaurants – drawings • Trade shows, happenings, community activity • Referrals! Data Management • Many sources: – conversions, transformations, cleaning, merge-purge • Many “clients” – marketing, sales, product managers, operations • Temporal issues – updates, audits, archives/deltas • Quantities: huge databases – Storage, access, processing, communications • Resolution, Enhancement Merge-Purge : Example • Palmer, Robert and Mary, 123 Sun Avenue, Apt 7, Key West, FL 31250 • Dr. Robert C. Palmer, Custom Engineers LLC, 123 Sun Avenue, 7th Floor, Key West, FL 31250 • Rob Charles Palmer, CE Inc., 123 Sun Ave #7, Key West FL 31250 • Bob Palmer Jr., 123 Sun #7, Key West, FL 31252 • Maria Palmer, 123 Sun Avenue, Suite 7, Key West, FL 31250 Other Issues • Legal – – – – Privacy Act Anti-discrimination Advertising Code Telephone Consumer Protection Act • Consumer groups – Right to be omitted (just write to DMA) – Environmental issues • DMA invests in education: – Dmers: best practice – Customers: better image Implementation Technologies Computing Platforms • Issues / Needs: – – – – Information sources + integration Storage/access, maintenance, completeness, update Computation: process queries, algorithms Analyses and reports, feed to operations, customer/user interaction • Trends: – – – – – Traditionally, all DMers used mainframes Today, some migration to mid-range (UNIX) PC-based computers gaining power (NT) Client/Server architectures Everything networked Database Management • Database is a foundational software • Must support variety of applications: – transaction processing – analyses – on-line interaction • Trends: – Relational databases – Data warehouses – Data redundancy/multiplicity Applications Database O.S. Relational Database Tables ID Cust Name Address … 1234 John Brown 123 Main St …. ID Product Supplier ….. A703 Levis Jeans S7003 …. ID Date 98765 3.5.98 98766 4.5.98 98767 4.5.98 Cust 1234 1234 1235 Product Quant. Price A703 5 150.00 A707 2 240.00 A703 1 30.00 Reports (SQL Queries) Transactions for John Brown 3.5.98 5 Levis Jeans $150 4.5.98 2 CK Jacket $120 Purchases of Levis Jeans 3.5.98 John Brown 5 $150 4.5.98 Jane Doe 1 $30 Data Warehouse • Stores data for informational and analytical processing – Separate from operations Operational Data Warehouse – Subject-oriented – Integrated savings customer – Historical product loans credit card investments Example: Computer-by-Mail Inc. House Files operations Mktg Executive mail Analyst Data Warehouse Client --- Server Telemarketers Analysis Techniques Data Limitations • Important: The data is a limited encoding of reality • Many potholes: – Omission – Errors, noise – Representation – Sampling bias • Cannot be too careful ! Exploratory data analysis : Single-variable • Descriptive statistics – Mean, Median – Variance • Histograms – Shows distribution 40% 35% 30% 25% 20% 15% 10% 5% 0% 0-20 20-30 30-40 40+ Exploratory data analysis: Multi-variable • Examine relationship between two or more variables – – – – Cross tabs Correlation Scatter plots Clustered histograms 45% 40% Buyer NoBuy 35% 30% 25% 20% 15% 10% 5% 0% 0-20 20-30 30-40 40+ RFM Analysis • RFM score – Recency: how close is the last purchase – Frequency: number of recent purchases – Monetary: dollars spent recently • Example: – Recency: 10 pts if within 3 mos, 1 pt lower per additional month, to 1 – Frequency: 1 pt for each purchase within 12 months – Monetary: 1pt for each $100 in past year, to 10 – Score=R*F*M, the higher the better Life-Time Value of Customers • LTV Goal: – recognize each customer’s contribution • Method: – calculate the “expected” net revenue – discounted: • risk of attrition • probability of sales • rate of money • Typically computed per 1000, if possible by segment OLAP Tools • OLAP : On-Line Analytical Processing • Goal: A database-driven system that provides – Fast – Analysis • common business reports, statistics – of Shared • same information available to many users – Multi-dimensional • every piece of information is multiply categorized “The OLAP Report” – Information OLAP Tools • Data represented internally as multi-dimensional cube – e.g., customer’s attributes, purchases, payments, etc. • User chooses presenting two dimensions at a time – e.g., show $-sales, by geographic region and income • Heavy use of hierarchical variables, with drilling capabilities: – time: year, quarter, month, week, day, hour – product: hardware, printers, small printers, PX-1000 – dollars: by ranges 0-1000, 1000-5000, 5000-25000, etc. • Analyses, highlights of interesting cases, etc. OLAP Tools: Example Screen Data Visualization Tools • Many relationships are best communicated visually: – histograms, pie-charts, scatter plots, graphs – use color/texture, shapes – temporal animations • Visualization software allows – – – – single-variable over time one variable as a function of another interaction detection segmentation Modeling Techniques Modeling Behavior • Target variable – a.k.a. dependent/modeled/explained variable – typically, whether bought/responded or not • Goal: – Use other variables in a model to classify/predict – other variables: a.k.a. independent, observable, explaining – model: formula, algorithm • Success criterion: future performance Modeling and Validation Framework • Data flow: – – – – – – Historical data Modeling software Training Set Constructs model Tested on more historical data Repeat until satisfied Use model to predict Output Model Production or Test Set Critical Success Factors • Choice of data – scope: same/similar period, audience, offer, communication – explaining variables: available, useful, well-represented • Choice of modeling technique – appropriate for the goal – powerful: good fitting power • Careful and “pessimistic” testing and validation Validation • NEVER test on same data set – avoid “memorizing” the data, overfitting • Out-of-sample methods – separate training set and test set – cross-validation, a.k.a. jack-knifing – remember temporal aspect • Evaluate the model’s robustness – estimate chance probability, bootstrapping Classification v. Prediction Systems • Classification systems: – distinguish few types of customers, e.g., responded or not – technically, target variable is discrete/categorical – validation through “hit rate” • Prediction systems – predict probability of purchase, or purchase dollars – technically, target variable is continuous – validation through “closeness” measures Linear Scoring Systems • Use linear regression Score a Income b PastPurchase c Age d • • • • Coefficients evaluated using historical data Higher score interpreted as greater likelihood of responding Every coefficient measures “independent” contribution Classification variant: discriminant analysis – e.g., predict response if score is above 0.3 Logistic Regressions • Logistic regression (logit) ea Income b PastPurchasec Aged Probabilit y 1 ea Income b PastPurchasec Aged • Target variable – historical data: 0 or 1 – future application: used as probability • Independent variables: continuous or categorical • Probit: variation that relies on normal distribution Presenting and Evaluating Results • Lift table Top Scoring % Respond %Non-Respond 5% 26.8% 4.9% 10% 41.2% 9.8% 15% 52.4% 14.8% 20% 62.6% 19.8% 50% 75% 87.9% 96.5% 49.7% 74.8% Presenting and Evaluating Results • Lift curve (a.k.a. Receiver Operating Characteristic curve) 1 0.9 0.8 %Responders 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 % Non-Responders 0.7 0.8 0.9 1 Presenting and Evaluating Results • Confusion Matrix (given a specific threshold) Predicted Respond Predicted Non-Respond Actual Respond Pr,r Pn,r Actual Non-Respond Pr,n Pn,n • Accuracy=(Pr,r+Pn,n)/Total • Detection=Pr,r/(Pr,r+Pr,n) • Two error types: Pr,n and Pn,r Break-even Analysis • Issue: how much to mail? • Solution: find break-even point • e.g.: %mail %respond Income 5% 10% 15% 20% 25% 30% 35% 25% 21% 16% 11% 6% 3% 2% $1,250 $1,050 $800 $550 $300 $150 $100 • Caution: use held-out data! Cost of product $750 $630 $480 $330 $180 $90 $60 Cost of mail $50 $50 $50 $50 $50 $50 $50 Marginal Profit $450 $370 $270 $170 $70 $10 -$10 Non-Linear Systems • In regressions, a change in one independent variables always affects in same direction – e.g., if age affects positively, then the older the better, always • One solution: transformations – e.g., if U-shaped relation, use quadratic form • Or, use non-linear techniques: – Neural networks – Decision trees – Other: rule-based systems, genetic algorithms, Bayesian nets Neural Networks • Motivated by biological nervous system • Perceptron = a model of a neuron activation w1 w2 x1 x2 w3 w4 x3 W iXi e W iXi 1 e w5 x4 x5 Classical Neural Net • Multi-layer network of perceptrons • Proper weights are “discovered” from random – forward propagation of training set – compare output to actual target variable – back propagation of error to adapt weights Decision Trees AB 0 0 0 1 1 0 1 1 1 1 C D Resp. 1 0 Buy 1 1 Buy 1 0 No 0 0 Buy 1 1 No • Partition the data based on one attribute... 50 Induction of Decision Trees A=0 0 0 1 0 Buy 0 1 1 1 Buy Recursively, AB 0 0 0 1 1 0 1 1 1 1 C D Resp. 1 0 Buy 1 1 Buy 1 0 No 0 0 Buy 1 1 No A=1 1 0 1 0 No 1 1 0 0 Buy 1 1 1 1 No partition each of the nodes 51 Induction of Decision Trees A=0 0 0 1 0 Buy 0 1 1 1 Buy A B C D Resp. 0 0 1 0 Buy 0 1 1 1 Buy 1 0 1 0 No 1 1 0 0 Buy 1 1 1 1 No C=0 A=1 1 0 1 0 No 1 1 0 0 Buy 1 1 1 1 No 1 1 0 0 Buy …until the node is homogeneous C=1 1 0 1 0 No 1 1 1 1 No 52 Classification (A=1,B=0,C=0,D=1) A=0 A=1 Buy C=0 Buy C=1 No • Go down a matching path... 53 Classification (A=1,B=0,C=0,D=1) A=0 Buy A=1 C=0 Buy C=1 No Continue... 54 Classification (A=1,B=0,C=0,D=1) A=0 Buy …until reaching a leaf Use the leaf’s probability A=1 C=0 Buy C=1 No 55 Set of Rules, or Market Segments A=0 Buy A=1 C=0 Buy C=1 A=0 => Buy A=1 and C=0 => Buy A=1 and C=1 => No No • Each rule represents a market segment 56 Which Modeling Technique? • Decision trees (ChAID, CART, C4.5) – symbolic: model is interpretable as set of rules – essentially is a segmentation – useful when few “classes”, e.g., based on action (send/not, or few offer types) • Regressions, neural nets – numeric: allows fine-tuning, e.g., for prediction or ranking – model is hard to interpret and used as “black-box” – useful when target is continuous Data Mining • Knowledge Discovery in Databases (KDD) • KDD is the process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data • Includes all steps of data preparation + management • Data mining step uses statistical techniques, decision trees, neural nets, etc. Summary • Customer data can be leveraged to better understand and manage current customers, and target new ones • Data analysis and visualization – insights about our customers – business economics • Modeling – “mined” insights – classify/predict behavior