Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MKT 700 Business Intelligence and Decision Models Week 8: Algorithms and Customer Profiling (1) Classification and Prediction Classification Unsupervised Learning Predicting Supervised Learning SPSS Direct Marketing Unsupervised Learning Supervised Learning Classification Predictive RFM Cluster analysis Postal Code Responses NA Customer Profiling Propensity to buy SPSS Analysis Unsupervised Learning Supervised Learning Classification Predictive Hierarchical Cluster Two-Step Cluster K-Means Cluster NA Classification Trees -CHAID -CART Linear Regression Logistic Regression Artificial Neural Nets Major Algorithms Unsupervised Learning Supervised Learning Classification Predictive Euclidean Distance Log Likelihood NA Chi-square Statistics Log Likelihood GINI Impurity Index F-Statistics (ANOVA) Log Likelihood F-Statistics (ANOVA) Nominal: Chi-square, Log Likelihood Continuous: F-Statistics, Log Likelihood Euclidean Distance Euclidean Distance for Continuous Variables Pythagorean distance √d2 = √(a2+b2) Euclidean space √d2 = √(a2+b2+c2) Euclidean distance d = [(di)2]1/2 Pearson’s Chi-Square Contingency Table North South East West Tot. Yes 68 75 57 79 279 No 32 45 33 31 141 100 120 90 110 420 Tot. Observed and theoretical Frequencies North South Yes No Tot. 68 66 32 34 100 75 80 45 40 120 East West Tot. 57 60 33 30 90 79 73 31 37 110 279 66% 141 34% 420 Chi-Square: Obs. fo fe 1,1 1,2 1,3 1,4 2,1 2,2 2,2 2,4 68 75 57 79 32 45 33 31 66 80 60 73 34 40 30 37 X (fo fe) fe 2 fo-fe (fo-fe)2 (fo-fe)2 fe 2 -5 -3 6 -2 5 3 6 4 25 9 36 4 25 9 36 .0606 .3125 .1500 .4932 .1176 .6250 .3000 .9730 X2= 3.032 2 Statistical Inference DF: (4 col –1) (2 rows –1) = 3 .10 3.032 6.251 .05 7.815 Log Likelihood Chi-Square Log Likelihood Based on probability distributions rather than contingency (frequency) tables. Applicable to both categorical and continuous variables, contrary to chi-square which must be discreticized. Contingency Table (Observed Frequencies) Cluster 1 Cluster 2 Male 10 30 Total 40 Contingency Table (Expected Frequencies) Cluster 1 Cluster 2 Male 10 20 30 20 Total 40 40 Chi-Square: Obs. fo Fe 1,1 1,2 10 30 20 20 X (fo fe) fe 2 fo-fe (fo-fe)2 (fo-fe)2 fe -10 10 100 100 5.00 5.00 X2= 10.00 p < 0.05; DF = 1; Critical value = 3.84 2 Log Likelihood Distance & Probability Cluster 1 Cluster 2 Male O E O/E Ln (O/E) O * Ln (O/E) 2∑O*Ln(O/E) 10 20 30 20 10/20 = .50 -.693 10*-.693 -6.93 30/20=1.50 .405 30*.405 12.164 2*-6.93+12.164 = 10.46 p < 0.05; critical value = 3.84 Variance, ANOVA, and F Statistics F-Statistics For metric or continuous variables Compares explained (in the model) and unexplained variances (errors) SQUARED Variance SS is Sum of Squares DF = N-1 VAR=SS/DF SD = √VAR VALUE 20 34 34 38 38 40 41 41 41 42 43 47 47 48 49 49 55 55 55 55 COUNT 20 MEAN 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 43.6 DIFFERENCE 557 92.16 92.16 31.36 31.36 12.96 6.76 6.76 6.76 2.56 0.36 11.56 11.56 19.36 29.16 29.16 130 130 130 130 SS = DF= MEAN 43.6 1461 19 VAR = 76.88 SD= 8.768 ANOVA Two Groups: T-test Three + Group Comparisons: Are errors (discrepancies between observations and the overall mean) explained by group membership or by some other (random) effect? Oneway ANOVA Group 1 6 5 4 5 4 6 5 4 Group 2 8 9 7 8 9 7 8 9 Group 3 3 2 1 3 2 1 3 2 8.125 2.125 (X-Mean)2 1.266 0.016 0.766 0.016 0.766 1.266 0.016 0.766 (X-Mean)2 0.016 0.766 1.266 0.016 0.766 1.266 0.016 0.766 (X-Mean)2 0.766 0.016 1.266 0.766 0.016 1.266 0.766 0.016 4.875 4.875 4.875 SS Within 14.625 Group means 4.875 Grand mean 5.042 (X-Mean)2 0.918 0.002 1.085 0.002 1.085 0.918 0.002 1.085 8.752 15.668 3.835 8.752 15.668 3.835 8.752 15.668 4.168 9.252 16.335 4.168 9.252 16.335 4.168 9.252 Total SS 158.958 MSS(Between)/MSS(Within) Between Groups Winthin groups SS DF Mean SS Between Groups Mean SS Within Groups Mean SS 14.625 24-3=21 0.696 + 72.167 0.696 Total Errors 144.333 = 3-1=2 72.167 158.958 24-1=23 6.911 103.624 p-value < .05 ONEWAY (Excel or SPSS) Anova: Single Factor SUMMARY Groups Group 1 Group 2 Group 3 ANOVA Source of Variation Between Groups Within Groups Total Count Sum 39 65 17 Average 4.875 8.125 2.125 Variance 0.696 0.696 0.696 144.333 14.625 2 21 MS 72.167 0.696 F 103.624 158.958 23 8 8 8 SS df P-value 1.318E-11 F crit 3.467 Profiling Customer Profiling: Documenting or Describing Who is likely to buy or not respond? Who is likely to buy what product or service? Who is in danger of lapsing? Profiling/Decision Tree SPSS Direct Marketing Customer Profiling Postal Code responses SPSS Analysis Classification Decision Tree • CHAID (Chi-Square Automatic Interactive Detector) • CART (Classification and Regression Tree)