Download Intelligence d`affaires Séance 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
MKT 700
Business Intelligence and
Decision Models
Week 8:
Algorithms and
Customer Profiling (1)
Classification and Prediction
Classification
Unsupervised Learning
Predicting
Supervised Learning
SPSS Direct Marketing
Unsupervised
Learning
Supervised
Learning
Classification
Predictive
RFM
Cluster analysis
Postal Code Responses
NA
Customer Profiling
Propensity to buy
SPSS Analysis
Unsupervised
Learning
Supervised
Learning
Classification
Predictive
Hierarchical Cluster
Two-Step Cluster
K-Means Cluster
NA
Classification Trees
-CHAID
-CART
Linear Regression
Logistic Regression
Artificial Neural Nets
Major Algorithms
Unsupervised
Learning
Supervised
Learning
Classification
Predictive
Euclidean Distance
Log Likelihood
NA
Chi-square Statistics
Log Likelihood
GINI Impurity Index
F-Statistics (ANOVA)
Log Likelihood
F-Statistics (ANOVA)
Nominal: Chi-square, Log Likelihood
Continuous: F-Statistics, Log Likelihood
Euclidean Distance
Euclidean Distance for
Continuous Variables

Pythagorean distance  √d2 = √(a2+b2)

Euclidean space  √d2 = √(a2+b2+c2)

Euclidean distance  d = [(di)2]1/2
Pearson’s Chi-Square
Contingency Table
North South
East
West
Tot.
Yes
68
75
57
79
279
No
32
45
33
31
141
100
120
90
110
420
Tot.
Observed and theoretical
Frequencies
North South
Yes
No
Tot.
68
66
32
34
100
75
80
45
40
120
East
West
Tot.
57
60
33
30
90
79
73
31
37
110
279
66%
141
34%
420
Chi-Square:
Obs.
fo
fe
1,1
1,2
1,3
1,4
2,1
2,2
2,2
2,4
68
75
57
79
32
45
33
31
66
80
60
73
34
40
30
37
X
(fo fe)

fe
2
fo-fe (fo-fe)2 (fo-fe)2
fe
2
-5
-3
6
-2
5
3
6
4
25
9
36
4
25
9
36
.0606
.3125
.1500
.4932
.1176
.6250
.3000
.9730
X2= 3.032
2
Statistical Inference

DF: (4 col –1) (2 rows –1) = 3
.10
3.032
6.251
.05
7.815
Log Likelihood Chi-Square
Log Likelihood

Based on probability distributions
rather than contingency (frequency)
tables.

Applicable to both categorical and
continuous variables, contrary to
chi-square which must be
discreticized.
Contingency Table
(Observed Frequencies)
Cluster 1 Cluster 2
Male
10
30
Total
40
Contingency Table
(Expected Frequencies)
Cluster 1 Cluster 2
Male
10
20
30
20
Total
40
40
Chi-Square:
Obs.
fo
Fe
1,1
1,2
10
30
20
20
X
(fo fe)

fe
2
fo-fe (fo-fe)2 (fo-fe)2
fe
-10
10
100
100
5.00
5.00
X2= 10.00
p < 0.05; DF = 1; Critical value = 3.84
2
Log Likelihood Distance
& Probability
Cluster 1 Cluster 2
Male
O
E
O/E
Ln (O/E)
O * Ln (O/E)
2∑O*Ln(O/E)
10
20
30
20
10/20 = .50
-.693
10*-.693
-6.93
30/20=1.50
.405
30*.405
12.164
2*-6.93+12.164
= 10.46
p < 0.05; critical value = 3.84
Variance, ANOVA, and
F Statistics
F-Statistics

For metric or continuous variables

Compares explained (in the model)
and unexplained variances (errors)
SQUARED
Variance
SS is Sum of Squares
DF = N-1
VAR=SS/DF
SD = √VAR
VALUE
20
34
34
38
38
40
41
41
41
42
43
47
47
48
49
49
55
55
55
55
COUNT
20
MEAN
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
43.6
DIFFERENCE
557
92.16
92.16
31.36
31.36
12.96
6.76
6.76
6.76
2.56
0.36
11.56
11.56
19.36
29.16
29.16
130
130
130
130
SS =
DF=
MEAN 43.6
1461
19
VAR =
76.88
SD=
8.768
ANOVA

Two Groups: T-test

Three + Group Comparisons: Are
errors (discrepancies between
observations and the overall mean)
explained by group membership or
by some other (random) effect?
Oneway
ANOVA
Group 1
6
5
4
5
4
6
5
4
Group 2
8
9
7
8
9
7
8
9
Group 3
3
2
1
3
2
1
3
2
8.125
2.125
(X-Mean)2
1.266
0.016
0.766
0.016
0.766
1.266
0.016
0.766
(X-Mean)2
0.016
0.766
1.266
0.016
0.766
1.266
0.016
0.766
(X-Mean)2
0.766
0.016
1.266
0.766
0.016
1.266
0.766
0.016
4.875
4.875
4.875
SS Within
14.625
Group means
4.875
Grand mean
5.042
(X-Mean)2
0.918
0.002
1.085
0.002
1.085
0.918
0.002
1.085
8.752
15.668
3.835
8.752
15.668
3.835
8.752
15.668
4.168
9.252
16.335
4.168
9.252
16.335
4.168
9.252
Total SS
158.958
MSS(Between)/MSS(Within)
Between
Groups
Winthin groups
SS
DF
Mean SS
Between Groups Mean SS
Within Groups Mean SS
14.625
24-3=21
0.696
+
72.167
0.696
Total Errors
144.333 =
3-1=2
72.167
158.958
24-1=23
6.911
103.624
p-value < .05
ONEWAY (Excel or SPSS)
Anova: Single Factor
SUMMARY
Groups
Group 1
Group 2
Group 3
ANOVA
Source of
Variation
Between Groups
Within Groups
Total
Count
Sum
39
65
17
Average
4.875
8.125
2.125
Variance
0.696
0.696
0.696
144.333
14.625
2
21
MS
72.167
0.696
F
103.624
158.958
23
8
8
8
SS
df
P-value
1.318E-11
F crit
3.467
Profiling
Customer Profiling:
Documenting or Describing



Who is likely to buy or not respond?
Who is likely to buy what product or
service?
Who is in danger of lapsing?
Profiling/Decision Tree
SPSS Direct Marketing 
Customer Profiling
Postal Code responses
 SPSS Analysis  Classification 
Decision Tree

• CHAID (Chi-Square Automatic
Interactive Detector)
• CART (Classification and Regression
Tree)