Download Discriminant Analysis

Document related concepts
no text concepts found
Transcript
Discriminant Analysis
Discriminant Analysis With 2-Groups
Comparison of 2-Group Discriminant
Analysis With Logistical Regression
Discriminant Analysis With More
Than 2-Groups
Notes on Discriminant Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
2
KEY CONCEPTS
*****
Discriminant Analysis
Discriminant function
A priori categories or groups
Homogeneity of variance/covariance matrices
Differences between discriminant analysis and logistical regression
Partitioning of sums of squares in discriminant analysis
TSS = BSS = WSS
Discriminant score
Discriminant weight or coefficient
Discriminant constant
Discriminant analysis assumptions
Steps in the discriminant analysis process
Box's M test and its null hypothesis
Wilks' lambda
Stepwise method in discriminant analysis
Pin and Pout criteria
F-test to determine the effect of adding or deleting a variable from the model
Unstandardized and standardized discriminant weights
Measures of goodness-of-fit
Eigenvalue
Canonical correlation
Model Wilks' lambda
Classification table and hit ratio
t-test for a hit ratio
Maximum chance criteria
Proportional chance criteria
Press's Q statistic
Histogram of discriminant scores
Casewise plot of the predictions
Calculation of the cutting score: equal and unequal groups
Prior probability
Conditional probability
Bayes' theorem and posterior probability
Structure coefficient or discriminant loading
Group centroid
Testing the collinearity of the predictor variables
Assumptions about multiple discriminant functions
Number: g-1 or k whichever is less
Functions may be collinear
Discriminant scores must be independent
KEY CONCEPTS (cont.)
Interpretation of multiple discriminant functions
Territorial map
Scatterplot of the discriminant scores across the discriminant functions
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
3
Lecture Outline
 What is discriminant analysis
 The concept of partitioning sums of squares
 Discriminant assumptions
 Stepwise discriminant analysis with Wilks' lambda
 Testing the goodness-of-fit of the model
 Determining the significance of the predictor
variables
 A 2-group discriminant problem
 A multi-group discriminant problem
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
4
Discriminant Analysis
Z = a + W1X1 + W2X2 + ... + WkXk
Dependency Technique
 Dependent variable is nonmetric
 Independent variables can be metric and/or
nonmetric
Used to predict or explain a nonmetric dependent
variable with two or more a priori categories
Assumptions
 Xk are multivariate normally distributed
 Homogeneity of variance-covariance
matrices of Xk across groups
 Xk are independent, non-collinear
 The relationship is linear
 Absence of outliers
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
5
Predicting a Nonmetric Variable
Two approaches
Logistical Regression … with a dummy
coded DV
Limited to a binary nonmetric dependent
variable
Makes relatively few restrictive assumptions
Discriminant Analysis … with a nonmetric
dependent variable with 2 or more groups
Not limited to a binary nonmetric
dependent variable
Makes several restrictive assumptions
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
6
Partitioning Sums of Squares (SS) in
Discriminant Analysis
In Linear Regression
The Total SS  ( Y- Y) 2 is partitioned into
Regression SS  ( Y'- Y) 2
Residual SS +  ( Y'- Y) 2
 ( Y- Y) 2 =  ( Y'- Y) 2 +  ( Y'- Y) 2
Goal
Estimate parameters that minimize
the Residual SS
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
7
Partitioning Sums of Squares (SS) in Discriminant Analysis (cont.)
In Discriminant Analysis
The Total SS  ( Zi- Z) 2 is partitioned into:
Between Group SS  ( Zj- Z) 2
Within Groups SS  ( Zij- Zj) 2
 ( Zi- Z) 2 =  ( Zj- Z) 2 +  ( Zij- Zj) 2
i = an individual case, j = group j
Zi = individual discriminant score
Z = grand mean of the discriminant
scores
Zj = mean discriminant score for group j
Goal
Estimate parameters that minimize
the Within Group SS
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
8
Elements of the Discriminant Model
Z = a + W1X1 + W2X2 + ... + WkXk
Z = discriminant score, a number used to predict
group membership of a case
a = discriminant constant
Wk = discriminant weight or coefficient, a measure of
the extent to which variable Xk discriminates among
the groups of the DV
Xk = an IV or predictor variable. Can be metric or
nonmetric.
Discriminant analysis uses OLS to estimate the
values of the parameters (a) and Wk that minimize
the Within Group SS
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
9
An Example of Discriminant Analysis with a
Binary Dependent Variable
Predicting whether a felony offender will receive a
probated or prison sentence as a function of various
background factors.
Dependent Variable
Type of sentence (type_sent)
(0 = probation, 1 = prison)
Independent Variables
 Degree of drug dependency (dr_score)
 Age at first arrest (age_firs)
 Level of work skill (skl_index)
 The seriousness of the crime (ser_indx)
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
10
Discriminant Analysis with Two Groups
Z0
Z
Z1
f of Z
Probation (0)
Prison (1)
Between SS = (Z0 - Z)2 + (Z1 - Z)2 =  (Zj - Z)2 = BSS
Within SS = (Zi0 - Z0)2 + (Zi1 - Z1)2 =  (Zij - Zj)2 =WSS
Total SS =  (Zi - Z)2 = TSS
Z0 and Z1 are called centroids, the mean discriminant
score for each group
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
11
Discriminant Analysis Assumptions
 The predictor variables are multivariate normal,
ipso facto univariate normal
 The variance-covariance matrices of the predictor
variables across the various groups are the same
in the population, i.e. homogeneous
 The groups defined by the DV exist a priori
 The predictor variables are noncollinear
 The relationship is linear in its parameters
 Absence of leverage point outliers
 The sample is large enough, say 30 cases for
each predictor variable
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
12
Steps in Discriminant Analysis Process
 Specify the dependent & the predictor variables
 Test the model’s assumptions a priori
 Determine the method for selection and criteria for
entering the predictor variables into the model
 Estimate the parameters of the model
 Determine the goodness-of-fit of the model and
examine the residuals
 Determine the significance of the predictors
 Test the assumptions ex post facto
 Validate the results
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
13
The Sentence-Type Discriminant Model
Specification of the model (N = 70)
Z = a + W1(dr_score) + W2(age_firs) +
W3(skl_indx)... + W4(ser_indx)
Are the predictor variables multivariate normally
distributed?
12
10
8
6
4
2
0
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5
10.5
DR_SCORE
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
14
The Sentence-Type Discriminant Model (cont.)
20
16
14
12
10
10
8
6
4
0
2
0.0
2.0
4.0
6.0
8.0
10.0
SKL_INDX
0
14.0
15.0
16.0
17.0
18.0
19.0
20.0
21.0
22.0
AGE_FIRS
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
15
The Sentence-Type Discriminant Model (cont.)
14
12
10
8
6
4
2
0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
SER_INDX
Variable
Dr-score
Age_firs
Skl_indx
Ser_indx
Skew
Kurtosis
-0.5049
+0.7728
-0.0266
+0.2197
-0.6946
-0.4250
-1.2321
-1.1727
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
16
Are the Variance/Covariance Matrices of
the Two Groups Homogeneous?
Covariance Matrices
TYPE_SEN
.00
1.00
DR_SCORE
AGE_FIRS
SKL_INDX
SER_INDX
DR_SCORE
AGE_FIRS
SKL_INDX
SER_INDX
DR_SCORE
7.590
-1.945
1.932
2.110
6.466
-2.688
-.648
1.474
AGE_FIRS
-1.945
5.553
-.329
-1.225
-2.688
4.500
.469
-2.219
SKL_INDX
1.932
-.329
8.632
-.857
-.648
.469
7.922
-.507
SER_INDX
2.110
-1.225
-.857
3.441
1.474
-2.219
-.507
3.405
The variances are on the diagonals, and the
covariances are on the off-diagonals.
Q Are the variance/covariance matrices of the two
groups the same
homogeneous?
in
the
population,
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
i.e.
17
Are the Variance/Covariance Matrices of the Two Groups Homogeneous?
(cont.)
Box's M test
H0: the variance/covariance matrices of the two
groups are the same in the population.
Log Determinants
TYPE_SEN
.00
1.00
Pooled within-groups
Rank
2
2
2
Log
Determinant
3.076
2.988
3.040
The ranks and natural logarithms of determinants
printed are thos e of the group covariance matrices.
Te st Results
Box's M
F
Approx .
df1
df2
Sig.
.361
.116
3
1476249
.951
Tests null hypothes is of equal population covariance matrices.
Box's M = 0.361,
Approximate F = 0.116, p = 0.951
Conclusion: The null hypothesis with respect to the
homogeneity of variance/covariance matrices in the
population is accepted.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
18
How Will the Predictor Variables Be
Entered into the Discriminant Model?
SPSS offers two methods for building a discriminant
model …
Entering all the variables simultaneously
Stepwise method
In this example, the variables will be entered in a
stepwise fashion using Wilks' lambda criterion
Q What is Wilks' lambda ()?
For a given predictor variable,  is the ratio of
the WSS to the TSS ( = WSS / TSS)
It is derived from a one-way ANOVA with
type_sent as the IV and the predictor
variable as the DV
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
19
How Will the Predictor Variables be Entered into the Discriminant Model?
(cont.)
 = (WSS / TSS)
 =  (Zij - Zj)2 /  (Zi - Z)2
 Step 1:
Compute four one-way ANOVAs with
type_sent as the IV and each of the four predictor
variables as the DVs
Variables in the Analysis
Step
1
2
SER_INDX
SER_INDX
DR_SCORE
Tolerance
1.000
.864
.864
Sig. of F to
Remove
.000
.000
.019
Wilks'
Lambda
.983
.832
 Step 2:
Identify the predictor variable that has the
lowest significant Wilks' lambda () and enter it into the
discriminant model, i.e. ser_indx. (Pin default = 0.05)
 Step 3: Estimate the parameters of the resulting
discriminant model
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
20
How Will the Predictor Variables be Entered into the Discriminant Model?
(cont.)
 Step 4: Of the variables not in the model, select the
predictor that has the lowest significant  and enter it
into the model. Determine if the addition of the variable
was significant. Now check if the predictor(s) previously
entered are still significant. (Pout default = 0.10)
 Step 5: Repeat Step 4 until all the predictor variables
are entered into the model or until none the variables
outside the model have significant 's.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
21
How Is the Significance of Change
Determined When a Variable is Entered
Into the Discriminant Function?
Use an F-ratio comparing the Wilks' lambda of the
model with the greater number of predictors (k) with
the one with the lesser number of predictors (k-1)
F=
1 - ( k-1) / (k)
( k-1) / (k)
(N - g - 1)
(g - 1)
 = WSS / TSS of the function
N = total sample size
g = number of DV groups
df = (N - g - 1) and (g - 1)
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
22
Estimation of the Parameters of the Model
Z = a + W1(dr_score) + W2(age_firs) +
W3(skl_indx)... + W4(ser_indx)
What values of the constant (a) and the discriminant
coefficients Wk best predict whether a case will
receive a probated or a prison sentence?
After variable selection by a stepwise process using
Wilks' , the best equation was found to be …
Ca nonical Discrim ina nt Function Coe ffici ents
DR_SCORE
SER_INDX
(Const ant)
Function
1
-.235
.564
-.706
Unstandardized coefficients
Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
23
Estimation of the Parameters of the Model (cont.)
Prediction of a case
Take a case with dr_score = 9, ser_indx = 1,
and an actual sentence = 0, (i.e. a probated
case)
Z = -0.706 - 0.235 (9) + 0.564 (1) = -2.25
Since -2.25 is closer to the code 0 than the code
1, the case would be predicted a probated case,
i.e. code 0.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
24
How Can the Goodness-of-Fit of the
Model Be Measured?
 Eigenvalues ()
 The Canonical Correlation eta ()
 Wilks' Lambda ()
 Classification Table
Hit Ratio
t-test of the Hit Ratio
Maximum Chance Criteria
Proportional Chance Criteria
Press’s Q Statistic
 Casewise Plot of the Predictions
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
25
What is an Eigenvalue?
In matrix algebra, an eigenvalue is a constant, which if
subtracted from the diagonal elements of a matrix, results
in a new matrix whose determinant equals zero.
An example
Given the matrix:
4
1
2
5
A=
(4 - x)
A
1
=
= 0.0
2
(5 - x)
Calculating the determinant of the matrix A:
( 4 - x) (5 - x) - (2) (1) = 0.0
(20 - 4x - 5x + x2 - 2) = 0.0
(18 - 9x + x2) = 0.0
(x2 - 9x + 18) = 0.0
This quadratic equation has two solutions or eigenvalues:
+ 6 and + 3
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
26
What Is an Eigenvalue in Discriminant
Analysis?
In the present example, the DV is composed of two
groups; i.e. cases either sentenced to probation or
prison.
When there are two groups, one discriminant
function can be extracted from the data and its
associated eigenvalue is as follows …
 = BSS / WSS = [  ( Zj- Z)2 /  ( Zij- Zj)2 ]
Interpretation
If  = 0.00, the model has no discriminatory
power, BSS = 0.0
The larger the value of , the greater the
discriminatory power of the model
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
27
What is the Eigenvalue for the
Sentence-Type Example?
Eigenvalues
Function
1
Eigenvalue
% of Variance
.305a
100.0
Cumulative %
100.0
Canonical
Correlation
.483
a. First 1 canonical dis criminant functions were used in the
analys is.
The eigenvalue of the discriminant function = 0.305
The % of the variance explained that is explained by
this discriminant function = 100%*
The cumulative percentage of the variance
explained by the 1st discriminant function = 100%*
* With two DV groups, only one discriminant function can be extracted, which will
therefore explain all the variance explained by the model. But with three groups, two
functions can be extracted, with g groups, (g - 1) functions can be extracted, or k
functions if k is less than g. Therefore, a different % of the total variance explained will
be explained by each of the successive functions extracted.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
28
How Can You Tell if the Eigenvalue Is
Significant?
Two useful statistical indicators can be derived from
the eigenvalue …
The canonical correlation eta ()
Wilks' lambda () for the model
The canonical correlation ()
=
 / (1 + ) =
BSS / TSS
 = the correlation of the predictor(s) with the
discriminant scores produced by the model
2 = coefficient of determination
1 - 2 = coefficient of non-determination
For the sentence-type example
=
0.3050 / (1 + 0.3050) = 0.483
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
29
How Can You Tell if the Eigenvalue is Significant? (cont.)
The Wilks'  for the discriminant model
 = (1 - 2) = [ 1 / (1 + ) ] = WSS / TSS
 is chi-square distributed for df = (k - 1), k equal to the
number of parameters estimated ***
For the sentence-type example
 = 0.3050
 = [ 1 / (1 + 0.3050) ] = 0.766, which can be
converted to a chi-square statistic ***
2 = 17.837, df = 2, p = 0.0001
H0: In the population Z0 = Z1 = Z
Since the chi-square results are significant …
H0 is rejected and it is concluded the differences
in the mean discriminant scores of the two
groups are greater than could be attributed to
sampling error.
***
2 = - [(n - 1) - 0.5 (m + k + 1)] ln , df= (k - 1), m=number of
discriminant function extracted, k=number of predictor variables
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
30
How Can You Tell if the Eigenvalue is Significant? (cont.)
Eigenvalues
Function
1
Eigenvalue
% of Variance
.305a
100.0
Canonical
Correlation
.483
Cumulative %
100.0
a. First 1 canonical dis criminant functions were used in the
analys is.
 eigenvalue
 canonical correlation
Wilks' Lambda
Test of Function(s)
1
Wilks'
Lambda
.766
Chi-square
17.837
df
2
Sig.
.000
Wilks' 
Chi-Square
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
31
How Well Does the Model Predict?
Classification Resultsa
Original
Count
%
TYPE_SEN
.00
1.00
.00
1.00
Predicted Group
Membership
.00
1.00
27
10
14
19
73.0
27.0
42.4
57.6
Total
37
33
100.0
100.0
a. 65.7% of original grouped cases correctly clas sified.
Overall results
Overall hit ratio = 65.7%
Correctly classified probationers = 73.0%
Correctly classified prisoners = 57.6%
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
32
Does the Model Predict Any Better
Than Chance?
Maximum chance criterion (MCC) Predict that all
70 cases are in the group with the largest number of
cases.
MCC = (nL / NL) (100)
nL = number of subjects in the larger of the
two groups
NL = total number of subjects in to combined
groups
For the sentence-type example
Probation group, n = 37
Prison group, n = 33
If all the cases were predicted to
receive probation …
MCC = (37 / 70) (100) = 52.86% correct by
chance
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
33
Does the Model Predict Any Better Than Chance? (cont.)
Proportional chance criterion (Cpro)
Randomly
classify the cases proportionate to the number of
cases in either group.
Cpro =p2 + (1 - p)2
p = proportion of subjects in one group
(1 - p) = proportion of cases in the other group
Proportion of probationers = (37 / 70) = 0.5286
Proportion of prisoners = (33 / 70) = 0.4714
Cpro = 0.5286 2 + (1 - 0.4714)2 = 0.5588 or a hit
ratio of 55.88%
Comparison of hit ratios
 The model
65.71%
 MCC
52.86%
 Cpro
55.88%
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
34
Is the Hit Ratio of the Model Significantly
Better Than Chance?
By the maximum chance criterion (MCC), one could
guess 52.86% of the cases correctly. The model hit
65.71% correctly. Is this significantly better than
chance?
Two ways to test whether the model hit ratio is
significantly better than chance
t-test for groups of equal size
Press's Q statistic, groups can be
of unequal size
t-test for a model with equal size groups
H0: the model hit ratio is no better than chance
t = (P - 0.5) /
(0.5) (1 - 0.5) / N
P = the proportion the model predicted correctly
df = (N - 2)
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
35
Is the Hit Ratio of the Model Significantly Better Than Chance? (cont.)
Press's Q statistic
Q = [ N - (n) (g) ] 2 / [ N * (g - 1)]
N = total number of subjects
n = number of cases correctly classified
g = number of groups
Q is chi-square distributed for df = 1
For the sentence-type example
Q = [ 70 - (46) (2) ] 2 / [ 70 - (2 - 1)] = 7.0145
p  0.01
Decision
The null hypothesis that the model
hit ratio is no better than chance is rejected
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
36
How Can a Cutting Score Be Established
to Sort the Cases Into Either Group Based
on Their Discriminant Scores?
When n0 = n1
Zcutting = (Z0 + Z1) / 2
(Zj = mean discriminant score for group j)
When n0  n1
Zcutting = (n0 Z0 + n1 Z1) / 2
For the sentencing-type study
Z0 = -0.5141 and Z1 = +0.5764
Zcutting = [ (37) (-0.5141 )+ (33) (+0.5764) ] / 2
Zcutting = -0.00025, or slightly less than 0.0
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
37
Can the Predictions of the Model Be Graphed? (cont.)
Box-Whisker plot of the distributions of discriminant
scores for probation and prison cases with the
cutting score set at -0.00025
Cutting score (-0.00025)
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
38
What is the Best Way to See the
Predictions Made on Individual Cases
Casewise Statistics
Highest Group
Case Number Actual Group
Original 1
0
2
0
3
0
4
0
5
0
6
0
7
0
8
0
9
0
10
1
Predicted
Group
0
0
0
0
0
0
0
0
0
0**
P(D>d | G=g)
p
df
.081
.157
.048
.203
.238
.704
.081
.395
.635
.238
1
1
1
1
1
1
1
1
1
1
P(G=g | D=d)
.932
.905
.946
.891
.880
.755
.932
.837
.773
.880
Second Highest Group
Squared
Mahalanobis
Distance to
Centroid
3.040
2.000
3.914
1.621
1.390
.144
3.040
.722
.225
1.390
Group
1
1
1
1
1
1
1
1
1
1
P(G=g | D=d)
.068
.095
.054
.109
.120
.245
.068
.163
.227
.120
Discrimin
ant
Scores
Squared
Mahalanobis
Distance to
Centroid
Function 1
8.031
-2.258
6.273
-1.928
9.418
-2.493
5.588
-1.787
5.151
-1.693
2.162
-.894
8.031
-2.258
3.765
-1.364
2.448
-.988
5.151
-1.693
**. Misclassified case
Discriminant scores … the column on the extreme
right hand side of the table
For case 1, discriminant score = -2.2575
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
39
How Does SPSS Classify Cases?
In SPSS, case classification is accomplished by
calculating the probability of a case being in one
group or the other (i.e. probation or prison), rather
than by simply using a cutting score.
This is accomplished by calculating the posterior
probability of group membership using Bayes'
Theorem …
P (Gi D) = [ P (D Gi) P (Gi) ] / [  P(D Gi) P (Gi) ]
D = the discriminant score (i.e. Z)
P (Gi D) = posterior probability that a case is in
group i, given that it has a specific discriminant
score D
P (D Gi) = conditional probability that a case has a
discriminant score of D, given that it is in group i
P (Gi) = prior probability that a case is in group i,
which would be equal to (ni / N)
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
40
How Does SPSS Classify Cases? (cont.)
The Bayesian probabilities associated with being in
either group are calculated, and the greater of the
two probabilities is used to classify the case.
Example: Case 1
Posterior probability of being in the probation
group P (Gprobation D) = 0.932
Posterior probability of being in the prison group
P (Gprison D) = 0.068
Since P (Gprobation D)  P (Gprison D), the case is
classified as a probation case. (0.932  0.068)
The column labeled "Actual Group" shows the group
the case actually belongs to. If the Bayesian
probability misclassifies the case, the case is
marked with two asterisks (**).
These are the errors produced by the model,
which can also be seen in the classification table
in a previous exhibit.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
41
Is Each Predictor Variable in the
Model Significant?
The significance of the individual predictors
variables is accomplished by conducting a one-way
MANOVA …
With the grouping variable as the IV and
The discriminant predictors as the DVs.
The MANOVA sums of squares are then used to
calculate Wilks' lambda () for each predictor
 = WSS / TSS
The results of the final stepwise discriminant model
Variables in the Analysis
Step
1
2
SER_INDX
SER_INDX
DR_SCORE
Tolerance
1.000
.864
.864
Sig. of F to
Remove
.000
.000
.019
Wilks'
Lambda
.983
.832
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
42
Is Each Predictor Variable in the Model Significant? (cont.)
For dr_score (F to remove: p = 0.0195)
 = WSS / TSS = (232.862) / (232.862 + 47.081) = 0.8318
For ser_indx (F to remove: p  0.0001)
 = WSS / TSS = (480.152) / (480.152 + 8.433) = 0.9827
The Null Hypothesis
H0: the discriminant coefficients in the population are equal
to zero
 = WSS / TSS = 1.0, i.e. the WSS = TSS, & BSS = 0
Decision
The null hypothesis is rejected since both Wilks'
lambdas () associated with the two predictors are
significant
This finding should not be surprising since
The stepwise processes guarantees that only significant
variables will be entered into the model, and
That all variables in the model are checked to assure that
they remain significant as new variables are added.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
43
How are the Discriminant Coefficients
Interpreted
The Final Stepwise Model
Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)
For dr-score
As dr_score increases by one unit, the discriminant
score Z decreases by 0.235
Holding the seriousness of the offence (ser_indx)
constant, the more drug dependent the defendant, the
more likely he/she will be granted probation (code =
0)
For ser_indx
As ser_indx increases by one unit, the discriminant
score Z increases by 0.564
Holding the drug dependency (dr_score) constant, the
more serious the offence, the more likely the
defendant will be sent to prison (code = 1)
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
44
How Can the Relative Impact on the DV of
the Different Predictor Variables be
Compared?
Two ways
Compare the standardized discriminant weights,
i.e. coefficients
Compare the structure coefficients, also called
the discriminant loadings
Standardized discriminant coefficient (Ck)
The relative difference among the discriminant
coefficients can not be compared …
If the predictors variables are in different
units of measurement.
The discriminant coefficients must first be
converted to standardized coefficients (Ck)
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
45
How Can the Relative Impact on the DV of the Different Predictor Variables be
Compared? (cont.)
Zz = C1ZX1 + C2ZX2 + … + CkZXk
Ck = Wk
 (Xk - Xk)2 / (N - g)
Wk = the unstandardized discriminant coefficient of
variable k
 (Xk - Xk)2 = SS of the predictor variable
N = total sample size
g = number of DV groups
Examples: (dr_score) and (ser_indx)
Cdr_score = - 0.235
C ser_indx = + 0.5643
495.67/ (70 - 2) = - 0.6345
232.857/ (70 - 2) = +1.044
Since +1.044 is greater in absolute value than
-0.6345, ser_indx has greater discriminatory
impact than dr_score.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
46
How Can the Relative Impact on the DV of the Different Predictor Variables be
Compared? (cont.)
Unstandardaized discriminant coefficients (Wk)
Ca nonical Discrim ina nt Function Coe ffici ents
DR_SCORE
SER_INDX
(Const ant)
Function
1
-.235
.564
-.706
Unstandardized coefficients
Standardized discriminant coefficients (Ck)
Standardized Canonical Discriminant Function Coefficients
DR_SCORE
SER_INDX
Function
1
-.625
1.044
Notice that there is no constant (a) in a standardized
discriminant function equation since the mean of a
standardized variable equals zero.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
47
What is a Structure Coefficient?
A structure coefficient, or discriminant loading, is the
correlation between a predictor variable and the
discriminant scores produced by the discriminant
function.
The higher the absolute value of the coefficient, the
greater the discriminatory impact of the predictor
variable on the DV.
Structure coefficients of the predictors
Structure Matrix
SE R_INDX
DR_SCORE
SK L_INDXa
AGE_FIRSa
Function
1
.814
-.240
-.194
-.185
Pooled within-groups correlations between discriminating
variables and s tandardized canonical dis criminant functions
Variables ordered by absolute s ize of correlation within function.
a. This variable not us ed in the analysis.
Ser_index has the highest correlation with the discriminant
scores, followed by dr_score, skl_indx and age_firs. The
algebraic sign () indicates the direction of the
relationship.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
48
On Average, How Well Did the
Discriminant Function Divide the Two
Groups?
Group Centroids
One way to determine the degree of separation
between the two groups is to compute the mean
discriminant score for either group.
These means are called the group centroids
Probation (0) and prison (1) group centroids
Functions at Group Centroids
TYPE_SEN
.00
1.00
Function
1
-.514
.576
Unstandardized canonical disc riminant
functions evaluated at group means
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
49
Are the Predictor Variables Independent,
Noncollinear?
Discriminant analysis assumes that the predictor
variables are independent or noncollinear.
This problem is partially addressed by using a
stepwise procedure to enter the variables into the
equation, since …
The collinearity among the variables is considered in
the process and the resulting discriminant coefficients
are partial coefficients.
Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)
Q Are
dr_score
correlated?
and
ser_indx
significantly
Not withstanding the stepwise process, the final two
predictors are significantly correlated.
(r = 0.2791, p = 0.019).
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
50
Would the Same Results be Achieved
Using Logistical Regression?
Yes …virtually the same results.
The functions
Discriminant function
Z = -0.7065 - 0.2350 (dr_score) + 0.5643 (ser_indx)
Logistical regression equation
Prob event = 1 / (1 + e
)
- ( -0.8011 - 0.272dr_score + 0.611 ser_indx
)
Hit ratios
Discriminant analysis
65.71%
Logistical Regression
65.71%
Significance of predictor variables
dr_score
ser_indx
Discriminant function
p = .0001
p = .0004
Logistical regression
p  .0001
p  .0001
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
51
Results of the Logistical Analysis of
Type of Sentence
---------------------- Variables in the Equation ----------------------Variable
B
S.E.
Wald
df
Sig
R
Exp(B)
DR_SCORE
SER_INDX
Constant
-.2720
.6111
-.8011
.1183
.1716
.7311
5.2810
12.6776
1.2006
1
1
1
.0216
.0004
.2732
-.1841
.3321
.7619
1.8424
--------------- Variables not in the Equation ----------------Residual Chi Square
1.360 with
2 df
Sig = .5067
Variable
Score
df
Sig
R
AGE_FIRS
SKL_INDX
1.0003
.3970
1
1
.3172
.5287
.0000
.0000
-2 Log Likelihood
Goodness of Fit
78.474
67.926
Chi-Square
Model Chi-Square
Improvement
18.338
5.931
df Significance
2
1
.0001
.0149
Classification Table for TYPE_SEN
Predicted
.0
1.0
Percent Correct
0 |
1
Observed
+-------+-------+
.0
0
|
27 |
10 |
72.97%
+-------+-------+
1.0
1
|
14 |
19 |
57.58%
+-------+-------+
Overall 65.71%
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
52
What if the Dependent Variable Has More
Than Two Groups?
Example
Dependant variable
Pre-disposition status (jail, bail, or ROR)
Independent variables
Age of first arrest (age_firs)
Age at time of arrest (age)
Degree of drug dependency (dr_score)
Number of prior arrests (pr_arrst)
Type of counsel (counsel 0 = court
appointed, 1 = retained)
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
53
One Versus Multiple Discriminant
Functions
 When the dependant variable has two groups …
One discriminant function can be extracted
from the data
 When there are three groups …
Two functions can be extracted from the
data
 When there are g-number of groups …
(g - 1) functions can be extracted from the data,
Or k-number of functions if the number of
predictor variables (k) is less than the number of
groups (g)
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
54
Geometry of Two Discriminant Functions
Imagine a problem with two predictor variables and
a DV with three groups. Now draw a scatterplot of
the cases in each group across the two predictor
variables.
X2
x
x xx x
xx
xxx
oo
o
ooo
o o
oo o
o
***
** ** *
* **
X1
Group 1 = 
Group 2 = o
Group 3 = x
Q How best to describe these three groups?
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
55
Geometry of Two Discriminant Functions (cont.)
X2
Z1
x xx x x
xxxx x
x x x
oo
o
ooo
o o
oo o
o
***
** ** *
* **
Z2
X1
Two vectors are fit to the data
Z1
reasonably good fit for groups 1 and 3, but a bad
fit to group 2 (1st discriminant function)
Z2
reasonably good fit for group 2, but a bad fit for
groups 1 & 3 (2nd discriminant function)
The two vectors taken together better explain the
three groups than either one by itself.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
56
Statistics with Multiple Discriminant
Functions
The statistical output with multiple discriminant
functions is comparable to that with one function …
Except that multiple sets of statistics are derived for
each discriminant function, including:
Discriminant coefficients, or weights
Standardized coefficients, or weights
Centroids
Structured coefficients, or loadings
Eigenvalues
Canonical correlations
Wilks' lambdas
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
57
Assumptions About Multiple Discriminant
Functions
Q Must the various discriminant functions be
independent of each other, i.e. noncollinear?
No, they may be collinear or noncollinear,
whatever best fits the data. Geometrically, the
functions can be other than 90 apart.
Q Must the discriminant scores (Z) produced by
the various discriminant functions be independent of
each other, i.e. noncollinear?
Yes, the correlations among the discriminant
scores produced by the various functions must
all be equal to zero (0.0)
r Z1 Z2 = 0.0
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
58
Discriminant Analysis of Pre-Disposition
Status
The analysis was conducted using a stepwise
selection process using the Wilks' lambda criterion
Q Are the variance/covariance matrices of the
three groups the same in the population?
Covariance Matrices
PRE_STAT
1.00
2.00
3.00
AGE_FIRS
AGE
DR_SCORE
PR_ARRST
COUNSEL
AGE_FIRS
AGE
DR_SCORE
PR_ARRST
COUNSEL
AGE_FIRS
AGE
DR_SCORE
PR_ARRST
COUNSEL
AGE_FIRS
3.415
-2.579
-.391
-1.639
.232
5.190
1.557
-2.326
-.152
.210
5.882
-1.305
-3.890
-.938
.298
AGE
-2.579
19.080
4.389
-1.729
-.229
1.557
5.957
.457
.814
-.257
-1.305
6.382
-.515
2.313
-.327
DR_SCORE
-.391
4.389
4.770
-1.510
.103
-2.326
.457
8.490
.631
-7.381E-02
-3.890
-.515
10.154
1.125
-9.559E-02
PR_ARRST
-1.639
-1.729
-1.510
6.814
-.138
-.152
.814
.631
.662
-.148
-.938
2.313
1.125
3.000
-.500
COUNSEL
.232
-.229
.103
-.138
.136
.210
-.257
-7.381E-02
-.148
.190
.298
-.327
-9.559E-02
-.500
.154
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
59
Discriminant Analysis of Per-Disposition Status (cont.)
Box's M test for the homogeneity of variance/
covariance matrices
Analysis 1
Box's Test of Equality of Covariance Matrices
Log Determinants
PRE_STAT
1.00
2.00
3.00
Pooled within-groups
Rank
2
2
2
2
Log
Determinant
.934
.066
-.130
.606
The ranks and natural logarithms of determinants
printed are thos e of the group covariance matrices.
Te st Results
Box's M
F
12.391
Approx .
1.968
df1
6
df2
38111. 579
Sig.
.066
Tests null hypothes is of equal population covariance matrices.
Decision
(Box's M = 12.39, p = 0.0663)
The null hypothesis that the variance/covariance
matrices are equal in the population is accepted.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
60
What Is the Final Model Estimated by the
Discriminant Analysis?
Two discriminant functions were extracted …
Ca nonical Discrim ina nt Function Coe ffici ents
AGE
COUNSEL
(Const ant)
Function
1
2
-.146
.253
1.946
1.682
2.375
-6. 655
Unstandardized coefficients
1st Function
Z1 = 2.375 - 0.146 (age) + 1.946 (counsel)
2nd Function
Z2 = -6.655 + 0.253 (age) + 1.682 (counsel)
Given a 22-year-old offender with retained counsel …
Z1 = 2.375 - 0.146 (22) + 1.946 (1) = 1.109
Z2 = -6.655 + 0.253 (22) + 1.682 (1) = 0.593
These two discriminant scores will be used to classify the
offender into one of the three pre-disposition groups.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
61
Are the Two Discriminant Functions
Significant?
Ei genvalues
Function
1
2
Eigenvalue % of Variance
.882a
98.0
.018a
2.0
Canonical
Correlation
.685
.134
Cumulative %
98.0
100.0
a. First 2 canonic al discriminant func tions were used in the
analys is.
Wilks' Lambda
Test of Function(s)
1 through 2
2
Wilks'
Lambda
.522
.982
Chi-square
43.254
1.206
df
4
1
Sig.
.000
.272
1st Function
Eigenvalue = 0.8819
Of the variance explained by the two functions,
the 1st explains 97.97%
The canonical correlation () between the two
predictor variables and the discriminant scores
produced by the 1st function = 0.6846
The chi-square test of the Wilks'  is significant
(2 = 43.254, p  0.0001). The null hypothesis
that in the population the BSS = 0,  = 0, is
rejected.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
62
Are the Two Discriminant Functions Significant? (cont.)
2nd Function
Eigenvalue = 0.0183
Of the variance explained by the two functions,
the 2nd explains 2.03%
The canonical correlation () between the two
predictor variables and the discriminant scores
produced by the 2nd function = 0.1341
The chi-square test of the Wilks'  is not
significant (2 = 1.206, p = 0.272). The null
hypothesis that in the population the BSS = 0, 
= 0, is accepted.
Decision
Since the second function is not significant,
its associated statistics will not be used in
the interpretation of the affect of age and
counsel on pre-disposition status.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
63
What Are the Standardized Canonical
Discriminant Functions?
Standardized Canonical Discriminant Function Coefficients
AGE
COUNSEL
Function
1
2
-.508
.883
.770
.666
Zz = C1ZX1 + C2ZX2 + … + CkZXk
1st Function
Zz1 = -0.508 (age) + 0.770 (counsel)
2nd Function
Zz2 = 0.883 (age) + 0.666 (counsel)
Nota Bene
Recall that the 2nd function was found not to be
significant. Of the two variables in the 1st
function, counsel has the greater impact.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
64
What is the Correlation Between Each of
the Predictor Variables and the
Discriminant Scores Produced By the
Two Functions?
Structure coefficients, or loadings …
Structure Matrix
COUNSEL
AGE_FIRSa
PR_ARRSTa
AGE
DR_SCOREa
Function
1
2
.867*
.499
.291*
.067
-.219*
-.190
-.654
.757*
-.109
.195*
Pooled within-groups correlations between dis criminating
variables and s tandardized canonical discriminant functions
Variables ordered by abs olute size of correlation within function.
*. Larges t abs olute correlation between each variable and
any dis criminant function
a. This variable not used in the analysis.
The predictors counsel, age_firs, and pr_arrst load
highest on the 1st function, while age and dr_score
load highest on the 2nd function.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
65
What is the Mean Discriminant Score for
Each Pre-Disposition Group on Each
Discriminant Function?
Recall that these mean discriminant scores are
called centroids and that the 2nd discriminant function is
not significant.
Functions at Group Centroids
Function
PRE_STAT
1.00
2.00
3.00
1
-1.001
.856
.827
2
-1.64E-03
-.160
.201
Unstandardized canonical discriminant
functions evaluated at group means
Notice how numerically similar the centroids of the 1st
function are for groups 2 and 3, i.e. bail and ROR.
This means that the 1st function, while significant, will
do a poor job discriminating between the bail and
ROR groups, and most of its discriminatory power will
be discriminating between the jail group versus the
other two groups.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
66
What Would a Scatterplot of the
Discrimanant Scores of the Three PreDisposition Groups Reveal?
Reading across horizontally, notice how the 1st
discriminant function separates the centroid-pair of the jail
group (1) from that of the bail (2) and ROR (3) groups.
Reading vertically, however, notice that the 2nd
discriminant function fails to separate the three centroidpairs of the three groups.
This is why the 2nd function was not found to be
significant.
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
67
How Were the Individual Cases
Classified?
Casewise plot of the cases
Casewise Statistics
Highest Group
Case Number Actual Group
Original 1
2
2
2
3
3
4
2
5
2
6
3
7
2
8
3
9
3
10
3
11
2
12
2
13
1
14
3
15
2
16
1
17
1
18
3
19
1
20
2
Predicted
Group
2
2
2**
2
2
2**
2
2**
3
2**
1**
2
3**
1**
1**
1
1
1**
1
1**
P(D>d | G=g)
p
df
.682
.787
.682
.833
.787
.811
.833
.811
.800
.682
.154
.682
.662
.551
.392
.154
.914
.711
.842
.085
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
P(G=g | D=d)
.578
.550
.578
.521
.550
.490
.521
.490
.465
.578
.596
.578
.470
.773
.721
.596
.910
.818
.855
.526
Second Highest Group
Squared
Mahalanobis
Distance to
Centroid
.766
.480
.766
.364
.480
.420
.364
.420
.447
.766
3.745
.766
.825
1.190
1.871
3.745
.179
.681
.343
4.938
Group
3
3
3
3
3
3
3
3
2
3
2
3
2
2
2
2
2
2
2
2
P(G=g | D=d)
.391
.409
.391
.426
.409
.442
.426
.442
.426
.391
.283
.391
.392
.144
.184
.283
.049
.112
.086
.341
Discriminant Scores
Squared
Mahalanobis
Distance to
Centroid Function 1 Function 2
1.127
1.694
-.411
.649
1.548
-.158
1.127
1.694
-.411
.342
1.403
.096
.649
1.548
-.158
.206
1.257
.349
.342
1.403
.096
.206
1.257
.349
1.044
.965
.856
1.127
1.694
-.411
4.394
-.398
-1.840
1.127
1.694
-.411
1.613
.819
1.109
3.707
-.836
-1.080
3.765
-.690
-1.333
4.394
-.398
-1.840
5.185
-1.419
-.067
3.820
-.981
-.827
4.104
-1.127
-.573
4.965
-.252
-2.094
**. Misclassified case
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
68
What Was the Hit Ratio of the
Discriminant Model?
Classification Resultsa
Original
Count
%
PRE_STAT
1.00
2.00
3.00
1.00
2.00
3.00
Predicted Group Membership
1.00
2.00
3.00
27
1
4
5
14
2
3
11
3
84.4
3.1
12.5
23.8
66.7
9.5
17.6
64.7
17.6
Total
32
21
17
100.0
100.0
100.0
a. 62.9% of original grouped cases correctly classified.
Hit Ratio = (44 / 70) (100) = 62.9%
Errors = (26 / 70) (100) = 37.14%
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University