Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Regression toward the mean wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Data assimilation wikipedia , lookup
Choice modelling wikipedia , lookup
Least squares wikipedia , lookup
Time series wikipedia , lookup
Regression analysis wikipedia , lookup
Logistic Regression and Discriminant Function Analysis Logistic Regression vs. Discriminant Function Analysis • Similarities – Both predict group membership for each observation (classification) – Dichotomous DV – Requires an estimation and validation sample to assess predictive accuracy – If the split between groups is not more extreme than 80/20, yield similar results in practice Logistic Reg vs. Discrim: Differences • Discriminant Analysis • Logistic Regression – Assumes MV normality – Assumes equality of VCV matrices – Large number of predictors violates MV normality can’t be accommodated – Predictors must be continuous, interval level – More powerful when assumptions are met – Many assumptions, rarely met in practice – Categorical IVs create problems – No assumption of MV normality – No assumption of equality of VCV matrices – Can accommodate large numbers of predictors more easily – Categorical predictors OK (e.g., dummy codes) – Less powerful when assumptions are met – Few assumptions, typically met in practice – Categorical IVs can be dummy coded Logistic Regression • Outline: – Categorical Outcomes: Why not OLS Regression? – General Logistic Regression Model – Maximum Likelihood Estimation – Model Fit – Simple Logistic Regression Categorical Outcomes: Why not OLS Regression? • Dichotomous outcomes: – Passed / Failed – CHD / No CHD – Selected / Not Selected – Quit/ Did Not Quit – Graduated / Did Not Graduate Categorical Outcomes: Why not OLS Regression? • Example: Relationship b/w performance and turnover 1.5 • Line of best fit?! • Errors (Y-Y’) across values of performance (X)? 1.0 .5 Turnover 0.0 -.5 1.5 2.0 Performance 2.5 3.0 3.5 4.0 4.5 5.0 Problems with Dichotomous Outcomes/DVs • • • • • • The regression surface is intrinsically non-linear Errors assume one of two possible values, violate assumption of normally distributed errors Violates assumption of homoscedasticity Predicted values of Y greater than 1 and smaller than 0 can be obtained The true magnitude of the effects of IVs may be greatly underestimated Solution: Model data using Logistic Regression, NOT OLS Regression Logistic Regression vs. Regression • Logistic regression predicts a probability that an event will occur – Range of possible responses between 0 and 1 – Must use an s-shaped curve to fit data • Regression assumes linear relationships, can’t fit an s-shaped curve – Violates normal distribution – Creates heteroscedascity Example: Relationship b/w Age and CHD (1 = Has CHD) General Logistic Regression Model • Y’ (outcome variable) is the probability that having one outcome or another based on a nonlinear function of the best linear combination of predictors e a b1 X 1 Y' 1 e a b1 X 1 Where: • Y’ = probability of an event • Linear portion of equation (a + b1x1) used to predict probability of event (0,1), not an end in itself The logistic (logit) transformation • DV is dichotomous purpose is to estimate probability of occurrences (0, 1) – Thus, DV is transformed into a likelihood • Logit/logistic transformation accomplishes (linear regression eq. takes log of odds) odds P P(Y 1) P(Y 1) 1 P 1 P(Y 1) P(Y 0) P Y' log( odds) log it ( P) ln ln A BjXij 1 P 1 Y ' Probability Calculation e a bX P Y' 1 e a bX Where: The relation b/w logit (P) and X is intrinsically linear b = expected change of logit(P) given one unit change in X a = intercept e = Exponential Ordinary Least Squares (OLS) Estimation • Purpose is obtain the estimates that would best minimize the sum of squared errors, sum(y-y’)2 • The estimates chosen best describe the relationships among the observed variables (IVs and DV) • Estimates chosen maximize the probability of obtaining the observed data (i.e., these are the population values most likely to produce the data at hand) Maximum Likelihood (ML) estimation • OLS can’t be used in logistic regression because of non-linear nature of relationships • In ML, the purpose is to obtain the parameter estimates most likely to produce the data – ML estimators are those with the greatest joint likelihood of reproducing the data • In logistic regression, each model yields a ML joint probability (likelihood) value • Because this value tends to be very small (e.g., .00000015), it is multiplied by -2log • The -2log transformation also yields a statistic with a known distribution (chi-square distribution) Model Fit • In Logistic Regression, R & R2 don’t make sense • Evaluate model fit using the -2log likelihood (-2LL) value obtained for each model (through ML estimation) – The -2LL value reflects fit of model; used to compare fit of nested models – The -2LL measures lack of fit – extent to which model fits data poorly – When the model fits the data perfectly, -2LL = 0 • Ideally, the -2LL value for the null model (i.e., model with no predictors, or “intercept-only” model) would be larger than then the model with predictors Comparing Model Fit The fit of the null model can be tested against the fit of the model with predictors using chi-square test: 2 2 LLNull 2 LLModel Where: • 2 = chi-square for improvement in model fit (where df = kNull – kModel) • -2LLMO = -2 Log likelihood value for null model (intercept-only model) • -2LLM1 = -2 Log likelihood value for hypothesized model • Same test can be used to compare nested model with k predictor(s) to model with k+1 predictors, etc. • Same logic as OLS regression, but the models are compared using a different fit index (-2LL) Pseudo R2 • Assessment of overall model fit • Calculation R 2 2 LLNull 2 LLModel 2 LLNull • Two primary Pseudo R2 stats: – Nagelkerke less conservative • preferred by some because max = 1 – Cox & Snell more conservative • Interpret like R2 in OLS regression Unique Prediction • In OLS regression, the significance tests for the beta weights indicate if the IV is a unique predictors • In Logistic regression, the Wald test is used for the same purpose Similarities to Regression • You can use all of the following procedures you learned about OLS regression in logistic regression – Dummy coding for categorical IVs – Hierarchical entry of variables (compare changes in % classification; significance of Wald test) – Stepwise (but don’t use, its atheoretical) – Moderation tests Simple Logistic Regression Example • • • • Data collected from 50 employees Y = success in training program (1 = pass; 0 = fail) X1 = Job aptitude score (5 = very high; 1= very low) X2 = Work-related experience (months) Syntax in SPSS DV LOGISTIC REGRESSION PASS /METHOD = ENTER APT EXPER IVs /SAVE = PRED PGROUP /CLASSPLOT /PRINT = GOODFIT /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . Results • Block O: The Null Model results – Can’t do any worse than this • Block 1: Method = Enter – Tests of the model of interest – Interpret data from here Tests if model is Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-square 10.169 10.169 10.169 df 2 2 2 Sig. .006 .006 .006 Step, Block & Model yield same results because all IVs entered in same block significantly better than the null model. Significant chi-square means yes! Results Continued Model Summary Step 1 -2 Log Cox & Snell likelihood R Square 59.066a .184 Nagelkerke R Square .245 a. Estimation terminated at iteration number 4 because parameter estimates changed by less than .001. -2 Log Likelihood an index of fit - smaller number means better fit (Perfect fit = 0) Pseudo R2 – Interpret like R2 in regression Nagelkerke preferred by some because max = 1, Cox & Snell more conservative estimate uniformly Classification: Null Model vs. Model Tested Classification Tablea, b Predicted PASS Step 0 Obs erved PASS fail fail pas s 0 0 pas s 24 26 Overall Percentage Percentage Correct .0 100.0 52.0 Null Model 52% correct classification a. Cons tant is included in the model. b. The cut value is .500 Classification Tablea Predicted PASS Step 1 Obs erved PASS Overall Percentage a. The cut value is .500 fail fail pas s pas s 16 6 8 20 Percentage Correct 66.7 76.9 72.0 Model Tested 72% correct classification Variables in Equation Variables in the Equation Step a 1 APT EXPER Cons tant B .549 .111 -3.050 S.E. .235 .052 1.146 Wald 5.473 4.577 7.086 df 1 1 1 Sig. .019 .032 .008 Exp(B) 1.731 1.118 .047 a. Variable(s ) entered on step 1: APT, EXPER. B effect of one unit change in IV on the log odds (hard to interpret) *Odds Ratio (OR) Exp(B) in SPSS = more interpretable; one unit change in aptitude increases the probability of passing by 1.7x Wald Like t test, uses chi-square distribution Significance to determine if wald test is significant Histogram of Predicted Probabilities To Flag Misclassified Cases SPSS syntax COMPUTE PRED_ERR=0. IF LOW NE PGR_1 PRED_ERR=1. You can use this for additional analyses to explore causes of misclassification Results Continued Hosmer and Lemeshow Test Step 1 Chi-square 6.608 df 8 Sig. .579 An index of model fit. Chi-square compares the fit of the data (the observed events) with the model (the predicted events). The n.s. results means that the observed and expected values are similar this is good! Hierarchical Logistic Regression • Question: Which of the following variables predict whether a woman is hired to be a Hooters girl? – Age – IQ – Weight Simultaneous v. Hierarchical Block 1. IQ, Age, Weight Block 1. IQ Omnibus Tests of Model Coefficients Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-s quare 48.462 48.462 48.462 df 3 3 3 Step 1 Sig. .000 .000 .000 -2 Log Cox & Snell likelihood R Square 142.383 a .296 Chi-s quare .289 .289 .289 df Sig. .591 .591 .591 1 1 1 Cox & Snell .002; Nagelkerke .003 Block 2. Age Model Summary Step 1 Step Block Model Nagelkerke R Square .395 a. Estimation terminated at iteration number 6 because parameter estimates changed by less than .001. Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-s quare 42.044 42.044 42.333 df 1 1 2 Sig. .000 .000 .000 Cox & Snell .264; Nagelkerke .353 Block 3. Weight Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-s quare 6.129 6.129 48.462 df 1 1 3 Sig. .013 .013 .000 Cox & Snell .296; Nagelkerke .395 Simultaneous v. Hierarchical Block 1. IQ Block 1. IQ, Age, Weight Classification Tablea Predicted Classification Tablea Predicted Step 1 Step 1 Obs erved Hired Overall Percentage a. The cut value is .500 not hired hired Hired not hired hired 53 12 26 47 Percentage Correct 81.5 64.4 72.5 Obs erved Hired not hired hired Hired not hired hired 8 57 6 67 Overall Percentage Percentage Correct 12.3 91.8 54.3 a. The cut value is .500 Block 2. Age Classification Tablea Predicted Step 1 Obs erved Hired not hired hired Hired not hired hired 55 10 28 45 Overall Percentage Percentage Correct 84.6 61.6 72.5 a. The cut value is .500 Block 3. Weight Classification Tablea Predicted Step 1 Obs erved Hired Overall Percentage a. The cut value is .500 not hired hired Hired not hired hired 53 12 26 47 Percentage Correct 81.5 64.4 72.5 Simultaneous v. Hierarchical Block 1. IQ, Age, Weight Block 1. IQ Variables in the Equation Variables in the Equation Step a 1 IQ age weight Constant B -.009 -.591 -.277 8.264 S.E. .015 .125 .117 1.821 Wald .372 22.224 5.630 20.602 a. Variable(s) entered on step 1: IQ, age, weight. df 1 1 1 1 Sig. .542 .000 .018 .000 Exp(B) .991 .554 .758 3881.775 Step IQ a 1 Constant B .006 -.185 S.E. .012 .585 Wald .289 .100 df Sig. .591 .752 1 1 Exp(B) 1.006 .831 a. Variable(s) entered on step 1: IQ. Block 2. Age Variables in the Equation Step a 1 IQ age Constant B -.003 -.591 6.484 S.E. .014 .120 1.533 Wald .032 24.220 17.899 df Sig. .858 .000 .000 1 1 1 Exp(B) .997 .554 654.298 a. Variable(s) entered on step 1: age. Block 3. Weight Variables in the Equation Step a 1 IQ age weight Constant B -.009 -.591 -.277 8.264 S.E. .015 .125 .117 1.821 Wald .372 22.224 5.630 20.602 a. Variable(s) entered on step 1: IQ, age, weight. df 1 1 1 1 Sig. .542 .000 .018 .000 Exp(B) .991 .554 .758 3881.775 Multinomial Logistic Regression • A form of logistic regression that allows prediction of probability into more than 2 groups – Based on a multinomial distribution • Sometimes called polytomous logistic regression • Conducts an omnibus test first for each predictor across 3+ groups (like ANOVA) – Then conduct pairwise comparisons (like post hoc tests in ANOVA) Objectives of Discriminant Analysis • Determining whether significant differences exist between average scores on a set of variables for 2+ a priori defined groups • Determining which IVs account for most of the differences in average score profiles for 2+ groups • Establishing procedures for classifying objects into groups based on scores on a set of IVs • Establishing the number and composition of the dimensions of discrimination between groups formed from the set of IVs Discriminant Analysis • Discriminant analysis develops a linear combination that can best separate groups. • Opposite of MANOVA • In MANOVA, groups are usually constructed by researcher and have clear structure (e.g., a 2 x 2 factorial design). Groups = IVs • In discriminant analysis, the groups usually have no particular structure and their formation is not under experimental control. Groups = DVs How Discrim Works • Linear combinations (discriminant functions) are formed that maximize the ratio of between-groups variance to within-groups variance for a linear combination of predictors. • Total # discriminant functions = # groups – 1 OR # of predictors (whichever is smaller) • If more than one discriminant function is formed, subsequent discriminant functions are independent of prior combinations and account for as much remaining group variation as possible. Assumptions in Discrim • Multivariate normality of IVs – Violation more problematic if overlap between groups • Homogeneity of VCV matrices • Linear relationships • IVs continuous (interval scale) – Can accommodate nominal but violates MV normality • Single categorical DV Results influenced by: • Outliers (classification may be wrong) • Multicollinearity (interpretation of coefficients difficult) Sample Size Considerations • Observations: # Predictors – Suggested 20 observations per predictor – Minimum required 5 observations per predictor • Observations: Groups (in DV) – Minimum: smallest group size exceeds # of IVs – Practical Guide: Each group should have 20+ observations – Wide variation in group size impacts results (i.e., classification is incorrect) Example In this hypothetical example, data from 500 graduate students seeking jobs were examined. Available for each student were three predictors: GRE(V+Q), Years to Finish the Degree, and Number of Publications. The outcome measure was categorical: “Got a job” versus “Did not get a job.” Half of the sample was used to determine the best linear combination for discriminating the job categories. The second half of the sample was used for cross-validation. DISCRIMINANT /GROUPS=job(1 2) /VARIABLES=gre pubs years /SELECT=sample(1) /ANALYSIS ALL /SAVE=CLASS SCORES PROBS /PRIORS SIZE /STATISTICS=MEAN STDDEV UNIVF BOXM COEFF RAW CORR COV GCOV TCOV TABLE CROSSVALID /PLOT=COMBINED SEPARATE MAP /PLOT=CASES /CLASSIFY=NONMISSING POOLED . Interpreting Output • • • • • Box’s M Eigenvalues Wilks Lambda Discriminant Weights Discriminant Loadings Group Statistics JOB Oops ! Got One! Total GRE (V+Q) Number of Publications Years to Finish Degree GRE (V+Q) Number of Publications Years to Finish Degree GRE (V+Q) Number of Publications Years to Finish Degree Mean 1296.20 3.50 6.47 1305.87 6.55 4.85 1298.94 4.36 6.01 Std. Deviation 96.913 2.029 2.094 101.824 1.593 1.179 98.224 2.357 2.016 Valid N (lis twis e) Unweighted Weighted 179 179.000 179 179.000 179 179.000 71 71.000 71 71.000 71 71.000 250 250.000 250 250.000 250 250.000 Tests of Equality of Group Means GRE (V+Q) Number of Publications Years to Finish Degree Wilks ' Lambda .998 .658 .867 F .492 129.009 37.885 df1 1 1 1 df2 248 248 248 Sig. .483 .000 .000 Test Results Box's M F Approx. df1 df2 Sig. 49.679 8.137 6 114277.8 .000 Tes ts null hypothes is of equal population covariance matrices . Violates Assumption of Homogeneity of VCV matrices. But this test is sensitive in general and sensitive to violations of multivariate normality too. Tests of significance in discriminant analysis are robust to moderate violations of the homogeneity assumption. Eigenvalues Function 1 Eigenvalue % of Variance .693 a 100.0 Canonical Correlation .640 Cumulative % 100.0 a. Firs t 1 canonical discriminant functions were us ed in the analys is . Wilks' Lambda Tes t of Function(s) 1 Wilks ' Lambda .590 Chi-s quare 129.854 df 3 Sig. .000 Discriminant Weights Standardized Canonical Discriminant Function Coefficients GRE (V+Q) Number of Publications Years to Finish Degree Function 1 -.308 .944 -.423 Discriminant Loadings Structure Matrix Number of Publications Years to Finish Degree GRE (V+Q) Function 1 .866 -.469 .054 Pooled within-groups correlations between dis criminating variables and standardized canonical discriminant functions Variables ordered by abs olute s ize of correlation within function. Data from both these outputs indicate that one of the predictors best discriminates who did/did not get a job. Which one is it? Canonical Discriminant Function Coefficients GRE (V+Q) Number of Publications Years to Finish Degree (Cons tant) Function 1 -.003 .493 -.225 3.268 This is the raw canonical discriminant function. Uns tandardized coefficients Functions at Group Centroids JOB Oops! Got One! Function 1 -.522 1.317 Uns tandardized canonical discriminant functions evaluated at group means The means for the groups on the raw canonical discriminant function can be used to establish cut-off points for classification. Prior Probabilities for Groups JOB Oops ! Got One! Total Prior .716 .284 1.000 Cas es Us ed in Analysis Unweighted Weighted 179 179.000 71 71.000 250 250.000 Classification can be based on distance from the group centroids and take into account information about prior probability of group membership. Classification Resultsb,c,d Cas es Selected Original Count % Cross -validateda Count % Cas es Not Selected Original Count % JOB Oops ! Got One! Oops ! Got One! Oops ! Got One! Oops ! Got One! Oops ! Got One! Oops ! Got One! Predicted Group Membership Oops ! Got One! 170 9 23 48 95.0 5.0 32.4 67.6 169 10 24 47 94.4 5.6 33.8 66.2 175 10 17 48 94.6 5.4 26.2 73.8 Total 179 71 100.0 100.0 179 71 100.0 100.0 185 65 100.0 100.0 a. Cross validation is done only for those cas es in the analysis . In cros s validation, each cas e is clas s ified by the functions derived from all cas es other than that cas e. b. 87.2% of s elected original grouped cases correctly clas sified. c. 89.2% of unselected original grouped cases correctly clas s ified. d. 86.4% of s elected cros s-validated grouped cas es correctly class ified. Canonical Discriminant Function 1 JOB = Oops! 50 Two modes? 40 30 20 Std. Dev = 1.10 10 Mean = -.55 N = 364.00 0 1. 1. 75 5 25 .7 5 .7 5 .2 5 .7 5 .2 5 .2 5 - .2 5 - .7 5 .2 -1 -1 -2 -2 -3 Canonical Discriminant Function 1 JOB = Got One! 16 14 12 10 8 6 4 Std. Dev = .62 2 Mean = 1.30 0 N = 136.00 2. 2. 2. 2. 1. 1. 1. 1. 75 50 25 00 75 50 25 5 0 00 .7 .5 5 .2 00 0. Violation of the homogeneity assumption can affect the classification. To check, the analysis can be conducted using separate group covariance matrices. Classification Resultsa, b Cas es Selected Original Count % Cas es Not Selected Original Count % JOB Oops ! Got One! Oops ! Got One! Oops ! Got One! Oops ! Got One! Predicted Group Membership Oops ! Got One! 165 14 21 50 92.2 7.8 29.6 70.4 168 17 11 54 90.8 9.2 16.9 83.1 Total 179 71 100.0 100.0 185 65 100.0 100.0 a. 86.0% of s elected original grouped cases correctly clas sified. b. 88.8% of unselected original grouped cases correctly clas sified. No noticeable change in the accuracy of classification. Discriminant Analysis: Three Groups The group that did not get a job was actually composed of two subgroups—those that got interviews but did not land a job and those that were never interviewed. This accounts for the bimodality in the discriminant function scores. The discriminant analysis of the three groups allows for the derivation of one more discriminant function, perhaps indicating the characteristics that separate those who get interviews from those who don’t, or, those who have successful interviews from those whose interviews do not produce a job offer. Remember this? Canonical Discriminant Function 1 JOB = Oops! 50 Two modes? 40 30 20 Std. Dev = 1.10 10 Mean = -.55 N = 364.00 0 1. 1. 75 5 25 .7 5 .7 5 .2 5 .7 5 .2 5 .2 25 -. 75 -. 5 .2 -1 -1 -2 -2 -3 DISCRIMINANT /GROUPS=group(1 3) /VARIABLES=gre pubs years /SELECT=sample(1) /ANALYSIS ALL /SAVE=CLASS SCORES PROBS /PRIORS SIZE /STATISTICS=MEAN STDDEV UNIVF BOXM COEFF RAW CORR COV GCOV TCOV TABLE CROSSVALID /PLOT=COMBINED SEPARATE MAP /PLOT=CASES /CLASSIFY=NONMISSING POOLED . Group Statistics GROUP Unemployed Got a Job Interview Only Total GRE (V+Q) Number of Publications Years to Finish Degree GRE (V+Q) Number of Publications Years to Finish Degree GRE (V+Q) Number of Publications Years to Finish Degree GRE (V+Q) Number of Publications Years to Finish Degree Mean 1307.54 1.59 8.57 1305.87 6.55 4.85 1291.30 4.32 5.56 1298.94 4.36 6.01 Std. Deviation 85.491 1.434 1.797 101.824 1.593 1.179 101.382 1.664 1.467 98.224 2.357 2.016 Valid N (lis twis e) Unweighted Weighted 54 54.000 54 54.000 54 54.000 71 71.000 71 71.000 71 71.000 125 125.000 125 125.000 125 125.000 250 250.000 250 250.000 250 250.000 Tests of Equality of Group Means GRE (V+Q) Number of Publications Years to Finish Degree Wilks ' Lambda .994 .455 .529 F .761 147.864 109.977 df1 2 2 2 df2 247 247 247 Sig. .468 .000 .000 Test Results Box's M F Approx. df1 df2 Sig. 21.796 1.780 12 137372.4 .045 Tes ts null hypothes is of equal population covariance matrices . Separating the three groups produces better homogeneity of VCV matrices. Still significant, but just barely. Not enough to worry about. Eigenvalues Function 1 2 Eigenvalue % of Variance 5.353 a 99.1 .047 a .9 Canonical Correlation .918 .211 Cumulative % 99.1 100.0 a. Firs t 2 canonical discriminant functions were us ed in the analys is . Wilks' Lambda Tes t of Function(s) 1 through 2 2 Wilks ' Lambda .150 .955 Chi-s quare 466.074 11.246 df 6 2 Sig. .000 .004 Two significant linear combinations can be derived, but they are not of equal importance. Weights Standardized Canonical Discriminant Function Coefficients Function GRE (V+Q) Number of Publications Years to Finish Degree 1 .734 -1.246 1.032 2 .194 .521 .602 Loadings Structure Matrix Function Number of Publications Years to Finish Degree GRE (V+Q) 1 -.466 .401 .008 2 .867* .796* .354* Pooled within-groups correlations between dis criminating variables and standardized canonical discriminant functions Variables ordered by abs olute s ize of correlation within function. *. Largest abs olute correlation between each variable and any discriminant function What do the linear combinations mean now? Canonical Discriminant Function Coefficients Function 1 .007 -.781 .701 -10.496 GRE (V+Q) Number of Publications Years to Finish Degree (Cons tant) 2 .002 .326 .409 -6.445 Uns tandardized coefficients Functions at Group Centroids Function GROUP Unemployed Got a Job Interview Only 1 4.026 -2.469 -.337 2 .162 .251 -.213 Uns tandardized canonical dis criminant functions evaluated at group means +4 DF2 got a job +2 interview only unemployed 0 -2 -4 DF1 -4 -2 0 +2 Functions at Group Centroids Function GROUP Unemployed Got a Job Interview Only 1 4.026 -2.469 -.337 2 .162 .251 -.213 Unstandardized canonical discriminant functions evaluated at group means +4 +4 DF2 got a job +2 interview only unemployed 0 -2 -4 DF1 -4 -2 0 Loadings +2 +4 Weights DF1 DF2 No. Pubs -1.246 .521 .796 Yrs to finish 1.032 .602 .354 GRE .734 .194 DF1 DF2 No. Pubs -.466 .867 Yrs to finish .401 GRE .008 +4 +2 DF2 got a job interview only unemployed 0 -2 -4 DF1 -4 -2 0 +2 +4 This figure shows that discriminant function #1, which is made up of number of publications and years to finish, reliably differentiates between those who got jobs, had interviews only, and had no job or interview. Specially, a high value on DF1 was associated with not getting a job, suggesting that having few publications (loading = -.466) and taking a long time to finish (loading = .401) was associated with not getting a job. Prior Probabilities for Groups GROUP Unemployed Got a Job Interview Only Total Prior .216 .284 .500 1.000 Cas es Us ed in Analysis Unweighted Weighted 54 54.000 71 71.000 125 125.000 250 250.000 Classification Function Coefficients GRE (V+Q) Number of Publications Years to Finish Degree (Cons tant) Unemployed .238 -10.539 11.018 -196.112 Fisher's linear discriminant functions GROUP Got a Job .190 -5.440 6.503 -123.212 Interview Only .205 -7.256 7.808 -139.036 Territorial Map Canonical Discriminant Function 2 -6.0 -4.0 -2.0 .0 2.0 4.0 6.0 ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô 6.0 ô 23 31 ô ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó 4.0 ô ô ô 23 ô 31ô ô ô ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó 2.0 ô ô ô 23 ô 31 ô ô ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó * 23 31 ó .0 ô ô ô 23 ô 31 * ô ó 23 * 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó -2.0 ô ô 23 ô ô31 ô ô ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó -4.0 ô ô 23 ô ô ô 31 ô ô ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó ó 23 31 ó -6.0 ô 23 31 ô ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô -6.0 -4.0 -2.0 .0 2.0 4.0 6.0 Canonical Discriminant Function 1 Symbols used in territorial map Symbol -----1 2 3 * Group ----1 2 3 Label -------------------Unemployed Got a Job Interview Only Indicates a group centroid Canonical Discriminant Functions 4 3 2 1 GROUP Got a Job Unemployed Interview Only 0 Function 2 Group Centroids -1 Interview Only -2 Got a Job -3 Unemployed -6 -4 -2 Function 1 0 2 4 6 8 Classification A classification function is derived for each group. The original data are used to estimate a classification score for each person, for each group. The person is then assigned to the group that produces the largest classification score. Classification Function Coefficients GRE (V+Q) Number of Publications Years to Finish Degree (Cons tant) Unemployed .238 -10.539 11.018 -196.112 Fisher's linear discriminant functions GROUP Got a Job .190 -5.440 6.503 -123.212 Interview Only .205 -7.256 7.808 -139.036 Classification Resultsb,c,d Cas es Selected Original Count % Cross -validateda Count % Cas es Not Selected Original Count % GROUP Unemployed Got a Job Interview Only Unemployed Got a Job Interview Only Unemployed Got a Job Interview Only Unemployed Got a Job Interview Only Unemployed Got a Job Interview Only Unemployed Got a Job Interview Only Predicted Group Membership Unemployed Got a Job Interview Only 51 0 3 0 51 20 0 13 112 94.4 .0 5.6 .0 71.8 28.2 .0 10.4 89.6 51 0 3 0 51 20 0 13 112 94.4 .0 5.6 .0 71.8 28.2 .0 10.4 89.6 62 0 4 0 47 18 4 11 104 93.9 .0 6.1 .0 72.3 27.7 3.4 9.2 87.4 Total 54 71 125 100.0 100.0 100.0 54 71 125 100.0 100.0 100.0 66 65 119 100.0 100.0 100.0 a. Cross validation is done only for thos e cas es in the analys is . In cross validation, each case is class ified by the functions derived from all cases other than that case. b. 85.6% of s elected original grouped cas es correctly class ified. c. 85.2% of uns elected original grouped cases correctly clas sified. d. 85.6% of s elected cros s-validated grouped cas es correctly class ified. Is the classification better than would be expected by chance? Observed values: Expected Actual Unemployed Got a Job Interview Only All Unemployed 51 0 3 54 Got a Job 0 51 20 71 Interview Only 0 13 112 125 All 51 64 135 250 Expected classification by chance E = (Row x Column)/Total N Expected Actual Unemployed Got a Job Interview Only All Unemployed (51x54) 250 (64x54) 250 (135x54) 250 54 Got a Job (51x71) 250 (64x71) 250 (135x71) 250 71 Interview Only (51x125) 250 (64x125) 250 (135x125) 250 125 All 51 64 135 250 Correct classification that would occur by chance: Expected Actual Unemployed Got a Job Interview Only All Unemployed 11.016 13.824 29.16 54 Got a Job 14.484 18.176 38.34 71 Interview Only 25.5 32 67.5 125 All 54 71 125 250 The difference between chance expected and actual classification can be tested with a chi-square as well. 2 f observed f f exp ected 2 exp ected = 145.13 + 13.82 + 23.47 + 14.48 + 59.25 + 8.77 + 25.5 + 11.28 + 29.34 Chi squared = 331.04 Where degree of freedom = (# groups -1)2 df = 4