Download 10.2 Logistic and Probit Regression Models 10.2.1 Logistic

10.2 Logistic and Probit Regression Models The logistic regression model is useful when you want to fit a linear regression model to a binary response variable. You have several levels of an independent, or predictor variable, X. Denote these levels X1, X2,...,Xm. At the ith level of X, you have Ni (i=1,2,...,m) observations, each of which is an independent Bernoulli trial. Of the Ni observations, yi are classified as “the outcome of interest” - or “success” - and the remaining Ni-yi have “the other” classification, e.g. “failure”. At the ith level of X, yi has a binomial distribution, or, more formally, yi~Binomial(Ni, Bi), where Ni is the number of trials and Bi is the probability of a success on a given trial. The object of logistic regression is to estimate or test for changes in Bi associated with changes in Xi, specifically by modeling these changes via regression. 10.2.1 Logistic Regression: Challenger Shuttle O-Ring Data Example Here is an example. Following the 1986 Challenger space shuttle disaster, investigators focused on a suspected association between O-ring failure and low temperature at launch. Data documenting the presence or absence of primary O-ring thermal distress in the 23 shuttle launches preceding the Challenger mission appeared in Dalal, et. al (1989) and were reproduced in Agresti (1996). Output 10.1 shows the raw data. Temperature at launch (TEMP) is the X variable. At each TEMP, TD denotes the number of launches in which thermal distress occurred and TOTAL gives the number of launches. TOTAL is the N variable, TD is the y variable, and the variable NO_TD is equal to N-y. Output 10.1 Challenger O-Ring Thermal Distress Data Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 temp 53 57 58 63 66 67 68 69 70 72 73 75 76 78 79 81 td 1 1 1 1 0 0 0 0 2 0 0 1 0 0 0 0 no_td total 0 0 0 0 1 3 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 3 1 1 4 1 1 2 2 1 1 1 Inspection of the data in Output 10.1 reveals that the incidence of thermal distress, indicated by frequency of TD versus NO_TD, appears to be greater at low temperatures. Therefore, it is of interest to fit a model for which π, the probability of thermal distress, decreases as temperature increases. However, fitting a model directly to π, such as πˆ i = β 0 + β1 X i , where Xi denotes the temperature at the ith launch, is not necessarily a reasonable approach. This is partly because, for theoretical reasons explained in Section 10.6, “Background Theory,” the binomial random variable is not linear with respect to π. It is also partly because fitted values of π from this model are not bounded by 0 or 1, allowing the possibility of nonsense estimates of π. A better approach is to fit the linear regression model to a function of π that is bounded by 0 and 1 and with which the binomial random variable at least theoretically has a linear relationship. Two such functions are the logit , defined as ⎛ π ⎞ logit(πi)= log⎜⎜ i ⎟⎟ , and the probit, defined as probit(πi)= Φ −1 (π i ) , where Φ-1(@) is the ⎝ 1 − πi ⎠ inverse of the cumulative density function of the standard normal distribution, that is, the value on a standard normal table corresponding to a probability of πi. The logit and probit are both examples of link functions. The link function is a fundamental component of generalized linear models, because it specifies the relationship between the mean of the response variable and the linear model. Note that the mean of the sample proportion, yi/Ni, is πi. For reasons explained in more detail in Section 10.6, the logit is the most natural link function for binomial data. Models using the logit are called “logistic” models; in this case we are interested in a logistic regression model because we want to regress a binomial random variable on temperature. The simplest logistic regression model for these data is logit(Bi)=$0+$1Xi. You can fit the logistic regression model using PROC GENMOD, using the following SAS program statements: proc genmod; model td/total=temp / link=logit dist=binomial type1; From the SAS statements, you can see that GENMOD has a number of features in common with PROC GLM and MIXED, but a number of unique features as well. As with GLM and MIXED, the MODEL statement has the general form of 〈response variable〉=〈independent variable(s)〉. The independent variables can be direct regression variables, or they can be CLASS variables, which you use in GENMOD to create the generalized linear model analog of analysis of variance. As with GLM and MIXED, GENMOD treats independent variables as direct regression variables by default and as “ANOVA” variables only if they appear first in a CLASS statement. Examples that use the CLASS statement appear later in this chapter. For binomial response variables the syntax differs from other SAS linear model procedures. You specify the response variable as the ratio of the number of outcomes of interest (the y variable, in this case TD) divided by the number of observations per level of X (the N variable, in this case TOTAL). The binomial is unique in this respect. For other distributions, shown in examples later in this chapter, the form of the response variable is the same as other linear model procedures in SAS. To complete the model statement, you also specify the distribution of the response variable, the link function, and other options. The DIST option specifies the distribution. If you do not specify a distribution, GENMOD uses either the binomial distribution (if the response variable is a ratio, as above) or the normal distribution (for all other response variables) as the default. Several distributions are available in GENMOD. Consult SAS Online Documentation for Version 8 (1999) for a complete list. Alternatively, you can provide your own distribution or quasi-likelihood, if none of the distributions provided with GENMOD are suitable. Section 10.4.5 presents an example of a user-specified distribution. The LINK option specifies the link function. If you do not specify a link function, GENMOD will use the canonical link, that is, the link that follows naturally from the probability distribution (see Section 10.6) that you select. In this example, the logit link is the default because the ratio response variable implies the binomial distribution and the logit is its canonical link. Thus, neither the DIST=BINOMIAL nor LINK=LOGIT statements are actually needed for this example. However, it is good practice to include the DIST and LINK options even when they are not strictly necessary, if only for the sake of clarity. The TYPE1 option yields likelihood ratio test statistics for hypotheses based on Type I estimable functions, as described in Chapter X. You can also compute tests based on Type III estimable functions by using the option TYPE3. For Type 3 tests, you can use likelihood ratio statistics, the default, or you can use the WALD option to compute Wald statistics. Section 10.6 gives explanations of likelihood ratio and Wald test statistics. Several other options are also available. This chapter illustrates several of these options where appropriate. Output 10.2 shows the output generated by PROC GENMOD. Output 10.2 Basic GENMOD Output for Challenger O-Ring Logistic Regression The GENMOD Procedure Model Information Data Set Distribution Link Function Response Variable (Events) Response Variable (Trials) Observations Used Number Of Events Number Of Trials WORK.O_RING Binomial Logit td total 16 7 23 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood 14 14 14 14 11.9974 11.9974 11.1303 11.1303 -10.1576 0.8570 0.8570 0.7950 0.7950 Algorithm converged. Analysis Of Parameter Estimates Parameter DF Estimate Standard Error Intercept temp Scale 1 1 0 15.0429 -0.2322 1.0000 7.3786 0.1082 0.0000 Wald 95% Confidence Limits 0.5810 -0.4443 1.0000 29.5048 -0.0200 1.0000 ChiSquare Pr > ChiSq 4.16 4.60 0.0415 0.0320 NOTE: The scale parameter was held fixed. LR Statistics For Type 1 Analysis Source Intercept temp Deviance DF ChiSquare Pr > ChiSq 28.2672 20.3152 1 7.95 0.0048 The beginning of the output contains some basic information about the data set. You can use this output to make sure that the data were read as intended, that the correct response variable was analyzed, that the right distribution and link were used, and so forth. The first substantive output is the “Criteria for Assessing Goodness of Fit.” You can use the deviance, defined in Section 10.6, to check the fit of the model by comparing the computed deviance to a P2 distribution with 14 d.f. In this case, the deviance is 11.9974 whereas the table value of χ (14 ) at "=0.25 is 17.12, indicating no evidence of 2 lack of fit. The Pearson Chi-Square provides an alternative way to check goodness of fit. Like the deviance, the Pearson P2 also has an approximate χ (14 ) distribution. Its 2 computed value is 11.1303, similar to the deviance and also suggesting no evidence of lack of fit. The “Scaled Deviance” and “Scaled Pearson Chi-Square” are not of interest in this example. They are relevant when there is evidence of lack of fit resulting from overdispersion. Section 10.4.3 presents an example. The “Analysis of Parameter Estimates” gives the estimates of the regression parameters as well as their standard errors and confidence limits. Here, the estimated intercept is β$ 0 =15.0429 with a standard error of 7.3786. The estimated slope is β$ 1 = 0.2322 with a standard error of 0.1082. The “Chi-Square” statistics and associated p-values (“Pr > ChiSq”) given in the “Analysis of Parameter Estimates” table are Wald statistics for testing null hypotheses of zero intercept and slope. For example, the Wald P2 statistic to test H0: $1=0 is 4.60 and the p-value is 0.0320. You can also test the hypothesis of zero slope using the likelihood ratio statistic generated by the TYPE1 option and printed under “LR Statistics For Type 1 Analysis.” The likelihood ratio P2 is 7.95 and its p-value in 0.0048. The fact that likelihood ratio statistic is larger than the corresponding Wald statistic in this case is coincidental. In general, no pattern exists, and there is no compelling evidence in the literature to indicate that either statistic is preferable. 10.2.2 Using the Inverse Link to Get the Predicted Probability From the output, you can see that 15.049-0.2322*TEMP is the estimated regression equation. The regression equation allows you to compute the predicted logit for a desired temperature. For example, at 50°, the predicted logit is 15.0429-0.2322*50 = 3.4329. Typically, the logit is not of direct interest. On the other hand, the predicted probability is of interest, in this case, the probability of O-ring thermal distress occurring at a given temperature. You use the inverse link function to convert the logit to a probability. In this example, the logit link function, 0= log⎛⎜ π ⎞⎟ , hence, the inverse link is ⎝ 1- π ⎠ B= e η 1+ e η . For 50°, using η$ = 3.4329 as calculated above, the predicted probability is therefore π$ = e 3.4329 = 0.9687 . That is, according to the logistic regression estimated 1 + e 3.4329 from the data, the probability of observing primary O-ring thermal distress at 50° is 0.9687. You can convert the standard error from the link function scale to the inverse link scale using the Delta Rule. The general form of the Delta Rule for generalized linear 2 ⎡ ∂ h(η ) ⎤ models is: Var[h( η$ )] is approximately equal to ⎢ ⎥ Var(η$ ) . For the logit link, ⎣ ∂η ⎦ ∂ h(η ) = π (1 − π ) and hence the standard error of some algebra yields ∂η π$ = π$ (1 − π$ ) × s.e. (η$ ) . You can use GENMOD to compute s.e.(η$ ) , as well as η$ and related statistics, using the ESTIMATE statement. The syntax and placement of the ESTIMATE statement are similar to GLM and MIXED. Here are the statements to compute η$ for several temperatures of interest. Output 10.3 shows the results. estimate estimate estimate estimate estimate estimate Output 10.3 'logit 'logit 'logit 'logit 'logit 'logit at at at at at at 50 deg' intercept 1 60 deg' intercept 1 64.7 deg' intercept 64.8 deg' intercept 70 deg' intercept 1 80 deg' intercept 1 temp 50; temp 60; 1 temp 64.7; 1 temp 64.8; temp 70; temp 80; Estimated Logits for Various Temperatures of Interest Contrast Estimate Results Label logit logit logit logit logit logit at at at at at at 50 deg 60 deg 64.7 deg 64.8 deg 70 deg 80 deg Estimate Standard Error Alpha 3.4348 1.1131 0.0220 -0.0012 -1.2085 -3.5301 2.0232 1.0259 0.6576 0.6518 0.5953 1.4140 0.05 0.05 0.05 0.05 0.05 0.05 Confidence Limits -0.5307 -0.8975 -1.2669 -1.2788 -2.3752 -6.3014 7.4002 3.1238 1.3109 1.2764 -0.0418 -0.7588 ChiSquare 2.88 1.18 0.00 0.00 4.12 6.23 The column “Estimate” gives you the estimated logit. For “logit at 50 deg”, η$ at 50°, the computed value is 3.4348, rather than the “hand-calculated” η$ =3.4329 given above. This reflects rounding error: SAS computations involve much greater precision. From the output, you can see that the standard error of η$ at 50° is 2.0232. Using the Delta Rule, the standard error for π$ is π$ (1− π$ ) × s.e.(η$ ) = 0.9687 × (1- 0.9687) × 2.0232 = 0.0613 . In addition to η$ and s.e.( η$ ), Output 10.3 also gives upper and lower 95% confidence limits for the predicted logit. You can use the inverse link to convert these to confidence limits for the predicted probability. For example, at 50°, the lower confidence limit for 0 is –0.5307. Applying the inverse link, the lower confidence limit for B is e − 0.5307 = 0.3704. A similar computation using the upper confidence limit for 0, 1 + e − 0.5307 7.4002, yields the upper confidence limit for B, 0.9994. It is better to use the upper and lower limits for 0 and covert them using the inverse link rather than using the standard error of π$ computed from the Delta Rule. The standard error results in a symmetric interval, i.e. πˆ ± t × s.e.(π) , which is not, in general, a sensible confidence interval. The confidence interval should be asymmetric reflecting the non-linear nature of the link function. You can compute π$ , its standard error and confidence interval using the ODS output statement in GENMOD followed by program statements to implement the inverse link and Delta Rule. First, you insert the following ODS statement after the ESTIMATE statements in the GENMOD procedure: ods output estimates=logit; Then use the following statements: data prob_hat; set logit; phat=exp(estimate)/(1+exp(estimate)); se_phat=phat*(1-phat)*stderr; prb_LcL=exp(LowerCL)/(1+exp(LowerCL)); prb_UcL=exp(UpperCL)/(1+exp(UpperCL)); proc print data=prob_hat; run; The statements produce Output 10.4. Output 10.4 PROC PRINT of data set containing , s.e.( ), and upper and lower confidence limits Obs 1 2 3 4 5 6 Obs 1 2 3 4 5 6 Label logit logit logit logit logit logit at at at at at at Estimate 50 deg 60 deg 64.7 deg 64.8 deg 70 deg 80 deg ChiSq Prob ChiSq 2.88 1.18 0.00 0.00 4.12 6.23 0.0896 0.2779 0.9733 0.9985 0.0423 0.0125 3.4348 1.1131 0.0220 -0.0012 -1.2085 -3.5301 phat 0.96877 0.75271 0.50549 0.49969 0.22997 0.02847 StdErr 2.0232 1.0259 0.6576 0.6518 0.5953 1.4140 Alpha LowerCL 0.05 0.05 0.05 0.05 0.05 0.05 -0.5307 -0.8975 -1.2669 -1.2788 -2.3752 -6.3014 se_phat prb_LcL prb_UcL 0.06121 0.19095 0.16439 0.16296 0.10541 0.03911 0.37036 0.28956 0.21978 0.21775 0.08509 0.00183 0.99939 0.95786 0.78766 0.78183 0.48955 0.31891 UpperCL 7.4002 3.1238 1.3109 1.2764 -0.0418 -0.7588 The variables “phat,” “se_phat,” “prb_LcL,” and “prb_UcL” give π$ , its standard error, and the confidence limits. Output 10.3 and 10.4 also give chi-square statistics. You can use these to test π H0: 0=0 for a given temperature. In categorical data analysis, is defined as the 1= π odds and hence η$ estimates the log of the odds for a given temperature. An odds of 1, and hence a log odds of 0, means that an event is equally likely to occur or not occur. In the above output, those temperatures whose P2 and associated p-values (“ProbChiSq”) result in a failure to reject H0 are temperatures for which there is insufficient evidence to contradict the hypothesis that there is a 50-50 chance of thermal distress occurring at that temperature. Whether this hypothesis is useful depends on the context. In many cases, the confidence limits of π$ may be important. What is striking in the O-ring data is that the upper confidence limit for the likelihood of O-ring thermal distress is fairly high (considering the consequences of O-ring failure), even at 80°. When the Challenger was launched, it was 31°. One final note regarding the odds. The estimated slope β$ = -0.2322 is 1 βˆ1 e interpreted as the log odds ratio per one-unit change in X. Thus = e −0.2322 = 0.793 odds at a given temperature is the ratio defined as . An odds ratio < 1 indicates the odds odds at temperature - 1 of thermal distress decrease as temperature increases. 10.2.3 Alternative Logistic Regression Analysis Using 0-1 Data In the previous section, there was one row in the data set for each temperature level with a variable for N, the number of observations per level, and one for y, the number of outcomes with the characteristic of interest. You can also enter binomial data with one row per observation, with each observation classified by which of the two possible outcomes was observed. Output 10.5 shows the O-ring data entered in this way. Output 10.5 O-ring data entered by observation rather than by temperature level Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 launch 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 temp td 66 70 69 68 67 72 73 70 57 63 70 78 67 53 67 75 70 81 76 79 75 76 58 0 1 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 1 0 1 There are three variables for each observation: an identification for the shuttle launch (LAUNCH), the temperature at the time of launch (TEMP) and an indicator for whether or not there was thermal distress (TD=0 means no distress, TD=1 mean there was distress). You can estimate the logistic regression model using the 0-1 data with the following GENMOD statements: proc genmod; model td=temp /dist=binomial link=logit type1; These statements differ from the GENMOD program used in the previous section to obtain Output 10.2. First, the sample proportion y/N, used as the response variable to compute Output 10.2, is replaced here by TD, the 0-1 variable. Also, because TD is not a ratio response variable, you must specify DIST=BINOMIAL, or GENMOD will use the normal distribution. As before, the LINK=LOGIT statement is not necessary because the logit link is the default for the binomial distribution, but it is good form. The results appear in Output 10.6. Output 10.6 Results of PROC GENMOD Analysis of 0-1 Form of O-Ring Data The GENMOD Procedure Model Information Data Set Distribution Link Function Dependent Variable Observations Used Probability Modeled WORK.TBL_5_10 Binomial Logit td 23 Pr( td = 1 ) Response Profile Ordered Level 1 2 Ordered Value 0 1 Count 16 7 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood 21 21 21 21 20.3152 20.3152 23.1691 23.1691 -10.1576 0.9674 0.9674 1.1033 1.1033 Algorithm converged. Analysis Of Parameter Estimates Parameter DF Estimate Standard Error Intercept temp Scale 1 1 0 15.0429 -0.2322 1.0000 7.3786 0.1082 0.0000 Wald 95% Confidence Limits 0.5810 -0.4443 1.0000 29.5048 -0.0200 1.0000 ChiSquare Pr > ChiSq 4.16 4.60 0.0415 0.0320 NOTE: The scale parameter was held fixed. LR Statistics For Type 1 Analysis Source Intercept temp Deviance DF ChiSquare Pr > ChiSq 28.2672 20.3152 1 7.95 0.0048 Compared to Output 10.2, the “Model Information” is in somewhat different form, reflecting the difference between using individual outcomes of each Bernoulli response rather than the sample proportion for each temperature level. The goodness-of-fit statistics, deviance and Pearson P2, are also different because the response variable and hence the log-likelihood are not the same. Using the data in Output 10.1, there were N=16 observations, i.e. 16 sample proportions, one per temperature level, and hence the deviance had N-p=16-2=14 d.f., where p corresponds to the 2 model degrees of freedom for $0 and $1. Using the data in Output 10.5, there are N=23 distinct observations, and hence N-p = 23-2 = 21 degrees of freedom for the lack of fit statistics. The deviance and Pearson P2 are the only statistics affected by whether you use sample proportion data or 0-1 data. The “Analysis of Parameter Estimates” and likelihood ratio test statistics for the Type I test of H0: $1=0 are identical to those computed using the sample proportion data. You can also compute estimated logit for various temperatures using the same ESTIMATE statements shown previously in Section 10.3.2. The output is identical to that shown in Output 10.3. Therefore, when you apply the inverse link and Delta Rule, you use the same program statements and get the same results as those presented in Output 10.4. 10.2.4 An Alternative Link: Probit Regression As mentioned above in Section 10.2.1, the probit link is another function suitable for fitting regression and ANOVA models to binomial data. The probit model assumes that the observed Bernoulli “success” or “failure” results from an underlying, but not directly observable, normally distributed random variable. Figure 10.1 illustrates the hypothesized model. Figure 10.1 Illustration of Model Underlying Probit Link Denote the underlying, unobservable random variable by Z and suppose that Z is associated with a predictor variable X according to the linear regression equation, Z = β 0 + β1 X . Remember, you cannot observe Z; all you can observe is the consequences of Z. If Z is below a certain level, you observe a success. Otherwise, you observe a failure. The regression of Z on X models how the failure-success boundary changes with X. Figure 10.1 depicts a case for which the boundary, denoted ZX, for a given X is equal to -1.2. Thus, the area under the normal curve below ZX=-1.2 is the probability of a success for the corresponding X. As X changes, the boundary value ZX changes thereby altering the probability of a success. Formally, the standard normal cumulative distribution function, i.e. the area under the curve less than Z, is denoted Φ(z)= Z ∫− ∞ 1 − X2 2 e dx . Thus, the probit linear 2π regression model can be written π = Φ (β 0 + β1 X ) . Note that this gives the model in the form of the inverse link. You can write the probit model in terms of the link function as probit(π) = Φ −1 (π) = β 0 + β1 X , where Φ-1(π) means the Z value such that the area under the curve less than Z is π. You can fit the probit regression model to the O-ring data using the following SAS statements: proc genmod data=agr_135; model td/total=temp/ link=probit type1; Note the use of the LINK=PROBIT option but no DIST option. Because of the ratio response variable, the binomial distribution is assumed by default, but a LINK statement is required because the PROBIT link is not the default. The results appear in Output 10.7. Output 10.7 GENMOD results fitting PROBIT link to O-ring Data Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 14 14 14 14 12.0600 12.0600 10.9763 10.9763 0.8614 0.8614 0.7840 0.7840 Analysis Of Parameter Estimates Parameter DF Estimate Standard Error Intercept temp 1 1 8.7750 -0.1351 4.0286 0.0584 Wald 95% Confidence Limits 0.8790 -0.2495 ChiSquare Pr > ChiSq 4.74 5.35 0.0294 0.0207 16.6709 -0.0206 LR Statistics For Type 1 Analysis Source Intercept temp Deviance DF ChiSquare Pr > ChiSq 19.9494 12.0600 1 7.89 0.0050 The results are not strikingly different from the results of the logistic regression. The deviance is 12.060 (vs. 11.997 for the logit link) and the p-value for the likelihood ratio test of H0: β1=0 is 0.0050 (vs. 0.0320 using the logit link). The estimate of β1 is different, reflecting a different scale for the probit vs. the logit. However, the sign and conclusion regarding the effect of temperature on thermal distress is the same. You can use the ESTIMATE statements as shown in Output 8.3, to obtain predicted probits for various temperatures. You use the inverse link, Φ(estimate), to convert predicted probits to predicted probabilities. The SAS function to evaluate Φ(estimate) is PROBNORM; you use the following SAS statements to obtain the probit model analog to Output 10.4: estimate 'probit at 50 deg' intercept 1 estimate 'probit at 60 deg' intercept 1 estimate 'probit at 64.7 deg' intercept estimate 'probit at 64.8 deg' intercept estimate 'probit at 70 deg' intercept 1 estimate 'probit at 80 deg' intercept 1 ods output estimates=probit; run; temp 50; temp 60; 1 temp 64.7; 1 temp 64.8; temp 70; temp 80; data prob_hat; set probit; phat=probnorm(estimate); pi=3.14159; invsqrt=1/(sqrt(2*pi)); se_phat=invsqrt*exp(-0.5*(estimate**2))*stderr; prb_LcL=probnorm(LowerCL); prb_UcL=probnorm(UpperCL); proc print data=prob_hat; The results appear in Output 10.8. Note the form of the Delta Rule for the probit model to obtain the approximate standard error of π̂ . Thus follows from the fact that the ∂ Φ(η ) Hs.e.(0). The approximate standard error of π$1 using the Delta Rule, is ∂η ∂ η 1 − e ∫ ∂ η − ∞ 2π ∂ Φ(η ) = derivative ∂η X2 2 dx = 1 2π e − η2 2 . Output 10.8 Predicted Probits and Probabilities obtained from PROBNORM Inverse Link and Probit form of Delta Rule Obs 1 2 3 4 5 6 Obs 1 2 3 4 5 6 Label probit probit probit probit probit probit at at at at at at Estimate 50 deg 60 deg 64.7 deg 64.8 deg 70 deg 80 deg ChiSq Prob ChiSq 3.13 1.23 0.01 0.00 4.42 7.80 0.0767 0.2666 0.9312 0.9579 0.0356 0.0052 StdErr 2.0201 0.6692 0.0342 0.0207 -0.6818 -2.0328 phat 0.97832 0.74831 0.51365 0.50826 0.24768 0.02104 pi 3.14159 3.14159 3.14159 3.14159 3.14159 3.14159 1.1413 0.6024 0.3960 0.3925 0.3244 0.7277 Alpha 0.05 0.05 0.05 0.05 0.05 0.05 LowerCL -0.2167 -0.5115 -0.7420 -0.7487 -1.3175 -3.4590 UpperCL 4.2570 1.8498 0.8104 0.7901 -0.0461 -0.6066 invsqrt se_phat prb_LcL prb_UcL 0.39894 0.39894 0.39894 0.39894 0.39894 0.39894 0.05917 0.19211 0.15790 0.15657 0.10257 0.03678 0.41421 0.30450 0.22905 0.22703 0.09383 0.00027 0.99999 0.96783 0.79115 0.78526 0.48163 0.27207 Comparing Output 10.8 to the analogous output for the logistic model in Output 10.4, the estimated probabilities, approximate standard errors, and lower and upper confidence limits are similar, though not equal, for the two models. For example, for the logit model, at 50 degrees the predicted probability of thermal distress was 0.969 with an approximate standard error of 0.061, whereas for the probit model the predicted probability (PHAT) is 0.978 with an approximate standard error of 0.059. Other “discrepancies” are similarly small; you reach essentially the same conclusions about the O-ring data with either link function. In general, logit and probit models produce similar results. In fact, the logit and probit are very similar functions of B, so the fact that they produce similar results is not surprising. There are no compelling statistical reasons to choose one over the other. In some studies, you use the logistic model because its interpretation in terms of odds-ratios fits the subject matter. In other disciplines, the probit model of the mean has a theoretical basis, so the probit is preferred.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 10.2 Logistic and Probit Regression Models 10.2.1 Logistic