Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Distribution of Random Error EPI 809/Spring 2008 1 Regression Modeling Steps 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation EPI 809/Spring 2008 2 Linear Regression Assumptions Assumptions of errors 1, ..., n - Gauss-Markov condition 1. 2. 3. 4. 5. Independent errors Mean of probability distribution of errors is 0 Errors have constant variance σ2, for which an estimator is S2 Probability distribution of error is normal Potential violation of G-M condition. EPI 809/Spring 2008 3 Error Probability Distribution f() Y X2 X1 X EPI 809/Spring 2008 4 Random Error Variation EPI 809/Spring 2008 5 Random Error Variation 1. Variation of Actual Y from Predicted Y EPI 809/Spring 2008 6 Random Error Variation 1. Variation of Actual Y from Predicted Y 2. Measured by Standard Error of Regression Model Sample Standard Deviation of , s^ EPI 809/Spring 2008 7 Random Error Variation 1. Variation of Actual Y from Predicted Y 2. Measured by Standard Error of Regression Model 3. Sample Standard Deviation of , ^s Affects Several Factors Parameter Significance Prediction Accuracy EPI 809/Spring 2008 8 Evaluating the Model Testing for Significance EPI 809/Spring 2008 9 Regression Modeling Steps 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation EPI 809/Spring 2008 10 Test of Slope Coefficient 1. Shows If There Is a Linear Relationship Between X & Y 2. Involves Population Slope 1 3. Hypotheses H0: 1 = 0 (No Linear Relationship) Ha: 1 0 (Linear Relationship) 4. Theoretical basis of the test statistic is the sampling distribution of slope EPI 809/Spring 2008 11 Sampling Distribution of Sample Slopes EPI 809/Spring 2008 12 Sampling Distribution of Sample Slopes Y Sample 1 Line Sample 2 Line Population Line X EPI 809/Spring 2008 13 Sampling Distribution of Sample Slopes Y Sample 1 Line Sample 2 Line Population Line X EPI 809/Spring 2008 All Possible Sample Slopes Sampl e 1: 2.5 Sampl e 2: 1.6 Sampl e 3: 1.8 Sampl e 4: 2.1 : : Very large number of sample slopes14 Sampling Distribution of Sample Slopes Y Sample 1 Line Sample 2 Line Population Line X Sampling Distribution S^1 1 ^ 1 EPI 809/Spring 2008 All Possible Sample Slopes Samp le 1: 2.5 Samp le 2: 1.6 Samp le 3: 1.8 Samp le 4: 2.1 : : large number of sample slopes 15 Slope Coefficient Test Statistic ˆ t 1 1 where S ˆ S 1 ˆ 1 SSE with S ˆ n2 S n X i n 2 X i 1 i n i 1 2 and SSE Yi Yˆi Yi ˆ0 ˆ1 X i n i 1 2 n 2 i 1 EPI 809/Spring 2008 16 Test of Slope Coefficient Rejection Rule Reject H0 in favor of Ha if t falls in colored area Reject H0 Reject H0 α/2 α/2 -t1-α/2, (n-2) Reject 0 t1-α/2, (n-2) T=t(n-2) H0 for Ha if P-value = P(T>|t|) < α EPI 809/Spring 2008 17 Test of Slope Coefficient Example Reconsider the Obstetrics example with the following data: Estriol (mg/24h) B.w. (g/1000) 1 1 2 1 3 2 4 2 5 4 Is the Linear Relationship between Estriol & Birthweight significant at .05 level? EPI 809/Spring 2008 18 Solution Table For β’s Xi Yi Xi2 Yi2 XiYi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 EPI 809/Spring 2008 19 Solution Table for SSE Birth weight =y Estriol =x (Obs-pred)2 =( y - y) ^2 Predicted =y=β ^ ^0+ ^β1x 1 1 0.6 0.16 1 2 1.3 0.09 2 3 2 0 2 4 2.7 0.49 4 5 3.4 0.36 10 15 - SSE=1.1 EPI 809/Spring 2008 20 Test of Slope Parameter Solution H0: 1 = 0 Ha: 1 0 .05 df 5 - 2 = 3 Critical Value(s): Reject .025 Test Statistic: Reject .025 -3.1824 0 3.1824 t EPI 809/Spring 2008 21 Test Statistic Solution ˆ1 1 0.70 0 t 3.656 S ˆ 0.1915 1 where S ˆ 1 S X i n 2 i 1 Xi i 1 n n 2 0.60553 153 55 0.1915 5 From Table SSE 1.1 with S 0.60553 n2 52 EPI 809/Spring 2008 22 Test of Slope Parameter H0: 1 = 0 Test Statistic: Ha: 1 0 1 1 0.70 0 t 3.656 .05 S 0.1915 1 df 5 - 2 = 3 Critical Value(s): Decision: Reject Reject Reject at = .05 .025 .025 -3.1824 0 3.1824 t Conclusion: There is evidence of a linear relationship EPI 809/Spring 2008 23 Test of Slope Parameter Computer Output Variable Intercept Estriol Parameter Estimates DF Parameter Estimate 1 1 -0.10000 0.70000 ^ k Standard Error t Value 0.63509 0.19149 S^ -0.16 3.66 Pr > |t| 0.8849 0.0354 ^ t = k / S^ k k P-Value EPI 809/Spring 2008 24 Measures of Variation in Regression 1. 2. 3. Total Sum of Squares (SSyy) Measures Variation of Observed Yi Around the MeanY Explained Variation (SSR) Variation Due to Relationship Between X&Y Unexplained Variation (SSE) Variation Due to Other Factors EPI 809/Spring 2008 25 Variation Measures Y Yi Total sum of squares (Yi -Y)2 Unexplained sum ^ )2 of squares (Yi - Y i Yi 0 1X i Explained sum of ^ squares (Yi -Y)2 Y Xi EPI 809/Spring 2008 X 26 Coefficient of Determination Proportion of Variation ‘Explained’ by Relationship Between X & Y 1. 0 r2 1 Explained Variation r Total Variation 2 ˆ Y Y Y Y n i 1 n 2 i 2 i i 1 Y Y n i 1 2 i EPI 809/Spring 2008 27 Coefficient of Determination Examples Y Y r2 = 1 r2 = 1 X Y X Y r2 = .8 X EPI 809/Spring 2008 r2 = 0 X 28 Coefficient of Determination Example Reconsider the Obstetrics example. Interpret a coefficient of Determination of 0.8167. Answer: About 82% of the total variation of birthweight Is explained by the mother’s Estriol level. EPI 809/Spring 2008 29 r 2 Computer Output r2 Root MSE 0.60553 R-Square 0.8167 Dependent Mean Coeff Var 2.00000 30.27650 Adj R-Sq 0.7556 S r2 adjusted for number of explanatory variables & sample size N-1 Adj R-Sq=1- 1-Rsquare . - 1 N - k 30 EPI 809/Spring 2008 Using the Model for Prediction & Estimation EPI 809/Spring 2008 31 Regression Modeling Steps 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term-Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation EPI 809/Spring 2008 32 Prediction With Regression Models What Is Predicted? Population Mean Response E(Y) for Given X • Point on Population Regression Line Individual Response (Yi) for Given X EPI 809/Spring 2008 33 What Is Predicted? Y YIndividual Mean Y, E(Y) ^ 0 + ^Y i= ^ 1X E(Y) = 0 + 1X Prediction,^Y X XP EPI 809/Spring 2008 34 Confidence Interval Estimate of Mean Y Yˆ t n 2, / 2 SYˆ E (Y ) Yˆ t n 2, / 2 SYˆ where 1 SYˆ S n X X X X 2 p n i 1 2 i EPI 809/Spring 2008 35 Factors Affecting Interval Width 1. 2. 3. 4. Level of Confidence (1 - ) Width Increases as Confidence Increases Data Dispersion (s) Width Increases as Variation Increases Sample Size Width Decreases as Sample Size Increases Distance of Xp from MeanX Width Increases as Distance Increases EPI 809/Spring 2008 36 Why Distance from Mean? Y m a S _ Y 1 e l p e n i L Sample 2 X1 X EPI 809/Spring 2008 Greater dispersion than X1 Line X2 X 37 Confidence Interval Estimate Example Reconsider the Obstetrics example with the following data: Estriol (mg/24h) B.w. (g/1000) 1 1 2 1 3 2 4 2 5 4 Estimate the mean BW and a subject’s BW response when the Estriol level is 4 at .05 level. EPI 809/Spring 2008 38 Solution Table Xi Yi Xi2 Yi2 XiYi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 EPI 809/Spring 2008 39 Confidence Interval Estimate Solution - Mean BW Yˆ t n 2, / 2 SYˆ E (Y ) Yˆ t n 2, / 2 SYˆ Yˆ 0.1 0.7 4 2.7 X to be predicted 1 4 3 SYˆ .60553 0.3316 5 10 2 2.7 3.1824 0.3316 E (Y ) 2.7 3.18240.3316 1.6445 E (Y ) 3.7553 EPI 809/Spring 2008 40 Prediction Interval of Individual Response Yˆ tn 2, / 2 S Y Yˆ YP Yˆ t n 2, / 2 S Y Yˆ where 1 S Y Yˆ S 1 n X X X X 2 P n i 1 2 i Note! EPI 809/Spring 2008 41 Why the Extra ‘S’? Y Y we're trying to predict Expected (Mean) Y + ^ ^= 0 ^ 1X i Yi E(Y) = 0 + 1X Prediction, ^ Y X XP EPI 809/Spring 2008 42 SAS codes for computing mean and prediction intervals Data BW; /*Reading data in SAS*/ input estriol birthw; cards; 1 1 2 1 3 2 4 2 5 4 ; run; PROC REG data=BW; /*Fitting a linear regression model*/ model birthw=estriol/CLI CLM alpha=.05; run; EPI 809/Spring 2008 43 Interval Estimate from SASOutput The REG Procedure Dependent Variable: y Output Statistics Dep Var Predicted Std Error Obs y Value Mean Predict 95% CL Mean 95% CL Predict 1 2 3 4 5 1.0000 1.0000 2.0000 2.0000 4.0000 0.6000 1.3000 2.0000 2.7000 3.4000 Predicted Y when X = 3 0.4690 0.3317 0.2708 0.3317 0.4690 SY^ -0.8927 0.2445 1.1382 1.6445 1.9073 2.0927 -1.8376 3.0376 2.3555 -0.8972 3.4972 2.8618 -0.1110 4.1110 3.7555 0.5028 4.8972 4.8927 0.9624 5.8376 Confidence Interval EPI 809/Spring 2008 Residual 0.4000 -0.3000 0 -0.7000 0.6000 Prediction Interval 44 Hyperbolic Interval Bands Y ^ ^= 0 Xi ^ 1 + Yi _ X EPI 809/Spring 2008 X XP 45 Correlation Models EPI 809/Spring 2008 46 Types of Probabilistic Models Probabilistic Models Regression Models Correlation Models EPI 809/Spring 2008 Other Models 47 Correlation vs. regression Both variables are treated the same in correlation; in regression there is a predictor and a response In regression the x variable is assumed nonrandom or measured without error Correlation is used in looking for relationships, regression for prediction EPI 809/Spring 2008 48 Correlation Models Answer ‘How Strong Is the Linear Relationship Between 2 Variables?’ 2. Coefficient of Correlation Used 1. 3. Population Correlation Coefficient Denoted (Rho) Values Range from -1 to +1 Measures Degree of Association Used Mainly for Understanding EPI 809/Spring 2008 49 Sample Coefficient of Correlation 1. Pearson Product Moment Coefficient of Correlation between x and y: n r X i X Yi Y i 1 n X i X i 1 2 n Yi Y 2 SS xy SS xx SS yy i 1 EPI 809/Spring 2008 50 Coefficient of Correlation Values -1.0 -.5 0 EPI 809/Spring 2008 +.5 +1.0 51 Coefficient of Correlation Values No Correlation -1.0 -.5 0 EPI 809/Spring 2008 +.5 +1.0 52 Coefficient of Correlation Values No Correlation -1.0 -.5 0 +.5 +1.0 Increasing degree of negative correlation EPI 809/Spring 2008 53 Coefficient of Correlation Values Perfect Negative Correlation -1.0 No Correlation -.5 0 EPI 809/Spring 2008 +.5 +1.0 54 Coefficient of Correlation Values Perfect Negative Correlation -1.0 No Correlation -.5 0 +.5 +1.0 Increasing degree of positive correlation EPI 809/Spring 2008 55 Coefficient of Correlation Values Perfect Negative Correlation -1.0 Perfect Positive Correlation No Correlation -.5 0 EPI 809/Spring 2008 +.5 +1.0 56 Coefficient of Correlation Examples Y Y r=1 r = -1 X Y r = .89 X Y X EPI 809/Spring 2008 r=0 X 57 Test of Coefficient of Correlation 1. Shows If There Is a Linear Relationship Between 2 Numerical Variables 2. Same Conclusion as Testing Population Slope 1 3. Hypotheses H0: = 0 (No Correlation) Ha: 0 (Correlation) EPI 809/Spring 2008 58 1 Sample t-Test on Correlation Coefficient Hypotheses H0: = 0 (No Correlation) Ha: 0 (Correlation) test statistic: under H0 t = r (n-2)1/2 / (1-r2)1/2 ~ t (n-2) Reject H0 if |t| > tα/2, n-2 EPI 809/Spring 2008 59 1 Sample Z-Test on Correlation Coefficient Hypotheses (Fisher) H0: = 0 Ha: 0 test statistic: under H0: 1 1 r 2 z ln ~ N ( , ) 2 1 r 1 1 1 0 2 ln n3 2 1 0 Reject H0 if |z| > z 1-α/2 EPI 809/Spring 2008 60 Conclusion 1. Describe the Linear Regression Model 2. State the Regression Modeling Steps 3. Explain Ordinary Least Squares 4. Compute Regression Coefficients 5. Understand and check model assumptions 6. Predict Response Variable 7. Comments of SAS Output EPI 809/Spring 2008 61 Conclusion … 8. Correlation Models 9. Test of coefficient of Correlation EPI 809/Spring 2008 62