Download Lecture Notes in Statistics

Statistics for Business and Economics Dr. TANG Yu Department of Mathematics Soochow University May 28, 2007 Types of Correlation Positive correlation Negative correlation Slope 1 is positive Slope 1 is negtive No correlation Slope 1 is zero Hypothesis Test For the simple linear regression model y  0  1 x    If x and y are linearly related, we must have 1  0   We will use the sample data to test the following hypotheses about the parameter 1 H 0 : 1  0 H a : 1  0 Sampling Distribution • Just as the sampling distribution of the sample mean, X-bar, depends on the the mean, standard deviation and shape of the X population, the sampling distributions of the β0-hat and β1-hat least squares estimators depend on the properties of the {Yj } sub-populations (j=1,…, n). y j   0  1 x j   j Given xj, the properties of the {Yj } sub-population are determined by the εj error/random variable. Model Assumption As regards the probability distributions of εj ( j =1,…, n), it is assumed that: Each εj is normally distributed, Each εj has zero mean, Each εj has the same variance, σε2, iv. The errors are independent of each other, v. The error does not depend on the independent variable(s). i. ii. iii. Yj is also normal; E(Yj) = β0 + β1 xj Var(Yj) = σε2 is also constant; {Yi} and {Yj}, i  j, are also independent; The effects of X and ε on Y can be separated from each other. Graph Show E(Y) E(Y)  β0  β1X Yi : N (β0+β1xi ; σ ) Yj : N (β0+β1xj ; σ ) X xi xj The y distributions have the same shape at each x value Sum of Squares Sum of squares due to error (SSE) 2 2 2 2 ˆ ˆ ˆ ˆ SSE  1   2   3   4    Yi  Yî  2 Sum of squares due to regression (SSR)  SSR  SYY   Yî  Y  2 Total sum of squares (SST) SST   Yi  Y   SSE  SSR 2 ANOVA Table Source of Variation Sum of Degree of Squares Freedom Mean Square F MSR/MSE Regression SSR 1 MSR=SSR/1 Error SSE n-2 MSE= SSE/(n-2) Total SST n-1 Example Total Score (y) LSD Conc (x) x-xbar y-ybar Sxx Sxy Syy 78.93 1.17 -3.163 28.843 10.004569 -91.230409 831.918649 58.20 2.97 -1.363 8.113 1.857769 -11.058019 65.820769 67.47 3.26 -1.073 17.383 1.151329 -18.651959 302.168689 37.47 4.69 0.357 -12.617 0.127449 -4.504269 159.188689 45.65 5.83 1.497 -4.437 2.241009 -6.642189 19.686969 32.92 6.00 1.667 -17.167 2.778889 -28.617389 294.705889 29.97 6.41 2.077 -20.117 4.313929 -41.783009 404.693689 350.61 30.33 -0.001 0.001 22.474943 -202.487243 2078.183343 350.61 30.33 y  50.087 x  4.333 7 7 ^ ^ ^  202.4872 1   9.01  0  y   1 x  50.09  (9.01)( 4.33)  89.10 22.4749 ^ y  89.10  9.01x SSE yˆ  89.10  9.01x Yi 78.93 58.20 67.47 37.47 45.65 32.92 29.97 Yî 78.5583 62.3403 59.7274 46.8431 36.5717 35.04 31.3459 Yi  Yî 0.3717 -4.1403 7.7426 -9.3731 9.0783 -2.12 -1.3759 Y  Yˆ  2 i i 0.138161 17.14208 59.94785 87.855 82.41553 4.4944 1.893101 253.886 SST and SSR S xx  22.475 S xy  202.487 yˆ  89.10  9.01x SST  SYY  2078.183 S yy  2078.183 SSE  253.89 SSR  SST  SSE  1824.3 ANOVA Table Source of Variation Sum of Degree of Squares Freedom Mean Square F Regression 1824.3 1 MSR=1824.3 35.93 Error 253.9 5 MSE=50.78 Total 2078.2 6  As F=35.93 > 6.61, where 6.61 is the critical value for F-distribution with degrees of freedom 1 and 5 (significant level takes .05), we reject H0, and conclude that the relationship between x and y is significant Hypothesis Test For the simple linear regression model y  0  1 x    If x and y are linearly related, we must have 1  0   We will use the sample data to test the following hypotheses about the parameter 1 H 0 : 1  0 H a : 1  0 Standard Errors Standard error of estimate: the sample standard deviation of ε. SSE s  MSE  n2 Replacing σε with its estimate, sε, the estimated standard error ofβ1-hat is s sˆ   1 S xx s 2   x  x  i t-test  Hypothesis H 0 : 1  0 H a : 1  0  Test statistic ˆ1 t sˆˆ 1 where t follows a t-distribution with n-2 degrees of freedom Reject Rule  Hypothesis H 0 : 1  0 H a : 1  0  This is a two-tailed test p  value approach : Reject H 0 if p  value   Critical value approach : Reject H 0 if t  t  2 or t  t 2 Example Total Score (y) LSD Conc (x) x-xbar y-ybar Sxx Sxy Syy 78.93 1.17 -3.163 28.843 10.004569 -91.230409 831.918649 58.20 2.97 -1.363 8.113 1.857769 -11.058019 65.820769 67.47 3.26 -1.073 17.383 1.151329 -18.651959 302.168689 37.47 4.69 0.357 -12.617 0.127449 -4.504269 159.188689 45.65 5.83 1.497 -4.437 2.241009 -6.642189 19.686969 32.92 6.00 1.667 -17.167 2.778889 -28.617389 294.705889 29.97 6.41 2.077 -20.117 4.313929 -41.783009 404.693689 350.61 30.33 -0.001 0.001 22.474943 -202.487243 2078.183343 350.61 30.33 y  50.087 x  4.333 7 7 ^ ^ ^  202.4872 1   9.01  0  y   1 x  50.09  (9.01)( 4.33)  89.10 22.4749 ^ y  89.10  9.01x SSE yˆ  89.10  9.01x Yi 78.93 58.20 67.47 37.47 45.65 32.92 29.97 Yî 78.5583 62.3403 59.7274 46.8431 36.5717 35.04 31.3459 Yi  Yî 0.3717 -4.1403 7.7426 -9.3731 9.0783 -2.12 -1.3759 Y  Yˆ  2 i i 0.138161 17.14208 59.94785 87.855 82.41553 4.4944 1.893101 253.886 Calculation s  MSE  s  S xx sˆ  1 t ˆ1 sˆˆ  SSE 253.89   7.1258 n2 72 s 2   x  x  i  7.1258  1.5031 22.475  9.01  5.9943  2.571 1.5031 1 where 2.571 is the critical value for t-distribution with degree of freedom 5 (significant level takes .025), so we reject H0, and conclude that the relationship between x and y is significant Confidence Interval β1-hat is an estimator of β1 ˆ1 t sˆˆ 1 follows a t-distribution with n-2 degrees of freedom The estimated standard error ofβ1-hat is sˆ  1 s  S xx s 2   x  x  i So the C% confidence interval estimators of β1 is βˆ1  tα/ 2 ,n 2 s βˆ 1 Example The 95% confidence interval estimators of β1 in the previous example is  9.01 2.5711.5031  9.01 3.86 i.e., from –12.87 to -5.15, which does not contain 0 Regression Equation  It is believed that the longer one studied, the better one’s grade is. The final mark (Y) on study time (X) is supposed to follow the regression equation: yˆ  ˆ 0  ˆ1 x  21.590  1.877 x  If the fit of the sample regression equation is satisfactory, it can be used to estimate its mean value or to predict the dependent variable. Estimate and Predict yˆ  ˆ 0  ˆ1 x  21.590  1.877 x Estimate For the expected value of a Y sub-population. E.g.: What is the mean final mark of all those students who spent 30 hours on studying? I.e., given x = 30, how large is E(y)? Predict For a particular element of a Y sub-population. E.g.: What is the final mark of Tom who spent 30 hours on studying? I.e., given x = 30, how large is y? What Is the Same? For a given X value, the point forecast (predict) of Y and the point estimator of the mean of the {Y} sub-population are the same: yˆ  ˆ0  ˆ1 x Ex.1 Estimate the mean final mark of students who spent 30 hours on study. Ex.2 Predict the final mark of Tom, when his study time is 30 hours. yˆ  ˆ0  ˆ1 x  21.590  1.877  30  77.9 What Is the Difference? The interval prediction of Y and the interval estimation of the mean of the {Y} sub-population are different:  The prediction yˆ  t 2 s ( xg  x ) 2 1 1  n  ( xi  x ) 2  The estimation yˆ  t 2 s ( xg  x ) 2 1  n  ( xi  x ) 2 The prediction interval is wider than the confidence interval Example Total Score (y) LSD Conc (x) x-xbar y-ybar Sxx Sxy Syy 78.93 1.17 -3.163 28.843 10.004569 -91.230409 831.918649 58.20 2.97 -1.363 8.113 1.857769 -11.058019 65.820769 67.47 3.26 -1.073 17.383 1.151329 -18.651959 302.168689 37.47 4.69 0.357 -12.617 0.127449 -4.504269 159.188689 45.65 5.83 1.497 -4.437 2.241009 -6.642189 19.686969 32.92 6.00 1.667 -17.167 2.778889 -28.617389 294.705889 29.97 6.41 2.077 -20.117 4.313929 -41.783009 404.693689 350.61 30.33 -0.001 0.001 22.474943 -202.487243 2078.183343 350.61 30.33 y  50.087 x  4.333 7 7 ^ ^ ^  202.4872 1   9.01  0  y   1 x  50.09  (9.01)( 4.33)  89.10 22.4749 ^ y  89.10  9.01x SSE yˆ  89.10  9.01x Yi 78.93 58.20 67.47 37.47 45.65 32.92 29.97 Yî 78.5583 62.3403 59.7274 46.8431 36.5717 35.04 31.3459 Yi  Yî 0.3717 -4.1403 7.7426 -9.3731 9.0783 -2.12 -1.3759 Y  Yˆ  2 i i 0.138161 17.14208 59.94785 87.855 82.41553 4.4944 1.893101 253.886 Estimation and Prediction The point forecast (predict) of Y and the point estimator of the mean of the {Y} are the same: yˆ  89.10  9.01x For x g  5.0 yˆ  89.10  9.01 5.0  44.05 Estimation and Prediction But for the interval estimation and prediction, it is different: yˆ  89.10  9.01x For x g  5.0 Data Needed For xg  5.0 s  MSE  SSE 253.89   7.1258 n2 72 2   x  x  S xx  i  22.475 t.025  2.571  The prediction yˆ  t 2 s ( xg  x ) 2 1 1  n  ( xi  x ) 2  The estimation yˆ  t 2 s ( xg  x ) 2 1  n  ( xi  x ) 2 Calculation Estimation yˆ  t 2 s ( xg  x ) 2 1  n  ( xi  x ) 2 1 5.0  4.333  44.05  2.571 7.1258   7 22.475  44.05  7.3887 2 yˆ  t 2 s 1  Prediction ( xg  x ) 2 1  n  ( xi  x ) 2 1 5.0  4.333  44.05  2.571 7.1258  1   7 22.475  44.05  19.7543 2 Moving Rule  As xg moves away from x the interval becomes longer. That is, the shortest interval is found at x. x  2 x 1 x 1 x  2 x 1  n The confidence interval when xg = x ŷ  t  2 s  The confidence interval when xg = x  1 1 ŷ  t  2 s   n The confidence interval when xg = x  2 1 ŷ  t  2 s   n ( x g  x)2  ( x i  x)2   12 ( x i  x )2 22 ( x i  x )2 Moving Rule  As xg moves away from x the interval becomes longer. That is, the shortest interval is found at x. The confidence interval when xg x= The confidence interval when xg = x  1 x  2 x 1 x 1 x  2 x The confidence interval when xg = x  2 yˆ  t 2 s ( xg  x ) 2 1 1  n  ( xi  x ) 2 yˆ  t 2 s 1 12 1  n  ( xi  x ) 2 yˆ  t 2 s 1 22 1  n  ( xi  x ) 2 Interval Estimation Prediction Estimation x  2 x 1 x 1 x  2 x Residual Analysis Regression Residual – the difference between an observed y value and its corresponding predicted value r  y  yˆ Properties The The of Regression Residual mean of the residuals equals zero standard deviation of the residuals is equal to the standard deviation of the fitted regression model Example yˆ  89.10  9.01x Score (y) LSD Conc (x) y-hat residual(r) 78.93 1.17 78.558 0.3717 58.20 2.97 62.34 -4.1403 67.47 3.26 59.727 7.7426 37.47 4.69 46.843 -9.3731 45.65 5.83 36.572 9.0783 32.92 6.00 35.04 -2.12 29.97 6.41 31.346 -1.3759 Residual Plot Against x r x Residual Plot Against y-hat r ŷ Three Situations Good Pattern Non-constant Variance Model form not adequate Standardized Residual Standard deviation of the ith residual  where s yi  yî  s 1  hi s yi  yˆ i  the standard deviation of residual i s  the standard error of the estimate  xi  x  1 hi   n  x j  x 2 2  Standardized residual for observation i yi  yˆ i zi  s yi  yˆ i Standardized Residual Plot z x Standardized Residual  The standardized residual plot can provide insight about the assumption that the error term has a normal distribution  If the assumption is satisfied, the distribution of the standardized residuals should appear to come from a standard normal probability distribution  It is expected to see approximately 95% of the standardized residuals between –2 and +2 Detecting Outlier Outlier Influential Observation Outlier Influential Observation Influential observation High Leverage Points  Leverage of observation  xi  x  1 hi   n  x j  x 2 2  For example 10 10 15 20 20 25 70 x  24.2857 2 2 6 6   xi  x  1 1 70  24.2857     .86 hi      .94 2 2 n 7 n  x j  x  7  xi  24.2857  Contact Information  Tang   Yu (唐煜) [email protected] http://math.suda.edu.cn/homepage/tangy

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture Notes in Statistics