Download Statistics 846.3(02) Statistics 349.3(02) Lecture Notes

The Simple Linear Regression Model Estimators in Simple Linear Regression n ̂   x  x  y i i 1  y i n   x  x  i 1 2   S xy x S xx 1 s   yi  ˆ  ˆxi n  2 i 1 2 S xx i ˆ  y  ˆx  y  n S xy  2 and 2  S xy  1    S yy  n  2  S xx  Sampling distributions of the estimators Recall that if y1, y2, y3…, yn are 1. Independent 2. Normally distributed with means m1, m2, m3…, mn and standard deviations s1, s2, s3…, sn Then L = c1 y1 + c2 y2 + c3 y3 + … + cn yn is normal with mean m L  c1m1  c2 m2    cn mn and standard deviation s L  c s  c s   c s 2 1 2 1 2 2 2 2 2 n 2 n Sampling distribution the slope ˆ Note : n ̂  S xy  S xx  x  x  y  y  i i 1 i n  x  x  2 i i 1 n Also  x  x  y  y    x  x  y  y  x  x  i 1 i i n i 1 n n i i   xi  x  yi i 1 i 1 i n Thus ̂  where  x  x  y i 1 i i S xx  xi  x  ci  S xx n   ci yi i 1 Hence ˆ is normal with mean mˆ  c1m1  c2 m2   cn mn and standard deviation s ˆ  c s  c s    c s 2 1 2 2 2 2 2 n 2 Thus mˆ  c1m1  c2 m2   cn mn   xn  x  x1  x    x1       xn     S xx  since and S xx S xx x1  x     xn  x    S xx x1 x1  x    xn xn  x  x1  x     xn  x   0 x1 x1  x     x1 x1  x   S xx Also s ˆ  c s  c s    c s 2 1 2 2 2 2 2 n 2  x1  x  2  xn  x  2    s    s  S xx   S xx  2   s 2 x1  x  2 S xx s S xx     xn  x  2 Hence ˆ is normal with mean m ˆ   and standard deviation s ˆ  s S xx  s n  x  x  i 1 2 i Sampling distribution of the intercept ̂ The sampling distribution intercept of the least squares line : ˆ  y  ˆx  y  S xy S xx x It can be shown that ̂ has a normal distribution with mean and standard deviation 1 mˆ   and s ˆ  s  n x n 2  x  x  i 1 2 i n Proof: ˆ  y  ˆx   bi yi i 1 1 1 xi  x x bi   ci x   where n n S xx Thus mˆ  b1m1  b2 m2    bn mn 1  1     c1 x   x1       c1 x   xn  n  n   1   c1 x   x1  c1 x x1   n n  1   cn x   xn  cn x x1 n n n n n 1    x   ci    xi  x   ci xi n i 1 i 1 i 1 Now ci and  xi  x   , n hence  ci  0 S xxn i 1 n  x x  x  i 1 S xx  ci xi  i 1 i i 1 Hence mˆ    0  x  x   Also s ˆ  b s  b s    b s 2 1 2 2 2 2 2 n 2  s b  b   b 2 1 2 2 2 n  1  xi  x x  now b    S xx  n 2 1 2 xi  x x 2  xi  x   2 x 2 n n S xx S xx 2 2 i Hence n 1 2 x 2 bi  n 2   n n S xx i 1 n x xi  x   2  S xx i 1 2 1 x   n S xx 2 and 2 1 x s ˆ  s  n S xx n  x  x  i 1 2 i Summary 1. ˆ is normal with mean m ˆ   and standard deviation s ˆ  s S xx  s n  x  x  i 1 2 i 2. ̂ is normal with mean and standard deviation 2 1 x mˆ   and s ˆ  s  n n 2  xi  x  i 1 Sampling distribution of the estimate of variance s 2 The sampling distribution of s2 n s y i 1 i  yˆ i  n2 n 2    y  a  bx  2 i i 1 i n2  S xy  1    S yy  n  2  S xx 2    This estimate of s is said to be based on n – 2 degrees of freedom The sampling distribution of s2 Recall that y1, y2, … , yn are independent, normal with mean  + xi and standard deviation s. zi  Let yi    xi  s Then z1, z2, … , zn are independent, normal with mean 0 and standard deviation 1, and n  yi    xi  z      s  i 1 i 1  n n 2 i 2   y    x  i 1 i 2 i s  Has a c2 distribution with n degrees of freedom If  and  are replaced by their estimators: ˆ and ˆ then ˆx  ˆ   y      n U i 1 2 i i s  n  2s 2  s   has a c2 distribution with n-2 degrees of freedom Note:   n E  yi  ˆ  ˆxi E U    i 1  s  2   2     n2 E s  n2    s Thus   n E  yi  ˆ  ˆxi  i 1 and  2      n  2 s     E s2  s  This verifies the statement made earlier that s2 is an unbiased estimator of s2. Summary 1. ˆ is normal with mean m ˆ   and standard deviation s ˆ  s S xx  s n  x  x  i 1 2 i 2. ̂ is normal with mean and standard deviation 2 1 x mˆ   and s ˆ  s  n n 2  xi  x  i 1 Recall ˆ is normal with mean m ˆ   and standard deviation s ˆ  s S xx  s n  x  x  i 1 Therefore z ˆ  m ˆ s ˆ 2 i ˆ    s S xx has a standard normal distribution and z ˆ   t  s s s S xx has a t distribution with n – 2 degrees of freedom (1 – )100% Confidence Limits for slope  : ˆ  t / 2 sˆ ˆ  t  /2 s S xx t/2 critical value for the t-distribution with n – 2 degrees of freedom Also ̂ is normal with mean mˆ   and standard deviation 2 1 x s ˆ  s  n S xx Therefore ˆ  mˆ ˆ   z  2 s ˆ 1 x s  n S xx has a standard Normal distribution and z  t s s ˆ   2 1 x  s n S xx has a t distribution with n – 2 degrees of freedom (1 – )100% Confidence Limits for intercept  : ˆ  t / 2 sˆ 2 1 x ˆ  t / 2 s  n S xx t/2 critical value for the t-distribution with n – 2 degrees of freedom The following data showed the per capita consumption of cigarettes per month (X) in various countries in 1930, and the death rates from lung cancer for men in 1950. TABLE : Per capita consumption of cigarettes per month (Xi) in n = 11 countries in 1930, and the death rates, Yi (per 100,000), from lung cancer for men in 1950. Country (i) Australia Canada Denmark Finland Great Britain Holland Iceland Norway Sweden Switzerland USA Xi 48 50 38 110 110 49 23 25 30 51 130 Yi 18 15 17 35 46 24 6 9 11 25 20 death rates from lung cancer (1950) 50 Great Britain 45 40 35 Finland 30 25 Switzerland Holland 20 USA Australia Denmark Canada 15 Sweden Norway Iceland 10 5 0 0 20 40 60 80 100 Per capita consumption of cigarettes 120 140 Fitting the Least Squares Line n  xi  664 i 1 n  yi  226 i 1 2 x  i  54,404 i 1 n 2 y  i  6,018 i 1 n x y i 1 n i i  16,914 Fitting the Least Squares Line First compute the following three quantities:  664  54404  2 S xx  14322.55 11 S yy 2  226   6018  S xy  664226  16914   3271.82 11  1374.73 11 Computing Estimate of Slope and Intercept b S xy S xx 3271.82   0.288 14322.55 226  664  a  y  bx   0.288   6.756 11  11   S xy  1  s  S yy  n  2  S xx 2    8.35  95% Confidence Limits for slope  : ˆ  t / 2 0.288  2.262  s S xx 8.35 14322.55 0.0706 to 0.3862 t.025 = 2.262 critical value for the t-distribution with 9 degrees of freedom 95% Confidence Limits for intercept  : 2 1 x ˆ  t / 2 s  n S xx 1 664 / 11 6.576  2.262 8.35  11 14322.55 2 -4.34 to 17.85 t.025 = 2.262 critical value for the t-distribution with 9 degrees of freedom (1 – )100% Confidence Limits for a point on the regression line  +  x0: y regression line y =  +0x  +0x0 x0 x Let then yˆ 0  ˆ  ˆx0 E  yˆ 0     x0  1 x0  x   Var  yˆ 0   s    n S xx   2 and 2 Proof: Note n ˆ   ci yi where ci  xi  x   i 1 S xx n and ˆ  y  ˆx   bi yi where i 1 1 1 xi  x x bi   ci x   n n S xx n Thus yˆ 0  ˆ  ˆx0   d i yi i 1 1 xi  x  x0  x  d i  bi  ci x0   n S xx Thus E  yˆ 0   d1m1  d 2 m2    d n mn  d1   x1     d n   xn  n n i 1 i 1    d i    d i xi  1 xi  x x0  x  di       S xx i 1 i 1  n   x0  x  n xi  x   1  1  S xx i 1 n n  1 xi  x x0  x  di xi   xi     S xx i 1 i 1 n  n n n 1   xi n i 1  x0  x   S xx n  x x  x  i 1  x  x0  x   x0 Thus E  yˆ 0     x0 i i Also Var  yˆ 0   d s  d s    d s 2 1  2 2 2 2  s d  d  d 2 2 1 2 2 2 n 2 n 2   1  xi  x  x0  x   now d     S xx n  2 2 i 1 2 xi  x x0  x  2  xi  x   2  x0  x  2 n n S xx S xx 2 Hence n x  x 1 2 2 0 xi  x  di  n 2    n n S xx i 1 i 1 2 n  x0  x  2 xi  x    2 S xx i 1 2 1  x0  x    n S xx 2     x  x 2 1 and Var  yˆ 0   s   0  S xx  n n (1 – )100% Confidence Limits for a point on the regression line intercept  +  x0: 1  x0  x  ˆ ˆ  x0  t / 2 s  n S xx 2 t/2 critical value for the t-distribution with n - 2 degrees of freedom Prediction In linear regression model (1 – )100% Prediction Limits for y when x = x0: 1  x0  x  a  bx0  t / 2 s 1   n S xx 2 t/2 critical value for the t-distribution with n - 2 degrees of freedom

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistics 846.3(02) Statistics 349.3(02) Lecture Notes