Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Simple Linear Regression Model Estimators in Simple Linear Regression n ̂ x x y i i 1 y i n x x i 1 2 S xy x S xx 1 s yi ˆ ˆxi n 2 i 1 2 S xx i ˆ y ˆx y n S xy 2 and 2 S xy 1 S yy n 2 S xx Sampling distributions of the estimators Recall that if y1, y2, y3…, yn are 1. Independent 2. Normally distributed with means m1, m2, m3…, mn and standard deviations s1, s2, s3…, sn Then L = c1 y1 + c2 y2 + c3 y3 + … + cn yn is normal with mean m L c1m1 c2 m2 cn mn and standard deviation s L c s c s c s 2 1 2 1 2 2 2 2 2 n 2 n Sampling distribution the slope ˆ Note : n ̂ S xy S xx x x y y i i 1 i n x x 2 i i 1 n Also x x y y x x y y x x i 1 i i n i 1 n n i i xi x yi i 1 i 1 i n Thus ̂ where x x y i 1 i i S xx xi x ci S xx n ci yi i 1 Hence ˆ is normal with mean mˆ c1m1 c2 m2 cn mn and standard deviation s ˆ c s c s c s 2 1 2 2 2 2 2 n 2 Thus mˆ c1m1 c2 m2 cn mn xn x x1 x x1 xn S xx since and S xx S xx x1 x xn x S xx x1 x1 x xn xn x x1 x xn x 0 x1 x1 x x1 x1 x S xx Also s ˆ c s c s c s 2 1 2 2 2 2 2 n 2 x1 x 2 xn x 2 s s S xx S xx 2 s 2 x1 x 2 S xx s S xx xn x 2 Hence ˆ is normal with mean m ˆ and standard deviation s ˆ s S xx s n x x i 1 2 i Sampling distribution of the intercept ̂ The sampling distribution intercept of the least squares line : ˆ y ˆx y S xy S xx x It can be shown that ̂ has a normal distribution with mean and standard deviation 1 mˆ and s ˆ s n x n 2 x x i 1 2 i n Proof: ˆ y ˆx bi yi i 1 1 1 xi x x bi ci x where n n S xx Thus mˆ b1m1 b2 m2 bn mn 1 1 c1 x x1 c1 x xn n n 1 c1 x x1 c1 x x1 n n 1 cn x xn cn x x1 n n n n n 1 x ci xi x ci xi n i 1 i 1 i 1 Now ci and xi x , n hence ci 0 S xxn i 1 n x x x i 1 S xx ci xi i 1 i i 1 Hence mˆ 0 x x Also s ˆ b s b s b s 2 1 2 2 2 2 2 n 2 s b b b 2 1 2 2 2 n 1 xi x x now b S xx n 2 1 2 xi x x 2 xi x 2 x 2 n n S xx S xx 2 2 i Hence n 1 2 x 2 bi n 2 n n S xx i 1 n x xi x 2 S xx i 1 2 1 x n S xx 2 and 2 1 x s ˆ s n S xx n x x i 1 2 i Summary 1. ˆ is normal with mean m ˆ and standard deviation s ˆ s S xx s n x x i 1 2 i 2. ̂ is normal with mean and standard deviation 2 1 x mˆ and s ˆ s n n 2 xi x i 1 Sampling distribution of the estimate of variance s 2 The sampling distribution of s2 n s y i 1 i yˆ i n2 n 2 y a bx 2 i i 1 i n2 S xy 1 S yy n 2 S xx 2 This estimate of s is said to be based on n – 2 degrees of freedom The sampling distribution of s2 Recall that y1, y2, … , yn are independent, normal with mean + xi and standard deviation s. zi Let yi xi s Then z1, z2, … , zn are independent, normal with mean 0 and standard deviation 1, and n yi xi z s i 1 i 1 n n 2 i 2 y x i 1 i 2 i s Has a c2 distribution with n degrees of freedom If and are replaced by their estimators: ˆ and ˆ then ˆx ˆ y n U i 1 2 i i s n 2s 2 s has a c2 distribution with n-2 degrees of freedom Note: n E yi ˆ ˆxi E U i 1 s 2 2 n2 E s n2 s Thus n E yi ˆ ˆxi i 1 and 2 n 2 s E s2 s This verifies the statement made earlier that s2 is an unbiased estimator of s2. Summary 1. ˆ is normal with mean m ˆ and standard deviation s ˆ s S xx s n x x i 1 2 i 2. ̂ is normal with mean and standard deviation 2 1 x mˆ and s ˆ s n n 2 xi x i 1 Recall ˆ is normal with mean m ˆ and standard deviation s ˆ s S xx s n x x i 1 Therefore z ˆ m ˆ s ˆ 2 i ˆ s S xx has a standard normal distribution and z ˆ t s s s S xx has a t distribution with n – 2 degrees of freedom (1 – )100% Confidence Limits for slope : ˆ t / 2 sˆ ˆ t /2 s S xx t/2 critical value for the t-distribution with n – 2 degrees of freedom Also ̂ is normal with mean mˆ and standard deviation 2 1 x s ˆ s n S xx Therefore ˆ mˆ ˆ z 2 s ˆ 1 x s n S xx has a standard Normal distribution and z t s s ˆ 2 1 x s n S xx has a t distribution with n – 2 degrees of freedom (1 – )100% Confidence Limits for intercept : ˆ t / 2 sˆ 2 1 x ˆ t / 2 s n S xx t/2 critical value for the t-distribution with n – 2 degrees of freedom The following data showed the per capita consumption of cigarettes per month (X) in various countries in 1930, and the death rates from lung cancer for men in 1950. TABLE : Per capita consumption of cigarettes per month (Xi) in n = 11 countries in 1930, and the death rates, Yi (per 100,000), from lung cancer for men in 1950. Country (i) Australia Canada Denmark Finland Great Britain Holland Iceland Norway Sweden Switzerland USA Xi 48 50 38 110 110 49 23 25 30 51 130 Yi 18 15 17 35 46 24 6 9 11 25 20 death rates from lung cancer (1950) 50 Great Britain 45 40 35 Finland 30 25 Switzerland Holland 20 USA Australia Denmark Canada 15 Sweden Norway Iceland 10 5 0 0 20 40 60 80 100 Per capita consumption of cigarettes 120 140 Fitting the Least Squares Line n xi 664 i 1 n yi 226 i 1 2 x i 54,404 i 1 n 2 y i 6,018 i 1 n x y i 1 n i i 16,914 Fitting the Least Squares Line First compute the following three quantities: 664 54404 2 S xx 14322.55 11 S yy 2 226 6018 S xy 664226 16914 3271.82 11 1374.73 11 Computing Estimate of Slope and Intercept b S xy S xx 3271.82 0.288 14322.55 226 664 a y bx 0.288 6.756 11 11 S xy 1 s S yy n 2 S xx 2 8.35 95% Confidence Limits for slope : ˆ t / 2 0.288 2.262 s S xx 8.35 14322.55 0.0706 to 0.3862 t.025 = 2.262 critical value for the t-distribution with 9 degrees of freedom 95% Confidence Limits for intercept : 2 1 x ˆ t / 2 s n S xx 1 664 / 11 6.576 2.262 8.35 11 14322.55 2 -4.34 to 17.85 t.025 = 2.262 critical value for the t-distribution with 9 degrees of freedom (1 – )100% Confidence Limits for a point on the regression line + x0: y regression line y = +0x +0x0 x0 x Let then yˆ 0 ˆ ˆx0 E yˆ 0 x0 1 x0 x Var yˆ 0 s n S xx 2 and 2 Proof: Note n ˆ ci yi where ci xi x i 1 S xx n and ˆ y ˆx bi yi where i 1 1 1 xi x x bi ci x n n S xx n Thus yˆ 0 ˆ ˆx0 d i yi i 1 1 xi x x0 x d i bi ci x0 n S xx Thus E yˆ 0 d1m1 d 2 m2 d n mn d1 x1 d n xn n n i 1 i 1 d i d i xi 1 xi x x0 x di S xx i 1 i 1 n x0 x n xi x 1 1 S xx i 1 n n 1 xi x x0 x di xi xi S xx i 1 i 1 n n n n 1 xi n i 1 x0 x S xx n x x x i 1 x x0 x x0 Thus E yˆ 0 x0 i i Also Var yˆ 0 d s d s d s 2 1 2 2 2 2 s d d d 2 2 1 2 2 2 n 2 n 2 1 xi x x0 x now d S xx n 2 2 i 1 2 xi x x0 x 2 xi x 2 x0 x 2 n n S xx S xx 2 Hence n x x 1 2 2 0 xi x di n 2 n n S xx i 1 i 1 2 n x0 x 2 xi x 2 S xx i 1 2 1 x0 x n S xx 2 x x 2 1 and Var yˆ 0 s 0 S xx n n (1 – )100% Confidence Limits for a point on the regression line intercept + x0: 1 x0 x ˆ ˆ x0 t / 2 s n S xx 2 t/2 critical value for the t-distribution with n - 2 degrees of freedom Prediction In linear regression model (1 – )100% Prediction Limits for y when x = x0: 1 x0 x a bx0 t / 2 s 1 n S xx 2 t/2 critical value for the t-distribution with n - 2 degrees of freedom