Download Statistics 846.3(02) Statistics 349.3(02) Lecture Notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Simple Linear
Regression Model
Estimators in Simple Linear Regression
n
̂ 
 x  x  y
i
i 1
 y
i
n

 x  x 
i 1
2
 
S xy
x
S xx
1
s 
 yi  ˆ  ˆxi
n  2 i 1
2
S xx
i
ˆ  y  ˆx  y 
n
S xy

2
and
2

S xy 
1 


S yy 
n  2 
S xx 
Sampling distributions of the
estimators
Recall that if y1, y2, y3…, yn are
1. Independent
2. Normally distributed with means m1, m2,
m3…, mn and standard deviations s1, s2,
s3…, sn
Then L = c1 y1 + c2 y2 + c3 y3 + … + cn yn
is normal with mean
m L  c1m1  c2 m2    cn mn
and standard deviation
s L  c s  c s   c s
2
1
2
1
2
2
2
2
2
n
2
n
Sampling distribution the slope
ˆ
Note :
n
̂ 
S xy

S xx
 x  x  y  y 
i
i 1
i
n
 x  x 
2
i
i 1
n
Also
 x  x  y  y 
  x  x  y  y  x  x 
i 1
i
i
n
i 1
n
n
i
i
  xi  x  yi
i 1
i 1
i
n
Thus
̂ 
where
 x  x  y
i 1
i
i
S xx

xi  x 
ci 
S xx
n
  ci yi
i 1
Hence ˆ is normal with mean
mˆ  c1m1  c2 m2   cn mn
and standard deviation
s ˆ  c s  c s    c s
2
1
2
2
2
2
2
n
2
Thus
mˆ  c1m1  c2 m2   cn mn


xn  x 
x1  x 
  x1    
  xn 



S xx

since
and
S xx
S xx
x1  x     xn  x  

S xx
x1 x1  x    xn xn  x 
x1  x     xn  x   0
x1 x1  x     x1 x1  x   S xx
Also
s ˆ  c s  c s    c s
2
1
2
2
2
2
2
n
2
 x1  x  2
 xn  x  2
 
 s  
 s
 S xx 
 S xx 
2


s
2
x1  x 
2
S xx
s
S xx
    xn  x 
2
Hence ˆ is normal with mean
m ˆ  
and standard deviation
s ˆ 
s
S xx

s
n
 x  x 
i 1
2
i
Sampling distribution of the
intercept
̂
The sampling distribution intercept of the
least squares line :
ˆ  y  ˆx  y 
S xy
S xx
x
It can be shown that ̂ has a normal
distribution with mean and standard deviation
1
mˆ   and s ˆ  s

n
x
n
2
 x  x 
i 1
2
i
n
Proof:
ˆ  y  ˆx   bi yi
i 1
1
1 xi  x x
bi   ci x  
where
n
n
S xx
Thus mˆ  b1m1  b2 m2    bn mn
1

1

   c1 x   x1       c1 x   xn 
n

n


1
  c1 x   x1  c1 x x1  
n
n

1
  cn x   xn  cn x x1
n
n
n
n
n
1
   x   ci    xi  x   ci xi
n i 1
i 1
i 1
Now ci
and

xi  x 

,
n
hence  ci  0
S xxn
i 1
n
 x x  x 
i 1
S xx
 ci xi 
i 1
i
i
1
Hence mˆ    0  x  x  
Also
s ˆ  b s  b s    b s
2
1
2
2
2
2
2
n
2
 s b  b   b
2
1
2
2
2
n
 1  xi  x x 
now
b  

S xx 
n
2
1 2 xi  x x
2  xi  x 
 2
x
2
n n S xx
S xx
2
2
i
Hence
n
1 2 x
2
bi  n 2 

n n S xx
i 1
n
x
xi  x   2

S xx
i 1
2
1 x
 
n S xx
2
and
2
1 x
s ˆ  s

n S xx
n
 x  x 
i 1
2
i
Summary
1. ˆ is normal with mean m ˆ  
and standard deviation
s ˆ 
s
S xx

s
n
 x  x 
i 1
2
i
2. ̂ is normal with mean and standard
deviation
2
1
x
mˆ   and s ˆ  s
 n
n
2
 xi  x 
i 1
Sampling distribution of the
estimate of variance
s
2
The sampling distribution of s2
n
s
y
i 1
i
 yˆ i 
n2
n
2

  y  a  bx 
2
i
i 1
i
n2

S xy 
1 

 S yy 
n  2 
S xx
2



This estimate of s is said to be based on n – 2
degrees of freedom
The sampling distribution of s2
Recall that y1, y2, … , yn are independent, normal
with mean  + xi and standard deviation s.
zi 
Let
yi    xi 
s
Then z1, z2, … , zn are independent, normal with
mean 0 and standard deviation 1, and
n
 yi    xi 
z  



s

i 1
i 1 
n
n
2
i
2
  y    x 
i 1
i
2
i
s

Has a c2 distribution with n degrees of freedom
If  and  are replaced by their estimators:
ˆ and ˆ
then
ˆx 
ˆ


y





n
U
i 1
2
i
i
s

n  2s 2

s


has a c2 distribution with n-2 degrees of freedom
Note:
 
n
E  yi  ˆ  ˆxi
E U    i 1

s

2


2


  n2 E s  n2

 
s
Thus
 
n
E  yi  ˆ  ˆxi
 i 1
and

2





n

2
s


 
E s2  s 
This verifies the statement made earlier that s2 is
an unbiased estimator of s2.
Summary
1. ˆ is normal with mean m ˆ  
and standard deviation
s ˆ 
s
S xx

s
n
 x  x 
i 1
2
i
2. ̂ is normal with mean and standard
deviation
2
1
x
mˆ   and s ˆ  s
 n
n
2
 xi  x 
i 1
Recall
ˆ is normal with mean m ˆ  
and standard deviation
s ˆ 
s
S xx

s
n
 x  x 
i 1
Therefore
z
ˆ  m ˆ
s ˆ
2
i
ˆ  

s
S xx
has a standard normal distribution
and
z
ˆ  
t

s
s
s
S xx
has a t distribution with n – 2 degrees of
freedom
(1 – )100% Confidence Limits for slope  :
ˆ  t / 2 sˆ
ˆ  t
 /2
s
S xx
t/2 critical value for the t-distribution with n – 2
degrees of freedom
Also
̂ is normal with mean mˆ  
and standard deviation
2
1 x
s ˆ  s

n S xx
Therefore
ˆ  mˆ
ˆ  
z

2
s ˆ
1 x
s

n S xx
has a standard Normal distribution
and
z

t
s
s
ˆ  
2
1 x

s
n S xx
has a t distribution with n – 2 degrees of
freedom
(1 – )100% Confidence Limits for intercept 
:
ˆ  t / 2 sˆ
2
1 x
ˆ  t / 2 s

n S xx
t/2 critical value for the t-distribution with n – 2
degrees of freedom
The following data showed the per capita consumption of cigarettes per month
(X) in various countries in 1930, and the death rates from lung cancer for men
in 1950.
TABLE : Per capita consumption of cigarettes per month (Xi) in n = 11
countries in 1930, and the death rates, Yi (per 100,000), from lung cancer for
men in 1950.
Country (i)
Australia
Canada
Denmark
Finland
Great Britain
Holland
Iceland
Norway
Sweden
Switzerland
USA
Xi
48
50
38
110
110
49
23
25
30
51
130
Yi
18
15
17
35
46
24
6
9
11
25
20
death rates from lung cancer (1950)
50
Great Britain
45
40
35
Finland
30
25
Switzerland
Holland
20
USA
Australia
Denmark
Canada
15
Sweden
Norway
Iceland
10
5
0
0
20
40
60
80
100
Per capita consumption of cigarettes
120
140
Fitting the Least Squares Line
n
 xi  664
i 1
n
 yi  226
i 1
2
x
 i  54,404
i 1
n
2
y
 i  6,018
i 1
n
x y
i 1
n
i
i
 16,914
Fitting the Least Squares Line
First compute the following three quantities:

664
 54404 
2
S xx
 14322.55
11
S yy
2

226 
 6018 
S xy

664226
 16914 
 3271.82
11
 1374.73
11
Computing Estimate of Slope and Intercept
b
S xy
S xx
3271.82

 0.288
14322.55
226
 664 
a  y  bx 
 0.288
  6.756
11
 11 

S xy 
1 
s
 S yy 
n  2 
S xx
2

  8.35

95% Confidence Limits for slope  :
ˆ  t / 2
0.288  2.262 
s
S xx
8.35
14322.55
0.0706 to 0.3862
t.025 = 2.262 critical value for the t-distribution with 9
degrees of freedom
95% Confidence Limits for intercept  :
2
1 x
ˆ  t / 2 s

n S xx
1 664 / 11
6.576  2.262 8.35

11 14322.55
2
-4.34 to 17.85
t.025 = 2.262 critical value for the t-distribution with 9
degrees of freedom
(1 – )100% Confidence Limits for a point on
the regression line  +  x0:
y
regression line
y =  +0x
 +0x0
x0
x
Let
then
yˆ 0  ˆ  ˆx0
E  yˆ 0     x0
 1 x0  x  
Var  yˆ 0   s  

n
S
xx


2
and
2
Proof:
Note
n
ˆ   ci yi where ci

xi  x 

i 1
S xx
n
and
ˆ  y  ˆx   bi yi
where
i 1
1
1 xi  x x
bi   ci x  
n
n
S xx
n
Thus
yˆ 0  ˆ  ˆx0   d i yi
i 1
1 xi  x  x0  x 
d i  bi  ci x0  
n
S xx
Thus E  yˆ 0   d1m1  d 2 m2    d n mn
 d1   x1     d n   xn 
n
n
i 1
i 1
   d i    d i xi
 1 xi  x x0  x 
di    


S xx
i 1
i 1  n


x0  x  n
xi  x   1
 1

S xx i 1
n
n
 1 xi  x x0  x 
di xi   xi  


S xx
i 1
i 1
n

n
n
n
1
  xi
n i 1

x0  x 

S xx
n
 x x  x 
i 1
 x  x0  x   x0
Thus
E  yˆ 0     x0
i
i
Also
Var  yˆ 0   d s  d s    d s
2
1

2
2
2
2
 s d  d  d
2
2
1
2
2
2
n
2
n
2

 1  xi  x  x0  x  
now d   

S xx
n

2
2
i
1 2 xi  x x0  x 
2  xi  x 
 2
 x0  x 
2
n
n
S xx
S xx
2
Hence
n
x

x
1
2
2
0
xi  x 
di  n 2 


n n S xx i 1
i 1
2 n

x0  x 
2
xi  x 


2
S xx
i 1
2
1  x0  x 
 
n
S xx
2




x

x
2 1
and
Var  yˆ 0   s   0

S xx 
n
n
(1 – )100% Confidence Limits for a point on
the regression line intercept  +  x0:
1  x0  x 
ˆ
ˆ  x0  t / 2 s 
n
S xx
2
t/2 critical value for the t-distribution with n - 2
degrees of freedom
Prediction
In linear regression model
(1 – )100% Prediction Limits for y when x = x0:
1  x0  x 
a  bx0  t / 2 s 1  
n
S xx
2
t/2 critical value for the t-distribution with n - 2
degrees of freedom