Download Lecture Notes in Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics for Business and
Economics
Dr. TANG Yu
Department of Mathematics
Soochow University
May 28, 2007
Types of Correlation
Positive correlation
Negative correlation
Slope 1 is positive
Slope
1 is negtive
No correlation
Slope
1 is zero
Hypothesis Test
For the simple linear regression
model y  0  1 x  
 If x and y are linearly related, we
must have 1  0


We will use the sample data to
test the following hypotheses
about the parameter 1
H 0 : 1  0 H a : 1  0
Sampling Distribution
•
Just as the sampling distribution of the sample
mean, X-bar, depends on the the mean, standard
deviation and shape of the X population, the
sampling distributions of the β0-hat and β1-hat least
squares estimators depend on the properties of the
{Yj } sub-populations (j=1,…, n).
y j   0  1 x j   j
Given xj, the properties of the {Yj } sub-population
are determined by the εj error/random variable.
Model Assumption
As regards the probability distributions of εj ( j
=1,…, n), it is assumed that:
Each εj is normally distributed,
Each εj has zero mean,
Each εj has the same
variance, σε2,
iv. The errors are independent of
each other,
v. The error does not depend on
the independent variable(s).
i.
ii.
iii.
Yj is also normal;
E(Yj) = β0 + β1 xj
Var(Yj) = σε2 is also
constant;
{Yi} and {Yj}, i  j, are also
independent;
The effects of X and ε on Y
can be separated from
each other.
Graph Show
E(Y)
E(Y)  β0  β1X
Yi : N (β0+β1xi ; σ )
Yj : N (β0+β1xj ; σ )
X
xi
xj
The y distributions have the same shape at each x value
Sum of Squares
Sum of squares due to error (SSE)
2
2
2
2
ˆ
ˆ
ˆ
ˆ
SSE  1   2   3   4

  Yi  Yˆi

2
Sum of squares due to regression (SSR)

SSR  SYY   Yˆi  Y

2
Total sum of squares (SST)
SST   Yi  Y   SSE  SSR
2
ANOVA Table
Source of
Variation
Sum of Degree of
Squares Freedom
Mean Square
F
MSR/MSE
Regression
SSR
1
MSR=SSR/1
Error
SSE
n-2
MSE=
SSE/(n-2)
Total
SST
n-1
Example
Total
Score (y)
LSD Conc (x)
x-xbar
y-ybar
Sxx
Sxy
Syy
78.93
1.17
-3.163
28.843
10.004569
-91.230409
831.918649
58.20
2.97
-1.363
8.113
1.857769
-11.058019
65.820769
67.47
3.26
-1.073
17.383
1.151329
-18.651959
302.168689
37.47
4.69
0.357
-12.617
0.127449
-4.504269
159.188689
45.65
5.83
1.497
-4.437
2.241009
-6.642189
19.686969
32.92
6.00
1.667
-17.167
2.778889
-28.617389
294.705889
29.97
6.41
2.077
-20.117
4.313929
-41.783009
404.693689
350.61
30.33
-0.001
0.001
22.474943
-202.487243 2078.183343
350.61
30.33
y
 50.087
x
 4.333
7
7
^
^
^
 202.4872
1 
 9.01
 0  y   1 x  50.09  (9.01)( 4.33)  89.10
22.4749
^
y  89.10  9.01x
SSE
yˆ  89.10  9.01x
Yi
78.93
58.20
67.47
37.47
45.65
32.92
29.97
Yˆi
78.5583
62.3403
59.7274
46.8431
36.5717
35.04
31.3459
Yi  Yˆi
0.3717
-4.1403
7.7426
-9.3731
9.0783
-2.12
-1.3759
Y  Yˆ 
2
i
i
0.138161
17.14208
59.94785
87.855
82.41553
4.4944
1.893101
253.886
SST and SSR
S xx  22.475
S xy  202.487
yˆ  89.10  9.01x
SST  SYY  2078.183
S yy  2078.183
SSE  253.89
SSR  SST  SSE  1824.3
ANOVA Table
Source of
Variation
Sum of Degree of
Squares Freedom
Mean Square
F
Regression
1824.3
1
MSR=1824.3
35.93
Error
253.9
5
MSE=50.78
Total
2078.2
6

As F=35.93 > 6.61, where 6.61 is the critical
value for F-distribution with degrees of freedom 1
and 5 (significant level takes .05), we reject H0,
and conclude that the relationship between x and
y is significant
Hypothesis Test
For the simple linear regression
model y  0  1 x  
 If x and y are linearly related, we
must have 1  0


We will use the sample data to
test the following hypotheses
about the parameter 1
H 0 : 1  0 H a : 1  0
Standard Errors
Standard error of estimate: the sample standard
deviation of ε.
SSE
s  MSE 
n2
Replacing σε with its estimate, sε, the
estimated standard error ofβ1-hat is
s
sˆ 

1
S xx
s
2


x

x
 i
t-test

Hypothesis
H 0 : 1  0 H a : 1  0

Test statistic
ˆ1
t
sˆˆ
1
where t follows a t-distribution with
n-2 degrees of freedom
Reject Rule

Hypothesis
H 0 : 1  0 H a : 1  0

This is a two-tailed test
p  value approach : Reject H 0 if p  value  
Critical value approach : Reject H 0 if t  t  2 or t  t 2
Example
Total
Score (y)
LSD Conc (x)
x-xbar
y-ybar
Sxx
Sxy
Syy
78.93
1.17
-3.163
28.843
10.004569
-91.230409
831.918649
58.20
2.97
-1.363
8.113
1.857769
-11.058019
65.820769
67.47
3.26
-1.073
17.383
1.151329
-18.651959
302.168689
37.47
4.69
0.357
-12.617
0.127449
-4.504269
159.188689
45.65
5.83
1.497
-4.437
2.241009
-6.642189
19.686969
32.92
6.00
1.667
-17.167
2.778889
-28.617389
294.705889
29.97
6.41
2.077
-20.117
4.313929
-41.783009
404.693689
350.61
30.33
-0.001
0.001
22.474943
-202.487243 2078.183343
350.61
30.33
y
 50.087
x
 4.333
7
7
^
^
^
 202.4872
1 
 9.01
 0  y   1 x  50.09  (9.01)( 4.33)  89.10
22.4749
^
y  89.10  9.01x
SSE
yˆ  89.10  9.01x
Yi
78.93
58.20
67.47
37.47
45.65
32.92
29.97
Yˆi
78.5583
62.3403
59.7274
46.8431
36.5717
35.04
31.3459
Yi  Yˆi
0.3717
-4.1403
7.7426
-9.3731
9.0783
-2.12
-1.3759
Y  Yˆ 
2
i
i
0.138161
17.14208
59.94785
87.855
82.41553
4.4944
1.893101
253.886
Calculation
s  MSE 
s

S xx
sˆ 
1
t
ˆ1
sˆˆ

SSE
253.89

 7.1258
n2
72
s
2


x

x
 i

7.1258
 1.5031
22.475
 9.01
 5.9943  2.571
1.5031
1
where 2.571 is the critical value for t-distribution
with degree of freedom 5 (significant level
takes .025), so we reject H0, and conclude that
the relationship between x and y is significant
Confidence Interval
β1-hat is an estimator of β1
ˆ1
t
sˆˆ
1
follows a t-distribution with
n-2 degrees of freedom
The estimated standard error ofβ1-hat is
sˆ 
1
s

S xx
s
2


x

x
 i
So the C% confidence interval
estimators of β1 is
βˆ1  tα/ 2 ,n 2 s βˆ
1
Example
The 95% confidence interval estimators
of β1 in the previous example is
 9.01 2.5711.5031  9.01 3.86
i.e., from –12.87 to -5.15, which does not
contain 0
Regression Equation

It is believed that the longer one studied, the better
one’s grade is. The final mark (Y) on study time (X)
is supposed to follow the regression equation:
yˆ  ˆ 0  ˆ1 x  21.590  1.877 x

If the fit of the sample regression equation is
satisfactory, it can be used to estimate its mean
value or to predict the dependent variable.
Estimate and Predict
yˆ  ˆ 0  ˆ1 x  21.590  1.877 x
Estimate
For the expected value of a
Y sub-population.
E.g.: What is the mean final
mark of all those students who
spent 30 hours on studying?
I.e., given x = 30, how large is
E(y)?
Predict
For a particular element of
a Y sub-population.
E.g.: What is the final mark of
Tom who spent 30 hours on
studying?
I.e., given x = 30, how large is y?
What Is the Same?
For a given X value, the point forecast (predict)
of Y and the point estimator of the mean of the
{Y} sub-population are the same:
yˆ  ˆ0  ˆ1 x
Ex.1 Estimate the mean final mark of
students who spent 30 hours on study.
Ex.2 Predict the final mark of Tom, when his
study time is 30 hours.
yˆ  ˆ0  ˆ1 x  21.590  1.877  30  77.9
What Is the Difference?
The interval prediction of Y and the interval
estimation of the mean of the {Y} sub-population
are different:

The prediction
yˆ  t 2 s
( xg  x ) 2
1
1 
n  ( xi  x ) 2

The estimation
yˆ  t 2 s
( xg  x ) 2
1

n  ( xi  x ) 2
The prediction interval is wider than the
confidence interval
Example
Total
Score (y)
LSD Conc (x)
x-xbar
y-ybar
Sxx
Sxy
Syy
78.93
1.17
-3.163
28.843
10.004569
-91.230409
831.918649
58.20
2.97
-1.363
8.113
1.857769
-11.058019
65.820769
67.47
3.26
-1.073
17.383
1.151329
-18.651959
302.168689
37.47
4.69
0.357
-12.617
0.127449
-4.504269
159.188689
45.65
5.83
1.497
-4.437
2.241009
-6.642189
19.686969
32.92
6.00
1.667
-17.167
2.778889
-28.617389
294.705889
29.97
6.41
2.077
-20.117
4.313929
-41.783009
404.693689
350.61
30.33
-0.001
0.001
22.474943
-202.487243 2078.183343
350.61
30.33
y
 50.087
x
 4.333
7
7
^
^
^
 202.4872
1 
 9.01
 0  y   1 x  50.09  (9.01)( 4.33)  89.10
22.4749
^
y  89.10  9.01x
SSE
yˆ  89.10  9.01x
Yi
78.93
58.20
67.47
37.47
45.65
32.92
29.97
Yˆi
78.5583
62.3403
59.7274
46.8431
36.5717
35.04
31.3459
Yi  Yˆi
0.3717
-4.1403
7.7426
-9.3731
9.0783
-2.12
-1.3759
Y  Yˆ 
2
i
i
0.138161
17.14208
59.94785
87.855
82.41553
4.4944
1.893101
253.886
Estimation and Prediction
The point forecast (predict) of Y
and the point estimator of the
mean of the {Y} are the same:
yˆ  89.10  9.01x
For
x g  5.0
yˆ  89.10  9.01 5.0  44.05
Estimation and Prediction
But for the interval estimation
and prediction, it is different:
yˆ  89.10  9.01x
For
x g  5.0
Data Needed
For
xg  5.0
s  MSE 
SSE
253.89

 7.1258
n2
72
2


x

x
 S xx
 i
 22.475
t.025  2.571

The prediction
yˆ  t 2 s
( xg  x ) 2
1
1 
n  ( xi  x ) 2

The estimation
yˆ  t 2 s
( xg  x ) 2
1

n  ( xi  x ) 2
Calculation
Estimation
yˆ  t 2 s
( xg  x ) 2
1

n  ( xi  x ) 2
1 5.0  4.333
 44.05  2.571 7.1258 

7
22.475
 44.05  7.3887
2
yˆ  t 2 s 1 
Prediction
( xg  x ) 2
1

n  ( xi  x ) 2
1 5.0  4.333
 44.05  2.571 7.1258  1  
7
22.475
 44.05  19.7543
2
Moving Rule

As xg moves away from x the interval
becomes longer. That is, the shortest
interval is found at x.
x  2 x 1 x 1 x  2
x
1

n
The confidence interval
when xg = x
ŷ  t  2 s 
The confidence interval
when xg = x  1
1
ŷ  t  2 s 

n
The confidence interval
when xg = x  2
1
ŷ  t  2 s 

n
( x g  x)2

( x i  x)2


12
( x i  x )2
22
( x i  x )2
Moving Rule

As xg moves away from x the interval
becomes longer. That is, the shortest
interval is found at x.
The confidence interval
when xg x=
The confidence interval
when xg = x  1
x  2 x 1 x 1 x  2
x
The confidence interval
when xg = x  2
yˆ  t 2 s
( xg  x ) 2
1
1 
n  ( xi  x ) 2
yˆ  t 2 s
1
12
1 
n  ( xi  x ) 2
yˆ  t 2 s
1
22
1 
n  ( xi  x ) 2
Interval Estimation
Prediction
Estimation
x  2 x 1 x 1 x  2
x
Residual Analysis
Regression
Residual – the difference
between an observed y value and its
corresponding predicted value
r  y  yˆ
Properties
The
The
of Regression Residual
mean of the residuals equals zero
standard deviation of the residuals is
equal to the standard deviation of the fitted
regression model
Example
yˆ  89.10  9.01x
Score (y)
LSD Conc (x)
y-hat
residual(r)
78.93
1.17
78.558
0.3717
58.20
2.97
62.34
-4.1403
67.47
3.26
59.727
7.7426
37.47
4.69
46.843
-9.3731
45.65
5.83
36.572
9.0783
32.92
6.00
35.04
-2.12
29.97
6.41
31.346
-1.3759
Residual Plot Against x
r
x
Residual Plot Against y-hat
r
ŷ
Three Situations
Good Pattern
Non-constant
Variance
Model form
not adequate
Standardized Residual
Standard deviation of the ith residual

where
s yi  yˆi  s 1  hi
s yi  yˆ i  the standard deviation of residual i
s  the standard error of the estimate

xi  x 
1
hi  
n  x j  x 2
2

Standardized residual for observation i
yi  yˆ i
zi 
s yi  yˆ i
Standardized Residual Plot
z
x
Standardized Residual

The standardized residual plot can provide
insight about the assumption that the
error term has a normal distribution

If the assumption is satisfied, the
distribution of the standardized residuals
should appear to come from a standard
normal probability distribution

It is expected to see approximately 95%
of the standardized residuals between –2
and +2
Detecting Outlier
Outlier
Influential Observation
Outlier
Influential Observation
Influential
observation
High Leverage Points

Leverage of observation

xi  x 
1
hi  
n  x j  x 2
2

For example
10 10 15 20 20 25 70
x  24.2857
2
2
6 6


xi  x 
1
1
70  24.2857 

  .86
hi  
 
 .94
2
2
n 7
n  x j  x  7  xi  24.2857 
Contact Information
 Tang


Yu (唐煜)
[email protected]
http://math.suda.edu.cn/homepage/tangy
Related documents