Download Chap. 11: Simple Linear Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Probability Distribution
of Random Error
EPI 809/Spring 2008
1
Regression Modeling Steps
 1.
Hypothesize Deterministic Component
 2.
Estimate Unknown Model Parameters
 3.
Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error
 4.
Evaluate Model
 5.
Use Model for Prediction & Estimation
EPI 809/Spring 2008
2
Linear Regression Assumptions
Assumptions of errors 1, ..., n
- Gauss-Markov condition
1.
2.
3.
4.
5.
Independent errors
Mean of probability distribution of errors
is 0
Errors have constant variance σ2, for
which an estimator is S2
Probability distribution of error is normal
Potential violation of G-M condition.
EPI 809/Spring 2008
3
Error
Probability Distribution
f()
Y
X2
X1
X
EPI 809/Spring 2008
4
Random Error Variation
EPI 809/Spring 2008
5
Random Error Variation
 1.
Variation of Actual Y from Predicted Y
EPI 809/Spring 2008
6
Random Error Variation
 1.
Variation of Actual Y from Predicted Y
 2. Measured by Standard Error of
Regression Model

Sample Standard Deviation of , s^
EPI 809/Spring 2008
7
Random Error Variation
 1.
Variation of Actual Y from Predicted Y
 2. Measured by Standard Error of
Regression Model

 3.


Sample Standard Deviation of , ^s
Affects Several Factors
Parameter Significance
Prediction Accuracy
EPI 809/Spring 2008
8
Evaluating the Model
Testing for Significance
EPI 809/Spring 2008
9
Regression Modeling Steps
 1. Hypothesize Deterministic Component

2. Estimate Unknown Model Parameters

3. Specify Probability Distribution of Random
Error Term

Estimate Standard Deviation of Error

4. Evaluate Model

5. Use Model for Prediction & Estimation
EPI 809/Spring 2008
10
Test of Slope Coefficient
 1. Shows If There Is a Linear Relationship
Between X & Y

2. Involves Population Slope 1

3. Hypotheses



H0: 1 = 0 (No Linear Relationship)
Ha: 1  0 (Linear Relationship)
4. Theoretical basis of the test statistic is the
sampling distribution of slope
EPI 809/Spring 2008
11
Sampling Distribution
of Sample Slopes
EPI 809/Spring 2008
12
Sampling Distribution
of Sample Slopes
Y
Sample 1 Line
Sample 2 Line
Population Line
X
EPI 809/Spring 2008
13
Sampling Distribution
of Sample Slopes
Y

Sample 1 Line
Sample 2 Line
Population Line
X




EPI 809/Spring 2008
All Possible
Sample Slopes
Sampl
e 1:
2.5
Sampl
e 2:
1.6
Sampl
e 3:
1.8
Sampl
e 4:
2.1
:
:
Very large number
of sample slopes14
Sampling Distribution
of Sample Slopes

Y
Sample 1 Line
Sample 2 Line
Population Line
X
Sampling Distribution
S^1
1
^
1
EPI 809/Spring 2008
All Possible
Sample Slopes

Samp
le 1:
2.5

Samp
le 2:
1.6

Samp
le 3:
1.8

Samp
le 4:
2.1
:
:
large number of
sample slopes
15
Slope Coefficient Test Statistic
ˆ  
t  1 1 where S 
ˆ
S
1
ˆ
1
SSE
with S  ˆ 
n2
S
 n

  X 
i
n 2 
 X  i 1 
i
n
i 1
2
and SSE   Yi  Yˆi    Yi  ˆ0  ˆ1 X i 
n
i 1
2
n
2
i 1
EPI 809/Spring 2008
16
Test of Slope Coefficient
Rejection Rule
 Reject
H0 in favor of Ha if t falls in colored
area
Reject H0
Reject H0
α/2
α/2
-t1-α/2, (n-2)
 Reject
0
t1-α/2, (n-2)
T=t(n-2)
H0 for Ha if P-value = P(T>|t|) < α
EPI 809/Spring 2008
17
Test of Slope Coefficient
Example

Reconsider the Obstetrics example with the
following data:
Estriol (mg/24h) B.w. (g/1000)
1
1
2
1
3
2
4
2
5
4
 Is the Linear Relationship between
Estriol & Birthweight significant at .05 level?
EPI 809/Spring 2008
18
Solution Table For β’s
Xi
Yi
Xi2
Yi2
XiYi
1
1
1
1
1
2
1
4
1
2
3
2
9
4
6
4
2
16
4
8
5
4
25
16
20
15
10
55
26
37
EPI 809/Spring 2008
19
Solution Table for SSE
Birth weight
=y
Estriol
=x
(Obs-pred)2
=( y - y)
^2
Predicted
=y=β
^ ^0+ ^β1x
1
1
0.6
0.16
1
2
1.3
0.09
2
3
2
0
2
4
2.7
0.49
4
5
3.4
0.36
10
15
-
SSE=1.1
EPI 809/Spring 2008
20
Test of Slope Parameter
Solution





H0: 1 = 0
Ha: 1  0
  .05
df  5 - 2 = 3
Critical Value(s):
Reject
.025
Test Statistic:
Reject
.025
-3.1824 0 3.1824
t
EPI 809/Spring 2008
21
Test Statistic
Solution
ˆ1  1 0.70  0
t

 3.656
S ˆ
0.1915
1
where S ˆ 
1
S
X 

i
n
2  i 1

 Xi 
i 1
n
n
2

0.60553

153
55 
 0.1915
5
From Table
SSE
1.1
with S 

 0.60553
n2
52
EPI 809/Spring 2008
22
Test of Slope Parameter
H0: 1 = 0
Test Statistic:
 Ha: 1  0
 1   1 0.70  0
t

 3.656
   .05
S
0.1915
1
 df  5 - 2 = 3
 Critical Value(s):
Decision:
Reject
Reject
Reject at  = .05

.025
.025
-3.1824 0 3.1824
t
Conclusion:
There is evidence of a
linear relationship
EPI 809/Spring 2008
23
Test of Slope Parameter
Computer Output

Variable
Intercept
Estriol

Parameter Estimates
DF
Parameter
Estimate
1
1
-0.10000
0.70000
^
k
Standard
Error t Value
0.63509
0.19149
S^
-0.16
3.66
Pr > |t|
0.8849
0.0354
^
t = k / S^
k
k
P-Value
EPI 809/Spring 2008
24
Measures of Variation
in Regression
 1.

 2.

 3.

Total Sum of Squares (SSyy)
Measures Variation of Observed Yi Around the
MeanY
Explained Variation (SSR)
Variation Due to Relationship Between
X&Y
Unexplained Variation (SSE)
Variation Due to Other Factors
EPI 809/Spring 2008
25
Variation Measures
Y
Yi
Total sum
of squares
(Yi -Y)2
Unexplained sum
^ )2
of squares (Yi - Y
i
Yi   0   1X i
Explained sum of
^
squares (Yi -Y)2
Y
Xi
EPI 809/Spring 2008
X
26
Coefficient of Determination
Proportion of Variation ‘Explained’ by
Relationship Between X & Y
 1.
0  r2  1
Explained Variation
r 
Total Variation
2
ˆ




Y

Y

Y

Y


n

i 1
n
2
i
2
i
i 1
 Y  Y 
n
i 1
2
i
EPI 809/Spring 2008
27
Coefficient of Determination
Examples
Y
Y
r2 = 1
r2 = 1
X
Y
X
Y
r2 = .8
X
EPI 809/Spring 2008
r2 = 0
X
28
Coefficient of Determination
Example

Reconsider the Obstetrics example. Interpret a
coefficient of Determination of 0.8167.

Answer: About 82% of the
total variation of birthweight
Is explained by the mother’s
Estriol level.
EPI 809/Spring 2008
29
r 2 Computer Output
r2
Root MSE
0.60553
R-Square
0.8167
Dependent Mean
Coeff Var
2.00000
30.27650
Adj R-Sq
0.7556
S
r2 adjusted for number
of explanatory variables
& sample size
 N-1 
Adj R-Sq=1- 1-Rsquare  
.
- 1
 N - k 30
EPI 809/Spring 2008
Using the Model for
Prediction & Estimation
EPI 809/Spring 2008
31
Regression Modeling Steps
 1.
Hypothesize Deterministic Component
 2.
Estimate Unknown Model Parameters
 3.
Specify Probability Distribution of Random
Error Term-Estimate Standard Deviation of
Error
 4.
Evaluate Model
 5.
Use Model for Prediction & Estimation
EPI 809/Spring 2008
32
Prediction With Regression
Models
What Is Predicted?

Population Mean Response E(Y) for Given X
• Point on Population Regression Line

Individual Response (Yi) for Given X
EPI 809/Spring 2008
33
What Is Predicted?
Y
YIndividual
Mean Y, E(Y)
^ 0 +
^Y i=
^ 1X
E(Y) =  0 +  1X
Prediction,^Y
X
XP
EPI 809/Spring 2008
34
Confidence Interval Estimate of
Mean Y
Yˆ  t n  2, / 2  SYˆ  E (Y )  Yˆ  t n  2, / 2  SYˆ
where
1
SYˆ  S

n
X  X 
 X  X 
2
p
n
i 1
2
i
EPI 809/Spring 2008
35
Factors Affecting
Interval Width
 1.

 2.

 3.

 4.

Level of Confidence (1 - )
Width Increases as Confidence Increases
Data Dispersion (s)
Width Increases as Variation Increases
Sample Size
Width Decreases as Sample Size Increases
Distance of Xp from MeanX
Width Increases as Distance Increases
EPI 809/Spring 2008
36
Why Distance from Mean?
Y
m
a
S
_
Y
1
e
l
p
e
n
i
L
Sample 2
X1
X
EPI 809/Spring 2008
Greater
dispersion
than X1
Line
X2
X
37
Confidence Interval
Estimate Example

Reconsider the Obstetrics example with the
following data:
Estriol (mg/24h) B.w. (g/1000)
1
1
2
1
3
2
4
2
5
4
 Estimate the mean BW and a subject’s BW
response when the Estriol level is 4 at .05 level.
EPI 809/Spring 2008
38
Solution Table
Xi
Yi
Xi2
Yi2
XiYi
1
1
1
1
1
2
1
4
1
2
3
2
9
4
6
4
2
16
4
8
5
4
25
16
20
15
10
55
26
37
EPI 809/Spring 2008
39
Confidence Interval Estimate
Solution - Mean BW
Yˆ  t n  2, / 2  SYˆ  E (Y )  Yˆ  t n  2, / 2  SYˆ
Yˆ  0.1  0.7 4  2.7
X to be predicted
1 4  3
SYˆ  .60553 
 0.3316
5
10
2
2.7  3.1824 0.3316   E (Y )  2.7  3.18240.3316
1.6445  E (Y )  3.7553
EPI 809/Spring 2008
40
Prediction Interval of Individual
Response
Yˆ  tn  2, / 2  S Y Yˆ   YP  Yˆ  t n  2, / 2  S Y Yˆ 
where
1
S Y Yˆ   S 1  
n
X  X 
 X  X 
2
P
n
i 1
2
i
Note!
EPI 809/Spring 2008
41
Why the Extra ‘S’?
Y
Y we're trying to
predict

Expected
(Mean) Y
+
^

^= 0
^ 1X i
Yi
E(Y) =  0 +  1X
Prediction, ^
Y
X
XP
EPI 809/Spring 2008
42
SAS codes for computing mean
and prediction intervals













Data BW; /*Reading data in SAS*/
input estriol birthw;
cards;
1
1
2
1
3
2
4
2
5
4
;
run;
PROC REG data=BW; /*Fitting a linear regression model*/
model birthw=estriol/CLI CLM alpha=.05;
run;
EPI 809/Spring 2008
43
Interval Estimate from SASOutput
The REG Procedure
Dependent Variable: y
Output Statistics
Dep Var Predicted
Std Error
Obs
y
Value Mean Predict 95% CL Mean
95% CL Predict
1
2
3
4
5
1.0000
1.0000
2.0000
2.0000
4.0000
0.6000
1.3000
2.0000
2.7000
3.4000
Predicted Y
when X = 3
0.4690
0.3317
0.2708
0.3317
0.4690
SY^
-0.8927
0.2445
1.1382
1.6445
1.9073
2.0927 -1.8376 3.0376
2.3555 -0.8972 3.4972
2.8618 -0.1110 4.1110
3.7555 0.5028 4.8972
4.8927 0.9624 5.8376
Confidence
Interval
EPI 809/Spring 2008
Residual
0.4000
-0.3000
0
-0.7000
0.6000
Prediction
Interval
44
Hyperbolic Interval Bands
Y
^
^= 0
Xi
^

1
+
Yi
_
X
EPI 809/Spring 2008
X
XP
45
Correlation Models
EPI 809/Spring 2008
46
Types of
Probabilistic Models
Probabilistic
Models
Regression
Models
Correlation
Models
EPI 809/Spring 2008
Other
Models
47
Correlation vs. regression
 Both
variables are treated the same in
correlation; in regression there is a predictor
and a response
 In
regression the x variable is assumed nonrandom or measured without error
 Correlation
is used in looking for relationships,
regression for prediction
EPI 809/Spring 2008
48
Correlation Models
Answer ‘How Strong Is the Linear
Relationship Between 2 Variables?’
 2. Coefficient of Correlation Used
 1.



 3.
Population Correlation Coefficient Denoted
 (Rho)
Values Range from -1 to +1
Measures Degree of Association
Used Mainly for Understanding
EPI 809/Spring 2008
49
Sample Coefficient
of Correlation
 1.
Pearson Product Moment Coefficient of
Correlation between x and y:
n
r

 X i  X Yi  Y 
i 1
n

X i  X 
i 1
2

n

Yi  Y 
2

SS xy
SS xx SS yy
i 1
EPI 809/Spring 2008
50
Coefficient of Correlation
Values
-1.0
-.5
0
EPI 809/Spring 2008
+.5
+1.0
51
Coefficient of Correlation
Values
No
Correlation
-1.0
-.5
0
EPI 809/Spring 2008
+.5
+1.0
52
Coefficient of Correlation
Values
No
Correlation
-1.0
-.5
0
+.5
+1.0
Increasing degree of
negative correlation
EPI 809/Spring 2008
53
Coefficient of Correlation
Values
Perfect
Negative
Correlation
-1.0
No
Correlation
-.5
0
EPI 809/Spring 2008
+.5
+1.0
54
Coefficient of Correlation
Values
Perfect
Negative
Correlation
-1.0
No
Correlation
-.5
0
+.5
+1.0
Increasing degree of
positive correlation
EPI 809/Spring 2008
55
Coefficient of Correlation
Values
Perfect
Negative
Correlation
-1.0
Perfect
Positive
Correlation
No
Correlation
-.5
0
EPI 809/Spring 2008
+.5
+1.0
56
Coefficient of Correlation
Examples
Y
Y
r=1
r = -1
X
Y
r = .89
X
Y
X
EPI 809/Spring 2008
r=0
X
57
Test of
Coefficient of Correlation
 1.
Shows If There Is a Linear Relationship
Between 2 Numerical Variables
 2. Same Conclusion as Testing
Population Slope 1
 3. Hypotheses


H0:  = 0 (No Correlation)
Ha:   0 (Correlation)
EPI 809/Spring 2008
58
1 Sample t-Test on
Correlation Coefficient
 Hypotheses



H0:  = 0 (No Correlation)
Ha:   0 (Correlation)
test statistic: under H0


t = r (n-2)1/2 / (1-r2)1/2 ~ t (n-2)
Reject H0 if |t| > tα/2, n-2
EPI 809/Spring 2008
59
1 Sample Z-Test on
Correlation Coefficient

Hypotheses (Fisher)



H0:  = 0
Ha:   0
test statistic: under H0:
1  1 r 
2
z  ln 
~
N
(

,

)

2  1 r 
1
1  1  0 
2
  ln 
  
n3
2  1  0 

Reject H0 if |z| > z 1-α/2
EPI 809/Spring 2008
60
Conclusion
1.
Describe the Linear Regression Model
2.
State the Regression Modeling Steps
3.
Explain Ordinary Least Squares
4.
Compute Regression Coefficients
5.
Understand and check model assumptions
6.
Predict Response Variable
7.
Comments of SAS Output
EPI 809/Spring 2008
61
Conclusion …
8.
Correlation Models
9.
Test of coefficient of Correlation
EPI 809/Spring 2008
62
Related documents