Download Lecture 16

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Univariate Linear Regression
Problem
•
•
•
•
Model: Y=b0+b1X+e
Test: H0: β1=0.
Alternative: H1: β1>0.
The distribution of Y is normal under both
null and alternative.
• Under null, var(Y)=σ02.
• Under alternative, β1>0, and var(Y)=σ12.
Step 1: Choose the test statistic
and specify its null distribution
• Use conditions of the null to find:
bˆ1 ~ N (0,

n
 (x
i 1
i
2
0
 xn )
).
2
Bringing sample size into
regression design
• The sample size n is hidden in the
regression results. That is, let:
n
 (x
i 1
i
 x n )  n .
2
2
X
Step 2: Define the critical value
• For the univariate linear regression test:
CV  0 | z |
0
( 0 /  X )
 0 | z |
.
2
n
n X
Step 3: Define the Rejection Rule
• Each test is a right sided test, and so the rule
is to reject when the test statistic is greater
than the critical value.
Step 4: Specify the Distribution
of Test Statistic under Alternative
• Use conditions of the null to find:
bˆ1 ~ N ( E1 ,
 12 /  X2
n
).
Step 5: Define a Type II Error
• For the univariate linear regression test:
( 0 /  X )
ˆ
b1  CV  0 | z |
.
n
Step 6: Find β
• For a univariate linear regression test:
( 0 /  X )
(0 | z |
 E1 )
ˆ
ˆ
( b1  E ( b1 ))
n
b  Pr1{

}.
( 1 /  X )
 ( bˆ1 )
n
Basic Insight
• Notice that all three problems have the same
basic structure.
• That is, if you understand the solution of the
one sample test, then you can derive the
answer to the other problems.
Step 7: Phrase requirement on β
• For example, we seek to “choose n so that
β=0.01.”
• That is, “choose n so that Pr1{Accept
H0}=β=0.01.
Step 7: Phrase requirement on β
• For example, we seek to “choose n so that
 0 / X
(0 | z |
ˆ
ˆ
( b1  E ( b1 ))
n
Pr1{

1 / X
 ( bˆ1 )
n
 E1 )
}  b.
Step 7: Phrase requirement on β
• Notice the parallel phrasing:
Pr{Z   | z b |}  b .
Step 7: Phrase requirement on β
• That is, “choose n so that (note that E0=0):
( E0  | z |
 0 / X
n
1 / X
n
 E1 )
  | zb | .
Step 7: Phrase requirement on β
• That is, choose n so that (after algebraic
clearing out):
0
1
( E1  E0 ) n | z |
 | zb |
.
X
X
Step 8: State the conclusion
• The result for a left sided test has to be
worked through but is similar. You must
remember to keep all entries positive. This
is reasonable if both α and β are constrained
to be less than or equal to 0.5. The
restriction is not a hardship in practice.
Univariate Linear Regression
• Note that the σ0 factor is changed to σ0/σX.
• There is a similar adjustment for the
alternative standard deviation.
Example Problem Group
• Two hundred values of an independent
variable xi are chosen so that Σ(xi-xbar)2 is
equal to 400,000. For each setting of xi, the
random variable Yi=β0+β1xi+σZi is
observed. Here β0 and β1 are fixed but
unknown parameters, σ=400, and the Zi are
independent standard normal random
variables.
Example Problem Group
• The null hypothesis to be tested is H0: β1=0,
α=0.01, and the alternative is H1: β1<0. The
random variable B1 is the OLS estimate of
β1.
Example Question 1
• When H0 is true, what is the standard
deviation of B1, the OLS estimate of the
slope?
• Var(B1)=σ2/Σ(xi-xbar)2=4002/400,000=0.4.
• sd(B1)=0.632.
Example Question 2
• What is the probability of a Type II error in
the test specified in the common section
using B1, the OLS estimator of the slope, as
test statistic when β1=-4, α=0.01, σ=400,
and Σ(xi-xbar)2 is equal to 400,000?
Solution to Question 2
• The critical value is 0-2.326(0.632)=-1.47
• A Type II error occurs when B1>-1.47.
• Under alternative B1 is normal with
expected value -4 and standard deviation
(error) 0.632.
• Pr{B1>-1.47}=Pr{Z>(-1.47-(-4))/0.632}
=Pr{Z>4.00}=.000032
• The answer is 0.000032.
Example Question 3
• How many observations n are necessary so
that the probability of a Type II error in the
test specified in the common section when
β1=-4, α=0.01, σ=400, and Σ(xi-xbarn)2 is
equal to 2,000n?
Outline of Solution to Problem 3
•
•
•
•
•
•
For σo term, use (4002/2000)0.5=8.94.
Use same value for σ1 term.
Use |z0.01|=2.326.
Use |E1-E0|=|-4-0|=4.
Square root of sample size is 10.39.
Sample size is 109 or more.
Chapter 21: Residual Analysis
• If the assumptions in regression are
violated:
– Residuals are one way of checking model:
Ri = Yi - Fitted value at xi
Checking the Assumptions
– Check for normality (test of normality,
histogram, q-q plots)
– Check variance if it is the same for all values of
the independent variable (plot residuals against
predicted values)
– Check independence (plot residuals against
sequence variable)
– Check for linearity (plot dependent variable
against independent variable)
Residual Plots
• Plot residuals against independent variable.
– Plot should be flat indicating the same variance.
– There should be no fanning out pattern.
– Check for influential observations.
• Plot residuals against predicted variable.
– For univariate regression this is the same as the
above plot. There should be no pattern.
What to do if problem?
• Can look for transformations of either
independent or dependent variable or both.
• Using computer this is easy: compute
option from menu bar.
Influential Points
• An easier way to look for points that have a
large impact on the slope is to plot the
change in slope against an arbitrary case
sequence number.
Example
• Data set in the web page
• aim: predict final exam score from midterm
score
• dependent variable: final exam score
• independent variable: midterm score
• model, check assumptions, predict
700
final examination score
600
500
400
300
200
0
score on first exam
100
200
300
Output
• Model: Y= b0 + b1 X + e
• R2 = 0.508
• F statistics=60.91, Significance=0.0
 b1=1.391117, t statistic=7.805,
Significance=0.0
 b0=238.95, t statistic=8.329,
Significance=0.0
200
100
Residual
0
-100
-200
300
Predicted Value
400
500
600
14
12
10
8
6
4
2
Std. Dev = 66.68
Mean = 0.0
0
N = 61.00
-160.0 -120.0
-80.0
-40.0
0.0
40.0
80.0
120.0
-140.0 -100.0
-60.0
-20.0
20.0
60.0
100.0
Residual
Normal Q-Q Plot of Residual
3
Expected Normal Value
2
1
0
-1
-2
-3
-200
-100
Observed Value
0
100
200
Next Class
• Multiple Regression!
• Check web site for your data file