* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Lecture 16
Survey
Document related concepts
Transcript
Univariate Linear Regression
Problem
•
•
•
•
Model: Y=b0+b1X+e
Test: H0: β1=0.
Alternative: H1: β1>0.
The distribution of Y is normal under both
null and alternative.
• Under null, var(Y)=σ02.
• Under alternative, β1>0, and var(Y)=σ12.
Step 1: Choose the test statistic
and specify its null distribution
• Use conditions of the null to find:
bˆ1 ~ N (0,
n
(x
i 1
i
2
0
xn )
).
2
Bringing sample size into
regression design
• The sample size n is hidden in the
regression results. That is, let:
n
(x
i 1
i
x n ) n .
2
2
X
Step 2: Define the critical value
• For the univariate linear regression test:
CV 0 | z |
0
( 0 / X )
0 | z |
.
2
n
n X
Step 3: Define the Rejection Rule
• Each test is a right sided test, and so the rule
is to reject when the test statistic is greater
than the critical value.
Step 4: Specify the Distribution
of Test Statistic under Alternative
• Use conditions of the null to find:
bˆ1 ~ N ( E1 ,
12 / X2
n
).
Step 5: Define a Type II Error
• For the univariate linear regression test:
( 0 / X )
ˆ
b1 CV 0 | z |
.
n
Step 6: Find β
• For a univariate linear regression test:
( 0 / X )
(0 | z |
E1 )
ˆ
ˆ
( b1 E ( b1 ))
n
b Pr1{
}.
( 1 / X )
( bˆ1 )
n
Basic Insight
• Notice that all three problems have the same
basic structure.
• That is, if you understand the solution of the
one sample test, then you can derive the
answer to the other problems.
Step 7: Phrase requirement on β
• For example, we seek to “choose n so that
β=0.01.”
• That is, “choose n so that Pr1{Accept
H0}=β=0.01.
Step 7: Phrase requirement on β
• For example, we seek to “choose n so that
0 / X
(0 | z |
ˆ
ˆ
( b1 E ( b1 ))
n
Pr1{
1 / X
( bˆ1 )
n
E1 )
} b.
Step 7: Phrase requirement on β
• Notice the parallel phrasing:
Pr{Z | z b |} b .
Step 7: Phrase requirement on β
• That is, “choose n so that (note that E0=0):
( E0 | z |
0 / X
n
1 / X
n
E1 )
| zb | .
Step 7: Phrase requirement on β
• That is, choose n so that (after algebraic
clearing out):
0
1
( E1 E0 ) n | z |
| zb |
.
X
X
Step 8: State the conclusion
• The result for a left sided test has to be
worked through but is similar. You must
remember to keep all entries positive. This
is reasonable if both α and β are constrained
to be less than or equal to 0.5. The
restriction is not a hardship in practice.
Univariate Linear Regression
• Note that the σ0 factor is changed to σ0/σX.
• There is a similar adjustment for the
alternative standard deviation.
Example Problem Group
• Two hundred values of an independent
variable xi are chosen so that Σ(xi-xbar)2 is
equal to 400,000. For each setting of xi, the
random variable Yi=β0+β1xi+σZi is
observed. Here β0 and β1 are fixed but
unknown parameters, σ=400, and the Zi are
independent standard normal random
variables.
Example Problem Group
• The null hypothesis to be tested is H0: β1=0,
α=0.01, and the alternative is H1: β1<0. The
random variable B1 is the OLS estimate of
β1.
Example Question 1
• When H0 is true, what is the standard
deviation of B1, the OLS estimate of the
slope?
• Var(B1)=σ2/Σ(xi-xbar)2=4002/400,000=0.4.
• sd(B1)=0.632.
Example Question 2
• What is the probability of a Type II error in
the test specified in the common section
using B1, the OLS estimator of the slope, as
test statistic when β1=-4, α=0.01, σ=400,
and Σ(xi-xbar)2 is equal to 400,000?
Solution to Question 2
• The critical value is 0-2.326(0.632)=-1.47
• A Type II error occurs when B1>-1.47.
• Under alternative B1 is normal with
expected value -4 and standard deviation
(error) 0.632.
• Pr{B1>-1.47}=Pr{Z>(-1.47-(-4))/0.632}
=Pr{Z>4.00}=.000032
• The answer is 0.000032.
Example Question 3
• How many observations n are necessary so
that the probability of a Type II error in the
test specified in the common section when
β1=-4, α=0.01, σ=400, and Σ(xi-xbarn)2 is
equal to 2,000n?
Outline of Solution to Problem 3
•
•
•
•
•
•
For σo term, use (4002/2000)0.5=8.94.
Use same value for σ1 term.
Use |z0.01|=2.326.
Use |E1-E0|=|-4-0|=4.
Square root of sample size is 10.39.
Sample size is 109 or more.
Chapter 21: Residual Analysis
• If the assumptions in regression are
violated:
– Residuals are one way of checking model:
Ri = Yi - Fitted value at xi
Checking the Assumptions
– Check for normality (test of normality,
histogram, q-q plots)
– Check variance if it is the same for all values of
the independent variable (plot residuals against
predicted values)
– Check independence (plot residuals against
sequence variable)
– Check for linearity (plot dependent variable
against independent variable)
Residual Plots
• Plot residuals against independent variable.
– Plot should be flat indicating the same variance.
– There should be no fanning out pattern.
– Check for influential observations.
• Plot residuals against predicted variable.
– For univariate regression this is the same as the
above plot. There should be no pattern.
What to do if problem?
• Can look for transformations of either
independent or dependent variable or both.
• Using computer this is easy: compute
option from menu bar.
Influential Points
• An easier way to look for points that have a
large impact on the slope is to plot the
change in slope against an arbitrary case
sequence number.
Example
• Data set in the web page
• aim: predict final exam score from midterm
score
• dependent variable: final exam score
• independent variable: midterm score
• model, check assumptions, predict
700
final examination score
600
500
400
300
200
0
score on first exam
100
200
300
Output
• Model: Y= b0 + b1 X + e
• R2 = 0.508
• F statistics=60.91, Significance=0.0
b1=1.391117, t statistic=7.805,
Significance=0.0
b0=238.95, t statistic=8.329,
Significance=0.0
200
100
Residual
0
-100
-200
300
Predicted Value
400
500
600
14
12
10
8
6
4
2
Std. Dev = 66.68
Mean = 0.0
0
N = 61.00
-160.0 -120.0
-80.0
-40.0
0.0
40.0
80.0
120.0
-140.0 -100.0
-60.0
-20.0
20.0
60.0
100.0
Residual
Normal Q-Q Plot of Residual
3
Expected Normal Value
2
1
0
-1
-2
-3
-200
-100
Observed Value
0
100
200
Next Class
• Multiple Regression!
• Check web site for your data file