Download Davidson-McKinnon book chapter 5 notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
5. Confidence Intervals
This chapter is primarily concerned with confidence intervals, but it also
introduces
two important topics related to the calculation of standard errors
and covariance matrices.
Section 5.2 introduces the basic idea of a confidence interval as a set of
parameter
values for which the null hypothesis that the parameter of interest is
equal to any value in the set would not be rejected by a test at a specified level.
It shows how to construct confidence intervals based on exact and asymptotic
t tests, and it justifies the usual interval that extends a certain number of
standard errors above and below the estimated parameter value. There is
also a brief discussion of P values for two-tailed tests based on asymmetric
distributions.
Section 5.3 deals with bootstrap confidence intervals. Although the treatment
is quite elementary, the ideas involved may be too advanced for some courses.
Section 5.4, which deals with confidence regions, may be considered a bit too
specialized, but we recommend that instructors at least discuss Figure 5.3 in
some detail. If students truly understand this figure, then they understand
the meaning of confidence intervals and confidence regions, even if they do
not necessarily know how to construct them in practice.
Section 5.5 introduces the concept of heteroskedasticity-consistent covariance
matrices, and more generally sandwich covariance matrices, which will reappear
in several places later in the book. Section 5.6 introduces the delta
method, which is a widely-used and very useful technique. It also returns
briefly to the subject of confidence intervals, showing how to construct
asymmetric
intervals via nonlinear transformations. The later parts of this section
could be omitted in less advanced courses.
P177
Find out whether restrictions imposed by economic theory are compatible with the data,
and whether various aspects of the model appear to be correct.
Inferences about the values of some of the parameters that appear in the model.
It is usually more convenient to construct confidence intervals for the individual
parameters of specific interest than do hypothesis testing.
Confidence regions are usually for two parameters jointly (word region is needed).
If we wish to test the hypothesis that a scalar parameter  in a regression model equals
0, we can use a t test. But we can also use a t test for the hypothesis that  =0 for any
specified real number 0. Thus, in this case, we have a family of t statistics indexed by
0.
By definition, a confidence interval is an interval of the real line that contains all values
for which the hypothesis that  = 0 is not rejected by the appropriate test in the family.
Confidence level is 1
Confidence intervals are themselves random
The probability that the random interval includes, or covers, the true value of the
parameter is called the coverage probability, or just the coverage, of the interval.
Page 178
If the tests in the family Reject their corresponding null hypotheses with probability
exactly equal to  when the hypothesis is true. Then the coverage interval constructed
from this family of tests must be precisely 1
When the exact distribution of test statistics used to construct interval is known, the
coverage is equal to the confidence level, and the interval is exact.
Section 5.2 Exact and Asymptotic Conf intervals
IF Only the asymptotic distribution of the test statistic is known, we obtain an
asymptotic confidence interval, which may or may not be reasonably accurate in finite
samples. Whenever a test statistic based on asymptotic theory has poor finite-sample
properties, a confidence interval based on that statistic has poor coverage.
page 179
Denote the test statistic for the hypothesis that  = 0 by random variable  Now  is
just a deterministic function of its arguments
If we write the critical values as c then, for any 0
eq (1) holds
we have by definition of c that
The trick is to invert the test statistic, i.e., find the limits of Conf Int L and U) !
For 0 to belong to the confidence interval It is necessary and sufficient that
(2) holds
Limits of the confidence interval (critical values) can be found by solving the equation
(3) for . This equation normally has two solutions. For example see (5.05) page 181.
One of these solutions is the upper limit U, of the confidence interval that we are trying
to construct.
A random function (y, ) is said to be pivotal for model M if, when it is evaluated at
true value 0 corresponding to some DPG in model M, then the result is a random
variable whose distribution does not depend on what that DPG is. that is, it DOES NOT
depend on its UNKNOWN parameters.
page 180
When we speak of critical values, we implicitly making use of the concept of a quantile
F denotes the cumulative density or CDF
The  quantile of F, for 0   1, satisfies the equation
F(q) = 
See p 181 for quantile function of N(0,1). In R software it is simply qnorm
page 180
Asymptotic Conf Intervals
What are they? Suppose the random function (5.04) where  is a test statistic
Thus  is the square of the t statistic for the null hypothesis that  =0
The asymptotic critical value c would be the 1 quantile of the Chi-sq
distribution with 1 df
(5.03)  (y,) = c becomes [(^  )/ s]2 =c Now take square root of both sides and
keep |(^  )| on the left side where we need absolute value since square root
sign is unknown
Two solutions to equation (5.05) because sq root has 2 signs + or - are in (5.06)
page 182:
the suitable quantile c for the Chi-sq is 3.8415. square root is involved, which is the
familiar 1.96!
Two-tailed test. For such a test, there are two critical values, one the negative of the
other.
Conventional to denote these normal distribution quantiles of the standard normal
distribution by
z/2 and z1-/2 respectively.
Equation (5.03), which has two solutions for a Chi-sq test, is replaced by two equations,
each with just one solution, as follows: See eq (7b) Note the 
Asymmetric Confidence Intervals page 183
Interval (5.06), which is the same as the interval (5.08), is a symmetric one, because L
is as far below the true  as U is above it.
Probability mass of /2 in each tail. This called an equal-tailed confidence interval.
It is asymmetric in the sense that we need two separate quantiles. The left side quantile
denoted by c is the /2 quantile of the distribution, and an upper (right side quantile)
one, c+ which is the (1/2) quantile, obtained coming backward from the maximum
probability 1.
Any realized statistic ^ leads to rejection at level  if either ^ is too small or too big.
This can be based on an asymmetric confidence interval.
The one-sided null Θ ≤ Θ0 is not rejected if
(y, Θ0) ≤ z1-
that is if
^0s z1-
. The interval over which satisfies this inequality is just. (5.09) or
[^s z1 , + ]. We reject the null if the statistic is in the upper tail of the distribution,
based on the upper quantile z1 .
Note that (5.09) is one-sided interval, since upper limit is infinity
P Values
Page 184
If we denote by F the CDF used to calculate critical values or P values, the P value
associated with a statistic  should be 2F( ) (where we must double since it is two sided)
if  is in the lower tail. On the other hand, if it happens to be in the upper tail the answer
after similar doubling must be 2(1-F( )). Which is it?
Answer is (5.10), simply the minimum of the two answers! (not well defined for the
median where F( )=0.5)
Exact Conf Int for Regressions
The confidence interval we are seeking is (13)
This interval may look a bit odd, because the upper limit is obtained by subtracting
something from 2^. What is subtracted is negative, and therefore it is OK
Page 185
Seem strange that the lower and upper limits of (5.13) depend, respectively, on the uppertail and lower-tail quantiles of the Student’s t with df=(n-k) or t(n-k) distribution. This
actually makes perfect sense, however as can be seen by looking at the infinite
confidence interval (5.09) based on a one-tailed test.
Since the null is  0, the confidence interval must be open out to . It is rejected when
observed quantity is beyond the upper critical value. It must be the Upper-tail quantile
that determines the only finite limit of the confidence interval, namely, the lower limit.
5.3 BOOTSTRAP INTERVALS
Pg 186
When the distribution of the test statistic does not depend on precisely which null is being
tested, the same bootstrap distribution can be used for a whole family of tests defined in
terms of a pivotal random function (y, 0) .
Make sure that  (.) is evaluated at the same value of Θ0 as the one used to generate the
lots of (=B) bootstrap samples. yj* are created for j= 1 to B and then j* estimates are
computed and we have each j* and its sj* or standard error (from the OLS algorithm)
Compute the bootstrap “t statistic” by (15) where everything is known.
see also page 195 eq 5.29
If  (.) is an exact pivot, the change of null from Θ0 to ^ makes no difference.
P 187
ASYMMETRIC bootstrap Conf Int
The bootstrap P value is, from (5.10) is (16) or twice the smaller of the two tail areas
from the empirical cdf of boot density
Look at eq in the middle of page 187!
r(Θ0) is the number of bootstrap t statistics that are less than or equal to t^(0)
r(Θ0) cannot exceed r/2 for Θ0 sufficiently large
there exists a greatest value of Θ0 for which r(Θ0 ) ≥ r . This value must be the upper
limit of the 1- bootstrap confidence interval.
Sort the t*j from smallest to largest
look at eq to the bottom of page 187
Upper limit of the confidence interval is determined by the lower tail of the bootstrap
distribution.
Pg 188-189
B=999 is convenient
The value of c*/2 is therefore the value of the 25th bootstrap t statistic when they are
sorted in ascending order.
975th entry in the sorted list gives the upper limit.
The interval (5.17) is almost never symmetric
(5.17) is called a studentized bootstrap confidence interval
studentized when it is the ratio of a random variable to its standard error
also sometimes called a percentile-t OR BOOtstrap-t confidence interval
These have good theoretical properties.
page189
Studentized bootstrap confidence intervals generally work better.
However, their coverage can be quite unsatisfactory in finite samples if the quantity 
is far from being pivotal, as can happen if the distributions of either ^ or SE=s
depend strongly on the true unknown value of Θ or .
5.4 CONFIDENCE REGIONS
To construct a confidence region, we must invert joint tests for several parameters.
These are usually tests based on statistics that follow F or χ2 distrubutions,
p. 190
The Wald statistic eq (18) Can be used to test the joint null hypothesis that  =0
in vector notation. Its asymptotic distribution is Chisq(k).
ASYMPTOTIC NORMALITY AND ROOT n CONSISTENCY
The  =n-0.5XTu , also defined earlier in (4.53) follows the normal distribution
asymptotically, with mean vector 0 and covariance matrix 20SXTX where SXTX is plim
of n-1XTX as the sample size n tends to infinity
Consider now the estimation error of the vector of OLS estimates
eq 20
^ is consistent under fairly weak conditions. Assume that it is consistent.
Its limiting covariance matrix of ^ 0 is a degenearate zero matrix. Thus it would
appear that asymptotic theory has nothing to say about limiting variances for consistent
estimators. Because zero covariance matrix means some kind of degeneracy.
To avoid degeneracy we Rewrite (5.20) as
(20b). Since terms involving SXTX are non-stochastic, the Variance of scaled error
simplifies as in (20c).
page 191
The estimation error is, asymptotically, just a deterministic linear combination of the
components of the multivariate normal random vector nu. Hence we may conclude that
(21) holds. root n times ^ 0 is N with zero mean and nonzero covariance matrix
The result (5.21) also gives us the rate of convergergence of ^ to its probability limit
or 0
Since root n times estimation error has zero mean, the estimation error ^ 0 itself tends
to zero at the same rate as n-1/2
Another way of saying this is to say that estimatior ^ is root-n consistent.
Estimator var^(^) is said to be consistent estimator of the covariance matrix of ^ if
(22) holds. Note the extra n in the numerator twice and n in the denominator in (23).
page 192
Test 2 = some fixed constant 20 not necessarily zero. All we have to do is subtract the
X2 20 from the left side before the regression and then test for zero coeff for 2
If RHS of the null hyp is NON-zero, then we have to subtract X2 times that RHS from
the y and rewrite (24) as (25). Now test for 2 =0
F statistic for this hypothesis takes the form (26)
Explain the numerator of (26). The middle mtx expression comes from FWL as follows.
(5.24) is, by the FWL Theorem, equivalent to the regression M1y etc (26b)
Thus Var 2^
(26) is explained
is equal to something with that middle term M1. Hence numerator of
(5.26) follows the F
distribution when the null hypothesis is true. Therefore, we
can use it to construct an exact confidence region.
The confidence region is, for k2=2, the interior of an ellipse.
Information about one of the parameters also provides information about the other. Only
the confidence region, based on the joint distribution, allows this to be taken into account.
Think of the relation between confidence ellipse and region. When joint-ness is present
we have to use regions not intervals even though the latter are more convenient .
Asymptotic / Bootstrap Confidence Regions
The key difference between the two is in the right side of (28) vs. (30)
the asymptotic distribution (say F) gives the right side of (28)
boot has the star on the critical value as it is found from re-ordering the bootstrap values
from the smallest to the largest for each *j.
Exercises to Ch. 5
www.econ.queensu.ca/ETM has earnings data download it, remove bottom descriptions
Save it in subdirectory called dmck on you c drive and then issue the following
command
Be sure not to have extra lines at the bottom of the file
erng <-read.table("C:/dmck/earningsdat.asc")
lme=lm(erng$V5~erng$V2+erng$V3+erng$V4)
attach(erng)
lme=lm(V5~V2+V3+V4-1)
summary(lme)
Call:
lm(formula = V5 ~ V2 + V3 + V4 - 1)
Residuals:
Min 1Q Median 3Q Max
-27404 -11466 -2766 8019 60930
Coefficients:
Estimate Std. Error t value Pr(>|t|)
V2 22880.5
467.5 48.95 <2e-16 ***
V3 25080.2
380.0 66.00 <2e-16 ***
V4 27973.6
404.8 69.11 <2e-16 ***
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 15570 on 4263 degrees of freedom
Multiple R-Squared: 0.73, Adjusted R-squared: 0.7298
F-statistic: 3842 on 3 and 4263 DF, p-value: < 2.2e-16
confint(lme) #prints 95% conf interval
confint(lme, level=0.99) #for any other percent
library(simpleboot)
bootlme=lm.boot(lme, 999)
After a long time I got following ouput
BOOTSTRAP OF LINEAR MODEL (method = rows)
Original Model Fit
-----------------Call:
lm(formula = V5 ~ V2 + V3 + V4 - 1)
Coefficients:
V2 V3 V4
22880 25080 27974
Bootstrap SD's:
V2
V3
V4
455.0526 389.5131 426.1600
vcovHC(lme, method="HC1")
V2
V3
V4
V2 194168.0
0.0
0.0
V3
0.0 139440.6
0.0
V4
0.0
0.0 184261.6