Download 5 - Hypothesis Testing in the Linear Model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gaia hypothesis wikipedia , lookup

Regression analysis wikipedia , lookup

Computer simulation wikipedia , lookup

Inverse problem wikipedia , lookup

Psychometrics wikipedia , lookup

History of numerical weather prediction wikipedia , lookup

Least squares wikipedia , lookup

Generalized linear model wikipedia , lookup

Transcript
A Logic of Prediction and Evaluation
Thursday, February 16, 2012
5:12 PM
One goal of science: determine whether current ways of thinking about the world are
adequate for predicting and understanding events
Until this point, we've been
focused on determining the
relationships
between variables by fitting
an appropriate model
Today, we'll focus on looking at
ways to determine whether our
observations of the world (or,
more accurately, our empirical
model thereof) match the
predictions generated by theory
• Not all models have to predict all things…
• …but a model probably has to predict something to be useful for anything.
5 - Hypothesis Testing in the Linear Model Page 1
Comparing Predictions to Events
Thursday, February 16, 2012
5:21 PM
At first, comparing predictions to observational data seems straightforward...
• Theory generates comparative static predictions in the form:
• We estimate a linear model on data and determine:
• Maybe we can just compare the empirical derivative to the comparative
static prediction and look for a match!
Upon reflection, this is a bit harder than you might think. Suppose we find:
Would we reject the hypothesis that = ? If not, two reasons:
5 - Hypothesis Testing in the Linear Model Page 2
5 - Hypothesis Testing in the Linear Model Page 3
Classical Hypothesis Testing
Thursday, February 16, 2012
6:02 PM
Classical hypothesis testing focuses on the second reason, a result's statistical
significance.
The majority of work in statistics (and our topic today) is about establishing statistical
significance and for our purposes it's only important to recognize that this is different
from substantive significance.
The reasons why we need a test of statistical significance...
5 - Hypothesis Testing in the Linear Model Page 4
Why is randomness of our estimate of problematic?
5 - Hypothesis Testing in the Linear Model Page 5
Distribution of Thursday, February 16, 2012
6:05 PM
Screen clipping taken: 2/16/2012 6:45 PM
The average value of is greater than zero, but there is some proportion of the
time that we will observe a ≈ 0 or even a negative value of … could give
misleading results!
5 - Hypothesis Testing in the Linear Model Page 6
Potential Problems
Thursday, February 16, 2012
6:47 PM
Presume that we have a theoretical prediction that > 0. There are four
possibilities:
State of the world Estimated Effect
Outcome
5 - Hypothesis Testing in the Linear Model Page 7
Ranking of Problems
Thursday, February 16, 2012
6:49 PM
Statistical tests are generally designed to minimize false positives at the
expense of false negatives… why?
When might another set of tradeoffs apply?
We can show the tradeoff between power (ability to avoid false
negatives) and size (ability to avoid false positives) in an explicit
analysis, and will do so later in the lecture.
5 - Hypothesis Testing in the Linear Model Page 8
The Logic of Hypothesis Testing
Thursday, February 16, 2012
7:53 PM
• Null: ≤ 0, Alternative: > 0
• Only accept the alternative if ( ≥ | = 0) ≤ , where is some
critical value that gives the chance of a Type I error / false positive
Screen clipping taken: 2/16/2012 7:53 PM
• Conventionally, = 0.05
• The probability of a false negative is never assessed--and it can be
different from case to case even when is constant!
○ We'll return to that shortly.
5 - Hypothesis Testing in the Linear Model Page 9
The distribution of under the null
Thursday, February 16, 2012
8:01 PM
One little problem: how did we draw ( |) in the previous graph?
Answer: we don't necessarily know the distribution (at least, I haven't told you
anything that about it yet), but we can write down a statistic for which we do know
The z-statistic is a standardization of the distance between the observed and null
prediction of We can prove that we know the distribution of the z-statistic, for any sample size.
5 - Hypothesis Testing in the Linear Model Page 10
5 - Hypothesis Testing in the Linear Model Page 11
()
5 - Hypothesis Testing in the Linear Model Page 12
CLNRM
Thursday, February 16, 2012
8:13 PM
When we assume that ~Φ(0, ), we now have the Classical Normal Linear
Regression Model
• Important note: up to this point, we have referred to the CLRM -- the CLRM
= the CLNRM without the normal error assumption
• The z-statistic and all tests that flow from similar ideas depend on CLNRM
assumptions (at least, in small samples); hence hypothesis tests can be
misleading if (all) of these assumptions do not hold
5 - Hypothesis Testing in the Linear Model Page 13
The t-statistic: a product of the CLNRM
Thursday, February 16, 2012
8:32 PM
Problem with the z statistic: it assumes a known and we don't actually know Recall the formula we derived for estimating It turns out that any quantity , where !~Φ(0,1) and #~$ (%), and where x and y
are statistically independent, takes the &(%) distribution
So we have:
5 - Hypothesis Testing in the Linear Model Page 14
• The red quantity is Φ(0,1), as we saw for the z formula
• The blue quantity is $ distributed $ because any quantity of the form ' ( '
(where v is % × 1) is $ (%) if '~Φ(0,1)
Screen clipping taken: 2/16/2012 8:38 PM
As → ∞, the t distribution approaches Φ(0,1)
5 - Hypothesis Testing in the Linear Model Page 15
Hypothesis testing: The bottom line
Thursday, February 16, 2012
8:58 PM
What this all boils down to is: to conduct a test for statistical significance for a
particular , coefficient, if one is willing to accept the CLNRM assumptions in a
small sample, one can calculate:
&=
-, − / 0
1'2-, 0
• We can then use the t distribution to calculate
Pr-&-, 05267|/ 0
• This means we can effectively compare &-, 0 to a critical value &8
such that (&8 |) = • If = 0.05 then &8 = 1.645
• If = 0.025 then &8 = 1.96 which puts a total of 0.05 probability in
both tails (0.025 in each tail)
• If &-, 0 > &8 , then , is statistically significant
• Remember: '2-, 0 comes out of the ith element of the VCV matrix,
(= ( =)> =
( ?>@
• As → ∞, t-testing is equivalent to z-testing, so Stata and R always
report t statistics
• R and Stata do most of this for you
○ R: summary command
○ Stata: regress command prints the relevant statistics
• Examples!
5 - Hypothesis Testing in the Linear Model Page 16
Multiple coefficient tests
Thursday, February 16, 2012
9:06 PM
What if multiple values (or all the coefficients simultaneously) are tested for
statistical significance?
For a model # = = + = + , where = and = are × C and × C blocks
of variables:
• Null hypothesis: = 0
• Alternative hypothesis: at least one element of ≠ 0
The F-test compares the residuals from two regressions:
The F-statistic is:
...this depends on the CLNRM assumptions because we need u and v to be normally
distributed in order for their sum of squares to be distributed $ .
5 - Hypothesis Testing in the Linear Model Page 17
Screen clipping taken: 2/16/2012 9:12 PM
Examples!
5 - Hypothesis Testing in the Linear Model Page 18