Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Logic of Prediction and Evaluation Thursday, February 16, 2012 5:12 PM One goal of science: determine whether current ways of thinking about the world are adequate for predicting and understanding events Until this point, we've been focused on determining the relationships between variables by fitting an appropriate model Today, we'll focus on looking at ways to determine whether our observations of the world (or, more accurately, our empirical model thereof) match the predictions generated by theory • Not all models have to predict all things… • …but a model probably has to predict something to be useful for anything. 5 - Hypothesis Testing in the Linear Model Page 1 Comparing Predictions to Events Thursday, February 16, 2012 5:21 PM At first, comparing predictions to observational data seems straightforward... • Theory generates comparative static predictions in the form: • We estimate a linear model on data and determine: • Maybe we can just compare the empirical derivative to the comparative static prediction and look for a match! Upon reflection, this is a bit harder than you might think. Suppose we find: Would we reject the hypothesis that = ? If not, two reasons: 5 - Hypothesis Testing in the Linear Model Page 2 5 - Hypothesis Testing in the Linear Model Page 3 Classical Hypothesis Testing Thursday, February 16, 2012 6:02 PM Classical hypothesis testing focuses on the second reason, a result's statistical significance. The majority of work in statistics (and our topic today) is about establishing statistical significance and for our purposes it's only important to recognize that this is different from substantive significance. The reasons why we need a test of statistical significance... 5 - Hypothesis Testing in the Linear Model Page 4 Why is randomness of our estimate of problematic? 5 - Hypothesis Testing in the Linear Model Page 5 Distribution of Thursday, February 16, 2012 6:05 PM Screen clipping taken: 2/16/2012 6:45 PM The average value of is greater than zero, but there is some proportion of the time that we will observe a ≈ 0 or even a negative value of … could give misleading results! 5 - Hypothesis Testing in the Linear Model Page 6 Potential Problems Thursday, February 16, 2012 6:47 PM Presume that we have a theoretical prediction that > 0. There are four possibilities: State of the world Estimated Effect Outcome 5 - Hypothesis Testing in the Linear Model Page 7 Ranking of Problems Thursday, February 16, 2012 6:49 PM Statistical tests are generally designed to minimize false positives at the expense of false negatives… why? When might another set of tradeoffs apply? We can show the tradeoff between power (ability to avoid false negatives) and size (ability to avoid false positives) in an explicit analysis, and will do so later in the lecture. 5 - Hypothesis Testing in the Linear Model Page 8 The Logic of Hypothesis Testing Thursday, February 16, 2012 7:53 PM • Null: ≤ 0, Alternative: > 0 • Only accept the alternative if ( ≥ | = 0) ≤ , where is some critical value that gives the chance of a Type I error / false positive Screen clipping taken: 2/16/2012 7:53 PM • Conventionally, = 0.05 • The probability of a false negative is never assessed--and it can be different from case to case even when is constant! ○ We'll return to that shortly. 5 - Hypothesis Testing in the Linear Model Page 9 The distribution of under the null Thursday, February 16, 2012 8:01 PM One little problem: how did we draw ( |) in the previous graph? Answer: we don't necessarily know the distribution (at least, I haven't told you anything that about it yet), but we can write down a statistic for which we do know The z-statistic is a standardization of the distance between the observed and null prediction of We can prove that we know the distribution of the z-statistic, for any sample size. 5 - Hypothesis Testing in the Linear Model Page 10 5 - Hypothesis Testing in the Linear Model Page 11 () 5 - Hypothesis Testing in the Linear Model Page 12 CLNRM Thursday, February 16, 2012 8:13 PM When we assume that ~Φ(0, ), we now have the Classical Normal Linear Regression Model • Important note: up to this point, we have referred to the CLRM -- the CLRM = the CLNRM without the normal error assumption • The z-statistic and all tests that flow from similar ideas depend on CLNRM assumptions (at least, in small samples); hence hypothesis tests can be misleading if (all) of these assumptions do not hold 5 - Hypothesis Testing in the Linear Model Page 13 The t-statistic: a product of the CLNRM Thursday, February 16, 2012 8:32 PM Problem with the z statistic: it assumes a known and we don't actually know Recall the formula we derived for estimating It turns out that any quantity , where !~Φ(0,1) and #~$ (%), and where x and y are statistically independent, takes the &(%) distribution So we have: 5 - Hypothesis Testing in the Linear Model Page 14 • The red quantity is Φ(0,1), as we saw for the z formula • The blue quantity is $ distributed $ because any quantity of the form ' ( ' (where v is % × 1) is $ (%) if '~Φ(0,1) Screen clipping taken: 2/16/2012 8:38 PM As → ∞, the t distribution approaches Φ(0,1) 5 - Hypothesis Testing in the Linear Model Page 15 Hypothesis testing: The bottom line Thursday, February 16, 2012 8:58 PM What this all boils down to is: to conduct a test for statistical significance for a particular , coefficient, if one is willing to accept the CLNRM assumptions in a small sample, one can calculate: &= -, − / 0 1'2-, 0 • We can then use the t distribution to calculate Pr-&-, 05267|/ 0 • This means we can effectively compare &-, 0 to a critical value &8 such that (&8 |) = • If = 0.05 then &8 = 1.645 • If = 0.025 then &8 = 1.96 which puts a total of 0.05 probability in both tails (0.025 in each tail) • If &-, 0 > &8 , then , is statistically significant • Remember: '2-, 0 comes out of the ith element of the VCV matrix, (= ( =)> = ( ?>@ • As → ∞, t-testing is equivalent to z-testing, so Stata and R always report t statistics • R and Stata do most of this for you ○ R: summary command ○ Stata: regress command prints the relevant statistics • Examples! 5 - Hypothesis Testing in the Linear Model Page 16 Multiple coefficient tests Thursday, February 16, 2012 9:06 PM What if multiple values (or all the coefficients simultaneously) are tested for statistical significance? For a model # = = + = + , where = and = are × C and × C blocks of variables: • Null hypothesis: = 0 • Alternative hypothesis: at least one element of ≠ 0 The F-test compares the residuals from two regressions: The F-statistic is: ...this depends on the CLNRM assumptions because we need u and v to be normally distributed in order for their sum of squares to be distributed $ . 5 - Hypothesis Testing in the Linear Model Page 17 Screen clipping taken: 2/16/2012 9:12 PM Examples! 5 - Hypothesis Testing in the Linear Model Page 18