Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Sufficient statistic wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, th March 7 , at 5pm. Political Science 15 Lecture 12: Hypothesis Testing Sampling Distributions for Coin Flips 8 coin flips 16 coin flips 32 coin flips 64 coin flips Sampling Distributions These distributions are known as sampling distributions. A sampling distribution is the distribution of a sample statistic under repeated sampling. The Central Limit Theorem: The sample statistics from random samples of a population will be normally distributed around the population parameter with variance σ2/n. The Normal Distribution About 68% of the time our sample statistic will be within 1 standard deviation of the true population parameter. About 95% of the time our sample statistic will be within 2 standard deviations of the true population parameter. Sampling Distributions and Hypothesis Testing We have seen that nearly all sampling distributions (the distribution of sample statistics we would estimate under repeated sampling) are normally distributed. How can we take advantage of this fact to test our hypotheses? Hypotheses and Parameters Our hypotheses are really statements about population parameters. Example: “The mean level of education in the US is 14 years.” We are saying the true mean is equal to 14. Example: “The relationship between IMF loans and political instability is positive.” We are saying a regression slope or correlation is positive. Testing Hypotheses Suppose we have a sample statistic, and we know the sample size (n), and we have some estimate of the variance in the population (σ). Our hypothesis provides a guess at the population parameter we care about. Using the normal distribution, we can then calculate the probability that we would have obtained the sample statistic we have if the hypothesis was correct. Null and Alternative Hypotheses We first set up a null hypothesis. This is the number we will actually test. By null we mean there is no difference between our hypothesized value and the true population parameter. In the mean education example, H0 = 14. For more general hypotheses we set the null hypothesis to be 0. For the IMF/instability example, H0 = 0. The alternative hypothesis is simply that the null hypothesis is incorrect. We designate this HA. For instance HA 14 or HA 0. Setting up a Hypothesis Test Example #1 Begin with your research hypothesis. Example: “The mean level of education in the US is 14 years.” We are saying the mean level of education in the population is 14. Determine the null and alternative hypotheses for your test. In this example, the null is that the mean = 14, and the alternative is that it is not. Estimate your sample statistic. In this example, you would calculate a mean. Based on the sample statistic, should we reject the null hypothesis? What does this mean for your research hypothesis? Setting up a Hypothesis Test Example #2 Begin with your research hypothesis. Example: “The relationship between IMF loans and political instability is positive.” We are saying there is a positive relationship in the population. Determine the null and alternative hypotheses for your test. In this example, the null is that the regression slope = 0, and the alternative is that it is not (and is positive). Estimate your sample statistic. In this example, you use a regression slope coefficient. Based on the sample statistic, should we reject the null hypothesis? What does this mean for your research hypothesis? Hypothesis Testing If our null hypothesis is correct, there will be a normally distributed sampling distribution around that value. Hypothesis Testing We calculate our sample statistic. We probably won’t estimate H0 exactly even if our null hypothesis is correct. Hypothesis Testing Some sample statistics are more likely than others if H0 is correct. We need to decide if the difference between our sample statistic and H0 is due to sampling variation, or due to H0 being a bad guess at the actual population parameter (H0 being wrong). Hypothesis Testing We pick critical values for our hypothesis test. Beyond the critical values, we conclude our null hypothesis is likely to be wrong and should be rejected. Within the critical values we fail to reject the null. Errors in Hypothesis Testing H0 True H0 False Accept H0 Correct Decision Type II Error Reject H0 Type I Error Correct Decision A Type I error is when we reject a null hypothesis that is true. A Type II error is when we fail to reject a null hypothesis that is false. Hypothesis Testing The standard approach in the social sciences is to pick critical values that cut off the last 5% of the distribution (the red area is 5% of the distribution). This seems to be a good compromise between the risk of Type I and Type II errors. Significance Level The amount of probability we cut off in the tail of the distribution around our null hypothesis is the significance level. This is the probability we reject our null hypothesis if it is in fact true. It is standard in the social sciences to set the significance level to 5%. That is, we usually cut off the last 5% in the tails of the distribution as too unlikely to think the null hypothesis is correct. This makes the probability of a Type I error 5%. Two-tailed tests cut off some probability in each tail. Some hypothesis tests are one-tailed, and only cut off probability in one tail. Most tests are two-tailed. Calculating a Test Statistic How do we know if our sample statistic falls inside or outside the critical values for our hypothesis test? We must calculate a test statistic. In this case, the number of standard deviations our sample statistic is from the null hypothesis. If we know the standard deviation of the sampling distribution, we can calculate a z score: Hypothesis Test with a Normal Example #1 We hypothesize the mean level of education in the US is 14 years. H0 = 14. HA 14. We calculate the mean level of education in our sample. That mean comes out to 14.7. Say we know = 30. N = 400 Our test statistic is a z-score. z = (14.7 – 14)/(30/400) = 0.47. With a level of significance = 5%, our critical values are 1.96. Our test statistic falls within this range. Thus, we fail to reject the null hypothesis. Hypothesis Test with a Normal Example #2 We hypothesize the mean level of education in the US is 14 years. H0 = 14. HA 14. We calculate the mean level of education in our sample. That mean comes out to 16. Say we know = 20. N = 400 Our test statistic is a z-score. z = (16 – 14)/(20/400) = 2. With a level of significance = 5%, our critical values are 1.96. Our test statistic falls outside this range. Thus, we reject the null hypothesis.