Download Notes from Lecture 12

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Sufficient statistic wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Homework #3 is due
Friday by 5pm.
Homework #4 will be posted
to the class website later this
week. It will be due Friday,
th
March 7 , at 5pm.
Political Science 15
Lecture 12:
Hypothesis Testing
Sampling Distributions
for Coin Flips

8 coin flips

16 coin flips

32 coin flips

64 coin flips
Sampling Distributions

These distributions are known as sampling
distributions. A sampling distribution is the
distribution of a sample statistic under repeated
sampling.

The Central Limit Theorem: The sample
statistics from random samples of a population
will be normally distributed around the
population parameter with variance σ2/n.
The Normal Distribution


About 68% of the time our sample statistic will be within 1 standard deviation
of the true population parameter.
About 95% of the time our sample statistic will be within 2 standard
deviations of the true population parameter.
Sampling Distributions and
Hypothesis Testing

We have seen that nearly all sampling
distributions (the distribution of sample statistics
we would estimate under repeated sampling) are
normally distributed.

How can we take advantage of this fact to test
our hypotheses?
Hypotheses and Parameters



Our hypotheses are really statements about
population parameters.
Example: “The mean level of education in the
US is 14 years.” We are saying the true mean is
equal to 14.
Example: “The relationship between IMF loans
and political instability is positive.” We are
saying a regression slope or correlation is
positive.
Testing Hypotheses



Suppose we have a sample statistic, and we
know the sample size (n), and we have some
estimate of the variance in the population (σ).
Our hypothesis provides a guess at the
population parameter we care about.
Using the normal distribution, we can then
calculate the probability that we would have
obtained the sample statistic we have if the
hypothesis was correct.
Null and Alternative Hypotheses




We first set up a null hypothesis. This is the number we
will actually test.
By null we mean there is no difference between our
hypothesized value and the true population parameter.
In the mean education example, H0 = 14.
For more general hypotheses we set the null hypothesis
to be 0. For the IMF/instability example, H0 = 0.
The alternative hypothesis is simply that the null
hypothesis is incorrect. We designate this HA. For
instance HA  14 or HA  0.
Setting up a Hypothesis Test
Example #1




Begin with your research hypothesis. Example: “The
mean level of education in the US is 14 years.” We are
saying the mean level of education in the population is
14.
Determine the null and alternative hypotheses for your
test. In this example, the null is that the mean = 14,
and the alternative is that it is not.
Estimate your sample statistic. In this example, you
would calculate a mean.
Based on the sample statistic, should we reject the null
hypothesis? What does this mean for your research
hypothesis?
Setting up a Hypothesis Test
Example #2




Begin with your research hypothesis. Example: “The
relationship between IMF loans and political instability
is positive.” We are saying there is a positive
relationship in the population.
Determine the null and alternative hypotheses for your
test. In this example, the null is that the regression
slope = 0, and the alternative is that it is not (and is
positive).
Estimate your sample statistic. In this example, you use
a regression slope coefficient.
Based on the sample statistic, should we reject the null
hypothesis? What does this mean for your research
hypothesis?
Hypothesis Testing

If our null hypothesis is correct, there will be a
normally distributed sampling distribution
around that value.
Hypothesis Testing

We calculate our sample statistic. We probably
won’t estimate H0 exactly even if our null
hypothesis is correct.
Hypothesis Testing

Some sample statistics are more likely than others if H0
is correct. We need to decide if the difference between
our sample statistic and H0 is due to sampling variation,
or due to H0 being a bad guess at the actual population
parameter (H0 being wrong).
Hypothesis Testing

We pick critical values for our hypothesis test. Beyond
the critical values, we conclude our null hypothesis is
likely to be wrong and should be rejected. Within the
critical values we fail to reject the null.
Errors in Hypothesis Testing


H0 True
H0 False
Accept H0
Correct
Decision
Type II
Error
Reject H0
Type I
Error
Correct
Decision
A Type I error is when we reject a null hypothesis that
is true.
A Type II error is when we fail to reject a null
hypothesis that is false.
Hypothesis Testing

The standard approach in the social sciences is to pick
critical values that cut off the last 5% of the distribution
(the red area is 5% of the distribution). This seems to
be a good compromise between the risk of Type I and
Type II errors.
Significance Level



The amount of probability we cut off in the tail of the
distribution around our null hypothesis is the significance
level. This is the probability we reject our null
hypothesis if it is in fact true.
It is standard in the social sciences to set the
significance level to 5%. That is, we usually cut off the
last 5% in the tails of the distribution as too unlikely to
think the null hypothesis is correct. This makes the
probability of a Type I error 5%.
Two-tailed tests cut off some probability in each tail.
Some hypothesis tests are one-tailed, and only cut off
probability in one tail. Most tests are two-tailed.
Calculating a Test Statistic



How do we know if our sample statistic falls
inside or outside the critical values for our
hypothesis test?
We must calculate a test statistic. In this case, the
number of standard deviations our sample
statistic is from the null hypothesis.
If we know the standard deviation of the
sampling distribution, we can calculate a z score:
Hypothesis Test with a Normal
Example #1
We hypothesize the mean level of education in the US
is 14 years. H0 = 14. HA 14.
 We calculate the mean level of education in our sample.
That mean comes out to 14.7.
 Say we know  = 30. N = 400
 Our test statistic is a z-score.
z = (14.7 – 14)/(30/400) = 0.47.
 With a level of significance = 5%, our critical values are
 1.96.
 Our test statistic falls within this range. Thus, we fail to
reject the null hypothesis.

Hypothesis Test with a Normal
Example #2
We hypothesize the mean level of education in the US
is 14 years. H0 = 14. HA 14.
 We calculate the mean level of education in our sample.
That mean comes out to 16.
 Say we know  = 20. N = 400
 Our test statistic is a z-score.
z = (16 – 14)/(20/400) = 2.
 With a level of significance = 5%, our critical values are
 1.96.
 Our test statistic falls outside this range. Thus, we
reject the null hypothesis.
