Download Significance Tests

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Sociology 601: Class 6: September 17, 2008
5.2-5.3: Confidence intervals
6.1: Elements of a significance test
6.2: Large-sample significance test for a mean
6.3: Large-sample significance test for a proportion
6.4: Types of error.
1
Where we stand so far
• We can use statistics from a sample to estimate parameters for
a population
– specifically, we can make a point estimate of a parameter
and an interval estimate / confidence interval of a parameter.
• As we have seen, however, a common-sense interpretation of
an interval estimate is elusive.
• Today, we will define an approach to produce interpretations
reasonable nonstatisticians can understand.
• “Is (some statement) true?”
2
6.1: Informal steps in a significance test
• 1.) Decide on a sample statistic that will tell us about a
population parameter (e.g., a sample mean).
• 2.) Make a prediction about some attribute of the
population: (= a hypothesis: mean age = 25)
• 3.) measure the result for a sample
• 4.) see how much the result differs from the prediction
• 5.) if the result is pretty close to the prediction, you cannot
conclude much, but if the result is far away from the
prediction, you have weakened the hypothesis and might
reject it.
• (Part 5 is the potentially confusing part)
3
Formal steps in a significance test
1.)
2.)
3.)
4.)
5.)
List assumptions
State a hypothesis (or two)
Calculate a test statistic
Look up a p - value
State a formal conclusion
• Note: this is how Agresti and Finlay do it; there are other
appropriate treatments of significance tests
4
1.) List Assumptions about the sample
The sample statistic we choose to calculate will only have a
predictable relationship with a population parameter when
certain assumptions about the sample and the variable are
true. We should state these assumptions explicitly.
Examples include…
– type of data
– the form of the population distribution (e.g. normal)
– method of sampling
– sample size
5
2.) Hypotheses about the parameter
• A significance test considers two hypotheses about the
value of the parameter.
– The null hypothesis, designated Ho, is usually a
statement that the parameter has a value corresponding
to, in some sense, no effect.
– The alternative hypothesis, designated Ha when it is
explicitly stated, is usually a statement that the parameter
has some other value.
• Note: the hypothesis is not a theory about why the
parameter has a certain value.
6
The special backward logic we use in developing
hypotheses
• Property of reality: If A is a polar bear, then A must
be white.
• Hypothesis: A is a polar bear
• Observed result #1: A is white.
• What can you conclude about the hypothesis?
• Observed result #2: A is not white.
• What can you conclude about the hypothesis?
7
A generalized application to statistical inference
Assumption about reality: If a population parameter has a
certain value, then the corresponding sample statistic should
be in a certain range.
Hypothesis:  has a certain value
•Sample result #1: Ybar has a result consistent with the
predicted value of .
•Sample result #2: Ybar has a result inconsistent with the
predicted value of .
8
A typical approach to stating hypotheses
• Usually, we construct our significance test around a null
hypothesis:
– The mean value of some variable in the population is
zero, or
– The difference between two population means is zero, or
– The mean value for some group is the same as the
known value for the entire population.
• Then, a meaningful result tends to be one that forces us to
reject our null hypothesis.
• We often have our own hypothesis about what is actually
going on, but this will be only one of many interpretations of
any alternative hypothesis.
9
3.) Calculate a test statistic
• The test statistics we will look at today are for
population mean and proportion.
• We also calculate statistics useful for interval
estimation such as the standard deviation.
• Later in the semester we will look at chi-squared
statistics, statistics for slopes of regression lines,
and so on.
10
4.) Look up a p-value.
• The p-values for population mean and proportion
are based on standardized confidence intervals - in
other words, on the number of standard errors that
separate the observed sample statistic from the
hypothesized population parameters.
• The p-value is the probability, if Ho were true, that
the sample statistic would fall this far (i.e., that
many standard errors) from the population
parameter, or closer.
• The smaller the p-value, the more strongly the data
contradict Ho.
11
5.) State a conclusion
• In the conclusion, we judge the evidence against Ho
and usually make a formal decision to reject Ho or
not.
• We also interpret the results in terms of the original
question motivating the test. What do we know that
we did not know before? We make reference to
some or all of the following numbers.
– p-values or z-scores
– standard errors
– differences between hypothesized and observed
means
12
6.2: Step-by-step example for a significance test
• Many political commentators have remarked
that US citizens have been politically
conservative in recent decades.
• A recent General Social Survey allows us to
collect data to test this assertion.
– respondents are asked to rate their political
ideology on an ordered 7-point scale
13
Survey data on political ideology
Score Response
1
Extremely liberal
2
Liberal
3
Slightly liberal
4
Moderate
5
Slightly conservative
6
Conservative
7
Extremely conservative
Count
12
66
109
239
116
74
11
Q: level of measurement?
14
Significance test for political ideology
Assumptions: we will be doing a large-sample test
for population means. To perform this test, we
must assume that…
– Sample size is at least 30 (some researchers insist on 50
or 100). This implies that the sampling distribution for
samples of this size has a normal curve, so that interval
inferences based on the normal curve are appropriate.
– The sample is a random sample of some sort.
– The variable is a quantitative variable with interval scale.
15
Significance test for political ideology
• Hypothesis: let  denote the population mean
ideology, based on this seven-point scale.
• One null hypothesis is that the population has
moderate political ideology.
– Ho:  = 4.0
• The alternate hypothesis is then
– HA:   4.0
• (To discuss: another null hypothesis is that the
population does not have a conservative ideology.)
– Ho:  <= 4.0
16
Significance test for political ideology
• Test Statistic: For an n of 627 respondents, we calculate
the following statistics:
– Ybar
= 4.032
–s
= 1.258
– s.e.
= s / SQRT(n) = .05024
–z
= (Ybar - o ) / s.e.
–
= (4.032 - 4.000) / .05024
–
= 0.64
• The z-statistic is the statistic of interest in a large-sample
test of a population mean.
17
Significance test for political ideology
• P-value: When we look up the p-value for z = 0.64,
we get .2611. This means that if the true
population mean is really 4.0, then 26% of samples
of size 627 would have z-scores of 0.64 or greater
by chance alone.
• Two-tailed test: We have stated the hypothesis as
a two-tailed test. The p-value for a two-tailed test is
2*.2611 = .5222. This means that if the true
population mean is really 4.0, then 52% of samples
of size 627 would have z-scores this far or farther
from 0 by chance alone.
18
Why do we usually use two-tailed p-values?
• Common practice: Other researchers expect to
read two-tailed p-values, and computer outputs
give two-tailed p-values.
• Conservative results: With a two-tailed test, you
face more stringent standards to reject the null
hypothesis and report a significant finding.
• Flexibility: What if you did this study and found a
statistically significant pattern of liberal political
ideology? Would you be duty-bound to ignore it?
• (Researchers still sometimes use 1-tailed pvalues.)
19
Significance test for political ideology
• Conclusion: The p-value of .56 for a two-tailed test
indicates that it is quite possible, given a true
population mean of 4.000, to have a sample mean
as far from the mean as 4.032. Therefore, we do
not reject the hypothesis that the population mean
is 4.0.
• Furthermore, a sample mean ideology score of
4.032 indicates that our best guess of the
population’s political ideology score is essentially
neutral.
20
Never “Accept Ho”
When we “do not reject Ho” that doesn’t mean
that we accept that Ho is true.
– For example, given a sample mean of 4.03 and a null
hypothesis that μ=4.00, the true population mean could
be 4.00, but it could also be a value much higher than
4.00 with an “unlucky” (but still random) sample of low
scores within that population.
– Given no other information about the population, our best
guess of the population mean is 4.032, not 4.00. We
simply have not proven with any conviction that it isn’t
4.00.
21
6.3: Next example:
Significance test for a population proportion
• A question in the General Social Survey asked: “Do
you think it should or should not be the
government’s responsibility to reduce income
differences between the rich and the poor?”
• Imagine that we would like to find out if US adults
had some net opinion on this issue.
22
Survey data on attitudes toward
income inequality
“Do you think it should or should not be the
government’s responsibility to reduce income
differences between the rich and the poor?”
Score
1
0
Response
Number
should be
591
should not be
636
Total N = 1227
23
Survey data on attitudes toward
income inequality, Step 1
1: Assumptions: we will be doing a large-sample test
for population proportions. To perform this test, we
must assume that…
– Sample size is large enough that np(1-p) > 10
• A&F suggest using the standard: N > 30 when .3 <= p <=.7
– The sample is a random sample of some sort
– The variable is a discrete interval-scale variable, which is
automatically true for population proportions.
24
Survey data on attitudes toward
income inequality, Step 2
2: Hypothesis: let  denote the population proportion
who favor government intervention to alleviate
income inequality.
•
Our null hypothesis is that the population, on average,
neither supports nor opposes government intervention.
– Ho:  = 0.5
•
The alternate hypothesis is then
– HA:   0.5
25

Survey data on attitudes toward
income inequality, Step 3
3: Test Statistic: For an N of 1227 respondents, we calculate
the following statistics:
– 
ˆ
– σ0
= N(yes)/ N(total) = 591/1227 = .4817
– s.e.
= σ0 / SQRT(n) = .500 / SQRT(1227) = .01427
–z
= ( 
ˆ - o ) /s.e.
–
= (.4817 - .500) / .01427
–
= -1.282
= SQRT(o(1- o)) = SQRT(.500 * .500) = .5

• The z-statistic is the test statistic of interest in a largesample test of a population proportion.
26
Survey data on attitudes toward
income inequality, Step 4
4: P-value: When we look up the p-value for z =
1.28, we get .1003. This means that if the true
population proportion is really .5, then 10% of
samples of size 1227 would have z-scores of -1.26
or less by chance alone.
• We have stated the hypothesis as a two-tailed test.
The p-value for a two-tailed test is 2 * .1003 =
0.2006.
27
Survey data on attitudes toward
income inequality, Step 5
5: Conclusion: The p-value of .21 for a two-tailed
test indicates that it is possible, given a true
population proportion of .5, to have a sample mean
as far from the proportion as .482 Therefore, we
do not reject the hypothesis that the population
proportion is .5
• Furthermore, a sample mean ideology score of
.482 indicates that our best guess of the
population’s attitude toward government
intervention is essentially neutral.
28
Confidence interval or significance test?
• Significance tests are better when the chief issue is
to make a yes/no decision about whether a pattern
exists in a population.
• Confidence intervals are better when the chief
issue is to make a best guess of a population
parameter.
29
A practical concern: what should you do with categories
that are inconclusive?
“A joint USA Today/CNN/Gallup poll in July 1995
indicated that of 832 white adults, 53% thought
affirmative action had been good for the country
and 37% thought it had not been good; the
remaining 10% were undecided.”
• There is not always a universal correct answer in
such cases. Behave honestly, and make your
decisions transparent.
30
6.4 Decision rules in hypothesis tests
• In our significance tests so far, we have calculated the pvalue in step 4, then decided what to conclude about H0 in
step 5.
• Traditionally, social scientists often decided in step 1 what
p-value will constitute sufficient evidence to reject H0.
– This is called using a fixed decision rule.
• Why use fixed decision rules?
– They supposedly reduce the chance that we will succumb to
temptation on a borderline result, and choose a p-value that
will allow us to reject the null.
31
Formal definition of a decision rule
• A statistical decision rule specifies for each possible sample
outcome which outcome HO or HA should be selected.
• Before we calculate a test statistic, we pick the significance
level at which we will decide to reject the null hypothesis.
• The predetermined significance level is  (alpha), the
probability of rejecting the null hypothesis due to a chance
distribution, if the null is actually true.
– we reject HO if our observed p-value is less than .
32
Types of errors in hypothesis tests
•  (alpha) is called the probability of a type I error.
– a type I error occurs when we reject the null hypothesis
for a population where the null hypothesis is true.
– type I error is like “crying wolf” when there is no wolf
•  (beta)is called the probability of a type II error.
– type II error is the error of not rejecting the null
hypothesis, when the null is in fact false.
– type II error is like not noticing a wolf that is really there
– NOTE:  is not equal to 1- , although  is often larger
than .
33
Errors and correct conclusions
in a hypothesis test
Possible types of error depend on your sample
statistics and on the true state of reality
State of reality:
Your conclusion:
Ho true
Ho not true
do not reject Ho
correct inference,
negative result
type II error
reject Ho
type I error
correct inference,
positive result 34
Consequences of errors in hypothesis tests
• Consequences of type I (alpha) error:
– misleads other researchers
– social costs of erroneous information
– damages your reputation as a careful researcher
• Consequences of type II (beta) error:
– no publication for you (probably)
– no damage to your reputation as a careful researcher
– the truth stays hidden, with possible social consequences
• hopefully, the truth will come out later
35
Terms used in hypothesis tests with decision rules
• alpha level: the probability of a type I error, conditional on
the idea that Ho is really true.
– an alpha level is expressed as a probability
• rejection region: the collection of test-statistic values for
which the test rejects Ho.
– a rejection region is often expressed as a range of z-scores
• action limit: the value of a test statistic at which one will shift
to rejecting the null hypothesis.
36
Example of a hypothesis test with a decision rule
The General Social Survey often asks:
– “I’m going to show you a seven-point scale on which the political
views that people hold are arranged from extremely liberal (point 1)
to extremely conservative (point 7). Where would you place yourself
on this scale?”
Use a fixed decision rule to test whether the
responses in 1994 were neutral or leaned toward
any political view. Use alpha = .01.
• N = 2879
• Y-bar = 4.170
• s = 1.39
37
Hypothesis test for GSS political views
• N = 2879
Y-bar = 4.170
s = 1.39
• assumptions: random sample, interval variable, large
sample size, alpha = .01
• hypothesis: Ho is that  = 4.00
• test statistic: z = (Ybar - o)/(s.e.)
= (4.17 – 4.00) /(1.39 / SQRT(2879)) = 6.6
• p-value: p < .01 (for z of 6.6)
• Conclusion: the p-value is in the rejection region, so reject
Ho: the population does not have neutral political views.
38
Thought questions
• In the previous GSS example, what is the probability that we
have committed a type I error?
– answer: unknown!. .01 is the probability that a random sample
would produce a type I error, conditional on H0 being true for
the population. We have no idea whether this condition is
true.
• another question: if Hois really true in our case, what is
the probability that we have committed a type I error?
• yet another question: Based on our result, what is the
probability that we have committed a type II error?
39
What (prospective) factors affect the probability of error?
• The probability of a type I error depends on…
– appropriateness of assumptions
– alpha level
• The probability of a type II error depends on…
–
–
–
–
–
appropriateness of assumptions
sample size
the efficiency of the estimation procedure
the true value of the population parameter
alpha level
40