Download Comparing Two Population Parameters

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 13
Comparing Two Population
Parameters
AP Statistics
Hamilton and Mann
Lipitor or Pravachol
• Which drug is more effective at lowering “bad
cholesterol?”
• To figure this out, researchers designed a study they
called PROVE-IT.
• They used 4000 people with heart disease as
subjects. These people were randomly assigned to
one of two treatment groups: Lipitor or Pravachol.
• At the end of the study, researchers compared the
mean “bad cholesterol levels” for the two groups.
For Pravachol it was 95 mg/dl versus 62 mg/dl for
Lipitor. Is this difference statistically significant?
• This is a question about comparing two means.
Lipitor or Pravachol
• The researchers also compared the proportion of
subjects in each group who died, had a heart attack,
or suffered other serious consequences within two
years.
• For Pravachol, the proportion was 0.263 and for
Lipitor it was 0.224. Is this a statistically significant
difference?
• This is a question about comparing two
proportions.
Success vs. Failure in Business
• How do small businesses that fail differ from small
businesses that succeed?
• Business school researchers compared the asset
liability ratios of two samples of firms started in
2000, one sample of failed businesses and one of
firms that are still going after two years.
• This observational study compares two random
samples, one from each of two different
populations.
Two-Sample Problems
• Comparing two populations or two treatments is
one of the most common situations encountered in
statistical practice. We call such situations twosample problems.
Two-Sample Problems
• A two-sample problem can arise from a randomized
comparative experiment that randomly divides
subjects into two groups and exposes each group to
a different treatment, like the PROVE-IT Study.
• Comparing random samples separately selected
from two populations, like the successful and failed
small businesses, is also a two-sample problem.
• Unlike the matched pairs designs studied earlier,
there is no matching of units in the two samples
and two samples can be of different sizes.
• Inference procedures for two-sample data differ
from those of matched pairs.
Comparing Means and Proportions
• Who is more likely to binge drink: male or female
college students?
• This is obviously a two-sample problem because we
are comparing the population of male college
students to female college students.
• To conduct this study, the Harvard School of Public
Health surveyed random samples of male and
female undergraduates at four-year colleges and
universities about their drinking behaviors.
• This observational study was designed to compare
the proportion of undergraduate males who binge
drink with the proportion of undergraduate females
who binge drink.
Comparing Means and Proportions
• A bank wants to know which of two incentive plans
will most increase the use of its credit cards.
• We are comparing the effect of two different
treatments here, so it is a two-sample problem.
• It offers each incentive to a random sample of credit
card customers and compares the amount charged
during the following six months.
• This is a randomized experiment designed to
compare the mean amount spent under each of the
two incentive “treatments.”
CHAPTER 13 SECTION 1
Comparing Two Means
HW: 13.1, 13.2, 13.4, 13.6, 13.8, 13.10, 13.11,
13.14, 13.16
Comparing Two Means
• We can examine two-sample data graphically by
comparing dotplots or stempots (for small samples)
and boxplots or histograms (for large samples).
• Now we will apply the ideas of formal inference in
this setting.
• When both population distributions are symmetric,
and especially when they are approximately
Normal, a comparison of the mean responses in the
two populations is the most common goal of
inference.
Notation
Parameters
Statistics
Population
Variable
Mean
Standard
Deviation
Sample Size
Mean
Standard
Deviation
1
x1
μ1
1
n1
x1
s1
2
x2
μ2
2
n2
x2
s2
• There are four unknown parameters, the two
means and the two standard deviations.
• We want to compare the two population means,
either by giving a confidence interval for their
difference µ1 - µ2 or by testing the hypothesis of no
difference, H0:µ1= µ2.
• We use the sample means and standard deviations
to estimate the unknown parameters.
Calcium and Blood Pressure
• Does increasing the amount of calcium in our diet
reduce blood pressure?
• An examination of a large number of people
revealed a relationship between calcium intake and
blood pressure. The relationship was strongest for
black men. As a result, researchers designed a
randomized comparative experiment.
• The subjects were 21 healthy black men. A
randomly chosen group of 10 of the men received
calcium supplements for 12 weeks. The other 11
men received a placebo pill that looked similar for
the 12 weeks.
Calcium and Blood Pressure
• The response variable is the decrease in systolic
blood pressure for a subject after 12 weeks. An
increase appears as a negative response.
• Group 1 will be the calcium group and Group 2 will
be the placebo group. Here are the data.
Group 1 – Calcium Group
7
-4
18
17
-3
-5
1
10
11
-2
-3
3
-5
5
2
-11
-1
Group 2 – Placebo Group
-1
12
-1
• Here are the summary statistics.
Group
Treatment
n
s
1
Calcium
10
5.000
8.743
2
Placebo
11
-0.273
5.901
-3
Calcium and Blood Pressure
• Notice that the calcium group experienced a drop in
blood pressure,
while the placebo group
shows a small increase,
Is this good
evidence that calcium decreases blood pressure in
the entire population of healthy black men more
than a placebo does?
• This example fits the two-sample setting because
we have a separate sample from each treatment
and we have not attempted to match them.
• Since we are testing a claim, we will conduct a
significance test and follow the Inference Toolbox.
Calcium and Blood Pressure
• Step 1: Hypotheses – We write the hypotheses in
terms of the mean decreases we would see in the
entire population μ1 of black men taking calcium for
12 weeks and μ2 for black men taking the placebo
for 12 weeks. There are two possible hypotheses:
or
Calcium and Blood Pressure
• Step 2 – Conditions – We do not know the name of the
test, but we know the conditions we must check to
compare two means.
– SRS – The 21 subjects are not an SRS. Therefore, we may not
be able to generalize our findings to all healthy black men.
Since we randomly assigned treatments, however, any
differences can be attributed to the treatments themselves.
– Normality – Since we have small samples, we must look at a
boxplot and histogram for both samples. There are no
serious problems (outliers or serious departure from
Normality).
– Independence – Since we randomized the treatments, we
can safely assume that the calcium and placebo are two
independent samples.
Calcium and Blood Pressure
• The natural estimator of the difference µ1 - µ2 is the
difference between the sample means:
• This statistic measures the average advantage of
calcium over the placebo. In order to use this,
however, we need to know about its sampling
distribution. In other words, we need to know what
the mean and standard deviation would be for the
population of differences if we took repeated
samples many times.
The Two-Sample z Statistic
• Here are the facts about the sampling distribution of the
difference
between the two sample means of
independent SRSs.
• Therefore,
• If both populations are Normal, then the distribution of
is also Normal with
Two-Sample z Statistic
• When the statistic
has a Normal distribution,
we can standardize it to obtain a standard Normal z
statistic.
Two-Sample z Statistic
• In the very unlikely case that we know both
population standard deviations, the two-sample z
statistic is what we would use to conduct inference
about
• Since we rarely know one, much less two,
population standard deviations, we are going to
move immediately to the more useful t procedures.
Two-Sample t Procedures
• Because we don’t know the population standard
deviations, we estimate them with the standard
deviations from our two samples.
• The result is the standard error, or estimated
standard deviation, of the difference in sample
means:
• We then standardize our estimate
result if the two-sample t statistic:
the
Two-Sample t Procedures
• The statistic t has the same interpretation as any z
or t statistic: it says how far
is from its mean
in standard deviation units.
• The two-sample t statistic has approximately a t
distribution. It does not have exactly a t
distribution even if the populations are both exactly
Normal. The approximation is very close though.
• There is a catch: we must use a messy formula to
calculate the degrees of freedom. Often, the
degrees of freedom are not whole numbers.
Two-Sample t Procedures
• There are two practical options for using the twosample t procedures:
1. With technology, use the statistic t with accurate
critical values from the approximating t distribution.
2. Without technology, use the statistic t with critical
values from the t distribution with degrees of freedom
equal to the smaller of n1 – 1 and n2 – 1. These
procedures are always conservative for any two
Normal populations.
• Technology will obviously use method 1.
• We are going to start by looking at how to do
method 2.
Two-Sample t Procedures
• These two-sample t procedures always err on the
safe side, reporting higher P-values and lower
confidence than may actually be true. The gap
between what is reported and the truth is actually
quite small unless the sample sizes are both small
and unequal.
• As the sample sizes increase, probability values
based on t with degrees of freedom equal to the
smaller of n1 – 1 and n2 – 1 become more accurate.
• Lets complete our calcium and blood pressure
problem from earlier.
Calcium and Blood Pressure
• Here are the summary statistics again.
Group
Treatment
n
s
1
Calcium
10
5.000
8.743
2
Placebo
11
-0.273
5.901
• Step 3 – Calculations
• Since it was a one-sided test, we are looking for the
probability being 1.604 or greater when we have 9
degrees of freedom. From the table, it is between
0.05 and 0.10.
Calcium and Blood Pressure
• Step 4 – Interpretation
– The experiment provides some evidence that calcium
reduces blood pressure, but the evidence falls short of
the traditional 5% and 1% levels of significance. We
would fail to reject H0 at both significance levels.
Creating a Confidence Interval
• We can estimate the difference in mean decreases
in blood pressure for the hypothetical calcium and
placebo populations using a two-sample t interval.
• We have already checked all of the conditions.
Group
Treatment
n
S
• Recall
1
Calcium
10
5.000
8.743
2
Placebo
11
-0.273
5.901
• Since the 90% confidence interval includes 0, we
cannot reject H0:μ1 – μ2 = 0 against the two-sided
alternative at the α = 0.10 level of significance.
Sample Size Matters
• Sample sizes strongly influence the P-value of a test.
• A result that fails to be significant at a specified
level α in a small sample may be significant in a
larger sample.
• For instance, the difference of 5.273 in the mean
systolic blood pressures between our two groups
was not significant. In a larger study with more
subjects, they were able to obtain a P-value of
0.008.
Robustness Again
• The two-sample t procedures are more robust than
the one-sample t procedures, particularly when the
distributions are not symmetric.
• When the sizes of the two samples are equal and
the two populations being compared have
distributions with similar shapes, probability values
from the t table are quite accurate for a broad range
of distributions for samples as small as 5. When the
populations have different shapes, larger samples
are needed.
Robustness Again
• As a guide to practice, adapt the guidelines on p.
655 for the use of one-sample t procedures to twosample t procedures by replacing “sample size” with
the “sum of the sample sizes” as long as both
samples are at least 5.
• These guidelines err on the side of safety, especially
when the two-samples are of equal size.
• Whenever possible, try to make both samples the
same size. Two-sample procedures are most robust
against non-Normality when the sample sizes are
equal and the conservative P-values are most
accurate.
Software Approximations for the DF
• The t procedures remain exactly as before except
that we use the t distribution with df given by the
formula in the box above to give critical values and
find P-values.
Calcium and Blood Pressure
• Here are the summary statistics again.
Group
Treatment
n
s
1
Calcium
10
5.000
8.743
2
Placebo
11
-0.273
5.901
• For improved accuracy, lets calculate the df given by
the formula on the prior slide.
• Notice that the P-value here is 0.064 compared to
the 0.0716 we got from the conservative approach.
Degrees of Freedom
• The formula from the box will always give us df at
least as large as the smaller of the two samples and
never bigger than n1 + n2 -2.
• The number of degrees of freedom is generally not
a whole number. Since the table only has whole
numbers, we will need to use technology to do
these calculations easily.
• Let’s do the Calcium and Blood Pressure problem on
the calculator!
• We should use the calculator to do these
calculations from now on!
DDT Poisoning
• Poisoning by the pesticide DDT causes convulsions
in humans and other mammals. Researchers seek
to understand how the convulsions are caused. In a
randomized comparative experiment, the compared
6 white rats poisoned with DDT with a control group
of 6 unpoisoned rats. Electrical measurements of
nerve activity are the main clue to the nature of
DDT poisoning. When a nerve is stimulated, its
electrical response shows a sharp spike followed by
a much smaller second spike. The experiment
found that the second spike is larger in rats fed DDT
than in normal rats.
DDT Poisoning
• The researchers measured the height (or amplitude)
of the second spike as a percent of the first spike
when a nerve in the rats leg was stimulated.
• For the poisoned rats the results were:
12.207
16.869
25.050
22.429
8.456
20.589
• For the control group the results were:
11.074
9.686
12.064
9.351
8.182
6.642
• Let’s conduct a significance test at the 0.05
significance level to determine if there is a
difference using the calculator.
DDT Poisoning
• Step 1 – Hypotheses
– We want to compare the mean height μ1 of the secondspike electrical response in rats fed DDT with the mean
height μ2 of the second-spike electrical response in the
population of normal rats.
Or
DDT Poisoning
• Step 2 – Conditions – Since both population
standard deviations are unknown we need to
conduct a 2-sample t test.
– SRS – By randomly assigning the rats to the treatments,
we can conclude that differences are a result of the
treatment. The researchers are willing to assume that
the two samples of rats represent an SRS.
– Normality – We don’t know if the populations are
Normal and do not have a large enough sample. We
must look at a boxplot and histogram. No outliers or
heavy skewness.
– Independence – Due to the random assignment, the
researchers can treat the two groups as independent.
DDT Poisoning
• Step 3 – Calculations
– Since it is a two-sided hypothesis, we must find the
probability that we are less than -2.99 or greater than 2.99.
– The degrees of freedom are df = 5.9 and the P-value from
t(5.9) distribution is 0.0246.
• Step 4 – Conclusion
– Since 0.0246 is less than the significance level of 0.05, we
reject the null hypothesis and conclude that there is
sufficient evidence to conclude that the height of the
second-spike electrical response in rats fed DDT differs from
that of normal rats.
Pooled Two-Sample t Procedures
• Do not use them.
• If a printout says pooled, do not use that. Instead
use the one that says unpooled.
• On the calculator, always do No for pooled.
• If you want more information you can read it on p.
800.
CHAPTER 13 SECTION 2
Comparing Two Proportions
HW: 13.26, 13.27, 13.28, 13.29, 13.30, 13.32, 13.33,
13.38
Prayer and In Vitro Pregnancy
• Some women want to have children but cannot for
medical reasons. One option for these women is in
vitro fertilization. About 28% of women who
undergo in vitro fertilization get pregnant. Can
praying for these women help increase the
pregnancy rate?
• Researchers developed an experiment to help
answer this question. (Why not just survey women
who have already gone through in vitro to find out
if a higher percentage of women who were prayed
for got pregnant?)
Prayer and In Vitro Pregnancy
• A large group of women who were about to
undergo in vitro fertilization served as the subjects.
Each subject was randomly assigned to the
treatment group (prayed for by people who did not
know them) or a control group (no prayer).
• The results: 44 of the 88 women (50%) got pregnant
in the treatment (prayer) group while only 21 out of
81 got pregnant in the control group.
• This seems like a large difference, but is it
statistically significant?
Two-Sample Proportions
• We will use notation that is similar to what we used
for two-sample means. We still want to compare two
groups, Population 1 and Population 2.
• Here is the notation:
Population
Population
Proportion
Sample
Size
Sample
Proportion
1
p1
n1
p̂1
2
p2
n2
p̂2
• We compare the populations by doing inference
about the difference p1 - p2 between the population
proportions.
• The statistic that estimates this difference is
Does Preschool Help?
• To study the long-term effects of preschool
programs for poor children, the High/Scope
Educational Research Foundation has followed two
groups of Michigan children since early childhood.
– Group 1: Control Group – 61 children from population 1,
poor children with no preschool
– Group 2: Treatment Group – 62 children from population
2, poor children with preschool as 3- and 4-year-olds.
• Both groups were from the same area and had
similar backgrounds.
• So our sample sizes are n1 = 61 and n2 = 62.
Does Preschool Help?
• One response variable of interest is the need for
social services as adults. In the past ten years, 49 of
the control sample and 38 of the preschool sample
had needed social services. So the sample
proportions are:
• To see if the study provides significant evidence that
preschool reduces the later need for social services,
we are going to create a 95% confidence interval.
Does Preschool Help?
• To estimate how large the reduction is, we give a
confidence interval for the difference.
• Both the test and the confidence interval start with
the difference in the sample proportions:
• This means we need to know the sampling
distribution of
• So let’s look at that now!
Sampling Distribution of
.
• Both
are random variables because their
values would vary if we took repeated samples of
the same size.
• In Chapter 7, we learned that if X and Y are any two
random variables then
• In Chapter 9, we learned that
Sampling Distribution of
.
• Using all of this information, we can find the mean
and standard deviation of
• If the two sample proportions are independent,
• Thus
Sampling Distribution of
.
• As far as the shape, the distribution will be
approximately normal when both of the
distributions are approximately Normal.
• In other words,
• Actually, we are safe performing significance tests
about
as long as all of these values are
greater than 5.
• The distribution of
is on the next graph.
Sampling Distribution of
.
Sampling Distribution of
.
• The standard deviation of
involves the
unknown parameters p1 and p2.
• Just like in Chapter 12, we must replace these by
estimates in order to do inference.
• Just like in Chapter 12, we do this a bit differently
for confidence intervals and significance tests.
Confidence Intervals for
.
• To obtain a confidence interval, replace p1 and p2 in
the expression for
with the sample
proportions.
• The result is the standard error of the statistic
• The confidence interval again has the form
Does Preschool Help?
• Here is a summary of the information from the
preschool problem we discussed earlier.
Population
Population
Description
Sample Size
1
Control
n1 = 61
2
Preschool
n2 = 62
Sample
Proportion
49
 0.803
61
38
pˆ 2 
 0.613
62
pˆ 1 
• We setup our hypotheses earlier. So we have
already done Step 1. Here are the Hypotheses as a
reminder.
or
Does Preschool Help
• Step 2 – Conditions – We are going to construct a
two-proportion z interval.
– SRS – We were not told how the children were selected,
so we must be cautious when drawing conclusions.
– Normality - Since all are at least 5 we can assume
Normality.
– Independence – We are fairly certain that there are at
least 610 poor children who did not attend preschool
and 620 poor children who did attend preschool in our
populations of interest.
Does Preschool Help
• Step 3 – Calculations
• Step 4 – Interpretation
– We are 95% confident that the percent needing social
services is between 3.3% and 34.7% lower among those
who attended preschool. The interval is wide because of
the small sample sizes. Also, our results may be
questionable due to the fact that the samples may not
have been SRSs.
Significance Tests for
.
• Observed differences in sample proportions may
reflect a difference in the populations, or it may just
be due to variation due to random sampling.
• Significance tests help us to determine if the
difference we see is really there or just chance
variation.
• The null hypothesis will always say that there is no
difference in the two populations. Hence
• The alternative hypothesis will always say what kind
of difference we expect.
Significance Tests for
.
• To conduct a significance test, we must standardize
to get a z statistic.
• If H0 is true, all the observations in both samples
come from a single population.
• So, instead of estimating p1 and p2 separately, we
combine the two samples and use the overall
sample proportion to estimate the single
population parameter p.
Significance Tests for
.
• We call this single proportion the combined sample
proportion. It is
• Now, we use in place of both
expression for the standard error of
in the
• This yields a z statistic that has the standard Normal
distribution when H0 is true.
Cholesterol and Heart Attacks
• High levels of cholesterol in the blood are
associated with higher risk of heart attacks. Does
using a drug to lower blood cholesterol reduce
heart attacks?
• The Helsinki Heart Study looked at this question by
randomly assigning middle-aged men to one of two
treatments: 2051 men took the drug gemfibrozil to
reduce their cholesterol levels, and a control group
of 2030 men took a placebo.
• During the next 5 years, 56 men in the gemfibrozil
group and 84 men in the control group had heart
attacks.
Cholesterol and Heart Attacks
Population
Population
Description
Sample Size
1
Gemfibrozil
n1 = 2051
2
Control
n2 = 2030
Sample
Proportion
56
 0.0273
2051
84
pˆ 2 
 0.0414
2030
pˆ 1 
• Is the apparent benefit of gemfibrozil statistically
significant?
• To answer this question, we need to conduct a
significance test.
• To conduct a significance test we need So let’s
find
Cholesterol and Heart Attacks
• Step 1 – Hypotheses – We want to use this
comparative randomized experiment to draw
conclusions about p1, the proportion of middleaged men who would suffer heart attacks after
taking gemfibrozil, and p2, the proportion of middleaged men who would suffer heart attacks if they
only took a placebo. We hope to show that
gemfibrozil reduces heart attacks, so we have a
one-sided alternative.
Cholesterol and Heart Attacks
• Step 2 – Conditions - We are going to conduct a twoproportion z test.
– SRS – Since the data come from a comparative randomized
experiment, we meet this condition. This will allow us to
conclude that the treatment caused the differences we
observe. Since the men in the experiment were not
randomly selected, we may not be able to generalize our
results to the population of all middle-aged men.
– Normality – We must use
to check for Normality since we
are assuming that both proportions are the same. So
– Independence – Due to the random assignment of men, the
two groups of men can be viewed as independent samples.
Cholesterol and Heart Attacks
• Step 3 – Calculations
• We believed it would decrease heart attacks, so we
need the probability that we are less than or equal
to -2.47.
Cholesterol and Heart Attacks
• Step 4 – Interpretation – Since our P-value (0.0068)
is less than 0.01, our results are significant at the α
= 0.01 significance level. So there is strong
evidence that gemfibrozil reduced the rate of
heart attacks.
Don’t Drink the Water
• The movie A Civil Action tells the story of a legal battle
that took place in the small town of Woburn,
Massachusetts. A town well that supplied water to East
Woburn residents was contaminated by industrial
chemicals. During the period that residents drank the
water from this well, a sample of 414 births showed 16
birth defects. On the west side of Woburn, a sample of
228 babies born during the same time period revealed
3 with birth defects. The plaintiffs suing the companies
responsible for the contamination claimed that these
data show that the rate of birth defects was
significantly higher in East Woburn, where the
contaminated well water was in use. How strong is the
evidence supporting the claim? What decision should
the judge make?
Don’t Drink the Water
Population
Population
Description
Sample Size
1
East Coburn
n1 = 414
2
West Coburn
n2 = 228
• To conduct a significance test we need
find
Sample
Proportion
16
 0.0386
414
3
pˆ 2 
 0.0132
228
pˆ 1 
So let’s
• Step 1 – Hypotheses – We are interested in seeing if
there is a difference in the proportion of birth
defects between East and West Coburn.
Don’t Drink the Water
• Conditions – We are going to conduct a TwoProportion z test.
– SRS – We don’t know that they are SRSs, but we will
treat them as SRSs.
– Normality – We must check our rules.
Since each is larger than 5, it is approximately Normal.
– Independence – We must assume that both populations
are at least 10 times as large as the sample of babies.
Don’t Drink the Water
• Step 3 - Calculations
– The P-value would be the probability that we would be
1.82 or greater.
• Step 4 – Interpretation
– Since the P-value (0.0344) is smaller than the usual level
of significance of 0.05, we reject the null hypothesis and
conclude that there is reason to believe that the
proportion of birth defects was higher in East Coburn.