Download chapter11

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 11
Comparing Two Populations or
Treatments
Suppose we have a population of adult men
with a mean height of 71 inches and
standard deviation of 2.5 inches. We also have a
population of adult women with a mean height of
65 inches and standard deviation of 2.3 inches.
Assume heights are normally distributed.
Suppose we take a random sample of 30 men and
a random sample of 25 women from their
respective populations and calculate the
On
the
next
slide
we
will
difference in their heights (man’s height –
investigate this distribution.
woman’s height).
If we did this many times, what would the
distribution of differences be like?
Male Heights
Female Heights
Randomly take
one of the sample
means for the
71 sM = 2.5
65 sF = 2.3
males
and one of
Suppose we took repeated
Suppose
we took repeated
the
sample
means
samples of size n = 25
samples
from
the
of size n = 30 from the
for the females
population of female heights
population
and of male heights and
and find the
calculated the sampledifference
calculated
means. in the sample means.
We would have the sampling
Weheights.
would have the sampling
mean
distribution of xF
71
s xM 
Doing this
repeatedly, we will
create the sampling
distribution of
(xM – xF)
2.5
30
distribution of xM.
65
xM - xF
s xF 
2. 3
25
 2.5 
2
 2.3 
s x M xF  
  

 30   25 
6
2
Heights Continued . . .
•
Describe the sampling distribution of the
difference in mean heights between men and
women.
The sampling distribution is normally distributed
2.5 2 2.3 2
with
x M xF  71  65  6
•
s x M x F 
30

25
What is the probability that the difference
in mean heights of a random sample of 30
men and a random sample of 25 women is less
than 5 inches?
P ((xM  xF )  5)  .0614
6
Properties of the Sampling
Distribution of x1 – x2
If the random samples on which x1 and x2 are based
are selected independently of one another, then
1. x1 x2 
x1  x2  1  2
2
2
1
2
distribution
s
s
2
2
s
s
1
2
of x1s– x2 is always
and

x1 x 2
2
2
2
The
sampling
s

s

s

2. x1 x2
x1
x2 
Mean
value of
centered
at then1valuen2of 1 – 2, so x1 – x2 isn1an n2
x1 –unbiased
x2
statistic for estimating 1 – 2.
3. In
n1 variance
and n2 are
large or theispopulation distributions
The
of both
the differences
are (at the
leastsum
approximately)
normal, x1 and x2 each have (at
of the variances.
least approximately) normal distributions. This implies that
the sampling distribution of x1 – x2 is also (approximately)
normal.
The properties for the sampling distribution of x1
– x2 implies that x1 – x2 can be standardized to
obtain a variable with a sampling distribution that
is approximately the standard normal (z)
distribution.
When two random samples are independently selected
and n1 and n2 are both large or the population We must
s1know
andthe
ss2 1isand
distributions are (at least approximately)If
normal,
distribution of
unknown
s
in we
order
2
x1  x2  ( 1  2 )
must
use
tthis
z 
to
use
s 12 s 22
distributions.
procedure.

n1
n2
is described (at least approximately) by the standard
normal (z) distribution.
Two-Sample t Test for Comparing
Two Populations
Null Hypothesis: H0: 1 – 2 = hypothesized value
Test Statistic:
t 
x1  x2  hypothesiz ed value
2
2
s
s
1
2
A conservative
of the P estimate
The can
hypothesized
valuethe
is tn1 found
n2 by using
value
be
often
0, but
times
curve with
thethere
numberare
of degrees
The appropriate df for the two-sample t test is
ofwhen
freedom
equalinterested
to the smaller
of
we
are
in

V1 V2 2
(n1 –a1)
or (n2 – 1). 2that
2 difference
testing for
df 
s1
s2
2
2
V2 
where V1 is not
and
V1
V2
0.

n1
n2
n1  1 n2  1
The computed number of df should be truncated to an integer.
Two-Sample t Test for Comparing
Two Populations Continued . . .
Null Hypothesis: H0: 1 – 2 = hypothesized value
Alternative Hypothesis:
P-value:
Ha: 1 – 2 > hypothesized value
Area under the appropriate t
curve to the right of the
computed t
Ha: 1 – 2 < hypothesized value
Area under the appropriate t
curve to the left of the
computed t
Ha: 1 – 2
2(area to right of computed
≠ hypothesized value t) if +t or
2(area to left of computed t)
if -t
Another Way to Write
Hypothesis Statements:
H0: 1 =
- 22 = 0
Ha: 1 <- 2 < 0
Ha: 1 >- 2 > 0
Ha: 1 -≠ 22 ≠ 0
When the
hypothesized
value is 0, we
Be sure to
can rewrite
define
theseBOTH
1 and 2!
hypothesis
statements:
Two-Sample t Test for Comparing
Two Populations Continued . . .
Assumptions:
1) The two samples are independently selected
random samples from the populations of interest
2) The sample sizes are large (generally 30 or larger)
or the population distributions are (at least
approximately) normal.
When comparing two treatment groups, use the
following assumptions:
1) Individuals or objects are randomly assigned to
treatments (or vice versa)
2) The sample sizes are large (generally 30 or larger)
or the treatment response distributions are
approximately normal.
Are women still paid less than men for comparable
work? A study was carried out in which salary data
was collected from a random sample of men and from a
random sample of women who worked as purchasing
managers and who were subscribers to Purchasing
magazine. Annual salaries (in thousands of dollars) appear
below (the actual sample sizes were much larger). Use a =
.05 to determine
there is
evidence
If we hadifdefined
1convincing
as the mean
salary that the
mean annual salary for male purchasing managers is greater
female
purchasing
and 2
than thefor
mean
annual
salary formanagers
female purchasing
as the mean salary for male purchasing
managers.
managers, then the correct alternative
hypothesis would be the difference in
Men
81 69 81 76 76 74 69 76 79 65
the means is less than 0.
Women 78 60 67 61 62 73 71 58 68 48
H0: 1 – 2 = 0
Ha: 1 – 2 > 0
Where 1 = mean annual salary for male
State the hypotheses:
purchasing managers and 2 = mean annual
salary for female purchasing managers
Salary War Continued . . .
Men
81 69 81 76 76 74 69 76 79 65
Women 78 60 67 61 62 73 71 58 68 48
H0: 1 – 2 = 0
Ha: 1 – 2 > 0
Where 1 = mean annual salary for male
purchasing managers and 2 = mean annual
salary for female purchasing managers
Assumptions:
1)Given two independently selected random samples of male
and female purchasing managers.
Men
2) Since
the
sample
sizes
are
small,
we
must
Even though these
from subscribers of
Verifyare
thesamples
assumptions
determine if it is plausible that the sampling Women
Purchasingfor
magazine,
the two
authors
of the study believed
distributions
each of the
populations
80
are approximately
normal. Since
the boxplots
it was reasonable
to view
the samples as 60
are reasonably
symmetrical
withpopulations
no outliers, of
it interest.
representative
of the
is plausible that the sampling distributions are
approximately normal.
Salary War Continued . . .
Men
81
69
81
Women 78 60 67
76 76 74 69 76 79 65
61
62 73
71
58 68 48
Where 1 = mean annual salary for male
H0: 1 – 2 = 0
purchasing managers and 2 = mean annual
Ha: 1 –What
2 > 0potential
typefor
error
salary
female purchasing managers
could we have made with
this conclusion?
74.6  64.6  0  3.11
t

Test Statistic:
(round down) this
8.62
Type I 5.4 2 Truncate

value.
10
10
P-value =.004
a = .05
Now find the area to the 2.916  7.3962
Since the
P-value
weinreject
is convincing
dftH0. There
 15.14  15
right
of t <=a,
3.11
the
2
2
evidence that the mean salary
for2.male
purchasing
916the
7
.396statistic
Compute
test
 for
with
dfthe
= 15.
To find
the
P-value,
first
managers iscurve
higher
than
mean
salary
9and P-value
9female
find the appropriate df.
purchasing managers.
The Two-Sample t Confidence Interval for the
Difference Between Two Population or
Treatment Means
The general formula for a confidence interval for 1 – 2
when
1) The two samples are independently selected random samples from
the populations of interest
2)The sample sizes are large (generally 30 or larger) or the population
distributions are (at least approximately) normal.
s12 s22
isFor a comparison
x1  x2  of(ttwo
critical
value) use the
 following
treatments,
n1 n2
assumptions:
The
t critical value
is based
onrandomly assigned to
1) Individuals
or objects
are

V1 V2 2(or vice versa)
treatments
s12
s22
df 
V2 
where V1  n
and
V1 2
V22
n2
2) The sample
sizes are large (generally
30 or larger)
1

n1  1 n2  1
or the treatment response distributions are
df should be truncated to an integer.
approximately normal.
In a study on food intake after sleep deprivation,
men were randomly assigned to one of two
treatment groups. The experimental group were required
to sleep only 4 hours on each of two nights, while the
control group were required to sleep 8 hours on each of
two nights. The amount of food intake (Kcal) on the day
following the two nights of sleep was measured. Compute
a 95% confidence interval for the true difference in the
mean food intake for the two sleeping conditions.
4-hour
sleep
3585
4470
3068
5338
2221
4791
4435
3187
3901
3868
3869
4878
3632
4518
8-hour
sleep
4965
3918
1987
4993
5220
3653
3510
4100
5792
4547
3319
3336
4304
4057
3099
3338
the mean and
deviation
for
x4 = 3924 s4 Find
= 829.67
x8standard
= 4069.27
s8 = 952.90
each treatment.
Food Intake Study Continued . . .
4-hour
sleep
3585
4470
3068
5338
2221
4791
4435
3187
3901
3868
3869
4878
3632
4518
8-hour
sleep
4965
3918
1987
4993
5220
3653
3510
4100
5792
4547
3319
3336
4304
4057
x4 = 3924 s4 = 829.67
3099
3338
x8 = 4069.27 s8 = 952.90
Assumptions:
1) Men were randomly assigned to two treatment groups
Verify the assumptions.
2) The assumption of normal response 4-hour
distributions is plausible because
8-hour
both boxplots are approximately
4000
symmetrical with no outliers.
Food Intake Study Continued . . .
4-hour
sleep
3585
4470
3068
5338
2221
4791
4435
3187
3901
3868
3869
4878
3632
4518
8-hour
sleep
4965
3918
1987
4993
5220
3653
3510
4100
5792
4547
3319
3336
4304
4057
3099
3338
x4 = 3924
s4 upon
= 829.67
x8is= there
4069.27
s8 = 952.90
Based
this interval,
a
significant difference in the mean food
No,
since
0 the
is intwo
the sleeping
confidence
interval,
829.67 2conditions?
952
.902 there is not
intake
for
(3924
 4069.27
)  2.052that the mean

 ( 814for
.1, 523
.6)
convincing
evidence
food
intake
the
15 Calculate
15the interval.
two sleep conditions are different.
We are 95% confident that the true difference in the
Interpret the interval in context.
mean food intake for the two sleeping conditions is
between -814.1 Kcal and 523.6 Kcal.
Pooled t Test
• Used when the variances of the two
populations are equal (s1 = s2)
• CombinesP-values
information
from
both
computed
using
the samples
pooled t to
create a “pooled”
estimate
of from
the common
procedure
can be far
the
variance which
is P-value
used inif place
of the two
actual
the population
variances
are not equal.
sample standard
deviations
When the population variances are equal,
• Is not widely
used tdue
to its sensitivity
the pooled
procedure
is better at to any
departure
from the
equal from
variance
assumption
detecting
deviations
H0 than
the
two-sample t test.
Suppose that an investigator wants to determine
if regular aerobic exercise improves blood
pressure. A random sample of people who jog
regularly and a second random sample of people
who do not exercise regularly are selected
independently of one another.
Can we conclude that the difference in mean
blood pressure is attributed to jogging?
What about other factors like weight?
One way to avoid these difficulties
would be to pair subjects by weight
then assign one of the pair to jogging
and the other to no exercise.
Summary of the Paired t test for Comparing
Two Population or Treatment Means
Null Hypothesis: H0: d = hypothesized value
xd  hypothesiz ed value
he hypothesized
Where d tis Tthe
mean of the value is
Test Statistic:
sd
differences
usually
in 0the
– meaning
paired
n that there is
no difference.
Where n is the number observations
of sample differences
and xd and sd are the mean and
standard deviation of the sample differences. This test is based on df = n – 1.
Alternative Hypothesis:
Ha: d > hypothesized value
Ha: d < hypothesized value
Ha: d ≠ hypothesized value
P-value:
Area to the right of calculated t
Area to the left of calculated t
2(area to the right of t) if +t
or 2(area to the left of t) if -t
Summary of the Paired t test for Comparing
Two Population or Treatment Means
Continued . . .
Assumptions:
1. The samples are paired.
2. The n sample differences can be viewed as a
random sample from a population of
differences.
3. The number of sample differences is large
(generally at least 30) or the population
distribution of differences is (at least
approximately) normal.
Is this an example of paired samples?
An engineering association wants to see if
there is a difference in the mean annual
salary for electrical engineers and chemical
engineers. A random sample of electrical
engineers is surveyed about their annual
income. Another random sample of chemical
engineers is surveyed about their annual
income.
No, there is no pairing of
individuals, you have two
independent samples
Is this an example of paired samples?
A pharmaceutical company wants to test its
new weight-loss drug. Before giving the drug
to volunteers, company researchers weigh
each person. After a month of using the
drug, each person’s weight is measured again.
Yes, you have two observations on
each individual, resulting in paired
data.
Can playing chess improve your memory? In a study,
students who had not previously played chess participated
in a program in which they took chess lessons and played
chess daily for 9 months. Each student took a memory
test before starting the chess program and again at the
end of the 9-month period.
If we had subtracted Post-test
the
alternative
Student
1
2
3
4minus
5 Pre-test,
6
7 then
8
9
10
11
12
hypothesis
be the
Pre-test
510 610 640 675 600
550 610would
625 450
720mean
575 675
difference
greater
than540
0. 680
Post-test
850 790 850 775 700
775 700 is850
690 775
Difference -340 -180
-210
-100
-100
-225
-90
-225
-240
-55
35
H0: d = 0
First, find the differences
Ha: State
d < 0 the hypotheses.
pre-test minus post-test.
Where d is the mean memory score difference
between students with no chess training and
students who have completed chess training
-5
Playing Chess Continued . . .
Student
1
2
Pre-test
510
Post-test
850 790 850 775 700 775 700 850 690 775 540 680
Ha: d < 0
4
5
6
7
8
9
10
11
12
610 640 675 600 550 610 625 450 720 575 675
Difference -340 -180
H0: d = 0
3
-210
-100
-100
-225
-90
-225
-240
-55
35
-5
Where d is the mean memory score difference
between students with no chess training and students
who have completed chess training
Assumptions:
1) Although the sample of students
is not a random
sample, the
Verify
assumptions
investigator believed that it was reasonable to view
the 12 sample differences as representative of all
such differences.
2) A boxplot of the differences is approximately
symmetrical with no outliers so the
assumption of normality is plausible.
Playing Chess Continued . . .
Student
1
2
Pre-test
510
Post-test
850 790 850 775 700 775 700 850 690 775 540 680
Ha: d < 0
4
5
6
7
8
9
10
11
12
610 640 675 600 550 610 625 450 720 575 675
Difference -340 -180
H0: d = 0
3
-210
-100
-100
-225
-90
-225
-240
-55
35
-5
Where d is the mean memory score difference
between students with no chess training and students
State
conclusion in
who have completed
chess the
training
Test Statistic:
context.
 144
.6  0
Compute
the test statistic
t 
 4.56
109.74
and
P-value.
12
P-value ≈ 0
df = 11
a = .05
Since the P-value < a, we reject H0. There is
convincing evidence to suggest that the mean
memory score after chess training is higher
than the mean memory score before training.
Paired t Confidence Interval for d
When
1.
2.
3.
The samples are paired.
The n sample differences can be viewed as a random sample
from a population of differences.
The number of sample differences is large (generally at least
30) or the population distribution of differences is (at least
approximately) normal.
the paired t interval for d is
 sd 
xd  (t critical value)

 n
Where df = n - 1
Playing Chess Revisited . . .
Student
1
2
3
4
5
6
7
8
9
10
11
12
Pre-test
510
Post-test
850 790 850 775 700 775 700 850 690 775 540 680
610 640 675 600 550 610 625 450 720 575 675
Difference -340 -180
-210
-100
-100
-225
-90
-225
-240
-55
35
 109.74 
 144.6  1.796
  ( 201.5,  87.69)
 12 
Compute a 90% confidence interval for the
are 90%in
confident
that the
true
meanWe
difference
memory scores
before
difference
in memory
scores
chessmean
training
and the memory
scores
after
before chess
and the memory
chesstraining
training.
scores after chess training is between
-201.5 and -87.69.
-5
Large-Sample Inferences
Concerning the Difference
Between Two Population or
Treatment Proportions
Some people seem to think that duct tape
can fix anything . . . even remove warts!
Investigators at Madigan Army Medical Center
tested using duct tape to remove warts versus
the more traditional freezing treatment.
Suppose that the duct tape treatment will
successfully remove 50% of warts and that the
traditional freezing treatment will successfully
remove 60% of warts.
Let’s investigate the sampling
distribution of pfreeze - ptape
pfreeze = the true proportion
ptape = the true proportion of
of warts that are
warts that are
successfully removed
successfully removed
by freezing
by using duct tape
Randomly take
pfreeze = .6
one of theptape = .5
Suppose we repeatedly treated
Suppose
sample we repeatedly treated
100 warts using the duct
tapewarts
100
proportions
for using the traditional
method and calculatedthe
the
freezing
freezing treatment and
treatment
and the proportion of
proportion of warts that
are
calculated
of the
successfully removed. Weone
would
warts
that are successfully
sample
have the
.6 sampling
removed.
have the
.6(.4)distribution
proportions
for We would .5
s pˆ

.5(.5)
s

100 .
of ptape
thesampling
duct tape distribution of pˆpfreeze
100
treatment and
find the
pfreeze
- ptape
difference.
Doing this
repeatedly, we will
create the sampling
.6(.4) .5(.5)
s


pˆfreeze  pˆtape
distribution of
100
100
(pfreeze – ptape)
freeze
tape
.1
Properties of the Sampling
Distribution of p1 – p2
If two random samples are selected independently
of a
When performing
one another, the following properties
hold:the value
hypothesis
test,forwe
Since
p1
will
null
and
p2use
arethe
unknown,
1.  pˆ1  pˆ2  p1  p2
hypothesis
that pp11
we will combine
Use:
This says that the sampling distribution of p1 – p2 is centered at
and
equal.
We
andp2p2are
estimate
ˆ
ˆ
p1 – p2 so p1 – p2 is an unbiased statistic for estimating
p
–
p
.
nto
p

n
p
22 2
1 11
ˆ
p

will
not
know
theof
the common
value
c
p1 (1  p1 ) p2 (1  p2 )
n1  nfor
2

common
value
p1
2. s pˆ1  pˆ2 
p
and
p
1
2
n1
n2
and p2.
3. If both n1 and n2 are large (that is, if n1p1 > 10,
n1(1 – p1) > 10, n2p2 > 10, and n2(1 – p2) > 10), then p1 and p2
each have a sampling distribution that is approximately
normal, and their difference p1 – p2 also has a sampling
distribution that is approximately normal.
Summary of Large-Sample z Test
for p1 – p2 = 0
Null Hypothesis: H0: p1 – p2 = 0
Test Statistic:
Use:
z 
n1 pˆ1  n2 pˆ2
pˆc 
n1  n2
Alternative Hypothesis:
Ha: p1 – p2 > 0
Ha: p1 – p2 < 0
Ha: p1 – p2 ≠ 0
pˆ1  pˆ2  ( p1  p2 )
pˆc (1  pˆc ) pˆc (1  pˆc )

n1
n2
P-value:
area to the right of calculated z
area to the left of calculated z
2(area to the right of z) if +z or
2(area to the left of z) if -z
Another Way to Write
Hypothesis statements:
H00:: pp11 -=pp22= 0
H
H
p11 >- p
p22 > 0
Haa:: p
Haa:: p
H
p11 <- p
p22 < 0
H
p11 ≠- pp22≠ 0
Haa:: p
Be sure to
define both
p1 & p2!
Summary of Large-Sample z Test
for p1 – p2 = 0 Continued . . .
Assumption:
1) The samples are independently chosen
p1 and
are unknown we
must use
randomSince
samples
orp2treatments
were
p1 at
andrandom
p2 to verify
that the samples
are
assigned
to individuals
or objects
large enough.
2) Both sample sizes are large
n1p1 > 10, n1(1 – p1) > 10, n2p2 > 10, n2(1 – p2) > 10
Investigators at Madigan Army Medical Center tested
using duct tape to remove warts. Patients with warts
were randomly assigned to either the duct tape
treatment or to the more traditional freezing treatment.
Those in the duct tape group wore duct tape over the
wart for 6 days, then removed the tape, soaked the area
in water, and used an emery board to scrape the area.
This process was repeated for a maximum of 2 months or
until the wart was gone. The data follows:
n
Number with wart
successfully removed
Liquid nitrogen freezing
100
60
Duct tape
104
88
Treatment
Do these data suggest that freezing is less
successful than duct tape in removing warts?
Duct Tape Continued . . .
Treatment
n
Number with wart successfully removed
Liquid nitrogen freezing
100
60
Duct tape
104
88
H0: p1 – p2 = 0
Ha: p1 – p2 < 0
Where p1 is the true proportion of warts that
would be successfully removed by freezing and p2
is the true proportion of warts that would be
successfully removed by duct tape
Assumptions:
1) Subjects were randomly assigned to the two treatments.
2) The sample sizes are large enough because:
n1p1 = 100(.6) = 60 > 10
n1(1 – p1) = 100(.4) = 40 > 10
n2p2 = 100(.85) = 85 > 10 n2(1 – p2) = 100(.15) = 15 > 10
Duct Tape Continued . . .
Treatment
n
Number with wart successfully removed
Liquid nitrogen freezing
100
60
Duct tape
104
88
H0: p1 – p2 = 0
Ha: p1 – p2 < 0
z 
.6  .85  0
.73(.27) .73(.27)

100
104
pˆc 
 4.03
60  88
 .73
100  104
P-value ≈ 0
a = .01
Since the P-value < a, we reject H0. There is
convincing evidence to suggest the proportion
of warts successfully removed is lower for
freezing than for the duct tape treatment.
A Large-Sample Confidence
Interval for p1 – p2
When
1)The samples are independently chosen random samples or
treatments were assigned at random to individuals or
objects
2) Both sample sizes are large
n1p1 > 10, n1(1 – p1) > 10, n2p2 > 10, n2(1 – p2) > 10
a large-sample confidence interval for p1 – p2 is
pˆ  pˆ   z critical value
1
2
pˆ1 (1  pˆ1 ) pˆ2 (1  pˆ2 )

n1
n2
The article “Freedom of What?” (Associated Press,
February 1, 2005) described a study in which high school
students and high school teachers were asked whether
they agreed with the following statement: “Students
should be allowed to report controversial issues in their
student newspapers without the approval of school
authorities.” It was reported that 58% of students
surveyed and 39% of teachers surveyed agreed with the
statement. The two samples – 10,000 high school
students and 8000 high school teachers – were selected
from schools across the country.
Compute a 90% confidence interval for the
difference in proportion of students who
agreed with the statement and the proportion
of teachers who agreed with the statement.
Newspaper Problem Continued . . .
p1 = .58
p2 = .39
this confidence
there
1) Assume Based
that it isonreasonable
to regard interval,
these two does
samples
as
being independently
selected
and representative
of the
populations
appear to be
a significant
difference
in proportion
of interest.
of students who agreed with the statement and
2) Both sample
sizes are large
enough
the proportion
of teachers
who agreed with the
n1p1 = 10000(.58) > 10, n1(1 – p1) = 10000(.42) > 10,
statement? Explain.
n2p2 = 8000(.39) > 10, n2(1 – p2) = 8000(.61) > 10
.58(.42) .39(.61)
(.58  .39)  1.645

 (.178, .202)
10000
8000
We are 90% confident that the difference in
proportion of students who agreed with the
statement and the proportion of teachers who
agreed with the statement is between .178 and
.202.