Download Chapter 3 - mistergallagher

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 13
Asking and Answering
Questions About the
Difference Between Two
Population Means
Created by Kathy Fritz
Let’s review notation:
Population
Mean
Population 1 :
Population 2:
Population Standard
Deviation
m1
m2
s1
s2
Sample Size
Sample Mean
Sample Standard
Deviation
Sample from
Population 1
n1
π‘₯1
𝑠1
Sample from
Population 2
n2
π‘₯2
𝑠2
Testing Hypotheses About the
Difference Between To Population
Means Using Independent Samples
Properties of the Sampling
Distribution of π‘₯1 βˆ’ π‘₯2
Two-Sample t Test for the
Difference in Population Means
Properties of the Sampling
Distribution of π‘₯1 βˆ’ π‘₯2
This rule specifies the standard
If π‘₯1 βˆ’ π‘₯2 is the difference in sample
error means
of π‘₯1 βˆ’for
π‘₯2 . independently
The value of the
selected random samples, then the
following
hold: how much
standard
errorrules
describes
the π‘₯1 βˆ’ π‘₯2 values tend to vary from
Rule 1.
πœ‡π‘₯1 βˆ’π‘₯2 = πœ‡1 βˆ’ πœ‡2the actual difference in population
means.
Rule 2.
𝜎12
𝜎22
sizes
can be distribution
considered
This ruleThe
that the
sampling
𝜎π‘₯1 βˆ’π‘₯2 =
+sayssample
2
large
if n at
β‰₯ 30
n β‰₯ value
30. of
ofπ‘₯ 𝑛
βˆ’1π‘₯ is𝑛centered
theand
actual
1
2
1
2
the difference in population means.
Rule 3. If both n1 and
n2 means
are large
or if the population
This
that differences
in sample means
tend
to cluster around
the value
of the
distributions are
approximately
normal,
then
the
actual difference
means.
sampling distribution
of π‘₯1 βˆ’ π‘₯2inispopulation
approximately
normal.
Two-Sample t Test for a
Difference in Population Means
Appropriate when the following conditions are met:
1.
The samples are independently selected
2. Each sample is a random sample from the population of
interest or the samples are selected in such a way that
results in samples that are representative of the
populations.
3. Both sample sizes are large (n1 β‰₯ 30 and n2 β‰₯ 30) or the
population distributions are approximately normal.
Two-Sample t Test for a
Difference in Population Means
When these conditions are met, the following test
statistic can be used:
𝑑=
π‘₯1 βˆ’π‘₯2 βˆ’(πœ‡1 βˆ’πœ‡2 )
2
𝑠2
1 + 𝑠2
𝑛1 𝑛2
Where m1 – m2 is the
hypothesized value of the
difference in population means
from the null hypothesis (often
this will be 0).
When the conditions are met and the null hypothesis is
true, the t test statistic has a t distribution with
df =
𝑉1 + 𝑉2
2
𝑠12
𝑠22
where 𝑉1 =
and 𝑉2 =
𝑛1
𝑛2
𝑉12
𝑉22
+
𝑛1 βˆ’ 1 𝑛2 βˆ’ 1 The computed value of df should be
truncated to obtain an integer value.
Two-Sample t Test for a
Difference in Population Means
Form of the null hypothesis:
H0: m1 – m2 = hypothesized value
When the Alternative
Hypothesis Is . . .
The P-value Is . . .
Ha: m1 – m2 > hypothesized value
Area under the t curve to the right of
the calculated value of the test statistic
Ha: m1 – m2 < hypothesized value
Area under the t curve to the left of the
calculated value of the test statistic
Ha: m1 – m2 β‰  hypothesized value
2·(area to the right of t) if t is positive
Or
2·(area to the left of t) if t is negative
Another Way to Write Hypothesis
Statements:
When the hypothesized value is 0, we can rewrite
these hypothesis statements:
H0: m1 =
- m22 = 0
Ha: m1 -< m2 < 0
Ha: m1 >- m2 > 0
Ha: m1 -β‰  mm22 β‰  0
Researchers have studied the ways in which college
students who use Facebook differ from college students
who do not use Facebook.
As part of the study, each person in a sample of 141 college
students who use Facebook was asked to report his or her
grade point average (GPA). College GPA was also reported
by each person in a sample of 68 students who do not use
Facebook.
Did the data from this study provide convincing evidence
that the mean college GPA for Facebook users was lower
than mean college GPA who do not use Facebook?
The two samples were independently selected from a large,
public Midwestern university. Although the samples were
not selected at random, they were
selected to be representative of the
two populations.
Facebook and Grades Continued . . .
Data from these samples were used to compute these
summary statistics.
Sample Size
Sample Mean
Sample Standard
Deviation
Students who use
Facebook
n1 = 141
π‘₯1 = 3.06
s1 = 0.95
Students who do
not use Facebook
n2 = 68
π‘₯2 = 3.82
s2 = 0.41
Population
Step 1 (Hypotheses):
Population characteristics of interest:
m1 = mean GPA for students who use Facebook
m2 = mean GPA for students who do not use Facebook
Null hypothesis:
H0: m1 – m2 = 0
Alternative hypothesis:
Ha: m1 – m2 < 0
Facebook and Grades Continued . . .
Step 2 (Method):
Because the answers to the four key questions are 1)
hypothesis testing, 2) sample data, 3) one numerical
variable, and 4) two independently selected samples,
consider a two-sample t test for a difference in
population means.
Significance level: a = 0.05
Step 3 (Check):
β€’ The
sample
sizea is
met because
thebased
sample sizes
Yoularge
should
choose
significance
level
areon
both
large: n1 = 141
n2 = 68 > 30
a consideration
of >30
the and
consequences
of
Type Iyou
andknow
Typethat
II errors.
β€’ From the study,
the
In this
situation,
becauseselected
neither type of
samples
were
independently
error
muchthat
more
serious
than the other, a
β€’ You
alsoisknow
the
samples
value
for a of
is a reasonable choice.
were
selected
to 0.05
be representative
of the two populations of interest.
Facebook and Grades Continued . . .
Step 4 (Calculate):
Test statistic:
π‘₯1 βˆ’ π‘₯2 βˆ’ (πœ‡1 βˆ’ πœ‡2 ) 3.06 βˆ’ 3.82 βˆ’ 0
𝑑=
=
= βˆ’8.08
2
2
0.95
0.41
2
2
𝑠1 𝑠2
+
141
68
+
𝑛1 𝑛2
Degrees of freedom:
𝑉1 =
𝑠12
𝑛1
= 0.0064
df =
𝑉2 =
𝑉1 +𝑉2 2
2
𝑉
𝑉2
1 + 2
𝑛1 βˆ’1 𝑛2 βˆ’1
(0.0064+0.0025)2
= (0.0064)2
140
(0.0025)2
+ 68
= 205.181
Truncate df to 205.
𝑠22
𝑛2
= 0.0025
Facebook and Grades Continued . . .
Step 4 (Calculate) Continued:
Associated P-value:
P-value = area under t curve to the left of -8.08
= P(t < -8.08) β‰ˆ 0
Step 5 (Communicate Results):
Decision: 0 < 0.05, Reject H0
Conclusion: Based on the sample data, there is convincing
evidence that the mean college GPA for students at the
university who use Facebook is lower
than the mean college GPA for
students at the university who do not
use Facebook.
More on Degrees of Freedom
The degree of freedom formula for the two-sample t test
involves quite a bit of arithmetic.
An alternative approach is to compute a conservative
estimate of the P-value – one that is close to but larger
than the actual P-value.
A conservative estimate of the P-value for the two-sample
t test can be found by using the t curve with the degrees
of freedom equal to the smaller of (n1 – 1) and (n2 – 1).
If the null hypothesis is rejected using this conservative
estimate, then it will also be rejected if the actual P-value
is used.
The Pooled t Test
If it is known that the variances of the two populations
are equal 𝜎12 = 𝜎22 , an alternative procedure known as
the pooled t test can be used.
This test procedure combines information from both
samples to obtain a β€œpooled” estimate of the common
variance, and then used this pooled estimate of the
variance in place of 𝑠12 and 𝑠22 in the test statistic.
This test procedure was widely used in the past,
but it has fallen into some disfavor because it is
quite sensitive to departures from the assumption
of equal population variances.
Testing Hypotheses About the
Difference Between Two Population
Means Using Paired Samples
Ultrasound is often used in the treatment of soft tissue
injuries. In a study to investigate the effect of ultrasound
therapy on knee extension, range of motion was measured
for people in a representative sample of physical therapy
patients both before and after ultrasound therapy.
Is there evidence that the ultrasound therapy increases
range of motion?
Let m1 denote the mean range of motion for the population
of all physical therapy patients prior to ultrasound and m2
denote the mean range of motion after ultrasound.
H0: m1 – m2 = 0 versus Ha: m1 – m2 < 0
If the mean range of motion after
ultrasound is greater than the mean range of
motion before ultrasound, then the
difference will be negative.
Range of Motion Problem Continued . . .
H0: m1 – m2 = 0 versus Ha: m1 – m2 < 0
Suppose we look at a sample of seven patients. The range
of motion before ultrasound therapy and after ultrasound
are plotted below.
This would lead to a
decision not to reject
the null hypothesis.
Why is this NOT a
If you
were
(incorrectly)
useand
theafter
two-sample
t test
for
Both
the to
before
ultrasound
ultrasound
range
correct
decision?
independent
with thevary
given
data,
the resulting
of motionsamples
measurements
from
patient
to patient.
testIt
statistic
would be that may obscure any difference.
is this variability
t = -0.61
Range of Motion Problem Continued . . .
H0: m1 – m2 = 0 versus Ha: m1 – m2 < 0
Now let’s look at the sample as paired data.
However,
plot inthat
which
pairs
are identified
(above)
Thisthe
suggests
the
methods
for independent
does
suggest
a difference.
that for
of the
samples
are
not adequateNotice
for dealing
withsix
paired
data.
seven pairs the after ultrasound observation is greater
than
the before
ultrasound
observation.
When
observations
are paired
in some meaningful way,
inferences are based on the differences between the
two observations within a pair
A Look at Hypotheses
To compare two population or treatment means when the
samples are paired, first translate the hypotheses of
interest about the value of m1 – m2 into equivalent
hypotheses involving md, the mean of the difference
population.
Hypothesis
Equivalent Hypothesis When Samples
Are Paired
H0: m1 – m2 = hypothesized value
H0: md = hypothesized value
Ha: m1 – m2 > hypothesized value
Ha: md > hypothesized value
Ha: m1 – m2 < hypothesized value
Ha: md < hypothesized value
Ha: m1 – m2 β‰  hypothesized value
Ha: md β‰  hypothesized value
The Paired t Test
Appropriate when the following conditions are met:
1.
The samples are paired.
2. The n sample differences can be viewed as a random
sample from a population of differences (or it is
reasonable to regard the sample of differences as
representative of the population of differences).
3. The number of sample differences is large (n β‰₯ 30) or
the population distribution of differences is
approximately normal.
Summary of the Paired t test for Comparing
Two Population Means Continued
When these conditions are met, the following test
statistic can be used:
π‘₯𝑑 βˆ’ πœ‡0
𝑑= 𝑠
𝑑
𝑛
Where m0 is the hypothesized value of the population mean
difference from the null hypothesis, n is the number of
sample differences, and π‘₯𝑑 and 𝑠𝑑 are the mean and
standard deviation of the sample differences.
Summary of the Paired t test for Comparing
Two Population Means Continued
Form of the null hypothesis:
H0: md = m0
When the conditions are met and the null hypothesis is true, this t test
statistic has a t distribution with df = n – 1.
When the Alternative
Hypothesis Is . . .
The P-value Is . . .
Ha: md > m0
Area under the t curve to the right of
the calculated value of the test statistic
Ha: md < m0
Area under the t curve to the left of
the calculated value of the test statistic
Ha: md β‰  m0
2·(area to the right of t) if t is positive
Or
2·(area to the left of t) if t is negative
Is this an example of paired samples?
An engineering association wants to see if
there is a difference in the mean annual
salary for electrical engineers and chemical
engineers. A random sample of electrical
engineers is surveyed about their annual
income. Another random sample of chemical
engineers is surveyed about their annual
income.
No, there is no pairing of
individuals, you have two
independent samples
Is this an example of paired samples?
A pharmaceutical company wants to test its
new weight-loss drug. Before giving the drug
to volunteers, company researchers weigh
each person. After a month of using the
drug, each person’s weight is measured again.
Yes, you have two observations on
each individual, resulting in paired
data.
In a study to investigate the effect of ultrasound therapy
on knee extension, range of motion was measured for
people
in a representative
sample
of physical
Because
the samples are
paired,
the firsttherapy
thing to do
patients both
and after
ultrasound
therapy.
is before
to compute
the sample
differences.
Range of Motion
Patient
1
2
3
4
5
6
7
Before Ultrasound
31
53
45
57
50
43
32
After Ultrasound
32
59
46
64
49
45
40
Differences
-1
-6
-1
-7
1
-2
-8
Is there evidence that the ultrasound therapy increases
range of motion?
The mean and standard deviation computed
from these sample differences are
π‘₯𝑑 = -3.43 and sd = 3.51
Range of Motion Continued . . .
Step 1 (Hypotheses):
The population characteristics of interest are
m1 = mean range of motion for physical therapy patients before ultrasound
m2 = mean range of motion for physical therapy patients after ultrasound
Because the samples are paired, you should also define md :
md = m1 – m2 = mean difference in range of motion (before – after)
Translating the question of interest into hypotheses gives:
H0: md = 0 versus H0: md < 0
Step 2 (Method):
Because the answers to the four key questions are 1) hypothesis
testing, 2) sample data, 3) one numerical variable, and 4) two paired
samples, consider the paired t test as a potential method. When the
null hypothesis is true, the test statistic will have a t distribution with
df = 7 – 1 = 6. Significance level: a = 0.05
Range of Motion Continued . . .
Step 3 (Check):
β€’ The sample was representative of physical therapy
patients.
β€’ Because the sample size is small, the distribution of
range of motion differences should be approximately
normal. The following boxplot of the seven differences
is not too asymmetric and there are no outliers, so it is
reasonable to think that the population differences
could be approximately normal.
Range of Motion Continued . . .
Step 4 (Calculate):
Because all conditions are met, it is appropriate to use paired-samples
t test.
Test Statistic:
𝑑=
βˆ’3.43βˆ’0
3.51
7
= βˆ’2.59
P-value:
P-value = area to the left of -2.6 = area to the right of 2.6 = 0.02
Step 5 (Communicate Results):
Because the P-value (0.02) is less than a (0.05), you reject
H0. There is convincing evidence that mean knee range of
motion for physical therapy patients before ultrasound is
less than the mean range of motion after ultrasound.
Estimating the Difference
Between Two Population Means
The Two-Sample t Confidence Interval for the
Difference Between Two Population Means
Appropriate when the following conditions are
met:
1.
The samples are independently selected
2. The samples are a random samples from the
populations of interest or the samples are selected in
such a way that results in samples that are
representative of the populations.
3. Both sample sizes are large (n1 β‰₯ 30 and n2 β‰₯ 30) or the
population distributions are approximately normal.
The Two-Sample t Confidence Interval for the
Difference Between Two Population Means
When these conditions are met, a confidence interval for a
difference in population means is
𝑠12 𝑠22
π‘₯1 βˆ’ π‘₯2 ± (𝑑 π‘π‘Ÿπ‘–π‘‘π‘–π‘π‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’)
+
𝑛1 𝑛2
The t critical value is based on
df =
𝑉1 + 𝑉2
2
𝑉12
𝑉22
+
𝑛1 βˆ’ 1 𝑛2 βˆ’ 1
𝑠12
𝑠22
where 𝑉1 =
and 𝑉2 =
𝑛1
𝑛2
The computed value of df should be truncated to obtain
an integer value. The desired confidence level
determines which t critical value is used.
The Two-Sample t Confidence Interval for the
Difference Between Two Population Means
Interpretation of Confidence Interval
You can be confident that the actual value of the
difference in population means is included in the computed
interval. This statement should be worded in context.
Interpretation of Confidence Level
The confidence level specifies the long-run proportion of
the time that this method is successful in capturing the
actual difference in population means.
In 2010, the Nielson Company released a report that stated
β€œWomen Talk and Text More than Men Do” (State of the
Media 2010: U.S. Audiences & Devices, The Nielsen Company).
This statement was based on data collected by examining
random samples of telephone bills selected from two
populations – female cell phone users and male cell phone users.
The report indicated that the mean number of text
messages sent per month by women was 716, while the
mean number of text messages sent per month by men was
555. The report also indicated the sample sizes were
large, although it did not give the actual sample sizes.
For purposes of this example, suppose that the
summary statistics are as shown in the table.
Sample
Sample Sizes
Sample Mean
Sample Standard
Deviation
Women
n1 = 1200
π‘₯1 = 716
s1 = 90
Men
n2 = 1000
π‘₯2 = 555
s2 = 75
Step 1 (Estimate):
The population characteristic of interest are:
m1 = mean number of text messages sent by female cell phone users
m2 = mean number of text messages sent by male cell phone users
m1 – m2 = difference in mean number of text messages sent
Step 2 (Method):
Because the answers to the four key questions are 1) estimation, 2)
sample data, 3) one numerical variable, and 4) two independently
selected samples, consider constructing a two-sample t confidence
interval. For this example, a confidence level of 90% was selected.
Step 3 (Check):
β€’ Both samples are large.
β€’ The samples were randomly selected from the two
populations of interest.
β€’ The samples are independent.
Step 4 (Calculate):
Degrees of freedom:
𝑉1 =
𝑠12
𝑛1
= 6.750
𝑉2 =
𝑠22
𝑛2
= 5.625
(0.0064 + 0.0025)2
df =
= 2187.729 β‰ˆ 2187
2
2
(0.0064)
(0.0025)
+
140
68
From Table 3 or technology:
t critical value = 1.645
𝑠12 𝑠22
π‘₯1 βˆ’ π‘₯2 ± (𝑑 critical value)
+
𝑛1 𝑛2
902
752
716 βˆ’ 555 ± (1.645)
+
1200 1000
(155.213, 166.787)
Step 5 (Communicate Results):
Confidence Interval:
You can be 90% confident that the actual difference in
mean number of text messages sent is between 155.213
and 166.787.
Both endpoints of this interval are positive, so you
estimate that the mean number of text messages sent by
female cell phone users is greater than the mean number
of text messages sent by male cell phone users by
somewhere between 155.213 and 166.787.
Confidence Level:
The method used to construct this interval estimate
is successful in capturing the actual difference in
population means about 90% of the time.
The Paired-Samples t Confidence Interval
for a Difference in Population Means
Appropriate when the following conditions are
met:
1. The samples are paired.
2. The n sample differences can be viewed as a random
sample from a population of differences (or it is
reasonable to regard the sample of differences as
representative of the population of differences).
3. The number of sample differences is large (n β‰₯ 30) or
the population distribution of differences is
approximately normal.
The Paired-Samples t Confidence Interval
for a Difference in Population Means
When these conditions are met, a confidence interval for
the difference in population means is
π‘₯𝑑 ± 𝑑 π‘π‘Ÿπ‘–π‘‘π‘–π‘π‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’
𝑠𝑑
𝑛
Where n is the number of sample differences, and π‘₯𝑑 and 𝑠𝑑 are the
mean and standard deviation of the sample differences. The t critical
value is based on df = n – 1.
Interpretation of Confidence Interval
You can be confident that the actual value of the difference in
population means is included in the computed interval. This statement
should be worded in context.
Interpretation of Confidence Level
The confidence level specifies the long-run proportion of the time
that this method is successful in capturing the actual difference in
population means.
Benefits of Ultrasound Revisited . . .
Range of Motion
Patient
1
2
3
4
5
6
7
Before Ultrasound
31
53
45
57
50
43
32
After Ultrasound
32
59
46
64
49
45
40
Differences
-1
-6
-1
-7
1
-2
-8
The mean and standard deviation computed from these
sample differences are π‘₯𝑑 = -3.43 and sd = 3.51
Step 1 (Estimate):
You want to estimate
md = m1 – m2 = mean difference in knee random of motion
where
m1 = mean knee range of motion before ultrasound
and
m2 = mean knee range of motion after ultrasound
Range of Motion Continued . . .
Step 2 (Method):
Because the answers to the four key questions are 1) estimation, 2)
sample data, 3) one numerical variable, and 4) two paired samples,
consider the paired t confidence interval as a potential method.
Step 3 (Check):
β€’ The sample was representative of physical therapy patients.
β€’ Because the sample size is small, the distribution of range of
motion differences should be approximately normal. The
following boxplot of the seven differences is not too
asymmetric and there are no outliers, so it is reasonable to
think that the population differences could be approximately
normal.
Range of Motion Continued . . .
Step 4 (Calculate):
The t critical value for df = 6 and 95% confidence level is 2.45.
The method used to construct this interval
𝑠𝑑
3.51
is successful
the
actual
π‘₯𝑑 ± 𝑑 π‘π‘Ÿπ‘–π‘‘π‘–π‘π‘Žπ‘™
π‘£π‘Žπ‘™π‘’π‘’ in capturing
= βˆ’3.43
± 2.45
𝑛
7
difference in population
means about 95%
= (βˆ’6.68,
of theβˆ’0.18)
time.
Step 5 (Communicate Results):
Based on these samples, you can be 95% confident that the
actual difference in mean range of motion is somewhere
between -6.670 degrees and -0.187 degrees. Because both
endpoints are negative, you would estimate that knee range
of motion after ultrasound is greater than the mean range
of motion before ultrasound by somewhere between 0.189
and 6.679 degrees.
Avoid These Common
Mistakes
Avoid These Common Mistakes
1. Remember that the results of a hypothesis
test can never show strong support for the null
hypothesis. In two-sample situations, this
means that you shouldn’t be convinced that
there is not difference between two population
means based on the outcome of a hypothesis
test.
Avoid These Common Mistakes
2. If you have complete information (a census)
for both populations, there is no need to carry
out a hypothesis test or to construct a
confidence interval – in fact, it would be
inappropriate to do so.
Avoid These Common Mistakes
3. Don’t confuse statistical significance with
practical significance. In the two-sample
setting, it is possible to be convinced that two
population means are not equal even in
situations where the actual difference
between them is small enough that it is of no
practical use.
After rejecting a null hypothesis of no difference,
it is useful to look at a confidence interval
estimate of the difference to get a sense of
practical significance.
Avoid These Common Mistakes
4. Correctly interpreting confidence intervals in
the two-sample case is more difficult than in
the one-sample case, so take particular care
when providing two-sample confidence
interval interpretations.
Because the two-sample confidence interval
estimates a difference (m1 – m2), the most
important thing to note is whether or not
the interval includes 0.