Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
HYPOTHESIS TESTS FOR THE DIFFERENCE
BETWEEN TWO MEANS: INDEPENDENT
SAMPLES
Section 11.1
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1.
2.
Perform a hypothesis test for the difference between two means
using the P-value method
Perform a hypothesis test for the difference between two means
using the critical value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Perform a hypothesis test for the difference
between two means using the P-value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Independent Samples
Scores on the National Assessment of Educational Progress (NAEP)
mathematics test range from 0 to 500. In a recent year, the sample mean
score for students using a computer was 309, with a sample standard
deviation of 29. For students not using a computer, the sample mean was
303, with a sample standard deviation of 32. Assume there were 60
students in the computer sample, and 40 students in the sample that didn’t
use a computer. We can see that the sample mean scores differ by 6
points: 309 βˆ’ 303 = 6. Now, we are interested in the difference between
the population means, which will not be exactly the same as the difference
between the sample means.
Is it plausible that the difference between the population means could be
0? How strong is the evidence that the population mean scores are
different?
This is an example of a situation in which the data consist of two
independent samples. Two samples are independent if the observations
in one sample do not influence the observations in the other.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Notation
We use the following notation:
β€’ πœ‡1 and πœ‡2 are the population means.
β€’ π‘₯1 and π‘₯2 are the sample means.
β€’ 𝑠1 and 𝑠2 are the sample standard deviations.
β€’ 𝑛1 and 𝑛2 are the sample sizes.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Null and Alternate Hypotheses
In the scores from the NAEP, the issue is whether the mean scores
from both populations of students, those using computers and those
without computers, are equal.
In other words, does πœ‡1 = πœ‡2 ? Therefore, the null hypothesis says
that the population means are equal:
𝐻0 : πœ‡1 = πœ‡2
As an alternate to the null hypothesis above, there are three
possibilities.
𝐻1 : πœ‡1 < πœ‡2
𝐻1 : πœ‡1 > πœ‡2
𝐻1 : πœ‡1 β‰  πœ‡2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Test Statistic
The test statistic is based on the difference between the two sample
means π‘₯1 βˆ’ π‘₯2 . The mean of π‘₯1 βˆ’ π‘₯2 is πœ‡1 βˆ’ πœ‡2 . We approximate the
standard deviation of π‘₯1 βˆ’ π‘₯2 with the standard error derived in the
previous chapter.
Standard error of π‘₯1 βˆ’ π‘₯2 =
𝑠12
𝑛1
+
𝑠22
𝑛2
The test statistic is
𝑑=
π‘₯1 βˆ’ π‘₯2 βˆ’ πœ‡1 βˆ’ πœ‡2
𝑠12
𝑠22
+
𝑛1 𝑛2
With degrees of freedom = smaller of 𝑛1 βˆ’ 1 and 𝑛2 βˆ’ 1
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Assumptions
The method just described requires the following assumptions:
Assumptions:
1. We have simple random samples from two populations.
2. The samples are independent of one another.
3. Each sample size is large (𝑛 > 30), or its population is
approximately normal.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Hypothesis Test for πœ‡1 βˆ’ πœ‡2 using the P-value
Method
Step 1:
Step 2:
State the null and alternate hypotheses.
If making a decision, choose a significance level 𝛼.
Step 3:
Compute the test statistic 𝑑 =
π‘₯ 1 βˆ’π‘₯ 2 βˆ’ πœ‡1 βˆ’πœ‡2
.
2
𝑠2
1 +𝑠2
𝑛1 𝑛 2
Step 4:
Compute the P-value
Step 5:
Interpret the P-value. If making a decision, reject 𝐻0 if the P-value is less
than or equal to the significance level 𝛼.
State a conclusion.
Step 6:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
The National Assessment of Educational Progress (NAEP) tested a sample of students
who had used a computer in their mathematics classes, and another sample of students
who had not used a computer. The sample mean score for students using a computer
was 309, with a sample standard deviation of 29. For students not using a computer, the
sample mean was 303, with a sample standard deviation of 32. Assume there were 60
students in the computer sample, and 40 students in the sample that hadn’t used a
computer. Can you conclude that the population mean scores differ? Use the Ξ± = 0.05
level.
Solution:
We first check the assumptions. We have two independent random samples with sizes
larger than 30. The assumptions are satisfied. We summarize the relevant information:
With Computer
Without
Computer
π‘₯ 1 = 309
π‘₯ 2 = 303
Sample stand dev.
𝑠1 = 29
𝑠2 = 32
Sample size
𝑛1 = 60
𝑛2 = 40
πœ‡1 (unknown)
πœ‡2 (unknown)
Sample mean
Population Mean
The null and alternate
hypotheses are:
𝐻0 : πœ‡1 = πœ‡2
𝐻1 : πœ‡1 β‰  πœ‡2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example – Perform a Hypothesis Test
Solution (continued):
Under the assumption that 𝐻0 is true, the test statistic is
𝑑=
π‘₯ 1 βˆ’π‘₯ 2 βˆ’ πœ‡1 βˆ’πœ‡2
𝑠1 𝑠2
+
𝑛1 𝑛2
=
309 βˆ’303 βˆ’ 0
292 322
+
60
40
= 0.953
Remember
𝐻0 : πœ‡1 = πœ‡2
𝐻1 : πœ‡1 β‰  πœ‡2
π‘₯1 = 309
π‘₯2 = 303
𝑠1 = 29
𝑠2 = 32
𝑛1 = 60
𝑛2 = 40
This is a two-tailed test, so the P-value is the
sum of the areas to the right of 0.953 and to
the left of βˆ’0.953. Using technology, we get
a P-value of 0.343.
Since P > 0.05, we do not reject 𝐻0 at the 𝛼 = 0.05 level.
There is not enough evidence to conclude that the mean scores differ between those
students who use a computer and those who do not. The mean scores may be the same.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Hypothesis Testing on the TI-84 PLUS
The 2-SampTTest command will perform a
hypothesis test for the difference between two
means when the samples are independent. This
command is accessed by pressing STAT and
highlighting the TESTS menu.
If the summary statistics are given the Stats option
should be selected for the input option.
If the raw sample data are given, the Data option
should be selected.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example (TI-84 PLUS)
The National Assessment of Educational Progress (NAEP) tested a sample of students
who had used a computer in their mathematics classes, and another sample of students
who had not used a computer. The sample mean score for students using a computer
was 309, with a sample standard deviation of 29. For students not using a computer, the
sample mean was 303, with a sample standard deviation of 32. Assume there were 60
students in the computer sample, and 40 students in the sample that hadn’t used a
computer. Can you conclude that the population mean scores differ? Use the Ξ± = 0.05
level.
Solution:
We first check the assumptions. We have two independent random samples with sizes
larger than 30. The assumptions are satisfied. We summarize the relevant information:
With Computer
Without
Computer
π‘₯ 1 = 309
π‘₯ 2 = 303
Sample stand dev.
𝑠1 = 29
𝑠2 = 32
Sample size
𝑛1 = 60
𝑛2 = 40
πœ‡1 (unknown)
πœ‡2 (unknown)
Sample mean
Population Mean
The null and alternate
hypotheses are:
𝐻0 : πœ‡1 = πœ‡2
𝐻1 : πœ‡1 β‰  πœ‡2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example (TI-84 PLUS)
The National Assessment of Educational Progress (NAEP) tested a sample of
students who had used a computer in their mathematics classes, and another
sample of students who had not used a computer. The sample mean score for
students using a computer was 309, with a sample standard deviation of 29.
For students not using a computer, the sample mean was 303, with a sample
standard deviation of 32. Assume there were 60 students in the computer
sample, and 40 students in the sample that hadn’t used a computer. Can you
conclude that the population mean scores differ? Use the Ξ± = 0.05 level.
Solution:
We press STAT and highlight the TESTS menu and
select 2-SampTTest.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example (TI-84 PLUS)
We press STAT and highlight the TESTS menu and select
2-SampTTest.
Select Stats and enter the following:
With Computer
Without Computer
π‘₯ 1 = 309
π‘₯ 2 = 303
Sample stand dev.
𝑠1 = 29
𝑠2 = 32
Sample size
𝑛1 = 60
𝑛2 = 40
Sample mean
Since we have a two-tailed test, select the β‰  𝝁𝟎 option and No for
the pooled option.
Select Calculate.
The P-value > 0.05, so we do not reject 𝐻0 at the 𝛼 = 0.05 level.
There is not enough evidence to conclude that the mean scores
differ between those students who use a computer and those who
do not. The mean scores may be the same.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Perform a hypothesis test for the difference
between two means using the critical value
method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Hypothesis Tests Using the Critical Value
Method
Step 1.
Step 2.
State the null and alternate hypotheses. The null hypothesis will have the
form 𝐻0 : πœ‡1 = πœ‡2 . The alternate hypothesis will be πœ‡1 < πœ‡2 , πœ‡1 < πœ‡2 , or πœ‡1 β‰  πœ‡2 .
Choose a significance level 𝛼, and find the critical value or values.
Step 3.
Compute the test statistic 𝑑 =
π‘₯ 1 βˆ’π‘₯ 2 βˆ’ πœ‡1 βˆ’πœ‡2
2
𝑠2
1 +𝑠2
𝑛1 𝑛 2
Step 4.
Determine whether to reject 𝐻0 , as follows:
Step 5. State a conclusion.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
Treatment of wastewater is important to reduce the concentration of undesirable pollutants.
One such substance is benzene, which is used as an industrial solvent. Two methods of
water treatment are being compared. Treatment 1 is applied to five specimens of
wastewater, and treatment 2 is applied to seven specimens. The benzene concentrations,
in units of milligrams per liter, for each specimen are as follows:
Treatment 1: 7.8 7.6 5.6 6.8 6.4
Treatment 2: 4.1 6.5 3.7 7.7 7.3 4.7 5.9
How strong is the evidence that the mean concentration is less for treatment 2 than for
treatment 1? We will test at the Ξ± = 0.05 significance level.
Solution:
We first check the assumptions. Because
the samples are small, we must check for
strong skewness and outliers. We construct
dotplots for each sample. There are no
outliers, and no evidence of strong skewness,
in either sample.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
The null and alternate hypotheses are:
𝐻0 : πœ‡1 = πœ‡2
𝐻1 : πœ‡1 > πœ‡2
We will find the critical value in Table A.3. The sample sizes are 𝑛1 = 5 and 𝑛2 =
7. For the number of degrees of freedom, we use the smaller of 5 βˆ’ 1 = 4 and 7 βˆ’
1 = 6, which is 4. Because the alternate hypothesis, πœ‡1 βˆ’ πœ‡2 > 0, is right-tailed, the
critical value is the value with area 0.05 to its right.
We consult Table A.3 with 4 degrees of freedom and find that 𝑑𝛼 = 2.132.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
To compute the test statistic, we first compute the sample means and standard
deviations. These are
π‘₯1 = 6.84
π‘₯2 = 5.70
𝑠1 = 0.8989
𝑠2 = 1.5706
The sample sizes are 𝑛1 = 5 and 𝑛2 = 7. Under the assumption that 𝐻0 is true,
πœ‡1 βˆ’ πœ‡2 = 0, the value of the test statistic is
𝑑=
π‘₯ 1 βˆ’π‘₯ 2 βˆ’ πœ‡1 βˆ’πœ‡2
2
𝑠2
1 +𝑠2
𝑛1 𝑛2
=
6.84 βˆ’5.70 βˆ’ 0
0.89892 1.57062
+ 7
5
= 1.590
This is a right-tailed test, so we reject 𝐻0 if 𝑑 β‰₯ 𝑑𝛼 . Because 𝑑 = 1.590 and 𝑑𝛼 =
2.132, we do not reject 𝐻0 .
There is not enough evidence to conclude that the mean benzene concentration
with treatment 1 is greater than that with treatment 2. The concentrations may be
the same.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Hypothesis Tests Using Pooled Standard Deviation
When the two population standard deviations, 𝜎1 and 𝜎2 , are known to be equal, there is
an alternate method for testing hypotheses about πœ‡1 βˆ’ πœ‡2 . This alternate method was
widely used in the past, and is still an option in many computer packages. We will
describe the method here, because it is still sometimes used. However, the method is
rarely appropriate, for the same reasons that the pooled method for constructing
confidence intervals is rarely appropriate.
Step 1:
Compute the pooled standard deviation, 𝑠𝑝 , as follows:
𝑠𝑝 =
Step 2:
𝑛1 βˆ’ 1 𝑠12 + 𝑛2 βˆ’ 1 𝑠22
𝑛1 + 𝑛2 βˆ’ 2
Compute the test statistic:
𝑑=
π‘₯1 βˆ’ π‘₯2 βˆ’ πœ‡1 βˆ’ πœ‡2
𝑠𝑝
Step 3:
1
1
+
𝑛1 𝑛2
Compute the degrees of freedom:
Degrees of freedom = 𝑛1 + 𝑛2 βˆ’ 2
Step 4:
Compute the P-value using a Student’s 𝑑 distribution with 𝑛1 + 𝑛2 βˆ’ 2 degrees
of freedom.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Hypothesis Tests When 𝜎1and 𝜎2 are Known
When the two population standard deviations, 𝜎1 and 𝜎2 , are known, we
can modify the test statistic presented here by replacing 𝑠1 and 𝑠2 with
𝜎1 and 𝜎2 , and using the standard normal distribution rather than the
Student’s 𝑑 distribution to find the P-value or critical values. In practice,
𝜎1 and 𝜎2 are rarely known, so this method is not often applicable.
𝑧=
π‘₯1 βˆ’ π‘₯2 βˆ’ πœ‡1 βˆ’ πœ‡2
Οƒ22
Οƒ12
+
𝑛1
𝑛2
The assumptions for this method are the same as for the method using
the Student’s 𝑑 distribution, with the additional assumption that the
population standard deviations are known.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
You Should Know…
β€’
β€’
How to perform a hypothesis test for the difference between two
means using the P-value method
How to perform a hypothesis test for the difference between two
means using the critical value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.