Download Chapter 9: Inferences Based on Two Samples

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 9: Inferences Based on Two Samples
Dr. Sharabati
Purdue University
April 10, 2014
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
1 / 26
Tests for Two Different Population Distributions
The z Tests
Two-independent Sample t Test
Paired-Sample t Test
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
2 / 26
Motivation
We are often interested in comparing two populations (or groups)
based on a continuous measurement.
For instance, to evaluate impact of light on the growth of plants, one
group of seedlings grows in dark conditions, and a second group gets
the standard amount of light. Compare heights of plants after a
specified time period.
Each group has different individuals who may receive different
treatments.
Responses from each sample are independent of each other.
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
3 / 26
Goal: compare the population means of the two groups.
Notation:
Population
1
2
Sample
1
2
Mean
µ1
µ2
Sample size
n1
n2
Standard Deviation
σ1
σ2
Mean
x̄1
x̄2
Standard Deviation
s1
s2
Assumptions:
Sample 1 is a random sample from a population with mean µ1 and
variance σ12 .
Sample 2 is a random sample from a population with mean µ2 and
variance σ22 .
Sample 1 and sample 2 are independent of one another.
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
4 / 26
z Test for Normal Populations with Known Variance
The z test concerns hypotheses about µ1 − µ2 , the difference between two
population means. Both population distribution are normal and the values
of σ12 and σ22 are known.
Hypotheses:
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 > ∆0
or
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 < ∆0
or
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 6= ∆0
The test statistic is:
z=
Dr. Sharabati (Purdue University)
x̄1 − x̄2 − ∆0
q 2
σ1
σ22
n1 + n2
Inferences
Spring 2014
5 / 26
Large-Sample Tests
The z test concerns hypotheses about µ1 − µ2 , the difference between two
population means when both n1 > 40 and n2 > 40.
Hypotheses:
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 > ∆0
or
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 < ∆0
or
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 6= ∆0
The test statistic is:
z=
Dr. Sharabati (Purdue University)
x̄1 − x̄2 − ∆0
q 2
s1
s22
n1 + n2
Inferences
Spring 2014
6 / 26
Confidence Interval/Bound When Sample Size is Large
Provided that n1 and n2 are both large, a Confidence interval for
µ1 − µ2 with a confidence level of approximately 1 − α is:
s
s21
s2
x̄1 − x̄2 ± z ∗
+ 2,
n1 n2
where z ∗ = zα/2 , − gives the lower limit and + gives gives the upper limit
of the intercal. An upper or lower confidence bound can also be
calculated by retaining the appropriate sign (+ or −) and replacing
z ∗ = zα .
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
7 / 26
Two-independent Sample t Test
The two-sample t test concerns hypotheses about µ1 − µ2 , the difference
between two population means. Both population distribution are normal
and the values of σ12 and σ22 are unknown.
Hypotheses:
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 > ∆0
Dr. Sharabati (Purdue University)
or
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 < ∆0
Inferences
or
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 6= ∆0
Spring 2014
8 / 26
Two-independent Sample t Test
The (unpooled) two-sample t test statistic is:
t=
x̄1 − x̄2 − ∆0
q 2
s22
s1
n1 + n2
Assume that both samples were randomly selected from the
populations and both populations are normally distributed. The
test statistic has a t distribution with k degrees of freedom if H0 is
true, where
k=
2
s21
n1
+
s22
n2
(s21 /n1 )2
n1 −1
+
(s22 /n2 )2
n2 −1
round k down to the nearest integer.
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
9 / 26
Two-independent Sample t Test
The (pooled) two-sample t test statistic is:
t=
x̄1 − x̄2 − ∆0
q
spooled n11 + n12
where the pooled standard deviation is
s
(n1 − 1)s21 + (n2 − 1)s22
spooled =
n1 + n2 − 2
Assume that both samples were randomly selected from the
populations and both populations are normally distributed with a
common population standard deviation. The test statistic has a t
distribution with k degrees of freedom if H0 is true, where
k = n1 + n2 − 2
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
10 / 26
Note: If the sample sizes n1 and n2 are both large (n1 > 40 and
n2 > 40), we no longer require that the samples came from a normal
distribution, because the CLT ensures that the sample means are
approximately normal. The test statistic can then be denoted as z,
and the rejection region and p-values can be computed using the
standard normal distributions. In this case, the unpooled procedure
should be used.
Like the pooled t confidence intervals, the pooled t test is not robust
to violations of the equal standard deviation assumption. We
therefore recommend the unpooled t test unless there is really
compelling evidence for doing otherwise.
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
11 / 26
Two-independent Sample t Confidence Intervals
The level 1 − α Confidence interval for µ1 − µ2 is:
s
s2
s21
+ 2
x̄1 − x̄2 ± t∗
n1 n 2
where t∗ = tα/2,k is the value for the t(k) density curve with area
1 − α between −t∗ and t∗ .
An upper confidence bound for µ1 − µ2 is:
s
s21
s2
x̄1 − x̄2 + tα,k
+ 2
n1 n2
An lower confidence bound for µ1 − µ2 is:
s
s21
s2
x̄1 − x̄2 − tα,k
+ 2
n1 n2
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
12 / 26
Two-independent Sample t Test when ∆0 = 0
1. Write the hypotheses in terms of the difference between means.
H0 : µ1 = µ2
H0 : µ1 = µ2
H0 : µ1 = µ2
or
or
Ha : µ1 > µ2
Ha : µ1 < µ2
Ha : µ1 6= µ2
2. Calculate the test statistic
x̄1 − x̄2
t= q 2
s1
s22
n1 + n2
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
13 / 26
3. Calculate the p-value
For Ha : µ1 < µ2 , p-value = P (T < t),
For Ha : µ1 > µ2 , p-value = P (T > t),
For Ha : µ1 6= µ2 , p-value = 2P (T > |t|).
where T ∼ t(k).
4. State conclusions in terms of the problem: Choose a significance level
α and compare the p-value to the α level.
If p-value ≤ α, then reject H0 (significant results).
If p-value > α, then fail to reject H0 (nonsignificant results).
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
14 / 26
Exercise
A group of 15 college seniors are selected to participate in a manual
dexterity skill test against a group of 20 industrial workers. Skills are
assessed by scores obtained on a test taken by both groups. Conduct a
hypothesis test to determine whether the industrial workers had better
manual dexterity skills than the students at the 0.05 significance level.
Descriptive statistics are listed below. Also construct a 95% confidence
interval for this problem.
Group
Students
Workers
Dr. Sharabati (Purdue University)
n
15
20
x̄
35.12
37.32
Inferences
s
4.31
3.83
Spring 2014
15 / 26
Matched Pairs t Test Procedures
Observations occur in pairs such as:
as identical twins
two observations on the same individual (two days, pre- and post-tests,
before and after measurements)
a matched pair design
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
16 / 26
Confidence intervals and hypothesis testing are based on the
difference obtained between the 2 measurements
Find the difference = post test - pre test (or before - after, etc.), in
the individual measurements.
Find the sample mean d¯ and sample standard deviation sD of these
differences.
Perform one-sample t procedures for these differences.
Confidence interval of the population mean difference:
sD
d¯ ± t∗ √
n
Hypothesis testing (H0 : µdiff = ∆0 , i.e., the population mean
difference is zero):
d¯ − ∆0
√
t=
sD / n
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
17 / 26
Example
Researchers are interested in whether Vitamin C is lost when wheat soy
blend (CSB) is cooked as gruel. Samples of gruel were collected, and the
vitamin C content was measured (in mg per 100 grams of gruel) before
and after cooking. Here are the results:
Sample
Before
After
Before - After
1
73
20
53
2
79
27
52
3
86
29
57
4
88
36
52
5
78
17
61
Mean
80.8
25.8
55
St. Dev.
6.14
7.53
3.94
a. Set up an appropriate hypothesis test for the population mean
difference and carry it out for these data. State your conclusions in a
sentence.
b. Find a 90% confidence interval for the mean vitamin C content loss.
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
18 / 26
Paired-Sample Test
The paired-sample t test concerns hypotheses about µd = µ1 − µ2 , the
mean difference between a pair of observations.
Hypotheses:
H0 : µd = ∆0
Ha : µd > ∆0
Dr. Sharabati (Purdue University)
or
H0 : µd = ∆0
Ha : µd < ∆0
Inferences
or
H0 : µd = ∆0
Ha : µd 6= ∆0
Spring 2014
19 / 26
Find the difference for each of the n pairs, that is
di = xi1 − xi2 (i = 1, 2, . . . , n).
Find the sample mean d¯ and sample standard deviation sD of these
differences.
Perform one-sample t test for these differences. That is,
The paired-sample t test statistic is:
t=
d¯ − ∆0
√
sD / n
Assume that the differences d1 , d2 , . . . , dn were selected randomly from
a normal population. The test statistic has a t distribution with n − 1
degrees of freedom if H0 is true.
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
20 / 26
Note: If the sample size is large (n > 40), we no longer require that
the di ’s came from a normal distribution, because the CLT ensures
that d¯ is approximately normal. The test statistic can then be
denoted as z, and the rejection region and the p-value can be
computed using the standard normal distribution.
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
21 / 26
Exercise
In an effort to determine whether sensitivity training for nurses would
improve the quality of nursing provided at an area hospital, the following
study was conducted. Eight different nurses were selected and their nursing
skills were given a score from 1-10. After this initial screening, a training
program was administered, and then the same nurses were rated again.
Below is a table of their pre- and post-training scores, along with the
difference in the score. Conduct a test to determine whether the training
could on average improve the quality of nursing provided in the population.
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
22 / 26
Individual
1
2
3
4
5
6
7
8
Pre-training score
2.56
3.22
3.45
5.55
5.63
7.89
7.66
6.20
Post-training score
4.54
5.33
4.32
7.45
7.00
9.80
5.33
6.80
a. What are the hypotheses?
b. What is the test statistic?
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
23 / 26
c. What is the p-value or reject region?
d. What is your conclusion in terms of the story?
e. What is the 95% confidence interval of the population mean difference
in nursing scores?
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
24 / 26
Summary
Matched pairs vs. two-independent sample comparison of means?
Matched pairs if all units are measured twice and/or receive both
treatments over time. Before vs. after is the most common example.
Two-independent sample comparison of means if you have two separate
groups, but each unit is only measured once. Men vs. women is the
most common example.
Confidence Interval and Hypothesis Tests
Suppose that the 95% two-tailed confidence interval for a population
mean µ based on a particular sample is (20.5, 27.2). If the sample is
used to test H0 : µ = µ0 against Ha : µ 6= µ0 at the 0.05 significance
level, the test will fail to reject H0 if µ0 falls in the 95% confidence
interval and will reject H0 if µ0 is not in the 95% confidence interval.
In general, a 100(1-α)% confidence interval can be used
equivalently with a two-tailed hypothesis test at the α level.
Dr. Sharabati (Purdue University)
Inferences
Spring 2014
25 / 26
Summary of t Tests for Two Different Population Distributions
Hypotheses:
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 > ∆0
or
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 < ∆0
or
H0 : µ1 − µ2 = ∆0
Ha : µ1 − µ2 6= ∆0
Two-independent sample t test
The (unpooled) two-sample t test statistic is:
t=
x̄1 − x̄2 − ∆0
q 2
s1
s22
n1 + n2
The (pooled) two-sample t test statistic is:
t=
x̄1 − x̄2 − ∆0
q
spooled n11 + n12
The paired-sample t test statistic is:
t=
Dr. Sharabati (Purdue University)
d¯ − ∆0
√
sD / n
Inferences
Spring 2014
26 / 26