Download Chapter 10 – Two-Sample Inference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 10 – Two-Sample Inference
 Independent Samples and Dependent Samples
o Two samples are independent when the subjects selected for the first
sample do not determine the subjects in the second sample. Two samples
are dependent when the subjects in the first sample determine that subjects
in the second sample. The data from dependent samples are called matchedpair or paired samples.
 Confidence Interval for Population Mean Difference
(Dependent Samples)
o Suppose we have a set of matched-pair data obtained by taking dependent
random samples of two populations and finding the differences to produce
(
a random sample of the difference between the populations. A
) confidence interval for
, the population mean of the differences, is
given by
lower bound = ̅
( )
upper bound = ̅
( )
√
√
where ̅ and represent the sample mean and sample standard deviation
of the differences, respectively, of the set of n paired differences, d1, d2,
d3,…, dn, and where is based on n – 1 degrees of freedom. This t interval
applies whenever either of the following condition is met:
 Case 1: the population of difference is normal, or
 Case 2: the sample size of difference is large (n ≥ 30).
(
The
form:
)
confidence interval for
may also be expressed in the
̅
( )
√
Example 10.1
Student
After (sample 1)
Ashley
66
Brittany
68
Chris
74
Dave
88
Emily
89
Fran
91
Greg
100
50
55
60
70
75
80
88
Before (sample 2)
Q1. Construct a 95% confidence interval for the mean of the differences in the statistics quiz scores. Is there evidence that
the Math Center tutoring leads to a mean improvement in the quiz scores?
A1. We ignore the original raw data and concentrate only on the set of sample differences: {16, 13, 14, 18, 14, 11, 12}
𝒔𝒅 =
(𝒙
𝒏
𝒙 )𝟐
=
𝟏
(𝟏𝟔
𝟏𝟒)𝟐
(𝟏𝟑
𝟏𝟒)𝟐
(𝟏𝟑
𝟏𝟒)𝟐
(𝟏𝟖 𝟏𝟒)𝟐
𝟕 𝟏
(𝟏𝟒
𝟏𝟒)𝟐
(𝟏𝟏
𝟏𝟒)𝟐
(𝟏𝟐
𝟏𝟒)𝟐
≈ 𝟐. 𝟑𝟖𝟎𝟓
For 95% confidence with n – 1 = 6 degrees of freedom, 𝒕𝒂/𝟐 = 𝟐. 𝟒𝟒𝟕
𝒔
lower bound = 𝒙𝒅
𝒕𝒂 ( 𝒅 ) = 14 – (2.447)(2.3805/√𝟕) ≈ 11.7983
upper bound = 𝒙𝒅
𝒕𝒂 ( 𝒅 ) = 14 + (2.447)(2.3805/√𝟕) ≈ 6.2017
𝟐
√𝒏
𝒔
𝟐
√𝒏
We are 95% confident that the population mean of the differences between quiz scores before and after visiting the
Math Center lies between 11.7983 points and 16.2017 points.
 Paired Sample t Test for the Population Mean of the Difference μd : p-Value
Method
o Suppose we have a set of matched-pair data obtained by taking dependent
random sample of two populations and finding the differences to produce a
random sample of the difference between the populations. We can use the t
test whenever either of the following conditions is met:
 Case 1: the population of difference is normal, or
 Case 2: the sample size of difference is large (n ≥ 30).
Step 1 State the hypotheses and the rejection rule. Use one of the hypothesis test
forms from Table 10.1.
Null hypothesis
H0 : μd ≤ 0
H0 : μd ≥ 0
H0 : μd = 0
Alternative hypothesis
Ha : μd > 0
Ha : μd < 0
Ha : μd ≠ 0
Type of test
Right-tailed test
Left-tailed test
Two-tailed test
Table 10.1 – Forms of the hypothesis test
Step 2 Find tdata.
=
/√
Step 3 Find the p-value.
Type of hypothesis Test
p-Values is tail area associated with tdata
Right-tailed test
H0: μd ≤ 0 versus Ha: μd > 0
p-value
p-value = P(t > tdata)
Area to right of tdata
0
tdata
Left-tailed test
H0: μd ≥ 0 versus Ha: μd < 0
p-value = P(t < tdata)
Area to left of tdata
Two-tailed test
H0: μd = 0 versus Ha: μd ≠ 0
p-value = P(t > | |) + P(t < -| |)
= 2 * P(t > | |)
Sum of the two tail areas.
p-value
tdata
0
Sum of two areas is
p-value
|𝑡𝑑𝑎𝑡𝑎 |
0
|𝑡𝑑𝑎𝑡𝑎 |
Step 4 State the conclusion and interpretation. Compare the p-value with
Example 10.2
Q1. Paired-sample t test for μd: The p-value method
A1. pg.549
 Paired Sample t Test for the Population Mean of the Difference μd : Critical Value
Method
o Suppose we have a set of matched-pair data obtained by taking dependent
random sample of two populations and finding the differences to produce a
random sample of the difference between the populations. You can use the t
test whenever either of the following conditions is met:
 Case 1: the population of difference is normal, or
 Case 2: the sample size of difference is large (n ≥ 30).
Step 1 State the hypotheses. Use one of the hypothesis test forms from Table 10.1.
State clearly the meaning of μd.
Step 2 Find tcrit, and state the rejection rule. To find tcrit, use the t table and degrees
of freedom n – 1. To find the rejection rule, use Table 10.2.
Form of test
Right-tailed
Left-tailed
Tow-tailed
H0: μd ≤ 0 vs. Ha: μd > 0
H0: μd ≥ 0 vs. Ha: μd < 0
H0: μd = 0 vs. Ha: μd ≠ 0
Rejection rules: “Reject H0 if…”
tdata > tcrit
tdata < –tcrit
tdata > tcrit or tdata < –tcrit
Table 10.2 – Rejection rules for the t test for μd
Step 3 Find tdata.
=
/√
Step 4 State the conclusion and interpretation. Compare the tdata with tcrit.
Example 10.3
Q1. Paired-sample t test for μd :The critical value method
A1. pg.551
 Sampling Distribution of ̅ 1 – ̅ 2
o When random samples are drawn independently from two populations with
population means μ1 and μ2, and either
 Case 1: the two populations are normally distributed, or
 Case 2: the two sample sizes are large (at least 30), then the
quantity
=
(
)
(
)
=
(
)
(
)
approximately follows a t distribution with degrees of freedom equal to the
smaller of n1 – 1 and n2 – 1, where ̅ and s1 represent the mean and
standard deviation of the sample taken from population 1, and ̅ and s2
represent the mean and standard deviation of the sample taken from
population 2.
 Standard Error of ̅ 1 – ̅ 2
o The standard error
̅
̅
of the statistic ̅ 1 – ̅ 2 is
=
It measures the size of the typical error in using ̅ 1 – ̅ 2 to estimate
.
 Confidence Interval for
o For two independent random samples taken from two populations with
population means and , and 100(1 – )% confidence interval for
is given by
(
)
/
The t interval applies whenever either of the following conditions is met:
 Case 1: both populations are normally distributed, or
 Case 2: both sample sizes are large.
 Margin of Error E
o The margin of error for a 100(1 – )% confidence interval for
given by
E = / * (standard error)
= / *(
)
=
/
*√
is
Example 10.4
Gender
Sample size
Sample
mean body
temperature
Sample
standard
deviation
Population
mean body
temperature
n1 = 65
𝑥̅ = 98.394
S1 = 0.743
𝜇 =?
n2 = 65
𝑥̅ = 98.
S2 = 0.699
𝜇 =?
Females
(sample 1)
Males
(sample 2)
5
Summary statistics for female versus male body temperatures in 0F
Q1. Calculate the standard error 𝑠𝑥̅
𝑥̅
for estimating the difference in population mean body
temperature between women and men.
A1.
𝒔𝒙𝟏
𝒙𝟐
𝒔𝟐
𝒔𝟐𝟐
𝟏
𝒏𝟐
= √𝒏𝟏
𝟎.𝟕𝟒𝟑𝟐
𝟎.𝟔𝟗𝟗𝟐
𝟔𝟓
𝟔𝟓
=√
≈ 𝟎. 𝟏𝟐𝟔𝟓
Q2. Find a 95% confidence interval for the difference in women’s and men’s population men
body temperatures.
A2. Both sample size are large, so the sampling distribution of 𝒙𝟏 𝒙𝟐 has a t distribution. We know the
standard error 𝒔𝒙𝟏 𝒙𝟐 ≈ 𝟎. 𝟏𝟐𝟔𝟓. But we need to find 𝒕𝒂/𝟐 to use the formula for E. the require degrees of
freedom is the smaller of n1 – 1 and n2 – 1, which are both equal to 65 – 1 = 64, so the degrees of freedom
for 𝒕𝒂/𝟐 is also 64. This df = 64 is not listed in the t table, so we choose the next lowest df listed, 60. For
95% confidence, then, 𝒕𝒂/𝟐 = 2.00, and the margin of error is E = 𝒕𝒂/𝟐 * (𝒔𝒙𝟏 𝒙𝟐 ) ≈ (2.00)*(0.1265) = 0.253
The 95% confidence interval is then
(𝒙𝟏
𝒙𝟐 )
𝑬 = (𝟗𝟖. 𝟑𝟗𝟒
𝟗𝟖. 𝟏𝟎𝟓)
𝟎. 𝟐𝟓𝟑 = 𝟎. 𝟐𝟖𝟗
𝟎. 𝟐𝟓𝟑 = (𝟎. 𝟎𝟑𝟔, 𝟎. 𝟓𝟒𝟐)
 Sampling Distribution of ̂
̂
o When two random samples are drawn independently from two populations,
then the quantity
(̂
̂ ) (
̂ ) (
) (̂
)
=
=
̂ ̂
√
has an approximately standard normal distribution when the following
conditions are satisfied:
x1 ≥ 5, (n1 – x1) ≥ 5, x2 ≥ 5, (n2 – x2) ≥ 5
 Standard Error of ̂
̂
o The standard error
̂
̂
of the statistic ̂
̂ is
̂ ̂
̂ ̂
̂
̂
where ̂ =
̂ and ̂ =
size of the typical error in using
=
̂ . The standard error
̂
̂ to estimate
̂
̂
measures the
.
 Confidence Interval for
o For two independent random samples taken from two populations with
population proportion and , a 100(1 – )% confidence interval for
is given by
̂
̂ ̂
̂
̂ ̂
/
 Margin of Error E
o The margin of error for a 100(1 – )% confidence interval for
given by
=
/
(
)=
/
(
̂
̂
)=
̂ ̂
/
is
̂ ̂
Example 10.5
Boys
x1 = 195
Number responding “Yes”
n1 = 487
Sample size
̂ 𝟏 = x1/ n1 = 195/487 ≈ 0.4004
𝒑
Sample proportion
girls
x2 = 93
n2 = 487
̂ 𝟐 = x2/ n2 = 93/487 ≈ 0.1910
𝒑
Proportions of teenage boys and girls who post their last names in online profiles
Q1. Find the point estimate 𝑝̂
̂𝟏
𝒑
𝑝̂ for the difference in population proportions 𝑝
̂𝟐 ≈ 𝟎. 𝟒𝟎𝟎𝟒
𝒑
𝑝
𝟎. 𝟏𝟗𝟏𝟎 = 𝟎. 𝟐𝟎𝟗𝟒
Q2. Calculate the standard error
̂𝟏 = 𝟏
𝒒
𝒔𝒑̂𝟏
̂𝟐
𝒑
̂𝟏 = 𝟏
𝒑
=
̂𝟏 𝒒
̂𝟏
𝒑
𝒏𝟏
̂𝟐 = 𝟏
𝟎. 𝟒𝟎𝟎𝟒 = 𝟎. 𝟓𝟗𝟗𝟔 and 𝒒
̂𝟐 𝒒
̂𝟐
𝒑
=
𝒏𝟐
(𝟎. 𝟒𝟎𝟎𝟒)(𝟎. 𝟓𝟗𝟗𝟔)
𝟒𝟖𝟕
̂𝟐 = 𝟏
𝒑
𝟎. 𝟏𝟗𝟏𝟎 = 𝟎. 𝟖𝟎𝟗𝟎.
(𝟎. 𝟏𝟗𝟏𝟎)(𝟎. 𝟖𝟎𝟗𝟎)
≈ 𝟎. 𝟎𝟐𝟖𝟓
𝟒𝟖𝟕
Q3. For a 95% confidence level, calculate the margin of error.
𝑬 = 𝒁𝒂/𝟐 (𝒔𝒑̂𝟏
̂𝟐 )
𝒑
= 𝟏. 𝟗𝟔(𝟎. 𝟎𝟐𝟖𝟓) ≈ 𝟎. 𝟎𝟓𝟓𝟗
Q4. Construct and interpret a 95% confidence interval for the difference in population
proportions of girls and boys whose last name is posted to their online profile.
̂𝟏
𝒑
̂𝟐
𝒑
𝑬 = (𝟎. 𝟒𝟎𝟎𝟒 𝟎. 𝟏𝟗𝟏𝟎)
= (𝟎. 𝟏𝟓𝟑𝟓, 𝟎. 𝟐𝟔𝟓𝟑)
𝟎. 𝟎𝟓𝟓𝟗 = 𝟎. 𝟐𝟎𝟗𝟒
𝟎. 𝟎𝟓𝟓𝟗