Download Chapter 5

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Analysis of variance wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 5
Hypothesis Testing
5.1 - Introduction
Definition 5.1.1 Hypothesis testing is a formal
approach for determining if data from a sample
support a claim about a population.
1.
2.
3.
4.
5.
State the null and alternative hypotheses
Calculate the test statistic
Find the critical value (or calculate the P-value)
State the technical conclusion
State the final conclusion
The Process
Claim: More than half the students at a
particular university are from out of state
Step 1: State the null and alternative hypotheses
– Define the parameter
p = The proportion of all students at the
university who are from out of state
– State the hypotheses
H0: p = 0.50 H1: p > 0.50
The Process
Step 2: Calculate the test statistic
– For a claim about a single proportion
z
pˆ  p0
p0 1  p0  / n
– 𝑝 = sample proportion
– 𝑛 = sample size
Suppose pˆ  0.621 and n  95
– 𝑝0 = number in H0
z
0.621  0.5
0.5 1  0.5  / 95
 2.36
The Process
Step 3: Find the critical value
– At the 95% confidence level,
z  z0.05  1.645  [1.645, )
The Process
Step 4: State the technical conclusion
– One of two statements:
Reject H0 or Do not reject H0
– Reject H0 if the test statistic falls into the critical
region and do not reject H0 otherwise
– z = 2.36, critical region: [1.645, ∞)
• Reject H0
The Process
Step 5: State the final conclusion
– The data support the claim
P-value Method
Criteria
1. Reject H0 if P-value ≤ α
2. Do not reject H0 if P-value > α
P-value Method
5.2 – Testing Claims about a
Proportion
1-Proportion Z-Test
Purpose: To test a claim about a single population
proportion where the null hypothesis is of the form
H0: p = p0. Let
– x be the number of “successes” in a sample of size
n and
– 𝑝 = 𝑥/𝑛 be the sample proportion.
1-Proportion Z-Test
The test statistic is
z
pˆ  p0
p0 1  p0  / n
The critical value is a z-score and the P-value is an area
under the standard normal density curve.
1-Proportion Z-Test
Requirements
1. The sample is random
2. The conditions for a binomial distribution must
be met (at least approximately)
3. The conditions 𝑛𝑝0 ≥ 5 and 𝑛(1 − 𝑝0 ) ≥ 5 are
both met
Example 5.2.1
A student claims that less than 25% of plain
M&M candies are red. A random sample of 195
candies contains 37 red candies. Use this data to
test the claim at the 0.05 significance level.
1. The parameter about which the claim is made is
p = The proportion of all M&M candies that are red
Example 5.2.1
2. The claim in mathematical notation is p < 0.25
H0: p = 0.25 H1: p < 0.25
3. The sample proportion is 𝑝 = 37/195 ≈ 0.190
z
pˆ  p0
p0 1  p0  / n

0.190  0.25
0.25 1  0.25  /195
 1.93
4. The critical value is −𝑧0.05 = −1.645
–
The P-value is the area to the left of z = −1.93 which is
0.0268.
Example 5.2.1
5. Critical region: (−∞, −1.645]
– z lies in this region
– P-value < 𝛼
– Technical conclusion: Reject H0
6. Final conclusion: The data support the claim
Types of Errors
5.3 – Testing Claims about a Mean
T-Test for a Claim About a Single Population Mean
Purpose: To test a claim about the mean of a single
population where the null hypothesis is of the form
H0: 𝜇 = 𝜇0 , and a sample of size 𝑛 has a mean of 𝑥 and
standard deviation 𝑠. The test statistic is
x  0
t
s/ n
The critical value is a t-score with n − 1 degrees of
freedom and the P-value is an area under the
corresponding Student-t density curve.
T-Test
Requirements
1. The sample is random
2. Either the population is normally distributed or
n > 30
Example 5.3.3
A manufacturer of cheese claims that the mean weight
of all its 12 oz packages of shredded cheddar is greater
than 12 oz. They collect a random sample of n = 36
packages, weigh each, and calculate a sample mean of
𝑥 = 12.05 and a sample standard deviation of s = 0.15.
Use this data to test the claim at the 0.05 significance
level.
1. The parameter about which the claim is made is
μ = The mean weight of all 12 oz packages
Example 5.3.3
The claim in mathematical notation is µ > 12
H0: µ = 12 H1: µ > 12
2. The test statistic is
x  0 12.05  12
t

2
s / n 0.15 / 36
3. The critical value is 𝑡0.05 (35) = 1.690
– The P-value is the area to the right of t = 2
– By software: P-value = 0.027
Example 5.3.3
4. Critical region: [1.690, ∞)
– t lies in this region
– P-value < 𝛼
– Technical conclusion: Reject H0
5. Final conclusion: The data support the claim
5.4 – Comparing Two Proportions
2-Proportion Z-Test
Purpose: To test a claim comparing proportions from
two independent populations where the null hypothesis
is of the form H0: 𝑝1 = 𝑝2 . The test statistic is
pˆ1  pˆ 2
z
pˆ 1  pˆ 1/ n1  1/ n2 
𝑛1 , 𝑛2 = sample sizes
𝑥1 , 𝑥2 = numbesr of “successes”
𝑝1 , 𝑝2 = sample proportions
x1  x2
pˆ 
n1  n2
2-Proportion Z-Test
Requirements
1. Both samples are random and independent
2. In both samples, the conditions for a binomial
distribution are satisfied
3. In both samples, there are at least 5 successes and
5 failures
Example 5.4.1
In a survey of voters in the 2008 Texas Democrat
primary, 54% of the 1167 females voted for
Hillary Clinton while 47% of the 881 males
voted for Clinton (data from
www.CNNPolitics.com, March 5, 2008). Use
this data to test the claim that of the voters in the
2008 Texas Democrat primary, the proportion of
females who voted for Clinton is higher than the
proportionof males who voted for Clinton.
Example 5.4.1
1. The parameters about which the claim is made are
𝑝1 = The proportion of all females who voted for Clinton
𝑝2 = The proportion of all males who voted for Clinton
The claim is 𝑝1 > 𝑝2
H0: 𝑝1 = 𝑝2 H1: 𝑝1 > 𝑝2
2. Test statistic:
x1  0.54(1167)  630 x2  0.47  881  414
pˆ 
630  414
 0.510
1167  881
Example 5.4.1
z
0.54  0.47
0.510 1  0.510 1/1167  1/ 881
 3.14
3. Critical value: 𝑧0.05 = 1.645
– P-value = area to the right of 𝑧 = 3.14 which is
0.0008
4. Critical region: [1.645, ∞)
– Technical conclusion: Reject H0
5. Final conclusion: The data support the claim
5.5 – Comparing Two Variances
F-Test
Purpose: To test a claim comparing the variances of
two independent populations where the null hypothesis
is of the form H0: 𝜎1 2 = 𝜎2 2 . The test statistic is
s12
f  2
s2
where
 s12 is the larger of the two sample variances
 n1 and n2 are the sample sizes
F-Test
The critical value is an F-value with 𝑛 = (𝑛1 − 1) and
𝑑 = (𝑛2 − 1) degrees of freedom
Requirements
1. Both samples are random and independent
2. Both populations are normally distributed (strict
requirement)
Example 5.5.1
Among the 𝑛1 = 10 subjects who followed
diet A, their mean weight loss was 𝑥1 = 4.5 lb
with a standard deviation of 𝑠1 = 6.5 lb. Among
the 𝑛2 = 10 subjects who followed diet B, their
mean weight loss was 𝑥2 = 3.2 lb with a
standard deviation of 𝑠2 = 4.5 lb. Test the claim
that the two populations have the same variance.
Example 5.5.1
1. The parameters about which the claim is made are
 12  The variance of the weight loss for all those on diet A
 22  The variance of the weight loss for all those on diet B
The claim is  12   22
H 0 :  12   22
H1 :  12   22
2. Test statistic:
6.52
f 
 2.09
2
4.5
Example 5.5.1
3. Critical value: 𝑓0.05/2 9, 9 = 4.03
– P-value = area to the right of 𝑓 = 4.03
– Using software: P-value = 0.288
4. Critical region: [4.03, ∞)
– Technical conclusion: Do not reject H0
5. Final conclusion: There is not sufficient
evidence to reject the claim
5.6 – Comparing Two Means
2-Sample T-Tests
Purpose: To test a claim regarding the means of two
independent populations where the null hypothesis is of
the form H0: 𝜇1 − 𝜇2 = 𝑐.
– Select random and independent samples from each
population
– Calculate the mean and variance of each sample
Equal Population Variances
Test statistic:
t
 x1  x2   c
s p 1/ n1  1/ n2
where
sp 
 n1  1 s12   n2  1 s22
n1  n2  2
– Critical value: t-score with 𝑛1 + 𝑛2 − 2 degrees of
freedom
– P-value is an area under the Student-t density
curve with this number of degrees of freedom.
Unequal Population Variances
Test statistic:
t
 x1  x2   c
s12 / n1  s22 / n2
The critical value is a t-score with r degrees of freedom
where r is
2
2
2
 s1 s2 
  
n1 n2 

r
2
2
2
2
1  s1 
1  s2 
  
 
n1  1  n1  n2  1  n2 
If r is not an integer, then round it down to the nearest
whole number
Requirements
1. Both samples are random
2. The samples are independent
3. Both populations are normally distributed or
both sample sizes are greater than 30
Example 5.6.1
Among the 𝑛1 = 10 subjects who followed
diet A, their mean weight loss was 𝑥1 = 4.5 lb
with a standard deviation of 𝑠1 = 6.5 lb. Among
the 𝑛2 = 10 subjects who followed diet B, their
mean weight loss was 𝑥2 = 3.2 lb with a
standard deviation of 𝑠2 = 4.5 lb. Test the claim
that the mean weight loss on diet A is higher than
that on diet B.
Example 5.6.1
1. The parameters about which the claim is made are
1  The mean weight loss for all those on diet A
2  The mean weight loss for all those on diet B
The claim is 1  2
H 0 : 1  2 H1 : 1  2
2. Assume equal population variances. Test statistic:
sp 
10  1 6.52  10  1 4.52
 5.59
10  10  2
(4.5  3.2)  0
t
5.59 1/10  1/10
 0.52
Example 5.6.1
3. Critical value: 𝑡0.05 (10 + 10 − 2) = 1.734
– P-value = area to the right of 𝑡 = 0.52
– Using software: P-value = 0.305
4. Critical region: [1.734, ∞)
– Technical conclusion: Do not reject H0
5. Final conclusion: The data do not support the
claim
Paired T-Test
Purpose: To test the claim that a set of paired data
𝑥𝑖, 𝑦𝑖 , 𝑖 = 1, … , 𝑛 come from a population in which the
differences (𝑥 − 𝑦) have a mean less than, greater than,
or equal to 0.
1. Let 𝜇 denote the population mean of the
differences.
2. The null hypothesis is H0: 𝜇 = 0
3. Calculate the differences 𝑥𝑖 − 𝑦𝑖 , 𝑖 = 1, … , 𝑛
4. Use the differences from step 3 and the T-test
from Section 5.3 to test the claim.
𝑑
𝑑
Example 5.6.3
At a large university, freshman students are
required to take an introduction to writing class.
Students are given a survey on their attitudes
toward writing at the beginning and end of the
class. Each student receives a score between 0
and 100 (the higher the score, the more favorable
his or her attitude toward writing). Test the claim
that the scores improve from the beginning to the
end.
Example 5.6.3
These data come in “matched pairs”
– They are not independent
– We cannot use a 2-sample T-test
If scores improve, then the differences (End – Beginning)
would be positive
– We test the claim 𝜇𝑑 > 0
Example 5.6.3
1. State the hypotheses
– 𝜇𝑑 = population mean of (end − beginning)
H0: 𝜇𝑑 = 0 H1:𝜇𝑑 > 0
2. Test statistic: Mean and standard deviation of
sample differences: 𝑥 = 3.256, 𝑠 = 4.215
3.256  0
t
 2.32
4.215 / 9
3. Critical value: 𝑡0.05 (9 − 1) = 1.860
–
P-value: Area to the right of 2.32 which is 0.025
Example 5.6.3
4. Critical region: [1.860, ∞)
– Reject H0
5. Final conclusion: The data support the claim
5.7 – Goodness-of-fit Test
Chi-square Goodness-of-fit Test
Purpose: To test the claim that a random variable X has
some particular distribution.
1. Divide the range of X into k categories
2. Observe values of X and record frequencies of the
categories, 𝑂𝑖
3. Calculate expected frequencies of the categories, 𝐸𝑖
4. Calculate the test statistic
2
k
c
i 1
 Oi  Ei 
Ei
Chi-square Goodness-of-fit Test
5. Critical value: 𝜒 𝛼2(𝑘 − 1)
–
P-value = Area to the right of c under the 𝜒 2 (𝑘 − 1)
density curve
6. If 𝑐 > 𝜒 𝛼2(𝑘 − 1) or P-value < 𝛼, then reject the
claim that X has the claimed distribution
Requirements
1. The data have been randomly chosen
2. Each expected frequency is at least 5
Example 5.7.3
A student simulated dandelions in a lawn by
randomly placing 300 dots on a piece of paper
with an area of 100 in2. He then randomly chose
75 different 1 in2 sections of paper and counted
the number of dots in each section.
Example 5.7.3
Let X = number of dots in a 1 in2 section
– Claim: X has a Poisson distribution with 𝜆 = 3
Let 𝑝𝑖 = 𝑃(𝑋 = 𝑖), 𝑖 = 0, … , 6
– If the claim were true, then, for instance
31 3
P( X  1)  e  0.149
1!
– Denote this number 𝜋1
– 𝐸1 = 75 0.149 = 11.20
Example 5.7.3
Critical value:
– 𝜒 0.052(7 − 1) = 12.59
– Do not reject H0
Final conclusion:
It is reasonable to assume that
X has Poisson distribution with
𝜆=3
Hypotheses:
H 0 : p0  0.050, p1  0.149,, p6  0.084
H1 : At least one probability is not as claimed
5.8 – Test of Independence
Two students want to determine if their university men’s
basketball team benefits from home-court advantage.
They randomly select 205 games played by the team
and record if each one was played at home or away and
if the team won or lost (data collected by Emily
Hudgins and Courtney Santistevan, 2009).
“Contingency Table”
Chi-square Test of Independence
Purpose: To test if the row events of a contingency
table are independent of the column events. Let
–
–
–
–
–
–
–
𝑛 = total number of observations
𝑎 = number of rows
𝑏 = number of columns
𝑅𝑖 = sum of the i th row
𝐶𝑗 = sum of the j th column
𝑂𝑖𝑗 = frequency in the i th row and j th column
H0: The rows are independent of the columns
Test of Independence
a
b
Test statistic: c  
i 1 j 1
O
ij
 Eij 
Eij
2
where Eij 
Ri  C j
n
Critical value: 𝜒 𝛼2[ 𝑎 − 1 𝑏 − 1 ]
– P-value = area to the right of 𝑐
– Reject H0 if 𝑐 > c.v.
Requirements
1. The data in the table represent frequency counts and are
randomly selected
2. All expected frequencies are at least 5
Example
Critical value: 𝜒 0.052 2 − 1 2 − 1
– Reject H0
= 3.841
Final Conclusion
– The result is not independent of the location
5.9 – One-way ANOVA
A seed company plants four types of new corn seed on
several plots of land and records the yield (in
bushels/acre) of each plot as shown below. Test the
claim that the four types of seed produce the same mean
yield.
One-way ANOVA
Purpose: To test for equality of two or more
populations means
– Null hypothesis: H0: 𝜇1 = ⋯ = 𝜇𝑘
– 𝑁 = total number of data values
– Test statistic
MS(treatment)
F
where
MS(error)
k
MS(treatment) 
 ni  xi  x 
i 1
k 1
k
2
and
MS(error) 
2
n

1
s


 i i
i 1
N k
One-way ANOVA
Critical region: [𝑓𝛼 𝑘 − 1, 𝑁 − 𝑘 , ∞)
– P-value: area to the right of F under the F-distribution
density curve with k − 1 and N − k degrees of freedom
Requirements (“loose” requirements)
1. The populations are normally distributed
2. The populations have the same variance
3. The samples are random and independent
Definition: Treatment (or factor)
– A characteristic that distinguishes the different populations
(or groups) from each other
Example 5.9.3
Let 𝜇1 = mean yield of Type A, etc
H0: 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4
H1: At least one mean is different
3(64.75  66.07) 2   4(58.47  66.07) 2
MS(treatment) 
 206.93
4 1
(3  1)4.602   (4  1)2.422
MS(error) 
 15.55
14  4
Critical region
206.93
F
 13.31
𝑓0.05 4 − 1, 14 − 4 , ∞ = [3.71, ∞)
15.55
- Reject H0
Final conclusion: The data do not
support the claim of equal means
5.10 – Two-way ANOVA
Randomized Block Design
A statistics professor is comparing four different delivery
methods for her introduction to statistics class: face-toface, online, hybrid, and video (called the treatments).
– She divides the population of students into three groups
according to their overall GPA: high, middle, and low.
(called blocks)
– She randomly chooses four students from each block and
randomly assigns each one to a class using one of the
delivery methods
– At the end of the semester she records each student’s
overall grade
Two-way ANOVA
Two questions:
1. Do the four treatments have the same population
mean?
2. Do the blocks have any affect on the scores?
Two-way ANOVA
Parameters
– 𝜇𝑖 = ith treatment mean
– 𝛽𝑖 = jth “block effect” (a measure of the effect that
block i has on the score)
Null hypotheses
H 0 : 1  2 
  k (the treatment means are all the same)
H B : 1   2 
  b (the block effects are all the same)
ANOVA Table
P-value = 0.271 – Do not reject H0
– There is not a statistically significant difference
between the treatment means
P-value = 0.008 – Reject HB
– There is a statistically significant difference in the
block effects
Factorial Experiment
The statistics professors randomly chooses 16
students from each GPA level, randomly assigns
four to each delivery method, and records their
scores at the end of the semester.
Three questions:
1. Is there any difference between the population mean
scores of the delivery methods?
2. Does the GPA level affect the scores?
3. Does the interaction of the delivery method and GPA
level affect the scores?
Factorial Experiment
• 3 × 4 factorial experiment
with 4 replications per
treatment
• Factors
– GPA (factor A)
– Delivery Method (factor B)
• Levels
– 3 levels of GPA
– 4 levels of Delivery Method
Factorial Experiment
Factorial Experiment
Parameters
– 𝜇𝑗 = population mean of the jth delivery method
– 𝛼𝑖 = effect of the ith GPA level on the score
– 𝛾𝑖𝑗 =“interaction effect” of the ith GPA level on the
jth delivery method
Null hypotheses
H B : 1  2 
 b (the delivery methods all have the same mean)
H A : 1   2 
  a (the effects of the GPA levels are all the same)
H AB :  11   12 
  ab (the interaction effects are all the same)
ANOVA Table
P-value = 0.176 – Do not reject HB
– There is not a statistically significant difference
between the means of the delivery methods
P-value = 0.000 – Reject HA
– There is a statistically significant difference in the
effects of the GPA levels
ANOVA Table
P-value = 0.004 – Reject HAB
– There is a statistically significant difference in the
interaction effects
Overall
– There is not a “best” method
– Consider certain delivery methods to certain GPA levels