Download Math 143: Introduction to Biostatistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Math 143: Introduction to Biostatistics
R Pruim
Spring 2011
0.2
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Contents
8 Goodness of Fit Testing
8.1 Testing Simple Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Testing Complex Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3 Reporting Results of Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
5
6
9 Chi-squared for Contingency Tables
1
10 Normal Distributions
10.1 The Family of Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
3
6
11 Inference for the Mean of a Population
11.1 Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3 Overview of One-Sample Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
3
3
12 Comparing Two Means
12.1 The Distinction Between Paired T and 2-sample T . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2 Comparing Two Means in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3 Formulas for 2-Sample T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
2
3
3
7.4
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Goodness of Fit Testing
8.1
8
Goodness of Fit Testing
Goodness of fit tests test how well a distribution fits some hypothesis.
8.1
Testing Simple Hypotheses
8.1.1
Golfballs in the Yard Example
Allan Rossman used to live along a golf course. One summer he collected 500 golf balls in his back yard and
tallied the number each golf ball to see if the numbers were equally likely to be 1, 2, 3, or 4.
Step 1: State the Null and Alternative Hypotheses.
• H0 : The four numbers are equally likely (in the population of golf balls driven 150–200 yards and sliced).
• Ha : The four numbers are not equally likely (in the population).
If we let pi be the proportion of golf balls with the number i on them, then we can restate the null hypothesis
as
• H0 : p1 = p2 = p3 = p4 = 0.25.
• Ha : The four numbers are not equally likely (in the population).
Step 2: Compute a test statistic. Here are Allan’s data:
> golfballs
1
2
3
4
137 138 107 104
(Fourteen golf balls didn’t have any of these numbers and were removed from the data leaving 486 golf balls to
test our null hypotheses.)
Although we made up several potential test statistics in class, the standard test statistic for this situation is
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
8.2
Goodness of Fit Testing
The Chi-squared test statistic:
X2 =
X (observed − expected)2
expected
There is one term in this sum for each cell in our data table, and
• observed = the tally in that cell (a count from our raw data)
• expected = the number we would “expect” if the percentages followed our null hypothesis exactly.
(Note: the expected counts might not be whole numbers.)
In our particular example, we would expect 25% of the 486 (i.e., 121.5) golf balls in each category, so
X2 =
(137 − 121.5)2
(138 − 121.5)2
(107 − 121.5)2
(104 − 121.5)2
+
+
+
= 8.469
121.5
121.5
121.5
121.5
Step 3: Compute a p-value Our test statistic will be large when the observed counts and expected counts are
quite different. It will be small when the observed counts and expected counts are quite close. So we will reject
when the test statistic is large. To know how large is large enough, we need to know the sampling distribution.
If H0 is true and the sample is large enough, then the sampling distribution for the Chi-squared test
statistic will be approximately a Chi-squared distribution.
• The degrees of freedom for this type of goodness of fit test is one less than the number of cells.
• The approximation gets better and better as the sample size gets larger.
The mean of a Chi-squared distribution is equal to its degrees of freedom. This can help us get a rough idea
about whether our test statistic is unusually large or not.
How unusual is it to get a test statistic at least as large as ours (8.47)? We compare to a Chi-squared distribution
with 3 degrees of freedom. The mean value of such a statistic is 3, and our test statistic is almost 3 times as
big, so we anticipate that our value is rather unusual, but not extremely unusual. This is the case:
> 1 - pchisq( 8.469, df=3 )
[1] 0.03725096
Step 4: Interpret the p-value and Draw a Conclusion. Based on this smallish p-value, we can reject our null
hypothesis at the usual α = 0.05 level. We have sufficient evidence to cast doubt on the hypothesis that all the
numbers are equally common among the population of golfballs.
Automation. Of course, R can automate this whole process for us if we provide the data table and the null
hypothesis. The default null hypothesis is that all the probabilities are equal, so in this case it is enough to give
R the golf balls data.
> chisq.test( golfballs )
Chi-squared test for given probabilities
data: golfballs
X-squared = 8.4691, df = 3, p-value = 0.03725
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Goodness of Fit Testing
8.1.2
8.3
Plant Genetics Example
A biologist is conducting a plant breeding experiment in which plants can have one of four phenotypes. If these
phenotypes are caused by a simple Mendelian model, the phenotypes should occur in a 9:3:3:1 ratio. She raises
41 plants with the following phenotypes.
phenotype
1
2
3
4
count
20
10
7
4
Should she worry that the simple genetic model doesn’t work for her phenotypes?
> plants <- c( 20, 10, 7, 4 )
> chisq.test( plants, p=c(9/16, 3/16, 3/16, 1/16) )
Chi-squared test for given probabilities
data: plants
X-squared = 1.9702, df = 3, p-value = 0.5786
Things to notice:
• plants <- c( 20, 10, 7, 4 ) is the way to enter this kind of data by hand.
• This time we need to tell R what the null hypothesis is. That’s what p=c(9/16, 3/16, 3/16, 1/16) is
for.
• The Chi-squared distribution is only an approximation to the sampling distribution of our test statistic,
and the approximation is not very good when the expected cell counts are too small.
Our rule of thumb will be this:
To use the chi-squared approximation
• All the expected counts should be at least 1.
• Most (at least 80%) of the expected counts should be at least 5.
Notice that this depends on expected counts, not observed counts.
1
Our expected count for phenotype 4 is only 16
· 41 = 2.562, which means that 75% of our expected counts are
between 1 and 5, so our approximation will be a little crude. (The other expected counts are well in the safe
range.)
When this happens, we can ask R to use the empirical method (just like we did with statTally) instead.
> chisq.test( plants, p= c(9/16, 3/16, 3/16, 1/16), simulate = T )
Chi-squared test for given probabilities with simulated p-value (based on 2000
replicates)
data: plants
X-squared = 1.9702, df = NA, p-value = 0.5932
This doesn’t change our overall conclusion in this case. These data are not inconsistent with the simple genetic
model. (The simulated p-value is not terribly accurate either if we use only 2000 replicates. We could increase
that for more accuracy. If you repeat the code above several times, you will see how the empirical p-value varies
from one computation to the next.)
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
8.4
Goodness of Fit Testing
If we had more data . . .
Interestingly, if we had 10 times as much data in the same proportions, the conclusion would be very different.
> plants <- c( 200, 100, 70, 40 )
> chisq.test( plants, p=c(9/16, 3/16, 3/16, 1/16) )
Chi-squared test for given probabilities
data: plants
X-squared = 19.7019, df = 3, p-value = 0.0001957
8.1.3
When there are only two categories
We can redo our toads example using a goodness of fit test instead of the binomial test.
> binom.test(14,18)
Exact binomial test
data: 14 and 18
number of successes = 14, number of trials = 18, p-value = 0.03088
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.5236272 0.9359080
sample estimates:
probability of success
0.7777778
> chisq.test(c(14,4))
Chi-squared test for given probabilities
data: c(14, 4)
X-squared = 5.5556, df = 1, p-value = 0.01842
Both of these test the same hypotheses:
• H0 : pL = pR = 0.5
• Ha : pL 6= pR
where pL and pR are proportions of right and left-handed toads in the population).
Although both tests test the same hypotheses and give similar p-values (in this example), the binomial test is
generally used because
• The binomial test is exact for all sample sizes while the Chi-squared test is only approximate, and the
approximation is poor when sample sizes are small.
• Later we will learn about the confidence interval associated with the binomial test. This will be another
advantages of the binomial test over the Chi-squared test.
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Goodness of Fit Testing
8.2
8.5
Testing Complex Hypotheses
In the examples above, our null hypothesis told us exactly what percentage to expect in each category. Such
hypotheses are called simple hypotheses. But sometimes we want to test complex hypotheses – hypotheses
that do not completely specify all the proportions, but instead express the type of relationship that must exist
among the proportions. Let’s illustrate with an example.
Example 8.5: Number of Boys in 2-Child Families
Q. Does the number of boys in a 2-child family follow a binomial distribution (with some parameter p)?
Data: number of boys in 2444 2-child families.
> boys <- c( "0" = 530, "1" = 1332, "2"= 582); boys
0
1
2
530 1332 582
> n <- sum(boys); n
[1] 2444
> p.hat = ( 1332 + 2 * 582) / ( 2 * (530 + 1332 + 582) ); p.hat
[1] 0.5106383
> p0 <- dbinom( 0:2, 2, p.hat); p0
# binomial proportions
[1] 0.2394749 0.4997737 0.2607515
> expected <- p0 * sum(boys); expected
# expected counts
[1] 585.2766 1221.4468 637.2766
> observed <- boys; observed
0
1
2
530 1332 582
This is a 1 degree of freedom test:
3 cells − 1 − 1 parameter estimated from data
We reduce the degrees of freedom because observed counts will fit expected counts better if we use the data to
calculate the expected counts.
> X.squared <- sum( (boys - expected)^2 / expected ) ; X.squared
[1] 20.02141
> 1 - pchisq( X.squared, df = 1 )
[1] 7.657994e-06
That’s a small p-value, so we reject the null hypothesis: It doesn’t appear that a binomial distribution is a good
model. Why not? Let’s see how the expected and observed counts differ:
> cbind(expected, observed)
expected observed
0 585.2766
530
1 1221.4468
1332
2 637.2766
582
Any theories? How might we check if your theory is correct?
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
8.6
Goodness of Fit Testing
Don’t make this mistake
> chisq.test( boys, p=p0 )
# THIS IS INCORRECT;
Chi-squared test for given probabilities
Wrong degrees of freedom
data: boys
X-squared = 20.0214, df = 2, p-value = 4.492e-05
This would be correct if you were testing that the distribution of boys is Binom(2, 0.5106), but that isn’t what
we are testing. We are testing whether the distribution of boys is Binom(2, p) for some p.
8.3
Reporting Results of Hypothesis Tests
Key ingredients
1. Null and alternative hypotheses
2. Test statistic
3. p-value
4. Sample size
Examples
Results of hypothesis tests are typically reported very succinctly in the literature. But if you know what to look
for, you should be able to find it.
Example 1: Reporting Significance but not p-value
Abstract. Ninety-six F2 .F3 bulked sampled plots of Upland cotton, Gossypium hirsutum L., from
the cross of HS46 × MARCABUCAG8US-1-88, were analyzed with 129 probe/enzyme combinations
resulting in 138 RFLP loci. Of the 84 loci that segregated as codominant, 76 of these fit
a normal 1 : 2 : 1 ratio (non-significant chi square at P = 0.05).
Of the 54 loci that
segregated as dominant genotypes, 50 of these fit a normal 3: 1 ratio (non-significant chi square at
P = 0.05). These 138 loci were analyzed with the MAPMAKER3 EXP program to determine linkage
relationships among them. There were 120 loci arranged into 31 linkage groups. These covered 865
cM, or an estimated 18.6% of the cotton genome. The linkage groups ranged from two to ten loci
each and ranged in size from 0.5 to 107 cM. Eighteen loci were not linked.
Notes:
• “non-significant chi square at P = 0.05” should say “at α = 0.05” (or perhaps P > 0.05).
• The p-values are not listed here because there would be 84 of them. What we know is that 76 were larger
than 0.05 and 8 were less than 0.05. Presumably they will discuss those 8 in the paper.
Source: Zachary W. Shappley, Johnie N. Jenkins, William R. Meredith, Jack C. McCarty, Jr. An RFLP linkage
map of Upland cotton, Gossypium hirsutum L. Theor Appl Genet (1998) 97 : 756–761. http://ddr.nal.usda.
gov/bitstream/10113/2830/1/IND21974273.pdf
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Goodness of Fit Testing
8.7
Example 2: Details not in abstract but are in the paper
Abstract
low male fertility and no measurable female fertility (no
viable seed production). The present research demonInheritance of two mutant foliage types, variegated strates the feasibility of breeding simultaneously for orand purple, was investigated for diploid, triploid, and namental traits and non-invasiveness.
tetraploid tutsan (Hypericum androsaemum). The
fertility of progeny was evaluated by pollen viability
..
tests and reciprocal crosses with diploids, triploids, and
.
tetraploids and germinative capacity of seeds from successful crosses. Segregation ratios were determined for There are many sentences of the following type in the
diploid crosses in reciprocal di-hybrid F1, F2, BCP1, paper:
and BCP2 families and selfed F2s with the parental
phenotypes. F2 tetraploids were derived from induced
autotetraploid F1s. Triploid segregation ratios were determined for crosses between tetraploid F2s and diploid
F1s. Diploid di-hybrid crosses fit the expected
9 : 3 : 3 : 1 ratio for a single, simple recessive
gene for both traits, with no evidence of linkage. A
novel phenotype representing a combination of parental
phenotypes was recovered. Data from backcrosses and
selfing support the recessive model. Both traits behaved as expected at the triploid level; however, at the
tetraploid level the number of variegated progeny increased, with segregation ratios falling between random
chromosome and random chromatid assortment models. We propose the gene symbol var (variegated) and
pl (purple leaf) for the variegated and purple genes,
respectively. Triploid pollen stained moderately well
(41%), but pollen germination was low (6%). Triploid
plants were highly infertile, demonstrating extremely And some of the results are summarized in a table.
Source: Richard T. Olsen, Thomas G. Ranney, Dennis J. Werner, J. AMER. SOC. HORT. SCI. 131(6):725–
730. 2006. Fertility and Inheritance of Variegated and Purple Foliage Across a Polyploid Series in Hypericum
androsaemum L. http://www.ces.ncsu.edu/fletcher/mcilab/publications/olsen-etal-2006c.pdf
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
8.8
Last Modified: April 4, 2011
Goodness of Fit Testing
Math 143 : Spring 2011 : Pruim
Chi-squared for Contingency Tables
9.1
9
Chi-squared for Contingency Tables
Example 9.3: Trematodes
> xtabs(~ eaten + infection.status, Trematodes ) -> fish; fish
infection.status
eaten high light uninfected
no
9
35
49
yes
37
10
1
> chisq.test(fish)
Pearson's Chi-squared test
data: fish
X-squared = 69.7557, df = 2, p-value = 7.124e-16
Some details:
• H0 : The proportion of fish that are eaten is the same for each treatment group (high infection, low
infection, or no infection).
• Ha : The proportion of fish differs among the treatment groups.
• expected count =
(row total) (column total)
grand total
> observed <- c( 9, 35, 49, 37, 10, 1)
> 9 + 35 + 49
# row 1 total
[1] 93
> 37 + 10 + 1
# row 2 total
[1] 48
> 9 + 37
# col 1 total
[1] 46
> 35 + 10
# col 2 total
[1] 45
> 49 + 1
# col 3 total
[1] 50
> 9 + 35 + 49 + 37 + 10 + 1
# grand total
[1] 141
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
9.2
Chi-squared for Contingency Tables
> expected <- c( 93*46/141, 93*45/141, 93*50/141, 48*46/141, 48*45/141, 48*50/141 ); expected
[1] 30.34043 29.68085 32.97872 15.65957 15.31915 17.02128
> X.squared <- sum ( (observed - expected)^2 / expected ) ; X.squared
[1] 69.7557
• expected proportion =
row total column total
·
= (row proportion) (column proportion)
grand total grand total
◦ So H0 can be stated in terms of independent rows and columns.
• This also explains the degrees of freedom. We need to estimate all but one row proportion (since they
sum to 1, once we have all but one we know them all), and all but one column proportion. So
degrees of freedom = (number of rows)(number of columns)
− 1 − (number of rows − 1) − (number of columns − 1)
= (number of rows − 1)(number of columns − 1)
Example: Physicians Health Study
> phs <- cbind(c(104,189),c(10933,10845)); phs
[,1] [,2]
[1,] 104 10933
[2,] 189 10845
> rownames(phs) <- c("aspirin","placebo")
# add row names
> colnames(phs) <- c("heart attack","no heart attack")
# add column names
> phs
heart attack no heart attack
aspirin
104
10933
placebo
189
10845
> chisq.test(phs)
Pearson's Chi-squared test with Yates' continuity correction
data: phs
X-squared = 24.4291, df = 1, p-value = 7.71e-07
Example: Fisher Twin Study
Here is an example that was treated by R. A. Fisher in an article [Fis62] and again later in a book [Fis70].
The data come from a study of same-sex twins where one twin had had a criminal conviction. Two pieces
of information were recorded: whether the sibling had also had a criminal conviction and whether the twins
were monozygotic (identical) twins and dizygotic (nonidentical) twins. Dizygotic twins are no more genetically
related than any other siblings, but monozygotic twins have essentially identical DNA. Twin studies like this
have often used to investigate the effects of “nature vs. nurture”.
Here are the data from the study Fisher analyzed:
Last Modified: April 4, 2011
Convicted
Not convicted
Dizygotic
2
15
Monozygotic
10
3
Math 143 : Spring 2011 : Pruim
Chi-squared for Contingency Tables
9.3
The question is whether this gives evidence for a genetic influence on criminal convictions. If genetics had
nothing to do with convictions, we would expect conviction rates to be the same for monozygotic and dizygotic
twins since other factors (parents, schooling, living conditions, etc.) should be very similar for twins of either
sort. So our null hypothesis is that the conviction rates are the same for monozygotic and dizygotic twins. That
is,
• H0 : p M = p D .
• Ha : pM 6= pD .
> twins <- rbind( c(2,15), c(10,3) ); twins
[,1] [,2]
[1,]
2
15
[2,]
10
3
> rownames(twins) <- c('dizygotic', 'monozygotic')
> colnames(twins) <- c('convicted', 'not convicted')
> twins
convicted not convicted
dizygotic
2
15
monozygotic
10
3
> chisq.test(twins)
Pearson's Chi-squared test with Yates' continuity correction
data: twins
X-squared = 10.4581, df = 1, p-value = 0.001221
We can get R to compute the expected counts for us, too.
> chisq.test(twins)$expected
convicted not convicted
dizygotic
6.8
10.2
monozygotic
5.2
7.8
This is an important check because the chi-squared approximation is not good when expected counts are too
small. Recall our general rule of thumb:
• All expected counts should be at least 1.
• 80% of expected counts should be at least 5.
For a 2 × 2 table, this means all four expected counts should be at least 5.
In our example, we are just on the good side of our rule of thumb. There is an exact test that works only in
the case of a 2 × 2 table (much like the binomial test can be used instead of a goodness of fit test if there are
only two categories). The test is called Fisher’s Exact Test.
> fisher.test(twins)
Fisher's Exact Test for Count Data
data: twins
p-value = 0.0005367
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.003325764 0.363182271
sample estimates:
odds ratio
0.04693661
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
9.4
Chi-squared for Contingency Tables
In this case we see that the approximate p-value from the Chi-squared Test is nearly the same as the exact
p-value from Fisher’s Exact Test. It appears that the conviction rates is higher for monozygotic twins than for
dizygotic twins.
Dealing with Low Expected Counts
What do we do if we have low expected counts?
1. If we have a 2 × 2 table, we can use Fisher’s Exact Test.
2. For goodness of fit testing with only two categories, we can use the binomial test.
3. For goodness of fit testing and larger 2-way tables, we can sometimes combine some of the cells to get
larger expected counts.
4. Low expected counts often arise when analysing small data sets. Sometimes the only good solution is to
design a larger study so there is more data to analyse.
Example 9.4: Cows and Bats
cows in estrous
cows not in estrous
row totals
15
6
21
7
322
329
22
328
350
bitten
not bitten
column totals
> cows <- cbind(c(15,6),c(7,322)); cows
[,1] [,2]
[1,]
15
7
[2,]
6 322
> rownames(cows) <- c("bitten","not bitten")
# add row names
> colnames(cows) <- c("in estrous","not in estrous")
# add column names
> cows
in estrous not in estrous
bitten
15
7
not bitten
6
322
> chisq.test(cows)
Pearson's Chi-squared test with Yates' continuity correction
data: cows
X-squared = 149.3906, df = 1, p-value < 2.2e-16
This time one of our expected counts is too low for our rule of thumb:
> chisq.test(cows)$expected
in estrous not in estrous
bitten
1.32
20.68
not bitten
19.68
308.32
So let’s use Fisher’s Exact Test instead:
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Chi-squared for Contingency Tables
9.5
> fisher.test(cows)
Fisher's Exact Test for Count Data
data: cows
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
29.94742 457.26860
sample estimates:
odds ratio
108.3894
In this example the conclusion is the same: extremely strong evidence against the null hypothesis.
Odds and Odds Ratio
odds = O =
p
1−p
=
pn
(1 − p)n
=
number of successes
number of failures
odds ratio = OR =
=
odds1
=
odds2
p1
1−p1
p2
1−p2
p1
1 − p2
·
1 − p1
p2
We can estimate these quantities using data. When we do, we put hats on everything
estimated odds = Ô =
ˆ =
estimated odds ratio = OR
=
p̂
1 − p̂
Ô1
Ô2
=
p̂1
1−p̂1
p̂2
1−p̂2
p̂1
1 − p̂2
·
1 − p̂1
p̂2
The null hypothesis of Fisher’s Exact Test is that the odds ratio is 1, that is, that the two proportions are equal.
fisher.test() uses a somewhat more complicated method to estimate odds ratio, so the formulas above won’t
exactly match the odds ratio displayed in the output of fisher.test(), but the value will be quite close.
> (2/15) / (10/3)
[1] 0.04
> (10/3) / (2/15)
[1] 25
# simple estimate of odds ratio
# simple estimate of odds ratio with roles reversed
Notice that when p1 and p1 are small, then 1 − p1 and 1 − p2 are both close to 1, so
p1
odds ratio ≈
= relative risk
p2
Relative risk is easier to understand, but odds ratio has nicer statistical and mathematical properties.
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
9.6
Last Modified: April 4, 2011
Chi-squared for Contingency Tables
Math 143 : Spring 2011 : Pruim
Normal Distributions
10.1
10
Normal Distributions
10.1
The Family of Normal Distributions
Normal Distributions
• are symmetric, unimodel, and bell-shaped
• can have any combination of mean and standard deviation (as long as the standard deviation is positive)
• satisfy the 68–95–99.7 rule:
Approximately 68% of any normal distribution lies within 1 standard deviation of the mean.
Approximately 95% of any normal distribution lies within 2 standard deviations of the mean.
Approximately 99.7% of any normal distribution lies within 3 standard deviations of the mean.
0.683
−4
−2
0
0.954
2
4
standard deviations from the mean
−4
−2
0
0.997
2
4
standard deviations from the mean
−4
−2
0
2
4
standard deviations from the mean
Many naturally occurring distributions are approximately normally distributed.
Normal distributions are also an important part of statistical inference.
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
10.2
Normal Distributions
Example: SAT scores
SAT scores are approximately normally distributed with a mean of 500 and a standard deviation of 100, so
• approximately 68% of scores are between 400 and 600
• approximately 95% of scores are between 300 and 700
• approximately 2.5% of scores are above 700
• approximately 81.5% (34% + 47.5%) of scores are between 400 and 700
We can obtain arbitrary probabilities using pnorm():
> pnorm(650, 500, 100)
[1] 0.9331928
> 1-pnorm(650, 500, 100)
[1] 0.0668072
> pnorm(650,500,100) - pnorm(400,500,100)
[1] 0.7745375
# proportion who score below 650 on SAT
# proportion who score above 650 on SAT
# proportion who score between 400 and 650 on SAT
0.933
200
0.067
400
600
800
200
SAT score
400
600
800
SAT score
0.775
200
400
600
800
SAT score
Z-scores (standardized scores)
Because probabilities in a normal distribution depend only on the number of standard deviations above and
below the mean, it is useful to define Z-scores (also called standardized scores) as follows:
Z-score =
value − mean
standard deviation
If we know the population mean and standard deviation, we can plug those in. When we do not, we will use
the mean and standard deviation of a random sample instead.
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Normal Distributions
10.2
10.3
The Central Limit Theorem
The importance of the normal distributions comes from the
The Central Limit Theorem. The sampling distribution of a sample mean or sample sum of a large
enough random sample is approximately normally distributed. In fact,
• Y ≈ Norm(µ, √σn )
P
√
•
Y ≈ Norm(µ, σ n)
The approximations are better
• when the population distribution is similar to a normal distribution,
• when the sample size is larger,
and are exact when the population distribution is normal.
The Central Limit Theorem is illustrated nicely in an applet available from the Rice Virtual Laboratory in
Statistics (http://onlinestatbook.com/stat_sim/sampling_dist/index.html).
10.2.1
Normal Approximation to the Binomial
Since a binomial random variable is a sum of 0’s and 1’s from each trial, the Central Limit Theorem implies
that
p
Binom(n, p) ≈ Norm np, np(1 − p)
Our rule of thumb for the quality of this approximation is that it is good enough for our purposes when np ≥ 5
and n(1 − p) ≥ 5 – that is, when we would expect at least 5 success and at least 5 failures. (Many books require
at least 10 of each. The true story is that the approximation gets better and better as the smaller of np and
n(1 − p) gets larger.
Here is a comparison using Binom(50, .20) ≈ Norm(10, 2.828):
> pbinom(12,50,.20) - pbinom(7,50,.20)
# P( X between 8 and 12, inclusive )
[1] 0.6235332
> pnorm(12,10,sqrt(50 * 0.20 * 0.80)) - pnorm(8,10,sqrt(50 * 0.20 * 0.80))
[1] 0.5204999
That’s not too great. But wait, we can do better . . .
Improving the Approximation
We can make the approximation even better if we make the so-called continuity correction. Binomial values
are counts, so they are always integers. Normal values can be any decimal amount. If we think of a binomial
count x as if it were the normal range from x − 12 to x + 12 , the approximations are much better.
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
10.4
Normal Distributions
> pbinom(12,50,.20) - pbinom(7,50,.20)
[1] 0.6235332
# P( X between 8 and 12, inclusive )
These approximations are much closer.
10.2.2
The Proportion Test
Since the binomial distribution is approximately normal, we can derive an approximate version of the binomial
test using the normal approximation. This is described on pages 248–250.
Example 10.7: The only good bug is a dead bug
31 of 41 spiders presented a choice of a live cricket and dead cricket chose the dead cricket. Is this significant
evidence that these spiders prefer dead crickets?
1. H0 : p = 0.5; Ha : p 6= 0.5
2. test statistic: x = 31 or p̂ =
31
41
= 0.756 (n = 41)
3. p-value
The p-value is 2 Pr(X ≥ 31) (doubling is exact in this case because the null hypothesis is that p = 0.5).
If the null hypothesis is true, then
X ∼ Binom(41, 0.50) ≈ Norm(41 · .5,
p
41(.5)(.5)) = Norm(20.5, 3.2) ,
so an approximate p-value is given by
p-value = 2 * (1 - pnorm( 31, mean=20.5, sd=sqrt(41*.5*.5)) ) = 0.001039
We could do even better if we use the continuity correction:
p-value = 2 * (1 - pnorm( 30.5, mean=20.5, sd=sqrt(41*.5*.5)) ) = 0.001787
Alternatively, we can use the fact that
X/n = p̂ ≈ Norm(0.50,
p
(0.5)(0.5)/41 = Norm(0.50, 0.0781)
so
p-value = 2 * (1 - pnorm( 31/41, mean=0.5, sd=sqrt(.5*.5/41)) ) = 0.001039
or with the continuity correction
p-value = 2 * (1 - pnorm( 30.5/41, mean=0.5, sd=sqrt(.5*.5/41)) ) = 0.0018
4. Since the p-value is small we can reject the null hypothesis. It appears that these spiders prefer dead
crickets.
Note: Presumably this study was designed carefully enough to test preference of the spiders and not ease of
capture (dead crickets are easier to catch than ones that can move).
As usual, R will do all the work for us if we ask it to:
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Normal Distributions
10.5
> prop.test(31,41)
1-sample proportions test with continuity correction
data: 31 out of 41, null probability 0.5
X-squared = 9.7561, df = 1, p-value = 0.001787
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5935581 0.8709221
sample estimates:
p
0.7560976
We can compare the approximate p-value to the exact p-value of the binomial test
> binom.test(31,41)
Exact binomial test
data: 31 and 41
number of successes = 31, number of trials = 41, p-value = 0.001450
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.5969538 0.8763675
sample estimates:
probability of success
0.7560976
You many wonder why have an approximate method when there is an exact method. This is a good question.
• Since typing binom.test() is no harder than typing prop.test(), we could just use binom.test() all
the time.
• You will see the approximate version (often called the z-test) frequently in literature, largely for historical
reasons. (Before computers were readily available, binomial probabilities were harder to calculate.)
• The proportion test can be quickly estimated by hand using the z statistic:
x − np0
x/n − p0
p̂ − p0
z=p
=p
=p
np0 (1 − p0 )
p0 (1 − p0 )/n
p0 (1 − p0 )/n
which will have a standard normal distribution Norm(0, 1). We can use our 68-95-99.7 rule to estimate
the p-value. (We can easily tell if it is above or below 0.05, for example.)
• Some people like that the proportion test is naturally expressed in terms of proportions rather than counts.
(It can be done either way; binomial distributions only deal with counts.)
The Proportion Test vs. the Binomial Test
• The binomial test is exact, the proportion test is approximate, but the approximation is good if the sample
size is large enough.
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
10.6
10.3
Normal Distributions
Confidence Intervals
Note: Much of this information is from Section 7.3
The sampling distribution for a sample proportion (assuming samples of size n) is
!
r
p(1 − p)
p̂ ∼ Norm p,
n
The standard deviation of a sampling distribution is called the standard error (abbreviated SE), so in this case
we have
r
p(1 − p)
SE =
n
This means that for 95% of samples, our estimated proportion p̂ will be within (1.96)SE of p. This tells us how
well p̂ approximates p. It says that usually p̂ is between p − 1.96SE and p + 1.96SE.
Key idea: If p̂ is close to p, then p is close to p̂.
That is, for most samples p is between p̂ − 1.96SE and p̂ + 1.96SE.
Our only remaining difficulty is that we don’t know SE because it depends on p, which we don’t know. We will
estimate SE as follows.
r
p̂(1 − p̂)
SE ≈
n
q
q
p̂(1−p̂)
p̂)
We will call the interval between p̂ − 1.96
and p̂ + 1.96 p̂(1−
a 95% confidence interval for p. Often
n
n
we will write this very succinctly as
p̂ ± 1.96SE
(Note: we are being a little bit lazy and using SE for the estimated standard error. Some books only refer
to these estimates as the standard error, further adding to the notational confusion. Fortunately, the idea is
simpler than the notation.)
Example
Q. Suppose we flip a coin 100 times and obtain 42 heads. What is a 95% confidence interval for p?
q
A. SE = .42·.58
100 = 0.049, so our confidence interval is .42 ± 1.96 · 0.049 or .42 ± 0.097.
R computes confidence intervals for us when we use prop.test() and binom.test().
> prop.test(42,100)
1-sample proportions test with continuity correction
data: 42 out of 100, null probability 0.5
X-squared = 2.25, df = 1, p-value = 0.1336
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.3233236 0.5228954
sample estimates:
p
0.42
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Normal Distributions
10.7
> binom.test(42,100)
Exact binomial test
data: 42 and 100
number of successes = 42, number of trials = 100, p-value = 0.1332
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.3219855 0.5228808
sample estimates:
probability of success
0.42
The method from prob.test() is slightly better because it uses the continuity correction to give better approximations of the binomial to the normal. binom.test() uses the binomial distribution directly. So each of these
methods will give slightly different results from our hand calculations. Our preferred computational method will
be binom.test() since it avoids the normal approximation to the binomial distribution.
The book points out that our hand calculations can be improved (especially for small samples) if we add in two
success and two failures before computing the confidence interval. Doing so gives an interval that is very close
to the one computed by binom.test().
Other confidence levels
We can use any confidence level we like if we replace 1.96 with the appropriate critical value z∗ . Our formula
then becomes
p̂ ± z∗ SE
which follows a pattern we will see again in other contexts:
estimate from data ± (critical value)(standard error) .
The expression z∗ SE is called the margin of error of the confidence interval.
Example
Q. Suppose we flip a coin 100 times and obtain 42 heads. What is a 99% confidence interval for p?
A. The only thing that changes is our value of z∗ which must now be selected so that 99% of a normal distribution
is between z∗ standard deviations below the mean and z∗ standard deviations above the mean. R can calculate
this value for us:
> qnorm(.995)
[1] 2.575829
# .995 because we need .99 + .005 BELOW z*
Now we proceed as before. SE =
q
.42·.58
100
= 0.049, so our confidence interval is .42 ± 2.576 · 0.049 or .42 ± 0.127.
R computes these confidence intervals for us when we use prop.test() and binom.test() with the extra
argument conf.level = 0.99.
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
10.8
Normal Distributions
> prop.test(42,100,conf.level=0.99)
1-sample proportions test with continuity correction
data: 42 out of 100, null probability 0.5
X-squared = 2.25, df = 1, p-value = 0.1336
alternative hypothesis: true p is not equal to 0.5
99 percent confidence interval:
0.2972701 0.5530641
sample estimates:
p
0.42
> binom.test(42,100,conf.level=0.99)
Exact binomial test
data: 42 and 100
number of successes = 42, number of trials = 100, p-value = 0.1332
alternative hypothesis: true probability of success is not equal to 0.5
99 percent confidence interval:
0.2943707 0.5534544
sample estimates:
probability of success
0.42
Example: Determining Sample Size
Q. You have been asked to conduct a public opinion survey to determine what percentage of the residents of a
city are in favor of the mayor’s new deficit reduction efforts. You need to have a margin of error of ±3%. How
large must your sample be? (Assume a 95% confidence interval.)
q
p̂)
A. The margin of error will be 1.96 p̂(1−
, so our method will be to make a reasonable guess about what p̂
n
will be, and then determine how large n must be to make the margin of error small enough.
Since SE is largest when p̂ = 0.50, one safe estimate for p̂ is 0.50, and that is what we will use unless we are
quite sure that p is close to 0 or 1. (In those latter cases, we will make a best guess, erring on the side of being
too close to 0.50 to avoid doing the work of getting a sample much larger than we need.)
q
We can solve 0.03 = 0.5·0.5
algebraically, or we can play a simple game of “higher and lower” until we get our
n
q
and use the graph to estimate n. Here’s the “higher/lower” guessing
value of n. (We could also graph 0.5·0.5
n
method.
> 1.96 * sqrt(.5
[1] 0.049
> 1.96 * sqrt(.5
[1] 0.03464823
> 1.96 * sqrt(.5
[1] 0.02829016
> 1.96 * sqrt(.5
[1] 0.03099032
> 1.96 * sqrt(.5
[1] 0.02954811
> 1.96 * sqrt(.5
[1] 0.03024346
* .5 / 400)
* .5 / 800)
* .5 / 1200)
* .5 / 1000)
* .5 / 1100)
* .5 / 1050)
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Normal Distributions
10.9
So we need a sample size a bit larger than 1050, but not as large as 1100. We can continue this process to get
a tighter estimate if we like:
> 1.96 * sqrt(.5
[1] 0.02988972
> 1.96 * sqrt(.5
[1] 0.03002972
> 1.96 * sqrt(.5
[1] 0.02995947
> 1.96 * sqrt(.5
[1] 0.02998751
> 1.96 * sqrt(.5
[1] 0.03000156
* .5 / 1075)
* .5 / 1065)
* .5 / 1070)
* .5 / 1068)
* .5 / 1067)
We see that a sample of size 1068 is guaranteed to give us a margin of error of at most 3%.
It isn’t really important to get this down to the nearest whole number, however. Our goal is know roughly what
size sample we need (tens? hundreds? thousands? tens of thousands?). Knowing the answer to 2 significant
figures is usually sufficient for planning purposes.
Side note: The R function uniroot can automate this guessing for us, but it requires a bit of programming to
use it:
> # uniroot finds when a function is 0, so we need to build such a function
> f <- function(n) { 1.96 * sqrt(.5 * .5/ n) - 0.03 }
> # uniroot needs a function and a lower bound and upper bound to search between
> uniroot( f, c(1,50000) )$root
[1] 1067.111
Example
Q. How would things change in the previous problem if
1. We wanted a 98% confidence interval instead of a 95% confidence interval?
2. We wanted a 95% confidence interval with a margin of error at most 0.5%?
3. We wanted a 95% confidence interval with a margin of error at most 0.5% and we are pretty sure that
p < 10%?
A. We’ll use uniroot() here. You should use the “higher-lower” method or algebra and compare your results.
> f1 <- function(n) { qnorm(.99) * sqrt(.5 * .5/ n) - 0.03 }
> uniroot(f1, c(1,50000))$root
[1] 1503.304
> f2 <- function(n) { qnorm(.975) * sqrt(.5 * .5/ n) - 0.005 }
> uniroot(f2, c(1,50000))$root
[1] 38414.59
> f3 <- function(n) { qnorm(.99) * sqrt(.1 * .9/ n) - 0.005 }
> uniroot(f3, c(1,50000))$root
[1] 19482.82
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
10.10
Last Modified: April 4, 2011
Normal Distributions
Math 143 : Spring 2011 : Pruim
Inference for the Mean of a Population
11.1
11
Inference for the Mean of a Population
The Central Limit Theorem tells us that if we compute the sample mean from a random sample taken from a
population with mean µ and standard deviation σ, then for large enough samples,
σ
Y ≈ Norm µ, √
n
All the results in this section are based on this distribution.
11.1
Hypothesis Tests
11.1.1
T Distributions
The test statistic
z=
where
y − µ0
SE
σ
SE = √
n
has a standard normal distribution, so if we knew σ we would be all set.
Since we don’t know σ we will use the standard deviation of our sample instead:
t=
where
y − µ0
SE
s
SE = √ .
n
This does not have a normal distribution, it has a t-distribution with n−1 degrees of freedom. The t-distributions
look like shorter, fatter versions of normal distributions. They become more and more like normal distributions
as the sample size gets larger.
11.1.2
Example: Is the Mean Human Body Temperature Really 98.6?
Let’s answer this question by following or usual 4 step method based on the following sample.
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
11.2
Inference for the Mean of a Population
> HumanBodyTemp$temp
[1] 98.4 98.6 97.8
[15] 98.4 99.1 98.4
98.8
97.6
97.9
97.4
99.0
97.5
98.2
97.5
98.8
98.8
98.8 99.0
98.6 100.0
98.0
98.4
99.2
99.5
99.4
1. State the hypotheses
• H0 : µ = 98.6
• Ha : µ 6= 98.6
2. Compute the test statistic.
First we compute the mean and standard deviation of our sample and the (estimated) standard error of
the mean:
> m <- mean(HumanBodyTemp$temp); m
[1] 98.524
> s <- sd(HumanBodyTemp$temp); s
[1] 0.6777905
> SE <- s / sqrt(25); SE
[1] 0.1355581
Now we can compute our test statistic:
t=
> t <- (m- 98.6) / SE;
[1] -0.5606452
y − µ0
98.52 − 98.6
=
= −0.561 .
SE
0.136
t
3. Compute the p-value
We need to compare our value of t with a T(24)-distribution:
> pt(t, df=24)
[1] 0.2901181
> 2 * (pt(-abs(t), df=24) )
[1] 0.5802363
> 2 * (1 - pt(abs(t), df=24) )
[1] 0.5802363
# tail distribution
# two-sided test p-value
# two-sided test p-value another way
4. Draw a conclusion.
These data are consistent with a population mean of 98.6 degrees F.
We can automate the entire calculation using t.test():
> t.test(HumanBodyTemp$temp, mu=98.6)
One Sample t-test
data: HumanBodyTemp$temp
t = -0.5606, df = 24, p-value = 0.5802
alternative hypothesis: true mean is not equal to 98.6
95 percent confidence interval:
98.24422 98.80378
sample estimates:
mean of x
98.524
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Inference for the Mean of a Population
11.2
11.3
Confidence Intervals
Confidence intervals can be computed using the formula
y ± t∗ SE
where
s
SE = √
n
Notice that this has the familiar form:
estimate from data ± (critical value)(standard error) .
A confidence interval appears in the output of t.test() above, but we can also calculate it manually:
> t.star <- qt(.975, df=24); t.star
[1] 2.063899
> me <- t.star * SE; me
[1] 0.2797782
> c(m - me, m + me)
[1] 98.24422 98.80378
11.3
# margin of error
# interval
Overview of One-Sample Methods
Usually we will use the following notation (subscripts indicate multiple populations/samples):
• parameters (population)
◦ p, proportion (of a categorical variable)
◦ µ, mean (of quantitative variable)
◦ σ, standard deviation
• statistics (sample)
◦ n, sample size
◦ X, count (of a categorical variable)
◦ p̂ =
X
n,
◦ p̃ =
X+2
n+4 ,
proportion (of a categorical variable)
plus-4 proportion (of a categorical variable)
◦ x̄, mean (of quantitative variable)
◦ s, standard deviation
• sampling distribution
◦ SE, standard error (standard deviation of the sampling procedure, sometimes estimated)
◦ µp̂ , µx̄ , mean of sampling procedure (for when determining p̂ and x̄, respectively)
The procedures involving the z (normal) and t distributions are all very similar.
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
11.4
Inference for the Mean of a Population
• To do a hypothesis test, compute the test statistic as
t or z =
data value − hypothesis value
,
SE
and compare with the appropriate distribution (using tables or a computer).
• To compute a confidence interval, determine the critical value for the desired level of confidence (z ∗ or
t∗ ), then the confidence interval is
data value ± (critical value)(SE) .
Note: Each of these procedures is only valid when certain assumptions are met. In particular, remember
• The sample must be an SRS (or something very close to it).
• The sample sizes must be “large enough.”
• Procedures involving means are generally sensitive to outliers, because outliers can have a large effect on
the mean (and standard deviation).
• Procedures involving the t statistic generally assume a population that is normal or nearly normal. These
procedures are sensitive to to skewness (for small sample sizes) and to outliers (for any size).
• You can’t do good statistics from bad data. The margin of error in a confidence interval, for example,
only accounts for random sampling errors, not for errors in experimental design.
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Comparing Two Means
12.1
12
Comparing Two Means
12.1
The Distinction Between Paired T and 2-sample T
• A paired t-test or interval compares two quantitative variables measure on the same observational units.
◦ Data: two quantitative variables
◦ Example: each swimmer swims in both water and syrup and we compare the speeds. We record
speed in water (quantitative) and speed in syrup (quantitative) for each swimmer.
• A paired t-test is really just a 1-sample t-test after we take our two measurements and combine them into
one, typically by taking the difference (most common) or the ratio.
• A 2-sample t-test or interval looks at one quantitative variable in two populations.
◦ Data: one quantitative variable and one categorical variable (telling which group the subject is in).
◦ Example: Some swimmers swim in water, some swim in syrup. We record the speed (quantitative)
and what they swam in (categorical) for each swimmer.
An advantage of the paired design (when it is available) is the potential to reduce the amount of variability by
comparing individuals to themselves rather than comparing individuals to individuals in an independent sample.
Paired tests can also be used for “matched pairs designs” where the observational units are pairs of people,
animals, etc. For example, we could use a matched pairs design with married couples as observational units.
We would have one quantitative variable for the husband and one for the wife.
In a 2-sample situation, we need two independent samples (or as close as we can get to that).
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
12.2
Comparing Two Means
12.2
Comparing Two Means in R
12.2.1
Using R for Paired T
> react <- read.csv('http://www.calvin.edu/~rpruim/data/calvin/react.csv')
> # Is there improvement from first try to second try with the dominant hand?
> t.test(react$dom1 - react$dom2)
One Sample t-test
data: react$dom1 - react$dom2
t = -1.6024, df = 19, p-value = 0.1256
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-2.3062108 0.3062108
sample estimates:
mean of x
-1
> # This gives slightly nicer labeling of the otherwise identical output
> t.test(react$dom1, react$dom2, paired=TRUE)
Paired t-test
data: react$dom1 and react$dom2
t = -1.6024, df = 19, p-value = 0.1256
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.3062108 0.3062108
sample estimates:
mean of the differences
-1
12.2.2
Using R for 2-Sample T
> # Is there improvement from first try to second try?
> t.test(dom1 ~ Sex.of.Subject, react)
Welch Two Sample t-test
data: dom1 by Sex.of.Subject
t = -0.5098, df = 17.998, p-value = 0.6164
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-4.096683 2.496683
sample estimates:
mean in group female
mean in group male
30.0
30.8
> # Is the first try with the dominant hand better if you get to do the other hand first?
> t.test(dom1 ~ Hand.Pattern, react)
Welch Two Sample t-test
data: dom1 by Hand.Pattern
t = -1.0936, df = 16.521, p-value = 0.2898
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-4.642056 1.477221
sample estimates:
mean in group dominant first
mean in group other first
29.84615
31.42857
Last Modified: April 4, 2011
Math 143 : Spring 2011 : Pruim
Comparing Two Means
12.3
12.3
Formulas for 2-Sample T
Math 143 : Spring 2011 : Pruim
Last Modified: April 4, 2011
12.4
Last Modified: April 4, 2011
Comparing Two Means
Math 143 : Spring 2011 : Pruim
Bibliography
[Fis62] R. A. Fisher, Confidence limits for a cross-product ratio, Australian Journal of Statistics 4 (1962).
[Fis70]
, Statistical methods for research workers, 14th ed., Oliver & Boyd, 1970.
5