Download Inference for 1 Sample - SFU Mathematics and Statistics Web Server

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Statistical inference wikipedia , lookup

Transcript
STAT 270
Inference for a Single Sample
Richard Lockhart
Simon Fraser University
Spring 2015 — Surrey
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
1 / 51
Purposes of These Notes
Describe point estimation, interval estimation, and hypothesis testing.
Describe a random sample.
Define a confidence interval and its level.
Derive some confidence intervals in 1 sample problems: means and
proportions.
Discuss difference between Fisher and Neyman-Pearson.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
2 / 51
Purposes Continued
Describe ingredients of Neyman-Pearson hypothesis testing.
Define null and alternative hypotheses.
Define a test statistic, rejection region, level.
Define a Type I and Type II errors.
Differentiate between one-tailed and two-tailed problems.
Specific formulas for hypotheses about means and proportions.
Define a P-value.
Understand technical meaning of statistically significant.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
3 / 51
List of Statistical Problems
Name most likely value of parameters: point estimation.
Name range of likely values: confidence interval.
Assess evidence against hypothesis about parameters: hypothesis
testing.
Make forecasts, do interpolation.
And more.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
4 / 51
Point Estimation
Estimate: number which is our best guess for parameter value.
Estimator: rule for computing estimate from data.
An estimator is a random variable which is a function of the data.
Example. Newcomb& Michelson measured speed of light in 1880s.
Made 66 measurements of time taken by light to travel 7.44373 km.
Measured values are X1 , X2 , . . . , Xn with n = 66.
Use lower case letters for observed values.
First measurement was 24.828 millionths of a second.
Convert measurement to speed of light.
x1 = 109 · 7.44373/24.828 = 2.998119 × 108 m/s.
x2 = 2.998361 × 108 m/s.
Point estimate of speed of light is 2.998336×108 m/s.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
5 / 51
Estimators
We were using the rule: average the data.
So our estimator was
X̄ =
X1 + · · · + Xn
.
n
Model for measurement error.
Several parts: X1 , . . . , Xn independent and identically distributed.
Let µ = E(Xi ) be the population mean.
Long run average measurement.
Population SD is σ.
Speed of light is c — standard notation.
Relate µ to c:
µ = c + bias
Often assume bias is 0.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
6 / 51
Newcomb data
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
7 / 51
Point Estimation
Have data and model for population.
Model describes population in terms of some parameters.
Binomial(n, α) model: α is a parameter.
Sample from a N(µ, σ 2 ) model. Parameters are µ and σ.
Sample from the Gamma density
1
f (x; α, β) =
βΓ(α)
α−1
x
exp(−x/β)
β
x > 0.
Parameters are α and β.
Generic notation: θ.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
8 / 51
Standard Errors
Estimates should always be accompanied by some assessment of their
likely accuracy.
For unbiased estimators with approximately normal sampling
distributions we use the Standard Error.
The SE of an estimator θ̂ of θ is
q
SE = Var(θ̂).
That is: Standard Error of an estimator is another name for its SD.
The standard error of α̂ in the Binomial(n, α) problem is
p
p(1 − p)
√
n
The SE of X̄ is
σ
√ .
n
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
9 / 51
Estimated Standard Errors
What accompanies our point estimate is a number, not a formula.
The SE is usually a formula with unknown parameters in it.
We estimate the SE by plugging in estimates of the parameters.
p
The SE for α̂ is α(1 − α)/n so Estimated SE is
p
α̂(1 − α̂)
√
.
n
And you plug in data to get a number to put in your report.
We use Standard Errors in Confidence Intervals.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
10 / 51
Confidence Interval Definition
A level β confidence interval for a parameter θ is the interval [L, U]
between two statistics L and U such that
P(L ≤ θ ≤ U) ≥ β
for all possible parameter values.
We prefer to replace ≥ by = or ≈.
We use CIs by:
◮
◮
◮
Deciding how to do data analysis before gathering data (decide on
formulas for L and U before getting data).
Get data; compute observed values of L and U, say l and u.
Say ‘I am 100β% confident that θ is the interval [l, u]’.
1 − β is the error rate or non-coverage rate.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
11 / 51
Populations and Samples
Meaning of a sample from a population.
Population is group we want to find out about.
Can be real: all Canadian adults of working age.
Can be ‘conceptual’: all possible outcomes of some experiment.
Populations often thought of as populations of numbers.
Conceptual populations often described by probability density or pmf.
Examples: heights of adults. Think of population as being normally
distributed with mean µ and sd σ.
Example: repeatedly measure speed of light in a vacuum. Each
measurement is ‘truth’ plus ‘measurement error’. Population of errors
describe by density: N(0, σ 2 ) perhaps.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
12 / 51
Populations and Samples
Sample is part of the group for which data is obtained.
Use n for number of items sampled.
Call it a “single sample” problem if we measure 1 number for each
item sampled.
Call measurements X1 , . . . , Xn .
Random sampling: fixed number of members of group selected by
random mechanism playing no favourites.
With replacment: pick one at a time. On i th selection each member
of population has same chance of being drawn, even if that member
has been picked before.
Usual model for conceptual populations.
Without replacment: pick one at a time. On i th selection each
member of population who has not been drawn yet has same chance
of being drawn.
Common model (sampling method) for real populations.
Neither is usual selection method in real surveys.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
13 / 51
Simplest Derivation of Confidence Interval
Mathematical model for a single sample: X1 , . . . , Xn are independent
and identically distributed. Write ‘iid’.
Simplest populations to describe – approximately normal, like heights.
Suppose X1 , . . . , Xn are independent N(µ, σ 2 ).
Suppose (quite unrealistically) that σ is known.
I now show you a 95% confidence interval for µ, based on the data.
Consider the random variable
X̄ − µ
√ .
Z =
σ/ n
Then, regardless of what µ is, Z has a standard normal distribution.
So
P(a ≤ Z ≤ b)
does not depend on µ.
No matter what µ is
P(−1.96 ≤ Z ≤ 1.96) =
Z
1
.96φ(z) dz = 0.95.
−1.96
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
14 / 51
The Confidence Interval
The event −1.96 ≤ Z ≤ 1.96 can be rewritten in a number of ways.
It is the event
X̄ − µ
√ ≤ 1.96.
−1.96 ≤
σ/ n
√
Multiply by σ/ n (which is positive):
σ
σ
−1.96 √ ≤ X̄ − µ ≤ 1.96 √
n
n
Notice this is still the event.
Rearrange second inequality:
σ
L ≡ X̄ − 1.96 √ ≤ µ
n
Rearrange first inequality:
σ
µ ≤ X̄ + 1.96 √ ≡ U
n
So no matter what µ is:
P(L ≤ µ ≤ U) = 0.95
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
15 / 51
An example with data
Simon Newcomb made 66 measurements of time taken by light to
travel 7.44373 km.
I round off a bit from real data.
Convert to list of 66 speeds.
Sample mean is 299,833,533 m/s.
Temporarily assume σ = 130, 000 m/s is known.
95% confidence interval is
299, 833, 553−1.96×
130, 000
130, 000
√
to 299, 833, 553+1.96× √
m/s.
66
66
We say we are 95% confident that the speed of light is between
299,802,189 and 299,864,917 m/s.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
16 / 51
Caveats and improvements
More digits than is wise but 6 leading digits worth reporting.
The quantity
130, 000
√
m/s
66
is called the standard error of the sample mean.
Pretty well everything is an approximation so many data analysts
round 1.96 to 2.
We are only pretending we know σ.
Usually we have to use the data to tell us about σ as well as about µ.
Notation: define upper α critical point of normal by:
P(N(0, 1) > zα ) = α.
So z0.025 = 1.96.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
17 / 51
The role of normality
We assumed initially that the population we are sampling is, itself
normally distributed.
But our basic probability was:
Z 1.96
X̄ − µ
√ ≤ 1.96 =
P −1.96 ≤
φ(z) dz = 0.95.
σ/ n
−1.96
Accuracy depends on sampling distribution of X̄ .
Central limit theorem says: if n large enough this is normal for
(nearly) any population distribution.
More skewness means larger n needed.
Heavy tails mean larger n needed.
We often use rule of thumb: n ≥ 30.
Message: use same formula if n large:
σ
X̄ ± zα/2 √ .
n
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
18 / 51
Unknown SD, lots of data
Actually Newcomb did not know σ at all.
He measured s, the SD of his 66 measurements.
In fact s = 130, 026 m/s.
When n is large s will be close to σ so
X̄ − µ
X̄ − µ
√ ≈ √ .
σ/ n
s/ n
So just replace σ by s in confidence interval.
We are 90% confident that the speed of light is in the range
130, 026
130, 026
to 299, 833, 553 + 1.645 × √
.
299, 833, 553 − 1.645 × √
66
66
The Estimated Standard Error is
130, 026
√
66
It estimates the Standard Deviation of X̄ .
Notice use of z0.05 = 1.645 not z0.025 = 1.96.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
19 / 51
Small samples – Student’s t distribution
How good is the approximation?
Estimated SD ofX̄ using same data from which we computed the
mean.
So we should use something a bit bigger than 1.645.
For 66 observations that ”bit bigger” is 1.997.
Correct critical point comes from Student’s t distribution.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
20 / 51
More probability – small samples
When sampling from a normally distributed population we have:
Z x
X̄ − µ
√
P
fT ,n−1 (u)du
≤x =
S/ n
−∞
where fT ,n−1 is Student’s t-density “with n − 1 degrees of freedom”.
To be precise – but this density is not part of this course:
Γ((ν + 1)/2)
(1 + u 2 /ν)−(ν+1)/2
fT ,ν (u) = √
πνΓ(ν/2)
As ν → ∞ this converges to the standard normal density.
Curve looks a lot like normal but heavier tails.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
21 / 51
Specific scientific settings
Specific settings have specific formulas for Estimated SE.
Scenario 1: sample from normal population, σ (population SD)
known, CI for population mean, µ.
Interval (already done) is
√
X̄ ± zα/2 σ/ n
Scenario 2: sample from general population, σ (population SD)
unknown, sample size n large, CI for population mean, µ.
Interval (already done) is
√
X̄ ± zα/2 s/ n
Scenario 3: sample from normal population, σ (population SD)
unknown, sample size n anything, CI for population mean, µ.
Interval is
√
X̄ ± tα/2,n−1 s/ n
Multipliers tα/2,n−1 from other table at back of text.
Statistical packages always do Scenario 3 arithmetic.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
22 / 51
Confidence intervals for proportions
Common scientific framework
Sequence of Bernoulli trials.
Number n fixed, p is “Success Probability” on each trial.
X is the number of successes.
Goal is a confidence interval for proportions.
Based on Central Limit Theorem.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
23 / 51
Using the CLT
Recall p̂ = X /n and X = X1 + · · · + Xn ; each Xi is Bernoulli.
So p̂ is a sample mean of the Xi .
Population mean is µ = E(Xi ) = p.
Population variance is σ 2 = Var(Xi ) = p(1 − p).
p
√
√
So SE of p̂ is σ/ n = p(1 − p)/ n.
p
√
Estimated SE is usually taken to be p̂(1 − p̂)/ n.
CLT says
p
p̂ − p
p(1 − p)/n
⇒ N(0, 1).
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
24 / 51
Using the CLT 2
Law of large numbers says:
lim p̂ = p
n→∞
So it is also true that
p
Result is:
lim P
n→∞
−zα/2 ≤ p
p̂ − p
⇒ N(0, 1).
p̂(1 − p̂)/n
p̂ − p
p̂(1 − p̂)/n
≤ zα/2
!
=
Z
zα/2
−zα/2
φ(z)dz = 1 − α.
Leads to approximate level 1 − α confidence interval.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
25 / 51
Solving inequalities to get limits
Temporary notation A is the event
−zα/2 ≤ p
p̂ − p
p̂(1 − p̂)/n
≤ zα/2 .
Solve inequalities in A to isolate p: multiply through by SE:
o
n
p
p
A = −zα/2 p̂(1 − p̂)/n ≤ p̂ − p ≤ zα/2 p̂(1 − p̂)/n
Rearrange each individual inequality: right hand gives
p
p̂ − zα/2 p̂(1 − p̂)/n ≤ p.
Similarly for left inequality get
p ≤ p̂ + zα/2
p
p̂(1 − p̂)/n.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
26 / 51
General points
Most essential: the meaning of confidence:
If we analyze 100 data sets and compute 100 (exact) confidence
intervals at the 95% level we expect that some of the 100 intervals
will contain the truth and some won’t.
The expected number which contain the truth is 95.
The number which contain the truth is random.
Rule of thumb: if np > 10 and n(1 − p) > 10 then normal approx is
fine.
You don’t know p but you use p̂ in the rule of thumb.
Text uses 5 instead of 10. That is ok, too.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
27 / 51
A catalogue of confidence intervals
Intervals for population proportions; done earlier.
Intervals for population means.
◮
◮
◮
Samples from Normal populations with known σ.
Samples from Normal populations with unknown σ.
Large samples from more general populations.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
28 / 51
Confidence statements, normal populations
Normal sample, σ known:
P(−z ≤
X̄ − µ
√ ≤ z) = Φ(z) − Φ(−z)
σ/ n
so if we find z so that Φ(z) − Φ(−z) = 1 − α then
√
√
X̄ − zσ/ n to X̄ + zσ/ n
is an exact level 1 − α confidence interval for µ.
Value of z is denoted zα/2 because
P(N(0, 1) > z) = α/2 = P(N(0, 1) < −z)
in this case. We call zγ the upper tail γ critical point.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
29 / 51
Confidence statements, normal populations
Normal sample, σ unknown:
P(−t ≤
so if we find t so that
X̄ − µ
√ ≤ t) =
S/ n
Rt
−t fT ,n−1 (u)du
Z
t
fT ,n−1 (u)du
−t
= 1 − α then
√
√
X̄ − tS/ n to X̄ + tS/ n
is an exact level 1 − α confidence interval for µ.
Value of t is denoted tα/2,n−1 because
P(T > t) = α/2 = P(T < −t)
Again tγ,ν is notation for the upper γ critical point of a Student’s
t-distribution on n − 1 degrees of freedom.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
30 / 51
Confidence statements, large samples, general populations
Sample from population mean µ and unknown SD σ:
Z t
X̄ − µ
√ ≤ t) ≈
P(−t ≤
fT ,n−1 (u)du ≈ Φ(t) − Φ(−t)
S/ n
−t
so
and
√
√
X̄ − tα/2,n−1 S/ n to X̄ + tα/2,n−1 S/ n
√
√
X̄ − zα/2 S/ n to X̄ + zα/2 S/ n
both approximate large sample level 1 − α confidence intervals for µ.
Very rarely: σ is known so replace S by σ and use zα/2 .
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
31 / 51
Confidence statements, large samples, general populations
Books traditionally recommend z for n ≥ 30 or n ≥ 40 or some such
rule of thumb.
BUT I say just use t; software always does and the t approximation is
generally better.
Rule of thumb comes from DARK AGES before computers when
people used the tables in the book.
Those are for statistics exams, nothing else.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
32 / 51
Typical hypothesis testing science questions
New drug for blood pressure. Get 200 patients. Pick 100 at random
to get new drug; others get old.
Choose between two possibilities: drug reduces BP or doesn’t.
Speed of light in vacuum is known. Measure speed of neutrinos. Is
speed equal to speed of light or not?
Are far away galaxies moving away from earth faster than nearby ones
or not?
Is speed of light same in north south and east west directions?
Does some intervention program in prison reduce recidivism or not?
Common feature: choose between two scientific alternatives.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
33 / 51
Methodology
Conduct experiment in which response (BP, speed of neutrinos, two
light speeds, recidivism) is measured.
Formulate statistical models: data are like a sample from a normal
population; number of patients surviving has binomial distribution;
north south speeds and east west speeds like samples from 2
populations.
Phrase the scientific alternatives as alternatives about the parameter
values in the model: mean north south speed equals mean east west
speed OR not; probability of re-offense in treatment group equals
probability of re-offense in control group OR not . . .
Develop a rule to make a choice between two alternatives.
Understand error rates.
Apply rule to data.
Details follow.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
34 / 51
Example 1: Measurement bias
Newcomb makes n = 66 measurements of time for light to travel
7.44373 km.
Modern value for that time is 24.82961 microseconds.
Is Newcomb biased?
Model: each measurement is like draw from a population of possible
measurements. Data is X1 , . . . , Xn sample from population with mean
µ and SD σ.
No bias translates to µ = 24.82961 microseconds.
We say our null hypothesis, H0 , is µ = 24.82961.
Our alternative hypothesis, Ha , becomes µ 6= 24.82961.
H0 is pronounced “H nought” (“H not”).
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
35 / 51
The test statistic
To make the decision we find a test statistic, T , which is function of
data.
It will depend on the number 24.82961 as well.
It should tend to be big if the alternative hypothesis is right.
It should NOT tend to be big if the null hypothesis is right.
We will calculate T and choose alternative if it is “too big”.
First obvious suggestion: T = |X̄ − 24.82961|.
How big is too big? Compare T to variability of X̄ − 24.82961.
√
Estimate that variability using Estimated Standard Error s/ n of X̄ .
So change to
X̄ − µ0 T = √ s/ n
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
36 / 51
How big is too big?
Two big approaches – assess evidence versus make firm decision.
Fisher: summarize size of T by a P-value and interpret this P value
as strength of evidence against null hypothesis.
Formal decision making: select rejection region. If T lands in
rejection region we reject the null hypothesis and behave as if
alternative hypothesis is true.
Two approaches very closely connected.
Neyman-Pearson approach first — formal decision making.
Recognize two kinds of errors.
Type I error: Newcomb has no bias but we say he did. Null
hypothesis is true but we say it is false.
Type II error: Newcomb was biased but we miss that fact. Null
hypothesis is false but we decide it is true.
Language used in book: reject null hypothesis or fail to reject null
hypothesis.
Other places: “fail to reject” null hypothesis is called “accept null
hypothesis”. You behave as if null is true.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
37 / 51
Making a decision
For Newcomb our rejection region is
X̄ − µ0 T = √ > c
s/ n
c is critical point.
How do we select c?
Neyman Pearson method.
Choose c to control Type I error rate.
Select a pre-specified tolerable error rate: usually 5%. Call this rate α.
Find c so that
PHo (T > c) = α.
PHo is notation to show that we compute this chance assuming that
the null hypothesis is true.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
38 / 51
Specific scientific settings
Scenario 1: sample from normal population, σ (population SD)
known, hypothesis tests for population mean, µ.
Two sided alternative: H0 :µ = µ0 , Ha :µ 6= µ0
X̄ − µ0 √
T =
σ/ n and
c = zα/2
One sided alternative. H0 :µ = µ0 , Ha :µ > µ0 or H0 :µ ≤ µ0 ,
Ha :µ > µ0 .
X̄ − µ0
√
T =
σ/ n
and
c = zα
I expect you to know what to do if inequalities reversed.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
39 / 51
Scenario 2, σ unknown
Scenario 2: sample from general population, σ (population SD)
unknown, sample size n large, hypothesis tests for population mean,
µ.
Two sided alternative: H0 :µ = µ0 , Ha :µ 6= µ0
X̄ − µ0 T = √ s/ n
and
c = tα/2,n−1
One sided alternative. H0 :µ = µ0 , Ha :µ > µ0 or H0 :µ ≤ µ0 ,
Ha :µ > µ0 .
X̄ − µ0
√
T =
s/ n
and
c = tα,n−1
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
40 / 51
Small samples
Scenario 3: sample from normal population, σ (population SD)
unknown, sample size n anything, CI for population mean, µ.
Use same method as Scenario 2.
But now the method is exact.
Without the normal population assumption we are relying on the CLT
and LLN and Slutsky’s theorem.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
41 / 51
Hypothesis tests for proportions
Common scientific framework
Sequence of Bernoulli trials.
Number n fixed, p is “Success Probability” on each trial.
X is the number of successes.
Goal is a hypothesis test for proportions.
Method based on application of the Central Limit Theorem.
Same list of null / alternative choices: H0 :p = p0 or H0 :p ≤ p0
H0 :p = p0 allows either 1 or 2 sided alternatives.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
42 / 51
Using the CLT (repeat from CI notes!)
Recall p̂ = X /n and X = X1 + · · · + Xn ; each Xi is Bernoulli.
So p̂ is a sample mean of the Xi .
Population mean is µ = E(Xi ) = p.
Population variance is σ 2 = Var(Xi ) = p(1 − p).
p
√
√
So SE of p̂ is σ/ n = p(1 − p)/ n.
CLT says: if p = p0 then
p
p̂ − p0
p0 (1 − p0 )/n
⇒ N(0, 1).
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
43 / 51
Using the CLT 2
Our test statistic is either
for Ha :p > p0 or
for Ha :p 6= p0
Critical value c is
T =p
p̂ − p0
p0 (1 − p0 )/n
p̂
−
p
0
T = p
p0 (1 − p0 )/n zα/2
for two-sided alternative or
zα
for one-sided alternative.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
44 / 51
Some scientific examples
Cadmium in a lake example.
n = 17 measurements of cadmium concentration. x̄ = 211, s = 15,
units are parts per million or some such. (Important but these
numbers are made up.)
Scientific question: decide between two possibilities – concentration
below 200 vs above 200.
Typical one-sided situation.
Need to connect data to scientific question of interest.
Introduce notation: X1 , . . . , Xn are the 17 measurements.
Must assume that they are gathered and measured in such a way that
they are a sample of size 17 from a population whose mean µ is
“concentration of cadmium in the lake”
Definition of that last is scientific problem.
Issues to consider: is the whole lake sampled? are the measurements
biased? are the measurement errors independent?
Assume issues dealt with.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
45 / 51
Cadmium
For first pass I consider BOTH possible H0 s.
For H0 :µ ≤ 200 use
T =
X̄ − 200
√
s/ n
and reject if T > t0.95,n−1 = 1.75. (Notice rejection region.)
Notice use of borderline value, 200, in T .
Plug in values and find
T =
211 − 200
√
= 3.02
15/ 17
Since 3.02 > 1.75 we reject the hypothesis that µ ≤ 200.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
46 / 51
P-values
BUT: in fact we can say a bit more. This number 3.02 is quite a bit
bigger than 1.75.
If we had used α = 0.01 instead of 0.05 our rejection region would be
T > t0.01,16 = 2.58
and we would still have rejected.
In fact we would reject for any α for which
tα,16 < 3.02
Smallest possible α is when
tα,16 = 3.02.
Or
P(T16 ≤ 3.02) = 1 − α = 1 − P(T16 ≥ 3.02)
This α is Fisher’s P-value.
Compute P by finding area to right of observed statistic under null
density of statistic.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
47 / 51
P-values
Reject H0 at level α if P < α.
If H0 is right then P has a Uniform[0,1] distribution.
Interpret P as measure of evidence strength – smaller P, stronger
evidence against H0 .
Call evidence statistically significant if P < 0.05.
Highly statistically significant and very highly statistically significant
are often used for smaller thresholds like 0.01 or 0.001.
Some statistics packages label P-values with 1 star for P < 0.05, 2
stars for P < 0.01 and 3 stars for P < 0.001.
These are all simply conventions.
For two tailed problems: P is twice the area in the small tail.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
48 / 51
Example from Devore, Page 342 Q 65
Sample of n = 50 lens thicknesses. Given x̄ = 3.05 and s = 0.34 (all
in mm).
Desired mean thickness 3.20 mm.
Do “the data strongly suggest that the true average thickness of such
lenses is something other than what is desired”?
Clear two sided alternative. Null must be H0 :µ = 3.20.
Test statistic is
3.05 − 3.2 √ = 3.12
T =
0.34/ 50 P value? Twice area to right of 3.12 under t on 49 df.
P = 0.003 which is very significant. (Table A.8 gives P in range of
0.002 to 0.004.)
So we see very strong evidence against the assertion that the true
average thickness is 3.2mm.
We would reject null at α = 0.05 or even α = 0.01.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
49 / 51
Error rates and sample size calculations
Type I error: incorrectly reject H0 .
Type II error: incorrectly fail to reject H0 .
Type I error rate is α; determined in advance.
Type II error rate is β – depends on what true parameter value is.
Can sometimes compute β = P(don’t reject) as a function.
Answer will depend on n.
Can then sometimes choose n to give suitable sample size.
But often n depends on unknown parameters like σ.
So we design for some hoped for value of σ.
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
50 / 51
Sample size, Z test, 1 sided
Imagine testing µ ≤ µ0 against µ > µ0 .
Assume that σ is known.
Fix some α like 0.05.
So reject if
X̄ − µ0
√ > zα .
Z =
σ/ n
Compute β:
X̄ − µ0
√ < zα .
β=P
σ/ n
For β > β0 we make a type II error is Z < zα .
Centre on correct µ:
X̄ − µ µ − µ0
√ +
√ < zα
β=P
σ/ n
σ/ n
√
Area to left of zα − (µ − µ0 )/(σ/ n):
µ − µ0
√
β = Φ zα −
σ/ n
Richard Lockhart (Simon Fraser University) STAT 270 Inference for a Single Sample
Spring 2015 — Surrey
51 / 51