Download STAT 113 Normal-Based Intervals and Tests for a Sample Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
STAT 113
Normal-Based Intervals and Tests for a Sample
Mean
Colin Reimer Dawson
Oberlin College
November 8, 2016
Distribution of Sample Means
CI for a Single Mean
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Hypothesis Test for a Single Mean
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Cases to Address
We will need standard errors to do CIs and tests for the following
parameters:
1. Single Proportion (last time)
2. Single Mean (today)
3. Difference of Proportions (tomorrow)
4. Difference of Means (tomorrow/Friday)
5. Mean of Differences (new! next week)
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Distribution of p̂
When the population proportion is p and the samples are of size n,
the sampling distribution of p̂ has mean p and standard deviation
(standard error)
r
p(1 − p)
SEp̂ =
n
It is also approximately normal, when samples are large enough, and
p isn’t too extreme. Rough rule:
np ≥ 10 AND n(1 − p) ≥ 10
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
CI Summary: Single Proportion
To compute a confidence interval for a proportion when the
bootstrap distribution for p̂ is approximately Normal (i.e., np̂ and
n(1 − p̂) ≥ 10), use
r
p̂(1 − p̂)
p̂ ± Z ∗ ·
n
where Z ∗ is the Z-score of the endpoint appropriate for the
confidence level, computed from a standard normal (N (0, 1)).
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
P -values for a sample proportion from a Standard Normal
Computing P -values when the null sampling distribution is
approximately Normal (i.e., np0 and np0 (1 − p0 ) ≥ 10) is the
reverse process:
1. Convert p̂ to a z-score within the theoretical distribution .
p̂ − p0
Zobserved = q
p0 (1−p0 )
n
2. Find the relevant area beyond Zobserved using a Standard
Normal
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Distribution of Sample Means
• Central Limit Theorem: Sampling Distribution of x̄ is
approximately Normal, for “sufficiently large” samples, or when
the population distribution is Normal.
• As the sample size n goes up, the standard error goes
.
• Pairs: What effect do you expect the population standard
deviation to have on the standard error of the distribution of
sample means? Why?
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Distribution of x̄
When the population mean is µ, the population standard deviation
is σ, and the samples are of size n, the sampling distribution of x̄
has mean µ and standard deviation (standard error)
σ
SEx̄ = √
n
It is also approximately Normal, when samples are large enough,
OR if the population distribution is approximately Normal. The
farther from Normal, the bigger the sample needs to be, but can
roughly use n ≥ 27.
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
CI Summary: Single Mean
To compute a confidence interval for a mean when the sampling
distribution for x̄ is approximately Normal (i.e., Normal population,
or “large” n), use
σ
x̄ ± Z ∗ · √
n
where Z ∗ is the Z-score of the endpoint appropriate for the
confidence level, computed from a standard normal (N (0, 1)).
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Example: Mean Atlanta Commute Time, “Pure” Bootstrap
CI
library("mosaic"); library("Lock5Data")
data(CommuteAtlanta)
Bootstrap.means <- do(10000) *
mean(~Time, data = resample(CommuteAtlanta))
CI.99.boot <quantile(~mean, data = Bootstrap.means, prob = c(0.005, 0.995))
CI.99.boot
##
0.5%
99.5%
## 26.78194 31.63203
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Commute Time: Pure Bootstrap CI
dotPlot(~mean, data = Bootstrap.means, width = 0.1, cex = 20,
groups = mean >= CI.99.boot[1] & mean <= CI.99.boot[2])
500
Count
400
300
200
100
0
● ● ●
26
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
● ●
● ●
● ●
●
● ●
● ●
● ●
● ●
●
● ●
●
●
●
● ●
●
●
●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
28
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
30
mean
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ● ●
● ●
●
●
●
●
● ●
● ● ●
● ● ● ●
●
●
●
●
●
●
●
●
32
●
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Example: Mean Atlanta Commute Time CI, Normal w/
Bootstrap SE
zstar.99.lower <- qnorm(0.005) # get z-scores of the endpoints
zstar.99.upper <- qnorm(0.995) # (without the 'x', no extra output)
xbar <- mean(~Time, data = CommuteAtlanta) # sample mean
se.boot <- sd(~mean, data = Bootstrap.means) # bootstrap se
CI.99.normal.boot.se <c(xbar + zstar.99.lower * se.boot, xbar + zstar.99.upper * se.boot)
CI.99.normal.boot.se
## [1] 26.72032 31.49968
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
0.3
0.2
0.1
0.0
Normal Density
0.4
Commute Time CI, Normal w/ Bootstrap SE
26
28
30
Sample mean
32
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Example: Mean Atlanta Commute Time CI, Normal w/
Theoretical SE
n <- nrow(CommuteAtlanta) # get the sample size
xbar <- mean(~Time, data = CommuteAtlanta) # get the sample mean
zstar.99.lower <- qnorm(0.005) # get the z-scores for the endpoints
zstar.99.upper <- pnorm(0.005) # (without the 'x', no extra output)
se.theory <- sigma / sqrt(n)
# calculate the SE using the formula
CI.99.normal.theory.se <c(xbar + zstar.99.lower * se.theory, xbar + zstar.99.upper * se.theory)
CI.99.normal.theory.se
Wait, where do we get σ?
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Using s instead of σ
• We only have s, the sample standard deviation; not σ, the
population standard deviation.
• We can approximate SE with √sn , but need to account for the
fact that s itself is an estimate (and differs between samples).
• “95% of sample means are within 2SE of µ” no longer
accurate: the percentage is less than this.
• How much less depends on how good an estimate s is of σ
(i.e., depends on n).
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Degrees of Freedom
Recall
sP
s=
n
i=1 (xi
− x̄)2
n−1
n − 1 is the “degrees of freedom”, or the number of “pieces of
information” we have about variability.
Bigger df → more accurate reflection of σ.
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
The t family of distributions
When we know σ, we have
Z=
X̄ − µ
√ ∼ N (0, 1)
σ/ n
i.e., z-scores calculated from sample means have a Standard
Normal However, if we use s as an estimate of σ, this introduces
extra possibility for error, and so the z-scores have a distribution
with “fatter tails” (i.e., a larger share of “extreme values”): a
“t-distribution”. How “fat” depends on n.
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
0.4
A family of t distributions
0.2
0.1
0.0
t density
0.3
df = 1
df = 5
df = 30
Standard Normal
−4
−2
0
(x − µ) (s n )
2
4
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Tail Probabilities in t distributions
density
14
8
0.
70
5
0.
0.0015
0.
14
8
xpt(c(-2, 2), df = 1)
0.0010
0.0005
−300
−200
−100
[1] 0.1475836 0.8524164
0
100
200
300
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Tail Probabilities in t distributions
density
07
0.
0.
07
0.
0.4
86
1
xpt(c(-2, 2), df = 3)
0.3
0.2
0.1
−5
[1] 0.06966298 0.93033702
0
5
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Tail Probabilities in t distributions
density
05
1
0.
89
8
0.
0.4
0.
0.5
05
1
xpt(c(-2, 2), df = 5)
0.3
0.2
0.1
−4
−2
[1] 0.05096974 0.94903026
0
2
4
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Tail Probabilities in t distributions
density
02
7
0.
94
5
0.
0.5
0.
02
7
xpt(c(-2, 2), df = 30)
0.4
0.3
0.2
0.1
−3
−2
−1
[1] 0.02731252 0.97268748
0
1
2
3
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Tail Probabilities in Standard Normal distribution
xpnorm(c(-2, 2))
If X ~ N(0, 1), then
density
02
3
0.
95
4
0.
0.4
0.
0.5
02
3
P(X <= -2) = P(Z <= -2) = 0.02275013
P(X <= 2) = P(Z <= 2) = 0.97724987
P(X > -2) = P(Z > -2) = 0.97724987
P(X >
2) = P(Z >
2) = 0.02275013
0.3
0.2
0.1
−2
0
2
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Quantiles of t distributions
density
02
5
0.
0.
0.
0.0015
95
02
5
xqt(c(0.025, 0.975), df = 1)
0.0010
0.0005
−300
[1] -12.7062
−200
12.7062
−100
0
100
200
300
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Quantiles of t distributions
density
02
5
95
0.
0.
0.
0.4
02
5
xqt(c(0.025, 0.975), df = 3)
0.3
0.2
0.1
−5
[1] -3.182446
3.182446
0
5
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Quantiles of t distributions
density
02
5
95
0.
0.
0.4
0.
0.5
02
5
xqt(c(0.025, 0.975), df = 5)
0.3
0.2
0.1
−4
[1] -2.570582
−2
2.570582
0
2
4
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Quantiles of t distributions
density
02
5
0.
0.
0.
0.5
95
02
5
xqt(c(0.025, 0.975), df = 30)
0.4
0.3
0.2
0.1
−3
[1] -2.042272
−2
−1
2.042272
0
1
2
3
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Quantiles of Standard Normal distribution
xqnorm(c(0.025, 0.975))
density
02
5
0.025
0.975
0.975
0.025
95
=
=
=
=
0.
0.
0.4
0.
0.5
02
5
P(X <= -1.95996398454005)
P(X <= 1.95996398454005)
P(X > -1.95996398454005)
P(X > 1.95996398454005)
0.3
0.2
0.1
−2
[1] -1.959964
1.959964
0
2
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
CI Summary: Single Mean
To compute a confidence interval for a mean when the sampling
distribution for x̄ is approximately Normal (i.e., Normal population,
or “large” n) and σ is unknown (which is almost always), use
s
x̄ ± t∗n−1 · √
n
where t∗n−1 is the quantile appropriate for the confidence level,
computed from a t-distribution with n − 1 degrees of freedom.
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Example: Atlanta Commute Time
Demo
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
P -values for a sample mean
Computing P -values when the null sampling distribution is
approximately Normal (i.e., Population is normal OR sample size is
“large”) and σ is unknown (which is almost always) is the reverse
process:
1. Convert x̄ to a t-statistic within the theoretical distribution .
Tobserved =
x̄ − µ0
√s
n
2. Find the relevant area beyond Tobserved using a t distribution
with n − 1 degrees of freedom
Distribution of Sample Means
CI for a Single Mean
Hypothesis Test for a Single Mean
Example: Mean Body Temperature
Demo
Related documents