Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean STAT 113 Normal-Based Intervals and Tests for a Sample Mean Colin Reimer Dawson Oberlin College November 8, 2016 Distribution of Sample Means CI for a Single Mean Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Hypothesis Test for a Single Mean Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Cases to Address We will need standard errors to do CIs and tests for the following parameters: 1. Single Proportion (last time) 2. Single Mean (today) 3. Difference of Proportions (tomorrow) 4. Difference of Means (tomorrow/Friday) 5. Mean of Differences (new! next week) Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Distribution of p̂ When the population proportion is p and the samples are of size n, the sampling distribution of p̂ has mean p and standard deviation (standard error) r p(1 − p) SEp̂ = n It is also approximately normal, when samples are large enough, and p isn’t too extreme. Rough rule: np ≥ 10 AND n(1 − p) ≥ 10 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean CI Summary: Single Proportion To compute a confidence interval for a proportion when the bootstrap distribution for p̂ is approximately Normal (i.e., np̂ and n(1 − p̂) ≥ 10), use r p̂(1 − p̂) p̂ ± Z ∗ · n where Z ∗ is the Z-score of the endpoint appropriate for the confidence level, computed from a standard normal (N (0, 1)). Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean P -values for a sample proportion from a Standard Normal Computing P -values when the null sampling distribution is approximately Normal (i.e., np0 and np0 (1 − p0 ) ≥ 10) is the reverse process: 1. Convert p̂ to a z-score within the theoretical distribution . p̂ − p0 Zobserved = q p0 (1−p0 ) n 2. Find the relevant area beyond Zobserved using a Standard Normal Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Distribution of Sample Means • Central Limit Theorem: Sampling Distribution of x̄ is approximately Normal, for “sufficiently large” samples, or when the population distribution is Normal. • As the sample size n goes up, the standard error goes . • Pairs: What effect do you expect the population standard deviation to have on the standard error of the distribution of sample means? Why? Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Distribution of x̄ When the population mean is µ, the population standard deviation is σ, and the samples are of size n, the sampling distribution of x̄ has mean µ and standard deviation (standard error) σ SEx̄ = √ n It is also approximately Normal, when samples are large enough, OR if the population distribution is approximately Normal. The farther from Normal, the bigger the sample needs to be, but can roughly use n ≥ 27. Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean CI Summary: Single Mean To compute a confidence interval for a mean when the sampling distribution for x̄ is approximately Normal (i.e., Normal population, or “large” n), use σ x̄ ± Z ∗ · √ n where Z ∗ is the Z-score of the endpoint appropriate for the confidence level, computed from a standard normal (N (0, 1)). Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Example: Mean Atlanta Commute Time, “Pure” Bootstrap CI library("mosaic"); library("Lock5Data") data(CommuteAtlanta) Bootstrap.means <- do(10000) * mean(~Time, data = resample(CommuteAtlanta)) CI.99.boot <quantile(~mean, data = Bootstrap.means, prob = c(0.005, 0.995)) CI.99.boot ## 0.5% 99.5% ## 26.78194 31.63203 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Commute Time: Pure Bootstrap CI dotPlot(~mean, data = Bootstrap.means, width = 0.1, cex = 20, groups = mean >= CI.99.boot[1] & mean <= CI.99.boot[2]) 500 Count 400 300 200 100 0 ● ● ● 26 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 28 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 mean ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 32 ● Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Example: Mean Atlanta Commute Time CI, Normal w/ Bootstrap SE zstar.99.lower <- qnorm(0.005) # get z-scores of the endpoints zstar.99.upper <- qnorm(0.995) # (without the 'x', no extra output) xbar <- mean(~Time, data = CommuteAtlanta) # sample mean se.boot <- sd(~mean, data = Bootstrap.means) # bootstrap se CI.99.normal.boot.se <c(xbar + zstar.99.lower * se.boot, xbar + zstar.99.upper * se.boot) CI.99.normal.boot.se ## [1] 26.72032 31.49968 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean 0.3 0.2 0.1 0.0 Normal Density 0.4 Commute Time CI, Normal w/ Bootstrap SE 26 28 30 Sample mean 32 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Example: Mean Atlanta Commute Time CI, Normal w/ Theoretical SE n <- nrow(CommuteAtlanta) # get the sample size xbar <- mean(~Time, data = CommuteAtlanta) # get the sample mean zstar.99.lower <- qnorm(0.005) # get the z-scores for the endpoints zstar.99.upper <- pnorm(0.005) # (without the 'x', no extra output) se.theory <- sigma / sqrt(n) # calculate the SE using the formula CI.99.normal.theory.se <c(xbar + zstar.99.lower * se.theory, xbar + zstar.99.upper * se.theory) CI.99.normal.theory.se Wait, where do we get σ? Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Using s instead of σ • We only have s, the sample standard deviation; not σ, the population standard deviation. • We can approximate SE with √sn , but need to account for the fact that s itself is an estimate (and differs between samples). • “95% of sample means are within 2SE of µ” no longer accurate: the percentage is less than this. • How much less depends on how good an estimate s is of σ (i.e., depends on n). Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Degrees of Freedom Recall sP s= n i=1 (xi − x̄)2 n−1 n − 1 is the “degrees of freedom”, or the number of “pieces of information” we have about variability. Bigger df → more accurate reflection of σ. Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean The t family of distributions When we know σ, we have Z= X̄ − µ √ ∼ N (0, 1) σ/ n i.e., z-scores calculated from sample means have a Standard Normal However, if we use s as an estimate of σ, this introduces extra possibility for error, and so the z-scores have a distribution with “fatter tails” (i.e., a larger share of “extreme values”): a “t-distribution”. How “fat” depends on n. Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean 0.4 A family of t distributions 0.2 0.1 0.0 t density 0.3 df = 1 df = 5 df = 30 Standard Normal −4 −2 0 (x − µ) (s n ) 2 4 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Tail Probabilities in t distributions density 14 8 0. 70 5 0. 0.0015 0. 14 8 xpt(c(-2, 2), df = 1) 0.0010 0.0005 −300 −200 −100 [1] 0.1475836 0.8524164 0 100 200 300 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Tail Probabilities in t distributions density 07 0. 0. 07 0. 0.4 86 1 xpt(c(-2, 2), df = 3) 0.3 0.2 0.1 −5 [1] 0.06966298 0.93033702 0 5 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Tail Probabilities in t distributions density 05 1 0. 89 8 0. 0.4 0. 0.5 05 1 xpt(c(-2, 2), df = 5) 0.3 0.2 0.1 −4 −2 [1] 0.05096974 0.94903026 0 2 4 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Tail Probabilities in t distributions density 02 7 0. 94 5 0. 0.5 0. 02 7 xpt(c(-2, 2), df = 30) 0.4 0.3 0.2 0.1 −3 −2 −1 [1] 0.02731252 0.97268748 0 1 2 3 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Tail Probabilities in Standard Normal distribution xpnorm(c(-2, 2)) If X ~ N(0, 1), then density 02 3 0. 95 4 0. 0.4 0. 0.5 02 3 P(X <= -2) = P(Z <= -2) = 0.02275013 P(X <= 2) = P(Z <= 2) = 0.97724987 P(X > -2) = P(Z > -2) = 0.97724987 P(X > 2) = P(Z > 2) = 0.02275013 0.3 0.2 0.1 −2 0 2 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Quantiles of t distributions density 02 5 0. 0. 0. 0.0015 95 02 5 xqt(c(0.025, 0.975), df = 1) 0.0010 0.0005 −300 [1] -12.7062 −200 12.7062 −100 0 100 200 300 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Quantiles of t distributions density 02 5 95 0. 0. 0. 0.4 02 5 xqt(c(0.025, 0.975), df = 3) 0.3 0.2 0.1 −5 [1] -3.182446 3.182446 0 5 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Quantiles of t distributions density 02 5 95 0. 0. 0.4 0. 0.5 02 5 xqt(c(0.025, 0.975), df = 5) 0.3 0.2 0.1 −4 [1] -2.570582 −2 2.570582 0 2 4 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Quantiles of t distributions density 02 5 0. 0. 0. 0.5 95 02 5 xqt(c(0.025, 0.975), df = 30) 0.4 0.3 0.2 0.1 −3 [1] -2.042272 −2 −1 2.042272 0 1 2 3 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Quantiles of Standard Normal distribution xqnorm(c(0.025, 0.975)) density 02 5 0.025 0.975 0.975 0.025 95 = = = = 0. 0. 0.4 0. 0.5 02 5 P(X <= -1.95996398454005) P(X <= 1.95996398454005) P(X > -1.95996398454005) P(X > 1.95996398454005) 0.3 0.2 0.1 −2 [1] -1.959964 1.959964 0 2 Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean CI Summary: Single Mean To compute a confidence interval for a mean when the sampling distribution for x̄ is approximately Normal (i.e., Normal population, or “large” n) and σ is unknown (which is almost always), use s x̄ ± t∗n−1 · √ n where t∗n−1 is the quantile appropriate for the confidence level, computed from a t-distribution with n − 1 degrees of freedom. Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Example: Atlanta Commute Time Demo Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean P -values for a sample mean Computing P -values when the null sampling distribution is approximately Normal (i.e., Population is normal OR sample size is “large”) and σ is unknown (which is almost always) is the reverse process: 1. Convert x̄ to a t-statistic within the theoretical distribution . Tobserved = x̄ − µ0 √s n 2. Find the relevant area beyond Tobserved using a t distribution with n − 1 degrees of freedom Distribution of Sample Means CI for a Single Mean Hypothesis Test for a Single Mean Example: Mean Body Temperature Demo