Download 1 Small Sample CI for a Population Mean µ

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Lecture 7: Small Sample Confidence Intervals Based on a Normal
Population Distribution
Readings: Sections 7.4-7.5
1
Small Sample CI for a Population Mean µ
• The large sample CI x̄ ± zα/2 √sn was constructed based on Central Limit Theorem (CLT).
• When sample size is small, CLT does not apply.
• We will assume instead that the population distribution is normal with mean µ and
standard deviation σ.
Sampling Distribution of X̄
• If population distribution is normal, or the sample size is large, X̄ ∼ N (µX̄ = µ, σX̄ =
√
σ/ n), i.e.,
X̄ − µ
√ ∼ N (0, 1).
σ/ n
• Since σ is often times unknown, we use the sample standard deviation s as the estimate
of σ.
– When sample size is large, s serves as a good estimate of σ and the sampling distriX̄−µ
√ is approximately N (0, 1).
bution of S/
n
– However, when sample size is small,
any more.
X̄−µ
√
S/ n
doesn’t have a standard normal distribution
– The multiplier zα/2 is no longer appropriate.
Properties of t Distributions
• T =
X̄−µ
√
S/ n
follows a t distribution with n − 1 degrees of freedom.
• The t distribution is symmetric about zero and bell-shaped.
• The t distribution has more variability than the standard normal distribution.
• As the degrees of freedom increase, the t distribution approaches the standard normal
distribution (because as n increases, s → σ).
1
The One-Sample t Confidence Interval for µ
• Let x̄ and s be the sample mean and sample standard deviation of a random sample of
size n from a normal population with mean µ. Then a 100(1 − α)% confidence interval
for µ is
s
x̄ ± tα/2,n−1 √ ,
n
where tα/2,n−1 is the value such that P (T > tα/2,n−1 ) = P (T < −tα/2,n−1 ) = α/2, where
T ∼ t(n − 1).
– If the sample size is large, the critical value can be taken from the standard normal
table.
– The t CIs are robust to small or even moderate deviations from normality unless n
is quite small.
Example 1: How accurate are radon detectors of a type sold to homeowners? To answer
this question, university researchers placed 12 detectors in a chamber that exposed them
to 105 picocuries per liter of radon. The detector readings were as follows:
91.9
103.8
97.8
99.6
111.4
119.3
122.3
104.8
105.4
101.7
95.0
96.6
The sample mean is x̄ = 104.13 and the sample standard deviation is s = 9.40. Find the
90% confidence interval for the population mean.
2
Assessing Normality Using Normal Quantile Plots
• Basic idea of normal quantile plots: if you data come from a normal distribution, then
the ith smallest observation should roughly correspond to the (i/n) × 100th percentile of
a normal distribution.
• Here is how it works:
1. Order the data from the smallest to the largest. Let x(i) denote the ith smallest
value.
2. Take x(i) to be the ((i − 0.5)/n) × 100th percentile.
3. Determine the corresponding percentiles for standard normal distributions, i.e., calculate the ((i − 0.5)/n) × 100th percentile zi of Z.
4. Plot the data values x(i) against zi .
• A plot for which the points fall close to some straight line suggests that the assumption
of a normal population is plausible.
Example 1 (cont’d): Check the normality of data.
120
●
105
110
●
●
●
●
●
100
Radon Detector Readings
115
●
●
●
95
●
●
●
−1.5
−1.0
−0.5
0.0
Normal Quantiles
data radon;
input reading @@;
datalines;
91.9 97.8 111.4 122.3 105.4
95.0 103.8 99.6 119.3 104.8
101.7 96.6
;
run;
proc univariate data=radon;
var reading;
QQplot / Normal(mu=est sigma=est);
run;
3
0.5
1.0
1.5
Prediction Interval for a Single Future Value
• In many applications, we are interested in predicting a single value of a variable to be
observed at some future time, rather than estimating the mean value of that variable.
Example 2: Consider the following sample of fat content (in percentage) of n = 10 randomly selected hot dogs:
25.2
21.0
21.3
25.5
22.8
16.0
17.0
20.9
29.8
19.5
– Assuming that these were selected from a normal population distribution, a 95% for
the population mean fat content is
– Suppose, however, you are going to eat a single hot dog of this type and want a
prediction of the resulting fat content.
4
Prediction Interval for a Single Value
• Let x̄ and s be the sample mean and sample standard deviation of a random sample of size
n from a normal population. Then the prediction interval (PI) for a single observation
to be selected from the normal population distribution is
r
1
x̄ ± tα/2,n−1 · s 1 +
n
The prediction level is 100(1 − α)%.
– If the sample size is large, the critical value can be taken from the standard normal
table.
– The validity of a prediction interval is closely tied to the normality assumption. The
interval shouldn’t be used in the absence of compelling evidence for normality.
– The interpretation of the prediction intervals is similar to a confidence interval. If
the prediction interval is calculated for a large number of samples, in the long run,
100(1 − α)% of these intervals will include the corresponding future value.
Example 2 (cont’d):
– Find the 95% prediction interval for the fat content of a single hot dog.
5
2
Small Sample CI for Difference of Two Population Means
µ1 − µ2
The (unpooled) Two-Sample t Confidence Interval for µ1 − µ2
• If samples of size n1 and n2 are taken from two normal populations with means µ1 and
µ2 , then the two-sided 100(1 − α)% confidence interval for µ1 − µ2 is
s
s21
s2
(x̄1 − x̄2 ) ± tα/2,k
+ 2,
n1 n2
where the value of the degrees of freedom is
k=
2
s21
n1
+
s22
n2
(s21 /n1 )2
n1 −1
+
(s22 /n2 )2
n2 −1
– If the sample sizes are large, the critical value can be taken from the standard normal
table.
– The degree of freedom is usually not an integer. SAS can calculate the critical value
for non-integer df’s. For example, the critical values for the 90%, 95%, and 99% CIs
with 10.45 degrees of freedom are calculated as follows:
data critical_value;
df = 10.45;
/*critical value for 90% CI*/
t_90 = tinv(0.95, df);
/*critical value for 95% CI*/
t_95 = tinv(0.975, df);
/*critical value for 99% CI*/
t_99 = tinv(0.995, df);
run;
proc print data=critical_value;
run;
SAS OUTPUT:
Obs
1
df
10.45
t_90
1.80457
t_95
2.21520
t_99
3.13894
– If SAS is not handy, round this df down to the nearest integer. (t0.05,10 = 1.812, t0.025,10 =
2.228, t0.005,10 = 3.169).
6
The (pooled) Two-Sample t Confidence Interval for µ1 − µ2
• If samples of size n1 and n2 are taken from two normal populations with means µ1 and
µ2 and a common standard deviation, then the two-sided 100(1 − α)% confidence interval
for µ1 − µ2 is
r
1
1
x̄1 − x̄2 ± tα/2,k spooled
+ ,
n1 n2
where the value of the degrees of freedom is k = n1 + n2 − 2 and the pooled standard
deviation is
s
(n1 − 1)s21 + (n2 − 1)s22
spooled =
n1 + n2 − 2
– The pooled t confidence interval is not robust to violations of the equal standard
deviation assumption.
– We therefore recommend the unpooled t confidence intervals unless there is really
compelling evidence for doing otherwise.
Example 3: Seedlings were germinated under two different lighting conditions. Their
lengths (in cm) were measured after a specified time period. The data are as follows
n
x̄
s
Dark
22
1.76
0.586
Light
21
2.46
0.802
– Calculate a 95% C.I. for the difference in the mean length under different lighting conditions, assuming that the lengths under different lighting conditions follow
normal distributions with different standard deviations.
7
– Calculate a 95% C.I. for the difference in the mean length under different lighting conditions, assuming that the lengths under different lighting conditions follow
normal distributions with a common standard deviation.
3
CI for µ1 − µ2 from Paired Data
• Sometimes we have observations in pairs such as:
– as identical twins
– two observations on the same individual (two days, pre- and post-tests, before and
after measurements)
• Confidence intervals for paired data are based on the difference obtained between the 2
measurements
– Find the difference for each of the n pairs, that is di = xi1 − xi2 .
– Find the sample mean d¯ and sample standard deviation sd of these differences.
– Perform one-sample procedures for these differences. That is,
∗ The 100(1 − α)% CI for µd = µ1 − µ2 is given by
sd
d¯ ± tα/2,n−1 √
n
∗ If the sample size n is large, the critical value can be taken from the standard
normal table: zα/2 .
Example 4: Researchers are interested in whether Vitamin C is lost when wheat soy blend
(CSB) is cooked as gruel. Samples of gruel were collected, and the vitamin C content was
measured (in mg per 100 grams of gruel) before and after cooking. Here are the results:
8
Sample
Before
After
Before - After
1
73
20
53
2
79
27
52
3
86
29
57
4
88
36
52
5
78
17
61
x̄
80.8
25.8
55
s
6.14
7.53
3.94
Find a 90% confidence interval for the mean vitamin C content loss
9