Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 7: Small Sample Confidence Intervals Based on a Normal Population Distribution Readings: Sections 7.4-7.5 1 Small Sample CI for a Population Mean µ • The large sample CI x̄ ± zα/2 √sn was constructed based on Central Limit Theorem (CLT). • When sample size is small, CLT does not apply. • We will assume instead that the population distribution is normal with mean µ and standard deviation σ. Sampling Distribution of X̄ • If population distribution is normal, or the sample size is large, X̄ ∼ N (µX̄ = µ, σX̄ = √ σ/ n), i.e., X̄ − µ √ ∼ N (0, 1). σ/ n • Since σ is often times unknown, we use the sample standard deviation s as the estimate of σ. – When sample size is large, s serves as a good estimate of σ and the sampling distriX̄−µ √ is approximately N (0, 1). bution of S/ n – However, when sample size is small, any more. X̄−µ √ S/ n doesn’t have a standard normal distribution – The multiplier zα/2 is no longer appropriate. Properties of t Distributions • T = X̄−µ √ S/ n follows a t distribution with n − 1 degrees of freedom. • The t distribution is symmetric about zero and bell-shaped. • The t distribution has more variability than the standard normal distribution. • As the degrees of freedom increase, the t distribution approaches the standard normal distribution (because as n increases, s → σ). 1 The One-Sample t Confidence Interval for µ • Let x̄ and s be the sample mean and sample standard deviation of a random sample of size n from a normal population with mean µ. Then a 100(1 − α)% confidence interval for µ is s x̄ ± tα/2,n−1 √ , n where tα/2,n−1 is the value such that P (T > tα/2,n−1 ) = P (T < −tα/2,n−1 ) = α/2, where T ∼ t(n − 1). – If the sample size is large, the critical value can be taken from the standard normal table. – The t CIs are robust to small or even moderate deviations from normality unless n is quite small. Example 1: How accurate are radon detectors of a type sold to homeowners? To answer this question, university researchers placed 12 detectors in a chamber that exposed them to 105 picocuries per liter of radon. The detector readings were as follows: 91.9 103.8 97.8 99.6 111.4 119.3 122.3 104.8 105.4 101.7 95.0 96.6 The sample mean is x̄ = 104.13 and the sample standard deviation is s = 9.40. Find the 90% confidence interval for the population mean. 2 Assessing Normality Using Normal Quantile Plots • Basic idea of normal quantile plots: if you data come from a normal distribution, then the ith smallest observation should roughly correspond to the (i/n) × 100th percentile of a normal distribution. • Here is how it works: 1. Order the data from the smallest to the largest. Let x(i) denote the ith smallest value. 2. Take x(i) to be the ((i − 0.5)/n) × 100th percentile. 3. Determine the corresponding percentiles for standard normal distributions, i.e., calculate the ((i − 0.5)/n) × 100th percentile zi of Z. 4. Plot the data values x(i) against zi . • A plot for which the points fall close to some straight line suggests that the assumption of a normal population is plausible. Example 1 (cont’d): Check the normality of data. 120 ● 105 110 ● ● ● ● ● 100 Radon Detector Readings 115 ● ● ● 95 ● ● ● −1.5 −1.0 −0.5 0.0 Normal Quantiles data radon; input reading @@; datalines; 91.9 97.8 111.4 122.3 105.4 95.0 103.8 99.6 119.3 104.8 101.7 96.6 ; run; proc univariate data=radon; var reading; QQplot / Normal(mu=est sigma=est); run; 3 0.5 1.0 1.5 Prediction Interval for a Single Future Value • In many applications, we are interested in predicting a single value of a variable to be observed at some future time, rather than estimating the mean value of that variable. Example 2: Consider the following sample of fat content (in percentage) of n = 10 randomly selected hot dogs: 25.2 21.0 21.3 25.5 22.8 16.0 17.0 20.9 29.8 19.5 – Assuming that these were selected from a normal population distribution, a 95% for the population mean fat content is – Suppose, however, you are going to eat a single hot dog of this type and want a prediction of the resulting fat content. 4 Prediction Interval for a Single Value • Let x̄ and s be the sample mean and sample standard deviation of a random sample of size n from a normal population. Then the prediction interval (PI) for a single observation to be selected from the normal population distribution is r 1 x̄ ± tα/2,n−1 · s 1 + n The prediction level is 100(1 − α)%. – If the sample size is large, the critical value can be taken from the standard normal table. – The validity of a prediction interval is closely tied to the normality assumption. The interval shouldn’t be used in the absence of compelling evidence for normality. – The interpretation of the prediction intervals is similar to a confidence interval. If the prediction interval is calculated for a large number of samples, in the long run, 100(1 − α)% of these intervals will include the corresponding future value. Example 2 (cont’d): – Find the 95% prediction interval for the fat content of a single hot dog. 5 2 Small Sample CI for Difference of Two Population Means µ1 − µ2 The (unpooled) Two-Sample t Confidence Interval for µ1 − µ2 • If samples of size n1 and n2 are taken from two normal populations with means µ1 and µ2 , then the two-sided 100(1 − α)% confidence interval for µ1 − µ2 is s s21 s2 (x̄1 − x̄2 ) ± tα/2,k + 2, n1 n2 where the value of the degrees of freedom is k= 2 s21 n1 + s22 n2 (s21 /n1 )2 n1 −1 + (s22 /n2 )2 n2 −1 – If the sample sizes are large, the critical value can be taken from the standard normal table. – The degree of freedom is usually not an integer. SAS can calculate the critical value for non-integer df’s. For example, the critical values for the 90%, 95%, and 99% CIs with 10.45 degrees of freedom are calculated as follows: data critical_value; df = 10.45; /*critical value for 90% CI*/ t_90 = tinv(0.95, df); /*critical value for 95% CI*/ t_95 = tinv(0.975, df); /*critical value for 99% CI*/ t_99 = tinv(0.995, df); run; proc print data=critical_value; run; SAS OUTPUT: Obs 1 df 10.45 t_90 1.80457 t_95 2.21520 t_99 3.13894 – If SAS is not handy, round this df down to the nearest integer. (t0.05,10 = 1.812, t0.025,10 = 2.228, t0.005,10 = 3.169). 6 The (pooled) Two-Sample t Confidence Interval for µ1 − µ2 • If samples of size n1 and n2 are taken from two normal populations with means µ1 and µ2 and a common standard deviation, then the two-sided 100(1 − α)% confidence interval for µ1 − µ2 is r 1 1 x̄1 − x̄2 ± tα/2,k spooled + , n1 n2 where the value of the degrees of freedom is k = n1 + n2 − 2 and the pooled standard deviation is s (n1 − 1)s21 + (n2 − 1)s22 spooled = n1 + n2 − 2 – The pooled t confidence interval is not robust to violations of the equal standard deviation assumption. – We therefore recommend the unpooled t confidence intervals unless there is really compelling evidence for doing otherwise. Example 3: Seedlings were germinated under two different lighting conditions. Their lengths (in cm) were measured after a specified time period. The data are as follows n x̄ s Dark 22 1.76 0.586 Light 21 2.46 0.802 – Calculate a 95% C.I. for the difference in the mean length under different lighting conditions, assuming that the lengths under different lighting conditions follow normal distributions with different standard deviations. 7 – Calculate a 95% C.I. for the difference in the mean length under different lighting conditions, assuming that the lengths under different lighting conditions follow normal distributions with a common standard deviation. 3 CI for µ1 − µ2 from Paired Data • Sometimes we have observations in pairs such as: – as identical twins – two observations on the same individual (two days, pre- and post-tests, before and after measurements) • Confidence intervals for paired data are based on the difference obtained between the 2 measurements – Find the difference for each of the n pairs, that is di = xi1 − xi2 . – Find the sample mean d¯ and sample standard deviation sd of these differences. – Perform one-sample procedures for these differences. That is, ∗ The 100(1 − α)% CI for µd = µ1 − µ2 is given by sd d¯ ± tα/2,n−1 √ n ∗ If the sample size n is large, the critical value can be taken from the standard normal table: zα/2 . Example 4: Researchers are interested in whether Vitamin C is lost when wheat soy blend (CSB) is cooked as gruel. Samples of gruel were collected, and the vitamin C content was measured (in mg per 100 grams of gruel) before and after cooking. Here are the results: 8 Sample Before After Before - After 1 73 20 53 2 79 27 52 3 86 29 57 4 88 36 52 5 78 17 61 x̄ 80.8 25.8 55 s 6.14 7.53 3.94 Find a 90% confidence interval for the mean vitamin C content loss 9