Download Interval Estimation (Means and Proportions) Recall: The observed

Interval Estimation (Means and Proportions) Recall: The observed value of an estimator of a population parameter is called a point estimate. For example, the sample mean x = 35 is a point estimate of the population mean µ. Question: How precise is the estimate? We could use the standard error of the estimate as a measure of the error. It measures approximately the error that we will observe on average. We will now make a refinement of this error called a confidence interval. It will permit us to measure the precision of the estimate. Before constructing a confidence interval for a population mean µ, we will need to define the concept of an upper quantile from the standard normal distribution. Definition: Let Z ∼ N (0, 1), that is a standard normal random variable. Its upper quantile of order A is a number zA such that A = P (Z > zA ) = 1 − Φ(zA ), that is the area under the density of Z to the right of the value zA est A. Remark: Since T (∞) = N (0, 1), then zA = tA,∞ . Thus, the row ν = ∞ in Table 17.4 in the textbook, we find some quantiles for the standard N (0, 1). The following graph illustrates an upper quantile of order 5% for the standard normal distribution, that is z0.05 = 1.645. 1 Estimating the population mean µ (σ known) Conditions: • normal population or a large sample size (n ≥ 30); • population variance σ 2 is known. Under these conditions: X −µ √ σ/ n follows a N (0, 1) distribution. It is approximative if n is large but the population is not normal. Hence X −µ √ ≤ zα/2 1 − α = P −zα/2 ≤ σ/ n σ σ = P X − zα/2 √ ≤ µ ≤ X + zα/2 √ n n Z= Hence, if the population is normal or if the sample size is large (n ≥ 30), then a 100 (1 − α)% confidence interval for µ is σ σ σ x ± zα/2 √ = [x − zα/2 √ , x + zα/2 √ ]. n n n 2 Remarks: • 1 − α is known as the confidence level. Often we set the level to 95%, but any large value would be reasonable. For example, 90%, 98% or 99%. • The length of the interval will be used as a measure of precision. The shorter the interval the more precise is the estimate and the longer the interval is interpreted as a less precise estimate. Further remarks: Usually in practice, we do not know σ. We will use the sample standard deviation s instead of σ. Recall that X −µ √ ∼ N (0, 1) approximately, S/ n when n is large (n ≥ 40). Thus, we obtain the following confidence interval. Confidence interval for µ (Large sample case, that is n ≥ 40) If n is large (n ≥ 40), then a 100(1 − α)% confidence interval for µ is s x ± zα/2 √ . n 3 Example 1: Consider the following summary statistics for the length of the unsuccessful songs of crickets. (a) Construct a 95% confidence interval for the true mean length of an unsuccessful song. (b) Give a 98% interval for the population mean length of an unsuccessful song. (c) Compare the length of the intervals from part (a) and (b). 4 Remark: Frequentist interpretation of the confidence level. Say 1 − α = 95% and that the population mean is µ = 4. As we collect n observations and compute a 95% confidence interval for µ, say it is [3.5, 4.2]. Then either the value of the mean is in the interval or not. In this case, it is in the interval. So with probability 1, µ = 4 falls in the interval [3.5, 4.2]. So what does the 95% represent? Well as we repeatedly collect other samples these intervals will vary. At times, the value is in the interval and sometimes it does not fall in the interval. However as we repeat this process a large number of times about 95% of the constructed intervals will contain the true value of the mean. So we say that we are 95% confident that µ is in our observed confidence interval since the interval belongs to a class of intervals such that 95% of them contain the true mean. Precision: We will use the length of the interval as a measure of the precision: zα/2 σ σ σ x + zα/2 √ − (x − zα/2 √ ) = 2 √ . n n n Remarks: • A short interval is interpreted as a precise estimate. • The precision is a function of the confidence level, the standard deviation and the sample size n. • The more dispersed is the population, the less precise is the estimate. Note: We cannot manipulate σ. • If we increase the level of confidence, then the estimate is less precise. • If we increase the sample size, then the estimate is more precise. • We want both a precise estimate and a high level of confidence. To achieve both, we fix the level of confidence to a high level, say 95%, then we choose the appropriate sample size to control the precision. 5 Sample Size If x is used as a point estimate for µ, then we are 100 (1 − α)% confident that the error |x − µ| will not be greater than E, if the sample size satisfies n≥ z α/2 E σ 2 . Remarks: • If n is not an integer, the we round up to the closest integer. • If σ is unknown, then in practice we can try to use past information from past experiments or to perform a preliminary study and use the corresponding sample standard deviation s instead of σ in the formula to calculate n. Example 2: Suppose that the standard deviation of a unsuccessful cricket song is σ = 4 minutes. How large should the sample size be in order to be 95% confident that the error of the estimate of the mean will not exceed 1 minute. 6 Estimating the mean from a normal population (σ unknown) For the particular case of a normal population it is possible to construct a confidence interval for the mean µ even when σ is unknown regardless of the sample size. Recall: Consider a random sample X1 , . . . , Xn from a normal population with mean µ and standard deviation σ. If we standardize the sample mean, but use the sample standard deviation S instead of population standard deviation σ, then X −µ √ S/ n does not follow a normal distribution. In fact it follows a distribution known as Student’s t with ν = n − 1 degrees of freedom. Thus, if the population is normal, a 100 (1 − α)% confidence interval for µ is s x ± tα/2,n−1 √ . n 7 Example 3: Consider the growth (in mm) of radish after three days in darkness : 15 20 22 20 29 37 11 35 15 30 33 8 10 25 Below we find a quantile-quantile plot. (a) Does it appear to be reasonable to model the distribution of the radish growth with a normal distribution? (b) Using the following summary data, produce a 95% confidence interval for the mean growth. (c) Use the following commands in minitab to produce a 95% confidence interval for the mean growth. We are assuming that the observations on in 8 column C1. MTB > onet c1; SUBC> Confidence 95. 9 Estimating a population proportion p Consider a sample proportion Pb = X/n, where X is the number of successes among n independent trials. So X ∼ B(n, p). But X/n is an average of 0s and 1s, so by the central limit theorem p (1 − p) 2 b approximately. P ∼ N µPb , σPb = N p, n So 1−α ≈ P = P Pb − p ! ≤ zα/2 −zα/2 ≤ p p (1 − p)/n ! r r p (1 − p) p (1 − p) ≤ p ≤ Pb + zα/2 . Pb − zα/2 n n Thus, for large n, a 100 (1 − α)% confidence interval for the population proportion p is r p (1 − p) . pb ± zα/2 n Problem: In the above formula the confidence interval uses the unknown parameter p. In practice, we use its point estimate p̂ instead to get an approximate confidence interval. For large n by using the following rule of thumb: n pb ≥ 5 and n (1 − pb) ≥ 5, a 100 (1 − α)% confidence interval for p is r pb (1 − pb) pb ± zα/2 . n 10 Example 4: Consider the following table concerning the location of nests of sparrows. location vines frequency 56 building 60 low tree 46 cavities 49 Compute a 95% confidence interval for the population proportion of sparrows that build nests in vines. Sample size: If we use pb as a point estimate for p, we are 100 (1 − α)% confident that the error |b p − p| will not exceed E when the size of the sample satistifies z 2 α/2 p (1 − p). n≥ E Problem: The formula involves the unknown quantity p. Solution: Consider p (1 − p) = p − p2 . It is a quadratic function turned down with zeros at p = 0 and p = 1. So the axis of symmetry of the parabola is p = 1/2. Thus, the maximum value of p (1 − p) is 1/2(1 − 1/2) = .25. If we use the value n at p = 1/2, then we obtain a value that will be at least as large as the required sample size for the true proportion p. Thus, if pb is an estimate for p, we are at least 100 (1 − α)% confident that the error |b p − p| will not be greater than E with a sample size n≥ z α/2 2 E 11 (0.25). Example 5: Suppose that we would like to be 95% confident that the error in the estimation of a population proportion p be at most 0.025. Compute the required sample size? 12

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Interval Estimation (Means and Proportions) Recall: The observed