Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 7 Inferences Based on a Single Sample: Estimation with Confidence Intervals 7.1 Identifying and Estimating the Target Parameter Definition 7.1 Target Parameter: The unknown population parameter (mean or proportion) that we are interested in estimating is called the target parameter. Parameter Phrases μ Mean (average) p Proportion (fraction, percentage) Data Type quantitative qualitative Definition 7.2 Point Estimator: A point estimator of a population parameter is a rule that tells you how to use the sample data to calculate a single number that can be used as an estimate of the population parameter. For example, the sample mean x̄ is a point estimator for the population mean μ. Definition 7.3 Interval Estimator: An interval estimator of a population parameter is a rule that tells you how to calculate two numbers; an upper and a lower limit, based on the sample data, forming an interval within which the parameter is expected to lie. This pair of numbers is called an interval estimate or confidence interval. The large number which located at the upper end of the interval, is called the upper confidence limit (UCL) and the number that located at the lower extreme of the interval, is called the lower confidence limit (LCL). Confidence width: The difference between UCL and LCL is called confidence width. That is Confidence width = U CL − LCL 35 7.2 Confidence Interval for a Population Mean: Normal (z) Statistic Definition 7.4 Confidence Coefficient: Confidence coefficient is the probability that a randomly selected confidence interval will enclose the parameter. The confidence coefficient measures the proportion of samples that produce a confidence interval containing the population parameter. A good confidence interval is as narrow as possible and has a confidence coefficient near to 1. The narrower the interval, the more exactly we have located the parameter. The larger the confidence coefficient, the more the confidence we have that a particular interval enclose the parameter. Large Sample Confidence Interval for μ The confidence interval for any population mean or proportion is defined as point Estimator ± Bound (Margin of error) Point Estimator ± Table Value × SE (Estimator) σ x̄ ± z α2 × √ . n (7.1) Using equation (7.1), the (1 − α)100% confidence interval (CI) for μ is obtained as σ σ x̄ ± z α2 × √ = x̄ − z α2 × √ n n , σ x̄ + z α2 × √ , n (7.2) where z α2 is a value from normal table such that P (z > z α2 ) = α2 . For example, if α = 0.05, then z α2 = z 0.05 = z0.025 = 1.96 (from Normal Table IV, Appendix B). 2 If σ is unknown, then replace it with s, the sample standard deviation. Assumptions: (1) The n observations in the sample were randomly selected from a population (2) Large sample size (n ≥ 30) Some common confidence intervals, the corresponding confidence coefficients and z values are given in Table 7.1 Table 7.1: Confidence coefficient (1 − α) 0.80 0.90 0.95 0.98 0.99 α 0.10 0.10 0.05 0.02 0.01 α 2 0.05 0.05 0.025 0.01 0.005 36 z α2 1.28 1.645 1.96 2.33 2.58 What do we mean by a 95% CI? A 95% confidence interval is constructed according to a method such that 95% of all the confidence intervals contain the true value of the population parameter and 5% of all intervals do not contain the true value of the parameter. For example, you wish to estimate the average height of the students of this class by an interval estimation. Suppose, you consider 100 random samples from this class, and you construct 100 confidence intervals, then you would expect 95% of such intervals will contain the mean height (true mean or population mean) of all students of this class. Remember, in real life we consider only one sample, that’s why, we are 95% confident only does not necessarily mean that your interval will contain or capture the population mean (true mean). Example 7.3, page 281: The estimate of the mean number of unoccupied seats is x̄ = 11.6. The margin of error is σ 4.1 1.645 × √ = 1.645 × √ = 0.45 n 225 Conclusion: We are 90% confident that our estimate of 11.6 is within 0.45 of the true mean number of unoccupied seats. The 90% confidence limits for the true mean number of unoccupied seats is 4.1 σ = 11.6 ± 0.45 = [11.15 , 12.05] x̄ ± 1.645 × √ = 11.6 ± 1.645 × √ n 225 Conclusion: We are 90% confident that the true mean number of unoccupied seats (μ) will lie between 11.15 to 12.05. Exercise 7.4, page 283. Exercise 7.10, page 284. 7.3 Confidence Interval for a Population Mean: Student’s t-Statistic Basic Idea: If x is distributed as Normal with mean μ and standard deviation σ. Then z= x̄ − μ √ σ/ n is distributed as standard normal (z ∼ N (0, 1)) and the percentile points of this distribution are presented in Table IV of Appendix B. However, if σ is unknown, we estimate it by sample standard deviation, s and we have the following new formula, t= x̄ − μ √ . s/ n 37 Then t is distributed as Student’s t with (n − 1) degrees of freedoms. See Figure 7.7, page 287, for both normal (z) and Student’s t distribution (with 4 degrees of freedom) functions. The percentile points of t distribution are presented in Table V (Appendix B, page 467) for various degrees of freedoms. Assumptions: 1. The n observations were randomly selected from a population. 2. Data are from a normal population. The (1 − α)100% confidence interval (CI) for μ is: x̄ ± t α2 ,n−1 × √s n s s x̄ − t α2 ,n−1 × √ ≤ μ ≤ x̄ + t α2 ,n−1 × √ , n n (7.3) where t α2 ,n−1 is the percentage point of the t distribution with (n − 1) degrees of freedom such that p(tn−1 ≥ t α2 ,n−1 ) = α2 . Example 7.4, page 288. Example 7.5, page 290. Exercise 7.26, page 293. Exercise 7.30, page 294. 7.4 Large Sample Confidence Interval for a Population Proportion Assumptions: (1) The n observations (measurements) in the sample were randomly selected from a binomial population. (2) Sample size is large, ie n ≥ 30. Point Estimation of p: Point Estimator: p̂ = total number of successes among n trials x = n n Example 7.6, page 296. Sampling Distribution of p̂ 1. The mean of the sampling distribution of p̂ is p. That is E(p̂) = p 38 2. The standard deviation of p̂ is σp̂ = 3. By central limit theorem, p̂ ≈ N p, pq n pq , n q = 1−p . The 100(1 − α)% confidence interval for population proportion p is: p̂ ± z α2 × p̂ − z α2 × p̂q̂ ≤ p ≤ p̂ + z α2 × n p̂q̂ . n p̂q̂ n (7.4) Example 7.7, page 298. Extra Example 1: A statistician is interested to estimate the proportion of female students at FIU. He randomly sampled 200 students from a total of 32,000 students and found 88 are female. (n=200 and N=32,000) (a) Define the population of interest in the survey. (b) Construct a 95% confidence interval for p, the population proportion of female students at FIU. (c) It is told by the administrator that 45% of the students at FIU are female. Does your confidence interval constructed in (a) support the administrator’s claim? Why or why not? Exercise 7.38, page 301. Exercise 7.46, page 302. 39