Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
62 Instruction: Confidence Interval Estimates In this lecture, we begin making statistical inferences, that is, we begin forming conclusions about the population based on a sample. In particular, we will use X to estimate the population mean, µ . If a single sample statistic is used to estimate a population parameter, it is called a point estimate. Of course, we would not expect a single sample's mean to actually equal the population mean. Statistical methods have been developed that build an interval of values around a point estimate in such a way that it is highly likely that the population falls within the interval. We call these intervals confidence interval estimates. A point estimate is a single sample statistic used to estimate a population parameter. A confidence interval estimate is an interval of values for which the probability that a population parameter is within that interval is significant. Note the modifier "significant" in the definition of confidence interval estimate. What is significant is open to the statistician performing the estimation. Thus, the level of significance or confidence is an arbitrary probability value. Usually, the level of "significance"— called the confidence level—is some constant c close to the numerical value of one such as 0.90, 0.95, or 0.99. In either case, Z c is the number such that the area under the standard normal curve falling between − Z c and Z c is equal to c. Since area under the standard normal curve corresponds to probability, we have the desired probability for the chosen confidence level. If c is the confidence level and x is the point estimate then P ( −Zc ≤ Z x ≤ Zc ) = c The value Z c is called the critical value for the confidence level of c. Consider the probability associated with the confidence level c. P ( −Zc ≤ Z x ≤ Zc ) = c This probability is associated with the Z-score of a single sample statistic. Assume that the single sample statistic is a mean. Then, we know by the Central Limit Theorem that X has a distribution that is approximately normal with mean µ and standard deviation. Thus a single sample mean can be converted to a standard Z-score as below. ZX = X −µ σ n 63 Substituting for Z x into the probability for c, gives ⎛ ⎞ X −µ ≤ Z c ⎟⎟ = c . P ⎜⎜ − Z c ≤ σ n ⎝ ⎠ Multiplying all parts of the inequality above by −σ n gives ⎛ σ σ ⎞ P ⎜ −Z c ⋅ ≤ µ − X ≤ Zc ⋅ ⎟ = c. n n⎠ ⎝ Adding X to all sides of the inequality gives ⎛ σ σ ⎞ P ⎜ X − Zc ⋅ ≤ µ ≤ X + Zc ⋅ ⎟ = c. n n⎠ ⎝ The inequality itself is called the confidence interval for a population mean with known standard deviation. The confidence interval for µ with known σ is X − Zc ⋅ σ n ≤ µ ≤ X + Zc ⋅ σ n where Z c , the critical value for the confidence level of c, is the value c +1 from the standardized normal corresponding to the cumulative area of 2 distribution. Let's calculate a the confidence interval estimate for the population mean with known population standard deviation. Consider the case of a large national investment firm whose board wants to know the average amount invested by a client during the previous five years. Statisticians working for the firm take a random sample of 400 client files for the five-year period. The mean amount invested for this sample of 400 clients equals $5,250 and the known standard deviation for all the investments during the five years is $800.00. With this given information, we can construct a 0.95 confidence interval for µ . Our arbitrary confidence level, c, is 0.95. Thus, the critical value, Z c , is the value 0.95 + 1 = 0.975, which is 1.96 as shown on a table like corresponding to the cumulative area of 2 the table below. 64 Z 1.8 1.9 2.0 A for 0.00 0.9641 0.9713 0.9772 A for 0.01 0.9649 0.9719 0.9778 A for 0.02 0.9656 0.9726 0.9783 A for 0.03 0.9664 0.9732 0.9788 A for 0.04 0.9671 0.9738 0.9793 A for 0.05 0.9678 0.9744 0.9798 A for 0.06 0.9686 0.9750 0.9803 Now, the confidence interval can be calculated as below. $5, 250 − 1.96 ⋅ $800 ≤ µ ≤ $5, 250 + 1.96 ⋅ $800 400 400 $5, 250 − $78.4 ≤ µ ≤ $5, 250 + $78.4 $5,171.60 ≤ µ ≤ $5,328.4 The interval from $5,171.60 to $5,328.40 is the 0.95 confidence interval for µ . The firm's board can be confident that 95% of the time the mean amount invested by clients during the previous five years is somewhere ranging from $5,171.60 to $5,328.40. The astute reader will note that the above example is not very practical because it is not often that the population mean is unknown but the population standard deviation is known. In such a case, we construct confidence intervals using a distribution that approximates the standardized normal distribution called the Student's t distribution. This distribution changes slightly for varying degrees of freedom, a value related to the sample size.* For any degree of freedom the Student's t distribution approximates the standardized normal distribution and we can construct a confidence interval for a population mean with unknown standard deviation. The confidence interval for µ with unknown σ is X − tn −1 ⋅ S n ≤ µ ≤ X + tn −1 ⋅ S n where tn −1 , the critical value for the confidence level of c, is the value 1− c from the t distribution with n − 1 corresponding to the upper-tail area of 2 degrees of freedom. Let's calculate a confidence interval estimate for the population mean with unknown population standard deviation. Consider the case of Atlas International, a company that manufactures forklifts. The company has a new assembly line and the managers are interested in knowing the mean lift capacity for each forklift. For obvious expense reasons, the number of forklifts available for study is limited. Statisticians working for the company select twelve forklifts from a trial run of the new assembly line. The mean lift capacity for these twelve * The degrees of freedom correspond to the number of values in the sample that can vary, that is, "be free" after all the previous values have been fixed when used as addends of a sum divided by the sample size and equal to the sample mean. 65 forklifts equals three tons and the sample standard deviation is 0.25 tons. With this given information, we can construct a 0.95 confidence interval for µ . Our arbitrary confidence level, c, is 0.95. The degrees of freedom are 12 − 1 = 11 . Thus, 1 − 0.95 = 0.025, the critical value, t11 , is the value corresponding to the cumulative area of 2 which is 2.201 as shown on a table like the table below. degrees of freedom 10 11 12 t for uppertail area of 0.25 0.6998 0.6974 0.6955 t for uppertail area of 0.10 1.3722 1.3634 1.3562 t for uppertail area of 0.05 1.8125 1.7959 1.7823 t for uppertail area of 0.025 2.2281 2.2010 2.1788 t for uppertail area of 0.01 2.7638 2.7181 2.6810 t for uppertail area of 0.005 3.1693 3.1058 3.0545 Now, the confidence interval can be calculated as below. 3 − 2.201 ⋅ 0.25 ≤ µ ≤ 3 + 2.201 ⋅ 12 3 − 0.159 ≤ µ ≤ 3 + 0.159 2.841 ≤ µ ≤ 3.159 0.25 12 The interval from 2.841 tons to 3.159 tons is the 0.95 confidence interval for µ . The assembly line managers can be confident that 95% of the mean lift capacity for the forklifts produced by the new assembly line is somewhere ranging from 2.841 tons to 3.159 tons. The previous two examples find confidence intervals for µ , once with σ known, once with σ unknown. Naturally, confidence intervals can be constructed for other parameters as well. In particular, we can construct confidence intervals for the population proportion, π , using the interval detailed in the box below. The confidence interval for π is p − Zc ⋅ p (1 − p ) n ≤ π ≤ p + Zc ⋅ p (1 − p ) n where Z c , the critical value for the confidence level of c, is the value c +1 corresponding to the cumulative area of from the standardized normal 2 distribution. Assignment 7 66 Problems For the following problems, assume that S σ as long as the sample size is at least thirty. #1 A random sample of 40 cups of coffee dispensed from an automatic vending machine showed that the amount of coffee the machine gave was X = 7.1 ounces with standard deviation S = 0.3 ounces. Find a 90% confidence interval for the population mean of the amount of coffee dispensed by the machine. #2 A hospital's chief inspector wants to estimate the average number of days a patient stays in the mental health ward. A random sample of 100 patients shows the average stay to be 5.2 days with standard deviation of 1.9 days. Find a 0.90 confidence interval for the mean number of days a patient stays in the ward. #3 In wine making, acidity of the grape is a crucial factor. A ph range of 3.1 to 3.6 is considered very acceptable. A random sample of twelve bunches of ripe grapes was taken from a particular vineyard. For each bunch of grapes the acidity as measured by ph level was found to be: 3.2 3.5 3.7 3.3 3.4 3.6 3.6 3.1 3.5 3.2 3.1 3.4 Find a 99% confidence interval for the mean acidity of the entire harvest of grapes from the vineyard of interest. #4 A random sample of 100 felony trials in San Diego shows the mean waiting time between arrest and trial is 173 days with standard deviation 28 days. Find a 0.99 confidence interval for the mean number of days between arrest and trial. #5 An anthropologist is studying a large pre-historic communal dwelling in northern Arizona. A random sample of 127 individual family dwellings showed signs that nineteen belonged to the Sun Clan. Let π be the probability that a dwelling selected at random was a dwelling of a Sun Clan member. Find a point estimate for π and a confidence interval estimate for π.