Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 7 Estimates, Confidence Intervals, and Sample Sizes Inferences Based on a Single Sample Estimation with Confidence Intervals Learning Objectives  State what is estimated  Distinguish point and interval estimates  Explain interval estimates  Estimating a population mean: s is known.  Estimating a population mean: s is not known.  Estimating a Population Proportion  Compute Sample Size Statistical Methods Statistical Methods Descriptive Statistics Inferential Statistics Estimation Hypothesis Testing Estimation Process Population Mean, , is unknown       Sample      Random Sample Mean  X = 79 I am 95% confident that  is between 75 & 83. Unknown Population Parameters Are Estimated Estimate Population Parameter... with the Sample Statistic Mean  X Proportion p p^ Variance s2 s2 National Unemployment Rates, 2008 - 2015 Source: Bureau of Labor Statistics Jan. Feb. Mar. April May June July Aug. Sept. Oct. Nov. Dec. 2015 5.7 5.5 5.5 5.4 5.5 5.3 5.3 5.1 5.1 2014 6.6 6.7 6.7 6.3 6.3 6.1 6.2 6.1 5.9 5.8 5.8 5.6 2013 7.9 7.7 7.5 7.5 7.5 7.5 7.3 7.2 7.2 7.2 7.0 6.7 2012 8.3 8.3 8.2 8.1 8.2 8.2 8.2 8.1 7.8 7.9 7.8 7.8 2011 9.0 8.9 8.8 9.0 9.1 9.2 9.1 9.1 9.1 9.0 8.6 8.5 2010 9.7 9.7 9.7 9.9 9.7 9.5 9.5 9.6 9.6 9.6 9.8 9.4 2009 7.6 8.1 8.5 8.9 9.4 9.5 9.4 9.7 9.8 10.2 10.0 10.0 2008 4.9 4.8 5.1 5.0 5.5 5.6 5.8 6.2 6.2 6.6 6.8 7.2 Point Estimation Estimation Point Estimation Interval Estimation Point Estimation A point estimate of a parameter (, p, or s 2) is the value of a statistic ( X , ^ p , or s 2) used to estimate the parameter. 1. It provides a single value. 2. It is based on observations from 1 sample. 3. It gives no information about how close the 4. value is to the unknown population parameter. Example: Sample mean X = 79 is a point estimate of the unknown population mean. Interval Estimation Estimation Point Estimation Interval Estimation Interval Estimation  The sample mean rarely equals the population   mean. That is, sampling error is to be expected. Therefore, in addition to the point estimate for  , we need to provide some information that indicates the accuracy of the point estimate. We do so by giving a confidence-interval estimate for . Interval Estimation  A confidence interval or interval estimate is a   range of values used to estimate the true value of a population parameter. The interval is obtained from a point estimate of the parameter and a percentage that specifies the probability that the interval actually does contain the population parameter. This percentage is called the confidence level of the interval. Confidence Level (CL)  Is the probability (when given in decimal form) that the confidence interval contains the unknown population parameter. The CL is denoted (1 - )   That is,  is the probability that the parameter is  Not within the confidence interval. Typical values of CL are 99%, 95%, 90%. This means that the corresponding values of  are  = 0.01, 0.05, 0.10. Key Elements of the CI  The center of the interval is the point estimate X.  E is called the margin of error and is half the length of the confidence interval.  E is determined by , s, and the sample size n. Key Elements of the CI  This interval contains the parameter  , (1-)% of the times. P( X - E    X  E ) = 1 -  Margin of Error and CI Interval Estimation 1. Provides Range of Values. 2. Is based on observations from 1 sample. 3. Gives information about how close the estimate is the to unknown population parameter. 2. Is stated in terms of probability. 3. Example: unknown population mean lies between 66 and 70 with 95% confidence. http://onlinestatbook.com/stat_sim/sampling_dist/index.html Margin of Error or Interval Width Factors Affecting Interval Width 1. As data dispersion measured by s increases the error or width E increases. 2. As Sample Size n increases the error or width E decreases. 3. As the level of confidence (1 - ) % increases the width increases because it affects Z / 2 Confidence Interval Estimates Confidence Intervals Mean s Known Variance s Unknown Proportion CI for the Mean (s known) 1. Assumptions are  Population standard deviation is known  Population is normally distributed or  Sampling distribution can be approximated by normal distribution (n  30) 2. Confidence Interval Estimate X - Z / 2 s n    X  Z / 2 s n Example 1 The mean of a random sample of n = 25 isX = 50. Set up a 95% confidence interval estimate for  if s = 10. X - Z 0.05/ 2 s    X  Z 0.05/ 2 s n n 10 10 50 - 1.96    50  1.96 25 25 46.08    53.92 Example 2 You are a Quality Control inspector for Norton. The s for 2-liter bottles is .05 liters. A random sample of 100 bottles showedX = 1.99 liters. What is the 90% confidence interval estimate of the true mean amount in 2-liter bottles? Tinto 2 litros 2 liter Solution X - Z /2  s    X  Z /2  s n n .05 .05 1.99 - 1.645     1.99  1.645  100 100 1.982    1.998 Confidence Interval Estimates Confidence Intervals Mean s Known Variance s Unknown Proportion CI for the Mean (s unknown)  If X is a normally distributed variable with mean μ and standard deviation σ, then, for samples of size n, the variableX is also normally distributed and has mean μ and standard deviation s / n .  Equivalently, the standardized version ofX , X -  has the standard normal distribution. z= s/ n CI for the Mean (s unknown)  In practice, σ is unknown therefore we cannot base our CI procedure on the standardized version ofX.  The best we can do is estimate σ using the sample standard deviation s and replace σ with s in the equation X - z= s/ n and base our CI procedure on the new variable, X - t= s/ n t-Distributions and t-Curves  The t-distribution depends on n and for each n there  is t-curve. Notice, that to find a t-value you need to compute the sample mean and the sample standard deviation. This will usually require the use of a calculator. Finding the t-Value Having a Specified Area to Its Right  For a t-curve with 13 degrees of freedom, find t0.05; that is, find the t-value having area 0.05 to its right, as shown in the figure. Finding the t-Value Having a Specified Area to Its Right  To find the t-value in question, we use Table IV.   For ease of reference, we have repeated a portion of Table IV in the next slide. Notice that the table provides the t-score only when you know the area to its right. Unlike the table for z-scores, the t-table cannot be used to find probabilities when you know what the t-score is. For this, you need to use the t-distribution in the calculator. Finding the t-Value Having a Specified Area to Its Right  For a t-curve with 13 degrees of freedom, find t0.05; that is, find the t-value having area 0.05 to its right, as shown in the figure. Example 1  A random sample of n = 25 has X = 50 and s = 8. Set up a 95% confidence interval estimate for . S S X - t /2, n -1     X  t /2, n -1  n n 8 8 50 - 2.0639     50  2.0639  25 25 46.69    53.30 Example 2  You are a time study analyst in  manufacturing. You have recorded the following task times (in minutes): 3.6, 4.2, 4.0, 3.5, 3.8, 3.1. What is the 90% confidence interval estimate of the population mean task time? Solution X = 3.7 S = 0.38987 n = 6 df = 6 - 1 = 5 S n = 0.38987 / 6 = 0.1592 t0.05 = 2.015 3.7 - 2.015  0.1592    3.7  2.015  0.1592 3.379    4.020 Finding Sample Sizes Estimating the Sample Size Example 1  What sample size is needed to be 90% confident of being correct within  5? A pilot study suggested that the standard deviation is 45.  Z /2s   1.645  45  n=  = 219.2  220  = 5   E   2 2 Example 2  You work in Human Resources at Merrill Lynch. You plan to survey employees to find their average medical expenses. You want to be 95% confident that the sample mean is within ± $50. A pilot study showed that s was about $400.  What sample size do you use? Solution  Z 0.025s  n=   E  2  1.96  400  =  50   2 = 245.86  246 Confidence Interval Estimates Confidence Intervals Mean s Known Variance s Unknown Proportion Confidence Intervals for One Population Proportion Proportion Notation and Terminology  Many statistical studies are concerned with obtaining  the proportion (percentage) of a population that has a specified attribute. For example, we might be interested in • the percentage of U.S. adults who have health insurance, • the percentage of cars in the US that are imports, • the percentage of U.S. adults who favor stricter clean air health standards, or • the percentage of Canadian women in the labor force. Proportion Notation and Terminology  Notice that in the previous examples, a given   individual in the population will have the specified attribute or not. This means that we are interested in an experiment that can have only two possible outcomes. For instance, • A U.S. adult does have health insurance or does not. • A car in the U.S. is either an import or is not. • etcetera Proportion Notation and Terminology  We introduced some notation and terminology used when we make inferences about a population proportion. Proportion Notation and Terminology  Sometimes we refer to x (the number of members in the sample that have the specified attribute) as the number of successes and to n − x (the number of members in the sample that do not have the specified attribute) as the number of failures. Proportion Notation and Terminology  Notice that for a given sample of size n, the quotient x/n, is the mean number of successes in n trials. p is the mean of the variable X which is 1 That is, ^ when the member in the sample has the attribute and 0 when the member does not. The Sampling Distribution of the Sample Proportion  To make inferences about a population proportion p  we need to know the sampling distribution of the sample proportion, that is, the distribution of the p. variable ^ Because a proportion can always be regarded as a mean, we can use our knowledge of the sampling distribution of the sample mean to derive the sampling distribution of the sample proportion.  The accuracy of the normal approximation depends on n and p. If p is close to 0.5, the approximation is quite accurate, even for moderate n. The farther p is from 0.5, the larger n must be for the approximation to be accurate.  As a rule of thumb, we use the normal approximation  when np and n(1 − p) are both 5 or greater. In this section, when we say that n is large, we mean that np and n(1 − p) are both 5 or greater.  Since in practice we do not know the value of p we  replace the conditions np  5 and n(1− p)  5 with ^  5 and n(1− ^ the conditions np p)  5 This is the same as: the number of successes x and the number of failures n-x are both 5 or greater. CI for the Proportion p Margin of Error or Interval Width p̂ pˆ - z / 2 pˆ (1- pˆ )/ n pˆ  z / 2 pˆ (1- pˆ )/ n Estimating the Sample Size  The margin of error E and CL (1-)% of a CI are often specified in advance. We must then determine the sample size required to meet those specifications.  If we solve for n in the formula for E, we obtain  z / 2  n = pˆ (1 - pˆ )   E   2  This formula cannot be used to obtain the required p, is not sample size because the sample proportion, ^ known prior to sampling. Estimating the Sample Size  The way around this problem is to observe that the ^ can be is 0.25 when ^ p = 0.5 p(1-p) largest ^ Estimating the Sample Size ^ is 0.25  Because the largest possible value of ^p(1-p) the most conservative approach for determining sample size is to use that value in equation  z / 2  n = pˆ (1 - pˆ )    E  2  The sample size obtained then will generally be larger than necessary and the margin of error less than required. Nonetheless, this approach guarantees that the specifications will be met or bettered. Estimating the Sample Size Example 1  A random sample of 400 graduates showed 32 went to grad school. Set up a 95% confidence interval estimate for proportion p of students that go to grad school. pˆ - Z /2  pˆ  (1 - pˆ )  p  pˆ  Z /2  n pˆ  (1 - pˆ ) n .08  (1 - .08) .08  (1 - .08) 0.08 - 1.96   p  0.08  1.96  400 400 0.053  p  0.107 Example 2  You are a production manager for a newspaper. You want to find the % defective newspapers. Of 200 newspapers, 35 had defects. What is the 90% confidence interval estimate of the population proportion of defective newspapers? Solution pˆ - z /2  0.175 - 1.645  pˆ  (1 - pˆ )  p  pˆ  z /2  n pˆ  (1 - pˆ ) n .175  (.825) .175  (.825)  p  0.175  1.645  200 200 0.1308  p  0.2192 Example 3 Example 4 Example 5