Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Confidence interval wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
Statistical inference wikipedia , lookup
Chapter 12 Inference About One Population Introduction We shall develop techniques to estimate and test three population parameters. mean m Population variance s2 Population proportion p Population Inference About a Population Mean When the Population Standard Deviation Is Unknown Recall that when s is known we use the following statistic to estimate and test a population mean z xm s n When s is unknown, we use its point estimator s, and the z-statistic is replaced then by the t-statistic The t - Statistic t The t distribution is mound-shaped, and symmetrical around zero. d.f. = v2 v1 < v2 d.f. = v1 0 xm s n The “degrees of freedom”, (a function of the sample size) determine how spread the distribution is (compared to the normal distribution) How to calculus sample variance From the data we have xi , 2 x i , thus x x n 2 s 2 2 i i n 1 Testing m when s is unknown Example 1 In order to determine the number of workers required to meet demand, the productivity of newly hired trainees is studied. It is believed that trainees can process and distribute more than 450 packages per hour within one week of hiring. Can we conclude that this belief is correct, based on productivity observation of 50 trainees Testing m when s is unknown Example 1 – Solution The problem objective is to describe the population of the number of packages processed in one hour. The data are interval. H0:m = 450 H1:m > 450 The t statistic t x m s n d.f. = n - 1 = 49 Testing m when s is unknown Solution continued (solving by hand) The rejection region is From the data we have t > ta,n – 1 ta,n - 1 = t.05,49 @ t.05,50 = 1.676. x i 23,019 2 x i 10,671,357, thus 23,019 x 460 .38, and 50 x x n 2 s2 2 i i n 1 s 1507 .55 38.83 1507 .55. Testing m when s is unknown Rejection region The test statistic is t x m s n 1.676 460.38 450 38.83 50 1.89 1.89 Since 1.89 > 1.676 we reject the null hypothesis in favor of the alternative. There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level. Estimating m when s is unknown Confidence interval estimator of m when s is unknown x ta s 2 n d.f . n 1 Estimating m when s is unknown Example 2 An investor is trying to estimate the return on investment in companies that won quality awards last year. A random sample of 83 such companies is selected, and the return on investment is calculated had he invested in them. Construct a 95% confidence interval for the mean return. Estimating m when s is unknown Solution (solving by hand) The problem objective is to describe the population of annual returns from buying shares of quality award-winners. The data are interval. x 15 .02 s 2 68 .98 s 68 .98 8.31 Solving by hand From the data we determine x ta 2, n 1 s @ 15 .02 1.990 n t.025,82@ t.025,80 8.31 83 13 .19,16 .85 Checking the required conditions We need to check that the population is normally distributed, or at least not extremely nonnormal. There are statistical methods to test for normality (one to be introduced later in the book). From the sample histograms we see… A Histogram for Example 1 14 12 10 8 6 4 2 0 400 425 450 475 500 525 550 Packages A Histogram for Example 2 30 575 More 25 20 15 10 5 0 -4 2 8 14 Returns 22 30 More Summary of Test Statistics to be Used in a Hypothesis Test about a Population Mean Yes s known ? Yes n > 30 ? No Yes Use s to estimate s s known ? Yes z x m s/ n No x m t s/ n x m z s/ n No Popul. approx. normal ? No Use s to estimate s x m t s/ n Increase n to > 30 Example 1 Solution Example 2 Solution Example 3 Solution Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: The consistency of a production process for quality control purposes. Investors use variance as a measure of risk. To draw inference about variability, the parameter of interest is s2. Inference About a Population Variance The sample variance s2 is an unbiased, consistent and efficient point estimator for s2. (n 1)s 2 The statistic has a distribution 2 s called Chi-squared, if the population is normally distributed. d.f. = 5 2 (n 1)s 2 s 2 d.f. = 10 d.f . n 1 Testing the Population Variance Example 3 (operation management application) A container-filling machine is believed to fill 1 liter containers so consistently, that the variance of the filling will be less than 1 cc (.001 liter). To test this belief a random sample of 25 1-liter fills was taken, and the results recorded Do these data support the belief that the variance is less than 1cc at 5% significance level? Testing the Population Variance Solution The problem objective is to describe the population of 1-liter fills from a filling machine. The data are interval, and we are interested in the variability of the fills. The complete test is: H0: s2 = 1 2 2 H1: s <1 (n 1)s 2 The test statistic is The rejection region . s is 2 12a ,n1 2 Testing the Population Variance • Solving by hand – Note that (n - 1)s2 = S(xi - x)2 = Sxi2 – (Sxi)2/n – From the sample, we can calculate Sxi = 24,996.4, and Sxi2 = 24,992,821.3 – Then (n - 1)s2 = 24,992,821.3-(24,996.4)2/25 =20.78 2 ( n 1 ) s 20.78 2 2 20.78, 2 s 1 12a ,n1 .295,251 13.8484. There is insufficient evidence to reject the hypothesis that the variance is less than 1. Since 13.8484 20.78, do not reject the null hypothesis. Testing the Population Variance a = .05 1-a = .95 Rejection region 2 13.8484 13.8484 20.8 2 .295,251 Do not reject the null hypothesis Testing and Estimating a Population Variance From the following probability statement P(21-a/2 < 2 < 2a/2) = 1-a we have (by substituting 2 = [(n - 1)s2]/s2.) (n 1)s 2 2a / 2 s2 (n 1)s 2 12a / 2 Example 4 Solution Example 5 During annual checkups physician routinely send their patients to medical laboratories to have various tests performed. One such test determines the cholesterol level in patients’ blood. However, not all tests are conducted in the same way. To acquire more information, a man was sent to 10 laboratories and in each had his cholesterol level measured. The results are listed here. Estimate with 95% confidence the variance of these measurements. 4.70 4.83 4.65 4.60 4.75 4.88 4.68 4.75 4.80 4.90 Solution Inference About a Population Proportion When the population consists of nominal data, the only inference we can make is about the proportion of occurrence of a certain value. The parameter p was used before to calculate these probabilities under the binomial distribution. Inference About a Population Proportion Statistic and sampling distribution the statistic used when making inference about p is: x p̂ where n x the number of successes . n sample size . – Under certain conditions, [np > 5 and n(1-p) > 5], p̂ is approximately normally distributed, with m = p and s2 = p(1 - p)/n. Testing and Estimating the Proportion Test statistic for p p̂ p Z p(1 p) / n where np 5 and n(1 p) 5 Interval estimator for p (1-a confidence level) p̂ z a / 2 p̂(1 p̂) / n provided np̂ 5 and n(1 p̂) 5 Example 6 Solution Selecting the Sample Size to Estimate the Proportion Recall: The confidence interval for the proportion is pˆ za / 2 pˆ (1 pˆ ) / n Thus, to estimate the proportion to within W, we can write W za / 2 pˆ (1 pˆ ) / n Selecting the Sample Size to Estimate the Proportion The required sample size is za / 2 pˆ (1 pˆ ) n W 2 Selecting the Sample Size Two methods – in each case we choose a value for solve the equation for n. Method 1 : no knowledge of even a rough value of a ‘worst case scenario’ so we substitute = .50 then . This is Method 2 : we have some idea about the value of . This is a better scenario and we substitute in our estimated value. 12.40