Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Sufficient statistic wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
German tank problem wikipedia , lookup
Statistical inference wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
Chapter 12 Inference About One Population 1 12.1 Introduction • In this chapter we utilize the approach developed before to describe a population. – Identify the parameter to be estimated or tested. – Specify the parameter’s estimator and its sampling distribution. – Construct a confidence interval estimator or perform a hypothesis test. 2 12.1 Introduction • We shall develop techniques to estimate and test three population parameters. – Population mean m – Population variance s2 – Population proportion p 3 12.2 Inference About a Population Mean When the Population Standard Deviation Is Unknown Recall that when s is known we use the following statistic to estimate and test a population mean z xm s n When s is unknown, we use its point estimator s, and the z-statistic is replaced then by the t-statistic 4 The t - Statistic ZZt t t Z ttt Z x m xm Z t t t t t Z ss n s s s sss n s ssss sssssss When the sampled population is normally distributed, the t statistic is Student t distributed. 5 The t - Statistic Using the t-table t The t distribution is mound-shaped, and symmetrical around zero. d.f. = v2 v1 < v2 d.f. = v1 0 xm s n The “degrees of freedom”, (a function of the sample size) determine how spread the distribution is (compared to the normal distribution) 6 Testing m when s is unknown • Example 12.1 - Productivity of newly hired Trainees 7 Testing m when s is unknown • Example 12.1 – In order to determine the number of workers required to meet demand, the productivity of newly hired trainees is studied. – It is believed that trainees can process and distribute more than 450 packages per hour within one week of hiring. – Can we conclude that this belief is correct, based on productivity observation of 50 trainees 8 (see file Xm12-01). Testing m when s is unknown • Example 12.1 – Solution – The problem objective is to describe the population of the number of packages processed in one hour. – The data are interval. H0:m = 450 H1:m > 450 – The t statistic t x m s n d.f. = n - 1 = 49 9 Testing m when s is unknown • Solution continued (solving by hand) – The rejection region is t > ta,n – 1 ta,n - 1 = t.05,49 @ t.05,50 = 1.676. From the data we have x i 23,019 2 x i 10,671,357, thus 23,019 x 460 .38, and 50 x x n 2 s2 2 i i n 1 s 1507 .55 38.83 1507 .55. 10 Testing m when s is unknown Rejection region • The test statistic is t 1.676 x m s n 460.38 450 38.83 50 1.89 1.89 • Since 1.89 > 1.676 we reject the null hypothesis in favor of the alternative. • There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level. 11 Testing m when s is unknown t-Test: Mean Pack ages Mean 460.38 Standard Deviation 38.83 Hypothesized Mean 450 df 49 t Stat 1.89 P(T<=t) one-tail 0.0323 t Critical one-tail 1.6766 P(T<=t) two-tail 0.0646 t Critical two-tail 2.0096 .05 .0323 • Since .0323 < .05, we reject the null hypothesis in favor of the alternative. • There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level. 12 Estimating m when s is unknown • Confidence interval estimator of m when s is unknown x ta s 2 n d.f . n 1 13 Estimating m when s is unknown • Example 12.2 – An investor is trying to estimate the return on investment in companies that won quality awards last year. – A random sample of 83 such companies is selected, and the return on investment is calculated had he invested in them. – Construct a 95% confidence interval for the mean return. 14 Estimating m when s is unknown • Solution (solving by hand) – The problem objective is to describe the population of annual returns from buying shares of quality award-winners. – The data are interval. x 15 .02 s 2 68 .98 s 68 .98 8.31 – Solving by hand • From the Xm12-02 we determine x ta 2, n 1 s @ 15 .02 1.990 n t.025,82@ t.025,80 8.31 83 13 .19,16 .85 15 Estimating m when s is unknown t-Estimate: Mean Mean Standard Deviation LCL UCL Returns 15.02 8.31 13.20 16.83 16 Checking the required conditions • We need to check that the population is normally distributed, or at least not extremely nonnormal. • There are statistical methods to test for normality (one to be introduced later in the book). • From the sample histograms we see… 17 A Histogram for Xm12- 01 14 12 10 8 6 4 2 0 400 425 450 475 500 525 550 Packages A Histogram for Xm12- 02 30 575 More 25 20 15 10 5 0 -4 2 8 14 Returns 22 30 More 18 12.3 Inference About a Population Variance • Sometimes we are interested in making inference about the variability of processes. • Examples: – The consistency of a production process for quality control purposes. – Investors use variance as a measure of risk. • To draw inference about variability, the parameter of interest is s2. 19 12.3 Inference About a Population Variance • The sample variance s2 is an unbiased, consistent and efficient point estimator for s2. (n 1)s 2 • The statistic has a distribution called Chi2 s squared, if the population is normally distributed. d.f. = 5 2 (n 1)s 2 s 2 d.f . n 1 d.f. = 10 20 Testing and Estimating a Population Variance • From the following probability statement P(21-a/2 < 2 < 2a/2) = 1-a we have (by substituting 2 = [(n - 1)s2]/s2.) (n 1)s 2 2a / 2 s2 (n 1)s 2 12a / 2 21 Testing the Population Variance • Example 12.3 (operation management application) – A container-filling machine is believed to fill 1 liter containers so consistently, that the variance of the filling will be less than 1 cc (.001 liter). – To test this belief a random sample of 25 1-liter fills was taken, and the results recorded (Xm12-03) – Do these data support the belief that the variance is less than 1cc at 5% significance level? 22 Testing the Population Variance • Solution – The problem objective is to describe the population of 1-liter fills from a filling machine. – The data are interval, and we are interested in the variability of the fills. – The complete test is: H0: s2 = 1 2 2 H1: s <1 (n 1)s 2 The test statistic is The rejection region . s is 2 12a ,n1 2 23 Testing the Population Variance • Solving by hand – Note that (n - 1)s2 = S(xi - x)2 = Sxi2 – (Sxi)2/n – From the sample (Xm12-03) we can calculate Sxi = 24,996.4, and Sxi2 = 24,992,821.3 – Then (n - 1)s2 = 24,992,821.3-(24,996.4)2/25 =20.78 2 ( n 1 ) s 20.78 2 2 20.78, 2 s 1 12a ,n1 .295,251 13.8484. There is insufficient evidence to reject the hypothesis that the variance is less than 1. Since 13.8484 20.78, do not reject the null hypothesis. 24 Testing the Population Variance a = .05 1-a = .95 Rejection region 2 13.8484 13.8484 20.8 2 .295,251 Do not reject the null hypothesis 25 Estimating the Population Variance • Example 12.4 – Estimate the variance of fills in Example 12.3 with 99% confidence. • Solution – We have (n-1)s2 = 20.78. From the Chi-squared table we have 2a/2,n-1 = 2.005, 24 = 45.5585 21a/2,n-1 2.995, 24 = 9.88623 26 Estimating the Population Variance • The confidence interval estimate is (n 1)s (n 1)s 2 s 2 2 a / 2 1a / 2 2 2 20.78 20.78 2 s 45.5585 9.88623 .46 s 2.10 2 27 12.4 Inference About a Population Proportion • When the population consists of nominal data, the only inference we can make is about the proportion of occurrence of a certain value. • The parameter p was used before to calculate these probabilities under the binomial distribution. 28 12.4 Inference About a Population Proportion • Statistic and sampling distribution – the statistic used when making inference about p is: x p̂ where n x the number of successes . n sample size . – Under certain conditions, [np > 5 and n(1-p) > 5], p̂ is approximately normally distributed, with m = p and s2 = p(1 - p)/n. 29 Testing and Estimating the Proportion • Test statistic for p p̂ p Z p(1 p) / n where np 5 and n(1 p) 5 • Interval estimator for p (1-a confidence level) p̂ z a / 2 p̂(1 p̂) / n provided np̂ 5 and n(1 p̂) 5 30 Additional example Testing the Proportion • Example 12.5 (Predicting the winner in election day) – Voters are asked by a certain network to participate in an exit poll in order to predict the winner on election day. – Based on the data presented in Xm12-05 where 1=Democrat, and 2=Republican), can the network conclude that the republican candidate will win the state college vote? 31 Testing the Proportion • Solution – The problem objective is to describe the population of votes in the state. – The data are nominal. – The parameter to be tested is ‘p’. – Success is defined as “Vote republican”. – The hypotheses are: H0: p = .5 H1: p > .5 More than 50% vote Republican 32 Testing the Proportion – Solving by hand • The rejection region is z > za = z.05 = 1.645. • From file we count 407 success. Number of voters participating is 765. • The sample proportion is p̂ 407 765 .532 • The value of the test statistic is Z p̂ p p(1 p) / n .532 .5 .5(1 .5) / 765 1.77 • The p-value is = P(Z>1.77) = .0382 33 Testing the Proportion z-Test : Proportion Sample Proportion Observations Hypothesized Proportion z Stat P(Z<=z) one-tail z Critical one-tail P(Z<=z) two-tail z Critical two-tail 0.532 765 0.5 1.77 0.0382 1.6449 0.0764 1.96 There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At 5% significance level we can conclude that more than 50% voted Republican. 34 Estimating the Proportion • Nielsen Ratings – In a survey of 2000 TV viewers at 11.40 p.m. on a certain night, 226 indicated they watched “The Tonight Show”. – Estimate the number of TVs tuned to the Tonight Show in a typical night, if there are 100 million potential television sets. Use a 95% confidence level. – Solution pˆ za / 2 pˆ (1 pˆ ) / n .113 1.96 .113 (1 .113 ) / 2000 .113 .014 35 Estimating the Proportion • Solution z - Estimate: Proportion Viewers Sample Proportion Observations LCL UCL 0.113 2000 0.099 0.127 A confidence interval estimate of the number of viewers who watched the Tonight Show: LCL = .099(100 million)= 9.9 million UCL = .127(100 million)=12.7 million 36 Selecting the Sample Size to Estimate the Proportion • Recall: The confidence interval for the proportion is pˆ za / 2 pˆ (1 pˆ ) / n • Thus, to estimate the proportion to within W, we can write W za / 2 pˆ (1 pˆ ) / n 37 Selecting the Sample Size to Estimate the Proportion • The required sample size is za / 2 pˆ (1 pˆ ) n W 2 38 Sample Size to Estimate the Proportion • Example – Suppose we want to estimate the proportion of customers who prefer our company’s brand to within .03 with 95% confidence. 1.96 p̂(1 p̂) – Find the sample size. n – Solution .03 W = .03; 1 - a = .95, therefore a/2 = .025, so z.025 = 1.96 Since the sample has not yet been taken, the sample proportion is still unknown. We proceed using either one of the following two methods: 39 2 Sample Size to Estimate the Proportion • Method 1: – There is no knowledge about the value of p̂ • Let p̂ .5 . This results in the largest possible n needed for a 1-a confidence interval of the form p̂ .03 . • If the sample proportion does not equal .5, the actual W will be narrower than .03 with the n obtained by the formula below. • Method 2: – There is some idea about the value of p̂ • Use the value of p̂ to calculate the sample size 1.96 .5(1 .5) n .03 2 1,068 1.96 .2(1 .2) n .03 2 683 40