* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download S 2
History of statistics wikipedia , lookup
Sufficient statistic wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Taylor's law wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Economics 173 Business Statistics Lectures 5 & 6 Summer, 2001 Professor J. Petry 1 Chapter 12 Inference about the Comparison of Two Populations 2 12.1 Introduction • Variety of techniques are presented whose objective is to compare two populations. • We are interested in: – The difference between two means. – The ratio of two variances. – The difference between two proportions. 3 12.2 Inference about the Difference b/n Two Means: Independent Samples • Two random samples are drawn from the two populations of interest. • Because we are interested in the difference between the two means, we shall build the statistic x for each sample (and support the analysis by the statistic S2 as well). 4 The Sampling Distribution of x x 1 2 x1 x 2 is normally distributed if the (original) population distributions are normal . x1 x 2 is approximately normally distributed if the (original) population is not normal, but the sample size is large. Expected value of The variance of x1 x 2 is m1 - m2 x1 x 2 is s12/n1 + s22/n2 5 • If the sampling distribution of x1 x 2 is normal or approximately normal we can write: ( x 1 x 2 ) (m m ) Z s s n1 n2 • Z can be used to build a test statistic or a confidence interval for m1 - m2 6 • Practically, the “Z” statistic is hardly used, because the population variances are not known. ( x 1 x 2 ) (m m ) Zt sS?12 sS?22 n1 n2 • Instead, we construct a “t” statistic using the sample “variances” (S12 and S22). 7 • Two cases are considered when producing the t-statistic. – The two unknown population variances are equal. – The two unknown population variances are not equal. 8 Case I: The two variances are equal • Calculate the pooled variance estimate by: 2 2 ( n 1 ) s ( n 1 ) s 2 1 2 2 Sp 1 n1 n2 2 n2 = 15 n1 = 10 S 2 1 S 22 S p2 Example: S12 = 25; S22 = 30; n1 = 10; n2 = 15. Then, (10 1)( 25) (15 1)( 30) Sp 28.04347 10 15 2 2 9 • Construct the t-statistic as follows: ( x1 x 2 ) (m m ) t 1 2 1 sp ( ) n1 n2 d.f . n1 n2 2 • Perform a hypothesis test H0: m m = 0 H1: m m > 0; or < 0; or 0 Build an interval estimate ( x1 x 2 ) t 1 1 sp ( ) n1 n2 2 where is the confidence level. 10 Case II: The two variances are unequal t ( x1 x 2 ) (m m ) d.f. 2 1 2 2 s s ( ) n1 n2 ( s12 n1 s 22 ) 2 2 1 2 2 2 ( s n1 ) ( s n2 ) n1 1 n2 1 2 11 Run a hypothesis test as needed, or, build an interval estimate Estimator s12 s 22 (x 1 x 2 ) t n1 n 2 where is the confidence level. 12 • Example 12.1 – Do people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast? – A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. – For each person the number of calories consumed at lunch was recorded. 13 Calories consumed at lunch Consmers Non-cmrs 568 498 589 681 540 646 636 739 539 596 607 529 637 617 633 555 . . . . 705 819 706 509 613 582 601 608 787 573 428 754 741 628 537 748 . . . . Solution: • The data are quantitative. • The parameter to be tested is the difference between two means. • The claim to be tested is that mean caloric intake of consumers (m1) is less than that of non-consumers (m2). 14 • Identifying the technique –The hypotheses are: H0: (m1 - m2) = 0 H1: (m1 - m2) < 0 (m1 < m2) – To check the relationships between the variances, we use a computer output to find the samples’ standard deviations. We have S1 = 64.05, and S2 = 103.29. It appears that the variances are unequal. – We run the t - test for unequal variances. 15 Calories consumed at lunch Consmers Non-cmrs 568 498 589 681 540 646 636 739 539 596 607 529 637 617 633 555 . . . . 705 819 706 509 613 582 601 608 787 573 428 754 741 628 537 748 . . . . t-Test: Two-Sample Assuming Unequal Variances Consumers Nonconsumers Mean 604.023 633.234 Variance 4102.98 10669.8 Observations 43 107 Hypothesized Mean Difference 0 df 123 t Stat -2.09107 P(T<=t) one-tail 0.01929 t Critical one-tail 1.65734 P(T<=t) two-tail 0.03858 t Critical two-tail 1.97944 • At 5% significance level there is sufficient evidence to reject the null hypothesis. 16 • Solving by hand – The interval estimator for the difference between two means is s2 s2 (x x ) t ( 1 2) 1 2 2 n n 1 2 64.05 2 103 .29 2 (604 .02 633 .239 ) 1.9796 43 107 29.21 27.65 17 • Example 12.2 – Do job design (referring to worker movements) affect worker’s productivity? – Two job designs are being considered for the production of a new computer desk. – Two samples are randomly and independently selected • A sample of 25 workers assembled a desk using design A. • A sample of 25 workers assembled the desk using design B. • The assembly times were recorded – Do the assembly times of the two designs differs? 18 Assembly times in Minutes Design-A Design-B 5.2 6.8 6.7 5.0 5.7 7.9 6.6 5.2 Solution 8.5 7.6 6.5 5.0 • The data are quantitative. 5.9 5.9 6.7 5.2 6.6 6.5 • The parameter of interest is the difference . . between two population means. . . . . . . • The claim to be tested is whether a difference between the two designs exists. 19 • Solving by hand (6.288 6.016) 0 0.93 1 1 1.075( ) 25 25 d.f . 25 25 2 48 t –The hypotheses test is: H0: (m1 - m2) = 0 H1: (m1 - m2) 0 – To check the relationship between the two variances calculate the value of S1 and S2. We have S1= 0.92, and S2 =1.14. We can infer that the two variances are equal to one another. – To calculate the t-statistic we have: Let us determine the x1 6.288 x 2 6.016 s 0.8481 s 1.2996 rejection region 2 1 S p2 2 2 (25 1)( 0.8481) (25 1)(1.2996 ) 1.075 25 25 2 20 • The rejection region is t t 2, d.f. t 0.025,48 2.009 Notice the absolute value |t| For = 0.05 • The test: Since t= 0.93 < 2.009, there is insufficient evidence to reject the null hypothesis. .025 Rejection region .093 2.009 21 • Conclusion: From this experiment, it is unclear at 5% significance level if the two job designs are different in terms of worker’s productivity. .025 Rejection region .093 2.009 22 Design-A Design-B 6.8 5.2 5.0 6.7 7.9 5.7 5.2 6.6 7.6 8.5 5.0 6.5 5.9 5.9 5.2 6.7 6.5 6.6 . . . . . . . . Degrees of freedom t - statistic P-value of the one tail test P-value of the two tail test The Excel printout t-Test: Two-Sample Assuming Equal Variances Design-A Mean 6.288 2 S1 0.847766667 Variance Observations 25 Pooled Variance 1.075416667 Hypothesized Mean Difference 0 df 48 t Stat 0.927332603 P(T<=t) one-tail 0.179196744 t Critical one-tail 1.677224191 P(T<=t) two-tail 0.358393488 t Critical two-tail 2.01063358 Design-B 6.016 1.3030667 25 2 S22 Sp m m 23 A 95% confidence interval for m1 - m2 is calculated as follows: ( x1 x 2 ) t 1 1 sp ( ) n1 n2 2 1 1 6.288 6.016 2.0106 1.075( ) 25 25 0.272 0.5896 [ 0.3176 , 0.8616 ] Thus, at 95% confidence level -0.3176 < m1 - m2 < 0.8616 Notice: “Zero” is included in the interval 24 Checking the required Conditions for the equal variances case (example 12.2) Design A 12 The distributions are not bell shaped, but they seem to be approximately normal. Since the technique is robust, we can be confident about the results. 10 8 6 4 2 0 5 5.8 6.6 Design B 7.4 8.2 More 4.2 5 5.8 7 6 5 4 3 2 1 0 6.6 7.4 More 25 12.4 Matched Pairs Experiment • What is a matched pair experiment? • Why matched pairs experiments are needed? • How do we deal with data produced in this way? The following example demonstrates a situation where a matched pair experiment is the correct approach to testing the difference between two population means. 26 Example 12.3 • To determine whether a new steel-belted radial tire lasts longer than a current model, the manufacturer designs the following experiment. – A pair of newly designed tires are installed on the rear wheels of 20 randomly selected cars. – A pair of currently used tires are installed on the rear wheels of another 20 cars. – Drivers drive in their usual way until the tires worn out. – The number of miles driven by each driver were recorded. See data next. 27 Solution New-Design 70 83 78 46 74 56 74 52 99 57 77 84 72 98 81 63 88 69 54 97 m1 Exstng-Dsn 47 65 59 61 75 65 73 85 97 84 72 39 72 91 64 63 79 74 76 43 • Compare two populations of quantitative data. • The parameter is m1 - m2 The hypotheses are: H0: (m1 - m2) = 0 H1: (m1 - m2) > 0 Mean distance driven before worn out occurs for the new design tires m2 Mean distance driven before worn out occurs for the existing design tires 28 • The hypotheses are H0: m1 - m2 = 0 H1: m1 - m2 > 0 The test statistic is t x1 x2 (m1 m2 ) 1 1 s( ) nand We run the t ntest, 1 1 2 p obtain the following Excel results. t-Test: Two-Sample Assuming Equal Variances New Dsgn Exstng dsgn Mean 73.6 69.2 Variance 243.4105263 226.8 Observations 20 20 Pooled Variance 235.1052632 Hypothesized Mean Difference0 df 38 t Stat 0.907447484 P(T<=t) one-tail 0.184944575 t Critical one-tail 1.685953066 P(T<=t) two-tail 0.36988915 t Critical two-tail 2.024394234 We conclude that there is insufficient evidence to reject H0 in favor of H1. 29 New design 7 6 5 4 3 2 1 0 45 60 75 90 105 More 105 More Existing design 12 10 8 6 4 2 0 45 60 75 90 While the sample mean of the new design is larger than the sample mean of the existing design, the variability within each sample is large enough for the sample distributions to overlap and cover about the same range. It is therefore difficult to argue that one expected value is different than the other. 30 • Example 12.4 – to eliminate variability among within t-Test: Paired Twoobservations Sample each sample the experiment for Means New-Dsn Exst-Dsn was redone. Mean 73.6 69.05 – One tire of each type was 316.366 Variance 242.779 Observations 20 of 20 20 installed on the rear wheel Pearson Correlation 0.91468 randomly selected cars (each Hypothesized Mean Difference 0 df car was sampled twice, 19 thus t Stat 2.81759 creating a pair of observations). P(T<=t) one-tail 0.0055 t Critical– one-tail 1.72913 The number of miles until P(T<=t) two-tail 0.01099 wear-out was recorded t Critical two-tail 2.09302 Car 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 New-Dsn Exst-Dsn 57 48 64 50 102 89 62 56 81 78 87 75 61 50 62 49 74 70 62 66 100 98 90 86 83 78 84 90 86 98 62 58 67 58 40 41 71 61 77 82 31 The range of observations sample A So what really The values each sample consists of might markedly vary... happened here? The range of observations sample B 32 Differences ...but the differences between pairs of observations might be quite close to one another, resulting in a small The range of the variability. differences 0 33 Observe the statistic t shown below and notice how a small variability of the differences (small sD) helps in rejecting the null hypothesis. 34 • Solving by hand – Calculate the difference for each xi – Calculate the average differences and the standard deviation of the differences – Build the statistics as follows: t xD m D sD nD – Run the hypothesis test using t distribution with nD - 1 degrees of freedom. 35 – The hypotheses test for this problem is H0: mD = 0 New-Dsn Exst-Dsn Difference 57 48 9 14 H1: mD > 0 The rejection 64 region is:50 The statistic is t xD m D sD nD 4.55 0 7.22186 2.817 102 62 81 87 .05,19 61 62 74 62 100 90 83 84 86 62 67 40 71 77 89 56 78 75 50 49 70 66 98 86 78 90 98 58 58 41 61 82 Average = Standard Deviation = 13 6 3 12 11 13 4 -4 2 4 5 -6 -12 4 9 -1 10 -5 4.55 7.2218636 t > t with d.f. = 20-1 = 19. If = .05, t = 1.729. Since 2.817 > 1.729, there is sufficient evidence in the data to reject the null hypothesis in favor of the alternative hypothesis. 20 Conclusion: At 5% significance level the new type tires last longer than the current type. Estimating the mean difference Interval Estimator of m D x D t / 2, n D 1 sD nD The 95% confidence int erval of the mean difference 7.22 in Example 12.4 is 4.55 2.093 4.55 3.38 20 37 Checking the required conditions for the paired observations case • The validity of the results depends on the normality of the differences. 8 6 4 2 0 -12 -6 0 6 12 More 38 12.5 Inferences about the ratio of two variances • In this section we discuss how to compare the variability of two populations. • In particular, we draw inference about the ratio of two population variances. • This question is interesting because: – Variances can be used to evaluate the consistency of processes. – The relationships between variances determine the technique used to test relationships between mean values 39 • Point estimator of s12/s22 – Recall that S2 is an unbiased estimator of s2. – Therefore, it is not surprising that we estimate s12/s22 by S12/S22. • Sampling distribution for s12/s22 – The statistic [S12/s12] / [S22/s22] follows the F distribution. – The test statistic for s12/s22 is derived from this statistic. 40 • Testing s12 / s22 – Our null hypothesis is always H0: s12 / s22 = 1 S12/s12 – Under this null hypothesis the F statistic F = 2/s 2 S 2 2 becomes F= S12 S22 41 Example 12.5 Calories consumed at lunch Consmers Non-cmrs 568 498 (see example 12.1) 589 The hypotheses are: 681 In order to perform a 540 test regarding average 646 H0: s 1 636 consumption of 739 s 539 calories atpeople’s 596 lunchH in: s relation to the 1 607 1 F-Test Two-Sample for Variances 529 inclusionsof high-fiber 637 cereal in their Consumers Nonconsumers 617 Mean 604.0232558 633.2336449 633 breakfast, the variance Variance 4102.975637 10669.76565 555 ratio of two samplesObservations 43 . 107 42 . 106 has to be tested first.dfF 0.384542245 . P(F<=f) one-tail 0.000368433 . F Critical one-tail 0.637072617 705 819 706 509 613 582 601 608 787 573 428 754 741 628 537 748 . . . . 42 • Solving by hand – The rejection region is F>F/2,n1,n2 or F<1/F,n,n which becomes (for =0.05)... F F / 2,n1,n 2 F.025 ,42,106 F.025 ,40,120 1.61 F 1/ F / 2,n 2,n1 1/ F.025 ,106 ,42 1/ F.025 ,120 ,40 .63 – The F statistic value is F=S12/S22 = .3845 – Conclusion: Because .3845<.63 we can reject the null hypothesis in favor of the alternative hypothesis. – There is sufficient evidence in the data to argue at 5% significance level that the variance of the two groups differ. 43 Estimating the Ratio of Two Population Variances • From the statistic F = [S12/s12] / [S22/s22] we can isolate s12/s22 and build the following interval estimator: 2 2 s12 s s 1 1 1 F / 2,n 2,n1 2 s2 F s2 s 2 2 / 2,n1,n 2 2 where n1 n 1 and n 2 n2 1 44 • Example 12.6 – Determine the 95% confidence interval estimate of the ratio of the two population variances in example 12.1 – Solution • we find Fa/2,v1,v2 = F.025,40,120 = 1.61 (approximately) Fa/2,v2,v1 = F.025,120,40 = 1.72 (approximately) • LCL = (s12/s22)[1/ Fa/2,v1,v2 ] = (4102.98/10,669.770)[1/1.61]= .2388 • UCL = (s12/s22)[ Fa/2,v2,v1 ] = (4102.98/10,669.770)[1.72]= .6614 45 12.6 Inference about the difference between two population proportions • In this section we deal with two populations whose data are qualitative. • When data are qualitative we can (only) ask questions regarding the proportions of occurrence of certain outcomes. • Thus, we hypothesize on the difference p1-p2, and draw an inference from the hypothesis test. 46 • Sampling Distribution of the Difference p̂1 p̂ 2 Between Two sample proportions – Two random samples are drawn from two populations. – The number of successes in each sample is recorded. – The sample proportions are computed. Sample 1 Sample size n1 Number of successes x1 Sample proportion pˆ 1 x1 n1 Sample 2 Sample size n2 Number of successes x2 Sample proportion x2 p̂ 2 n2 47 – The statistic p̂1 p̂ 2 is approximately normally distributed if n1p1, n1(1 - p1), n2p2, n2(1 - p2) are all Because p1, p2, are unknown, equal to or greater than 5. – The mean of p̂1 p̂ 2 is p1 - we use their estimates instead. p2. Thus, n1p̂1,n1q̂1,n2p̂2 ,n2q̂2 are all equal to or greater than 5. – The variance of p̂1 p̂ 2 is p1(1-p1) /n1)+ (p2(1-p2)/n2) The statistic Z (p̂1 p̂ 2 ) (p1 p 2 ) p1 (1 p1 ) p 2 (1 p 2 ) n1 n2 is approximately normally distributed 48 • Testing the Difference between Two Population p1 p 2 Proportions – We hypothesize on the difference between the two proportions, p1 - p2. – There are two cases to consider: Case 1: Case 2: H0: p1-p2 =0 H0: p1-p2 =D (D is not equal to 0) Calculate the pooled proportion Do not pool the data Then x1 x 2 p̂ n1 n 2 (p̂1 p̂ 2 ) (p1 p 2 ) Z 1 1 p̂(1 p̂)( ) n1 n2 x1 p̂1 n1 Then Z x2 p̂ 2 n2 (p̂1 p̂ 2 ) D p̂1 (1 p̂1 ) p̂ 2 (1 p̂ 2 ) n1 n492 • Example 12.7 – A research project employing 22,000 American physicians was conduct to discover whether aspirin can prevent heart attacks. – Half of the participants in the research took aspirin, and half took placebo. – In a three years period,104 of those who took aspirin and 189 of those who took the placebo had had heart attacks. – Is aspirin effective in preventing heart attacks? 50 • Solution – Identifying the technique • The problem objective is to compare the population of those who take aspirin with those who do not. • The data is qualitative (Take/do not take aspirin) • The hypotheses test are Population 1 - aspirin takers H0: p1 - p2 = 0 Population 2 - placebo takers H1: p1 - p2 < 0 • We identify here case 1 so Z (p̂1 p̂2 ) (p1 p2 ) 1 1 p̂(1 p̂)( ) n1 n2 51 – Solving by hand • For a 5% significance level the rejection region is z < -z = -z.05 = -1.645 - 5.02 < - 1.645, so reject the null hypothesis. The sample proportions are p̂1 104 11,000 .00945 , and p̂ 2 189 11,000 .01718 The pooled proportion is p̂ ( x1 x 2 ) (n1 n2 ) (104 189) (11,000 11,000) .01332 The z statistic becomes (p̂1 p̂ 2 ) (p1 p 2 ) .009455 .01718 Z 5.02 1 1 1 1 p̂(1 p̂)( ) .01332 (.98668 )( ) n1 n2 11,000 11,000 52 • Example 12.8 (Marketing application) – Management needs to decide which of two new packaging designs to adopt, to help improve sales of a soap. – A study is performed in two communities: • Design A is distributed in Community 1. • Design B is distributed in Community 2. • The old design packages is still offered in both communities. – For design A to be financially viable it has to outsell design B by at least 3%. 53 – Summary of the experiment results • Community 1 - 580 packages with new design A sold 324 packages with old design sold • Community 2 - 604 packages with new design B sold 442 packages with old design sold – Use 1% significance level and perform a test to find which type of packaging to use. 54 • Solution – Identifying the technique • The problem objective is to compare two populations, consisting of the values “purchase of the new design”, and “purchase of the old design”. • Data are qualitative. We need to test p1 - p2.. • The hypotheses to test are H0: p1 - p2 = .03 H1: p1 - p2 > .03 • We have to perform case 2 of the test for difference in proportions (the difference is not equal to zero). 55 • Solving by hand Z (p̂1 p̂ 2 ) D p̂1 (1 p̂1 ) p̂ 2 (1 p̂ 2 ) n1 n2 580 604 .03 580 324 604 442 1.58 .642(1 .642) .577(1 .577) 904 1046 .642 The rejection region is z > z = z.01 = 2.33. Conclusion: Do not reject the null hypothesis. There is insufficient evidence to infer that packaging with design A will outsell design B by 3% or more. 56 • Estimating the Difference Between Two Population Proportions p̂1 (1 p̂1 ) p̂ 2 (1 p̂2 ) (p̂1 p̂2 ) n1 n2 • Example 12.9 Estimate with 95% the proportion of men who would avoid a heart attack if they take aspirin regularly. (.009455 .01718 ) 1.96 .009455 (.999545 ) .01718 (.98282 ) 11,000 11,000 [ .010753 , .004697 ] 57 12.7 Market Segmentation (Optional) • Marketing Segmentation is a statistical analysis aimed at determining the differences that exist between buyers and non-buyers of a company’s product. • Statistics plays a major role in market segmentation. – Surveys are used to gather the relevant data. – Statistical tests are used to differentiate among segments. – Sales and profit estimates are derived. 58 • Example 12.10 – A new company in the market offers no-wait services for car oil and filter change. – The company wants to make decisions about where to advertise, and the nature of the advertisement. – A sample of 1000 car owners was selected. The drivers were asked to report whether or not they used a no-wait station, as well as several characteristics of their lives (including age). 59 – The research should reveal whether differences in age exist between customers of no-wait service and customers of other types of facilities (see file XM12-10) • Solution – Identifying the technique • The problem objective is to compare the population of ages of no-wait customers, to the population of ages of other facility users. • Data are quantitative. • Samples are independent. • The parameter to be tested is m1 - m2., (m represents mean age) 60 – The hypotheses are H0: m1 - m2 = 0 H1: m1 - m2 = 0 – When testing for the relationship between the two variances we get the following results F-Test Two-Sample for Variances No-Wait Other Mean 47.78331 44.03448 Variance 77.17323 60.09721 Observations 623 377 df 622 376 F 1.28414 P(F<=f) one-tail 0.003822 F Critical one-tail 1.166224 We run the test for m1 - m2 with two equal variances 61 Chapter 13 Statistical Inferences: A Review of Chapter 11 through 12 62 13.1 Introduction In this chapter we try to build a framework that help decide which technique (or techniques) should be used in solving a problem. 63 Flow chart of techniques for Chapters 11 and 12 64 Problem objective? Describing a single population Compare two populations Data type? Data type? Qualitative Quantitative Quantitative Z test & estimator of p Type of descriptive measurements? Central location Variability t- test & estimator of m c- test & estimator of s2 Type of descriptive measurements? Central location Continue Qualitative Z test & estimator of p1-p2 Variability F- test & 2 estimator of s2/s65 Experimental design? Continue Continue Experimental design? Independent samples Matched pairs t- test & estimator of mD Population variances? Equal Continue Unequal Problem objective? t- test & estimator of m1-m2 (Equal variances) t- test & estimator of m1-m2 (Unequal variances) Describing a single population Compare two populations Data type? Data type? Qualitative Quantitative Quantitative test&& ZZ test estimator of ofpp estimator Type of descriptive measurements? Central location Variability t- test & estimator of m c - test & estimator of s 22 Type of descriptive measurements? Central location Continue Qualitative test&& ZZ test estimator of ofpp11-p -p22 estimator Variability F- test test && 66 Festimator of of ss22/s /s223 estimator Experimental design? Continue Summary of statistical inferences: Chapters 11 and 12 • Problem objective: Describe a single population. – Data type: Quantitative • Descriptive measurement: Central location – Parameter: m – – x m Test statistic: t s n s Interval estimator: x t 2 n – Required condition: Normal population 67 Summary - continued • Descriptive measurement: Variability. – Parameter: s2 – Test statistic: – Interval estimator: 2 ( n 1 ) s c2 s2 (n 1)s 2 LCL , 2 c 2 (n 1)s 2 UCL 2 c 1 2 – Required condition: normal population. 68 Summary - continued – Data type:Qualitative – Parameter: p – Test statistic: p̂ p z p(1 p) n – Interval estimator: p̂ z 2 – Required condition: p̂(1 p̂) n np 5 and n(1 p) 5 ( for test) np̂ 5 and n(1 p̂) 5 ( for estimate 69 Summary - continued • Problem objective: Compare two populations. – Data type: Quantitative. • Descriptive measurement: Central location – Experimental design: Independent samples » population variances: » Parameter: m1 - m2 » Test statistic: s12 s22 d.f. = n1 + n2 -2 Interval estimator: ( x 1 x 2 ) (m1 m 2 ) 1 2 1 t x1 x 2 t 2 sp ( ) 1 1 n1 n 2 s p2 ( ) n1 n 2 » Required condition: Normal populations 70 Summary - continued • Problem objective: Compare two populations. – Data type: Quantitative. • Descriptive measurement: Central location – Experimental design: Independent samples » population variances: » Parameter: m1 - m2 » Test statistic: t ( x 1 x 2 ) (m1 m 2 ) s s 2 1 2 2 d.f. ( s12 n1 s 22 ) 2 2 1 2 2 2 ( s n1 ) ( s n2 ) n1 1 n2 1 Interval estimator: x x t 1 2 2 s12 s 22 ( ) n1 n2 » Required condition: Normal populations s12 s 22 ( ) n1 n2 71 2 Summary - continued • Problem objective: Compare two populations. – Data type: Quantitative. • Descriptive measurement: Central location – Experimental design: Matched pairs » Parameter: mD » Test statistic: t Interval estimator: xD m D sD d.f. = nD - 1 nD x D t / 2 ,nD 1 sD nD » Required condition: Normal differences 72 Summary - continued • Problem objective: Compare two populations. – Data type: Quantitative • Descriptive measurement: Variability – Parameter: s s – Test statistic: F s12 s 22 2 1 2 2 s12 1 s12 , 2 F / 2,n 2,n1 2 s 2 F / 2,n1,n 2 s 2 wherepopulation n1 n 1 and n 2 n2 1 – Required condition: Normal – Interval estimator: 73 Summary - continued • Problem objective: Compare two populations. – Data type: Qualitative – Parameter: p1 - p2 – Test statistic: Case 1: H0: p1 - p2= 0 (p̂1 p̂2 ) (p1 p2 ) Z 1 1 p̂(1 p̂)( ) n1 n2 – Interval estimator: Required condition: n1p̂1, n1(1 p̂1), np̂2 , n2 (1 p̂2 ) 5 Case 2 : H0 : p1 p2 D Z ( pˆ 1 pˆ 2 ) ( p1 p 2 ) pˆ 1 (1 pˆ 1 ) pˆ 2 (1 pˆ 2 ) n1 n2 p̂1 (1 p̂1 ) p̂ 2 (1 p̂2 ) (p̂1 p̂2 ) n1 n2 74