* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download No Slide Title
Degrees of freedom (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
Statistical inference wikipedia , lookup
Chapter 13 Inference about Comparing Two Populations 1 13.1 Introduction • In previous discussions we presented methods designed to make an inference about characteristics of a single population. We estimated, for example the population mean, or hypothesized on the value of the standard deviation. • However, in the real world we encounter many times the need to study the relationship between two populations. For example: – We want to compare the effects of a new drug on blood pressure, in which case we can test the relationship between the mean blood pressure of two groups of individuals: those who take the drug, and those who don’t. – We are interested in the effects a certain ad has on voters’ preferences as part of an election campaign. In this case we can estimate the difference in the proportion of voters who prefer one candidate before and after the ad is televised. 2 13.1 Introduction • Variety of techniques are presented whose objective is to compare two populations. • These techniques are designed to compare: – two population means. – two population variances. – two proportions. 3 13.2 Inference about the Difference between Two Means: Independent Samples • We’ll look at the relationship between the two population means by analyzing the value of m1 – m2. • The reason we are looking at the difference between the two means is that x1 x 2 is strongly related to a normal distribution, whose mean is m1 – m2. See next for details. • Two random samples are therefore drawn from the two populations of interest and their means x1 and x2 are calculated. 4 The Sampling Distribution of x1 x 2 x1 x2 is normally distributed if the (original) population distributions are normal . x1 x2 is approximately normally distributed if the (original) population is not normal, but the samples’ size is sufficiently large (greater than 30). The mean value of The variance of x1 x 2 is m1 - m2 x1 x 2 σ12 σ 22 is n1 n2 5 Making an inference about m – m • The Z – score of x1 x 2 is ( x 1 x 2 ) (m m ) Z n1 n2 • However, if x1 x 2 is normal or approximately normal, then Z is standard normal. So… • Z can be used to build a confidence interval or test a hypothesis about m1-m2. See next. 6 Making an inference about m – m • Practically, the “Z” statistic is hardly used, because the population variances are not known. • Instead, we construct a “t” statistic using the sample “variances” (S12 andS22 ). ( x 1 x 2 ) (m m ) Zt 2 2 ? ? S S1 2 n1 n2 7 Making an inference about m – m • Two cases are considered when producing the t-statistic. – The two unknown population variances are equal. – The two unknown population variances are not equal. 8 Inference about m – m: Equal variances • If the two variances σ12 2 2 another, thenS1 and S2 2 and σ 2 are equal to one estimate the same value. • Therefore, we can pool the two sample variances and provide a better estimate of the common populations’ variance, based on a larger amount of information. • This is done by forming the pooled variance estimate. See next. 9 Inference about m – m: Equal variances • Calculate the pooled variance estimate by: (n1 1)s1 (n2 1)s 2 Sp n1 n2 2 2 2 2 To get some intuition about this pooled estimate, note that we can re-write it as S p2 n1 1 n2 1 S12 S 22 n1 n2 2 n1 n2 2 which has the form of a weighted average of the two sample variances. The weights are the relative sample sizes. A larger sample provides larger weight and thus influences the pooled estimate more (it might be easier to eliminate the values ‘-1’ and ‘-2’ from the formula in order to see the structure more easily. 10 Inference about m – m: Equal variances • Calculate the pooled variance estimate by: (n1 1)s1 (n2 1)s 2 Sp n1 n2 2 2 2 2 Example: S12 = 25; S22 = 30; n1 = 10; n2 = 15. Then, S p2 (10 1)( 25) (15 1)( 30) 28.04347 10 15 2 11 Inference about m – m: Equal variances • Construct the t-statistic as follows: t (x1 x 2 ) (μ1 μ2 ) 1 1 s n1 n2 d.f. n1 n2 2 2 p Note how Sp replaces both S12 and S22. 2 t (x1 x 2 ) (μ1 μ2 ) s p2 s p2 n n 2 1 12 Inference about m – m: Unequal variances t (x1 x 2 ) (μ1 μ2 ) 2 1 s s n1 n2 2 • Since σ12 and σ 2 are unequal we can’t produce a single estimate for both variances. • Thus we use the sample variances in the ‘t’ formula d.f. 2 2 (s12 n1 s 22 n2 ) 2 2 1 2 2 2 ( s n1 ) ( s n2 ) n1 1 n2 1 2 13 Which case to use: Equal variance or unequal variance? • Whenever there is insufficient evidence that the variances are unequal, it is preferable to run the equal variances t-test. • This is so, because for any two given samples The number of degrees of freedom for the equal variances case The number of degrees of freedom for the unequal variances case 14 15 Example: Making an inference about m – m • Example 1 – Do people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast? – A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. – For each person the number of calories consumed at lunch was recorded. 16 Example: Making an inference about m – m Consmers Non-cmrs 568 498 589 681 540 646 636 739 539 596 607 529 637 617 633 555 . . . . 705 819 706 509 613 582 601 608 787 573 428 754 741 628 537 748 . . . . Solution: • The data are quantitative. • The parameter to be tested is the difference between two means. • The claim to be tested is: The mean caloric intake of consumers (m1) is less than that of non-consumers (m2). 17 Example: Making an inference about m – m • The hypotheses are: H0: m1 - m2 = 0 H1: m1 - m2 < 0 m1= mean caloric intake for fiber consumers m2= mean caloric intake for fiber non-consumers – To check the relationships between the variances, we use a computer output to find the sample variances (Xm13-1.xlsx). From the data we have S12= 4103, and S22 = 10,670. – It appears that the variances are unequal.18 Example: Making an inference about m – m • Solving by hand – From the data we have: x1 604.2, x 2 633.23 s12 4,103, s 22 10,670 4103 43 10670 107 df 122.6 123 2 2 4103 43 10670 107 43 1 107 1 19 Example: Making an inference about m – m • Solving by hand – H1: m1 - m2 < 0 The rejection region is t < -ta,df = -t.05,123 1.658 t ( x1 x 2 ) (m m ) s s n1 n2 2 1 2 2 (604.2 633.23) (0) -2.09 4103 10670 43 107 20 Example: Making an inference about m – m t-Test: Two-Sample Assuming Unequal Variances Consumers Nonconsumers Mean 604.023 633.234 Variance 4102.98 10669.8 Observations 43 107 Hypothesized Mean Difference 0 df 123 t Stat -2.09107 P(T<=t) one-tail 0.01929 t Critical one-tail 1.65734 P(T<=t) two-tail 0.03858 t Critical two-tail 1.97944 Conclusion: At 5% significance level there is sufficient evidence to reject the null hypothesis, and argue that m1 < m2. Click. The p-value approach: .01929 < .05 The rejection region approach -2.09107 < -1.65734 21 Example: Making an inference about m – m • Solving by hand The confidence interval estimator for the difference between two means when the variances are unequal is s2 s2 1 2 (x x ) t 1 2 a 2 n n 2 1 4103 10670 (604 .02 633 .239 ) 1.9796 43 107 29.21 27.65 56.86, 1.56 22 Example: Making an inference about m – m Note that the confidence interval for the difference between the two means falls entirely in the negative region: [-56.86, -1.56]; even at best the difference between the two means is m1 – m2 = -1.56, so we can be 95% confident m1 is smaller than m ! This conclusion agrees with the results of the test performed before. 23 Example: Making an inference about m – m • Example 2 – An ergonomic chair can be assembled using two different sets of operations (Method A and Method B) – The operations manager would like to know whether the assembly time under the two methods differ. 24 Example: Making an inference about m – m • Example 13.2 – Two samples are randomly and independently selected • A sample of 25 workers assembled the chair using design A. • A sample of 25 workers assembled the chair using design B. • The assembly times were recorded – Do the assembly times of the two methods differs? 25 Example: Making an inference about m – m Assembly times in Minutes Design-A Design-B 6.8 5.2 Solution 5.0 6.7 7.9 5.7 5.2 6.6 • The data are quantitative. 7.6 8.5 5.0 6.5 • The parameter of interest is the difference 5.9 5.9 5.2 6.7 between two population means. 6.5 6.6 . . . . • The claim to be tested is whether a difference . . between the two designs exists. . . 26 Example: Making an inference about m – m • Solving by hand –The hypotheses test is: H0: m1 - m2 0 H1: m1 - m2 0 Since we ask whether or not the assembly times are the same on the average, the alternative hypothesis is of the form m1 m – To check the relationship between the two variances we run the F test. (Xm13-02). 2 S – From the data we have 0.8478, and 2 =1.3031. The p-value of the F test = 2(.1496) = .299 S12 = Conclusion: 12 and 22 appear to be equal. 27 Example: Making an inference about m – m • Solving by hand – To calculate the t-statistic we have: x1 6.288 x2 6.016 s12 0.8478 s22 1.3031 (25 1)( 0.848) (25 1)(1.303) S 1.076 25 25 2 2 p t (6.288 6.016) 0 1 1 1.076 25 25 d.f . 25 25 2 48 0.93 28 Example: Making an inference about m – m • The 2-tail rejection region is t < -ta/,n =-t.025,48 = -2.009 or t > ta/,n = t.025,48 = 2.009 For a = 0.05 • The test: Since t= -2.009 < 0.93 < 2.009, there is insufficient evidence to reject the null hypothesis. Rejection region Rejection region -2.009 .093 2.009 29 Example: Making an inference about m – m t-Test: Two-Sample Assuming Equal Variances Design-A Mean 6.288 Variance 0.847766667 Observations 25 Pooled Variance 1.075416667 Hypothesized Mean Difference 0 df 48 t Stat 0.927332603 P(T<=t) one-tail 0.179196744 t Critical one-tail 1.677224191 P(T<=t) two-tail 0.358393488 t Critical two-tail 2.01063358 Design-B 6.016 1.3030667 25 Conclusion: From this experiment, it is unclear at 5% significance level if the two assembly methods are different in terms of assembly time -2.0106 < .9273 < +2.0106 .35839 > .05 30 Example: Making an inference about m – m: A 95% confidence interval for m1 - m2 when the two variances are equal is calculated as follows: ( x1 x 2 ) t a s p2 ( 1 1 ) n1 n2 1 1 6.288 6.016 2.0106 1.075( ) 25 25 0.272 0.5896 [ 0.3176 , 0.8616 ] Thus, at 95% confidence level -0.3176 < m1 - m2 < 0.8616 Notice: “Zero” is included in the confidence interval and therefore the two mean values could be equal. 31 Checking the required Conditions for the equal variances case (example 13.2) Design A 12 10 The data appear to be approximately normal 8 6 4 2 0 5 5.8 6.6 Design B 7.4 8.2 More 4.2 5 5.8 7 6 5 4 3 2 1 0 6.6 7.4 More 32 13.4 Matched Pairs Experiment Dependent samples • What is a matched pair experiment? • A matched pairs experiment is a sampling design in which every two observations share some characteristic. For example, suppose we are interested in increasing workers productivity. We establish a compensation program and want to study its efficiency. We could select two groups of workers, measure productivity before and after the program is established and run a test as we did before. Click. • But, if we believe workers’ age is a factor that may affect changes in productivity, we can divide the workers into different age groups, select a worker from each age group, and measure his or her productivity twice. One time before and one time after the program is established. Each two observations constitute a matched pair, and because they belong to the same age group they are not independent. 33 13.4 Matched Pairs Experiment Dependent samples Why matched pairs experiments are needed? The following example demonstrates a situation where a matched pair experiment is the correct approach to testing the difference between two population means. 34 Additional example 13.4 Matched Pairs Experiment Example 3 – To investigate the job offers obtained by MBA graduates, a study focusing on salaries was conducted. – Particularly, the salaries offered to finance majors were compared to those offered to marketing majors. – Two random samples of 25 graduates in each discipline were selected, and the highest salary offer was recorded for each one. – From the data, can we infer that finance majors obtain higher salary offers than marketing majors among MBAs?. 35 13.4 Matched Pairs Experiment • Solution – Compare two populations of quantitative data. – The parameter tested is m1 - m2 – H0: m1 - m2 = 0 H1: m1 - m2 > 0 Finance 61,228 51,836 20,620 73,356 84,186 . . . Marketing 73,361 36,956 63,627 71,069 40,203 . . . m1 The mean of the highest salary offered to Finance MBAs m2 The mean of the highest salary offered to Marketing MBAs 36 13.4 Matched Pairs Experiment • Solution – continued From Xm13-3.xls we have: x1 65,624 x 2 60,423 s12 360,433,294, s 22 262,228,559 • Let us assume equal variances t-Test: Two-Sample Assuming Equal Variances Finance Mark eting Mean 65624 60423 Variance 360433294 262228559 Observations 25 25 Pooled Variance 311330926 Hypothesized Mean Difference 0 df 48 t Stat 1.04215119 P(T<=t) one-tail 0.15128114 t Critical one-tail 1.67722419 P(T<=t) two-tail 0.30256227 t Critical two-tail 2.01063358 There is insufficient evidence to conclude that Finance MBAs are offered higher 37 salaries than marketing MBAs. The effect of a large sample variability • Question – The difference between the sample means is 65624 – 60423 = 5,201. – So, why could not we reject H0 and favor H1? 38 The effect of a large sample variability • Answer: – Sp2 is large (because the sample variances are large) Sp2 = 311,330,926. – A large variance reduces the value of the t statistic and this is why t does not fall in the rejection region. ( x1 x 2 ) (m m ) t 1 2 1 sp ( ) n1 n2 Recall that rejection of H0 in this problem occurs when ‘t’ is sufficiently large (t>ta). A large Sp2 reduces ‘t’ and therefore it does not fall in the rejection region. 39 The matched pairs experiment • We are looking for hypotheses formulation where the variability of the two samples has been reduced. • By taking matched pair observations and testing the differences per pair we achieve two goals: – We still test m1 – m2 (see explanation next) – The variability used to calculate the t-statistic is usually smaller (see explanation next). 40 The matched pairs experiment – Are we still testing m1 – m2? • Yes. Note that the difference between the two means is equal to the mean difference of pairs of observations • A short example Group 1 10 15 Group 2 12 11 Difference -2 +4 Mean1 =12.5 Mean2 =11.5 Mean1 – Mean2 = 1 Mean Differences = 1 41 The matched pairs experiment – Reducing the variability The range of observations sample A Observations might markedly differ... The range of observations sample B 42 The matched pairs experiment – Reducing the variability Differences ...but the differences between pairs of observations might have much smaller variability. The range of the differences 0 43 The matched pairs experiment • Example 4 (Example 3 part II) – It was suspected that salary offers were affected by students’ GPA. Since GPAs were different, so were the salaries (which caused S12 and S22 to increase). – To reduce this variability, the following procedure was used: • 25 ranges of GPAs were predetermined. • Students from each major were randomly selected, one from each GPA range. • The highest salary offer for each student was recorded. – From the data presented can we conclude that Finance majors are offered higher salaries? 44 The matched pairs hypothesis test • Solution (by hand) – The parameter tested is mD (=m1 – m2) Finance Marketing – The hypotheses: H0: mD = 0 The rejection region is H1: mD > 0 t > t.05,25-1 = 1.711 – The t statistic: t xD mD sD Degrees of freedom nD – n 45 The matched pairs hypothesis test • Solution (by hand) – continue – From the data (Xm13-4.xls) calculate: Using Descriptive Statistics in Excel we get: GPA Group Finance Marketing Difference 1 95171 89329 5842 2 88009 92705 -4696 3 98089 99205 -1116 4 106322 99003 7319 5 74566 74825 -259 6 87089 77038 10051 7 88664 78272 10392 8 71200 59462 11738 9 69367 51555 17812 10 82618 81591 1027 . . . . . . . . . Difference Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 5064.52 1329.3791 3285 #N/A 6646.8953 44181217 -0.659419 0.359681 23533 -5721 17812 126613 25 46 The matched pairs hypothesis test • Solution (by hand) – continue x D 5.065 sD 6,647 See conclusion later – Calculate t x D mD 5065 0 t 3.81 sD n 6647 25 47 The matched pairs hypothesis test Using Data Analysis in Excel t-Test: Paired Two Sample for Means Finance Mark eting Mean 65438.2 60373.68 Variance 4.45E+08 4.69E+08 Observations 25 25 Pearson Correlation 0.952025 Hypothesized Mean Difference0 df 24 t Stat 3.809688 P(T<=t) one-tail 0.000426 t Critical one-tail 1.710882 P(T<=t) two-tail 0.000851 t Critical two-tail 2.063898 Conclusion: There is sufficient evidence to infer at 5% significance level that the Finance MBAs’ highest salary offer is, on the average, higher than this of the Marketing MBAs. Recall: The rejection region is t > ta. Indeed, 3.809 > 1.7108 .000426 < .05 48 The matched pairs mean difference estimation Confidence int erval Estimator of mD x D t a / 2 ,n1 s n Example 13.5 The 95% confidence int erval of the mean difference 6647 5,065 2,744 in example 13.4 is 5065 2.064 25 49 The matched pairs mean difference estimation Using Data Analysis Plus GPA Group Finance Marketing Difference 1 95171 89329 5842 2 88009 92705 -4696 3 98089 99205 -1116 4 106322 99003 7319 5 74566 74825 -259 6 87089 77038 10051 7 88664 78272 10392 8 71200 59462 11738 9 69367 51555 17812 10 82618 81591 1027 . . . . . . . . . t-Estimate:Mean Mean Standard Deviation LCL UCL Difference 5065 6647 2321 7808 First calculate the differences for each pair, then run the confidence interval procedure in Data Analysis Plus. 50 Checking the required conditions for the paired observations case • The results validity depends on the normality of the differences. Diffrences 6 4 2 e or M 0 18 00 0 15 00 0 12 00 90 00 60 00 0 30 00 -3 00 0 0 51 13.5 Inferences about the ratio of two variances • In this section we draw inference about the relationship between two population variances. • This question is interesting because: – Variances can be used to evaluate the consistency of processes. – The relationships between variances determine the technique used to test relationships between mean values 52 Parameter tested and statistic • The parameter tested is 12/22 2 1 2 2 s • The statistic used is F s 2 1 2 2 • The Sampling distribution of 12/22 – The statistic [s12/12] / [s22/22] follows the F distribution with… Numerator d.f. = n1 – 1, and Denominator d.f. = n2 –531. Parameter tested and statistic – Our null hypothesis is always H0: 12 / 22 = 1 S12/12 – Under this null hypothesis the F statistic F = 2 2 S2 /2 becomes s F s 2 1 2 2 54 55 Testing the ratio of two population variances Example 6 (revisiting example 1) (see example 1) Calories intake at lunch Consmers Non-cmrs In order to test whether 568 705 The hypotheses are: 498 819 having a rich-in-fiber 589 706 681 509 breakfast reduces the 540 613 H0: 1 646 582 amount of caloric intake 636 601 739 608 at lunch, we need to 539 787 596 573 1 H : decide whether the 607 428 1 529 754 variances are equal or 637 741 617 628 not. 633 537 555 . . . . 748 . . . . 56 Testing the ratio of two population variances • Solving by hand – The rejection region is Note: The numerator degrees of freedom and the denominator degrees of freed are replacing one another! F Fa 2 F ,n1 1,n 2 1 1 Fa 2 ,n 2 1,n1 1 F.025,42,106 F.025,40,120 1.61 1 F.025,106,42 1 F.025,120,40 .63 – The F statistic value is F=S12/S22 = .3845 – Conclusion: Because .3845<.63 we can reject the null hypothesis in favor of the alternative hypothesis, and conclude that there is sufficient evidence in the data to argue at 5% 57 significance level that the variance of the two groups differ. Testing the ratio of two population variances Example 6 (revisiting example 1) (see Xm13.1) The hypotheses are: H0: H1: 1 1 From Data Analysis F-Test Two-Sample for Variances Consumers Nonconsumers Mean 604.0232558 633.2336449 Variance 4102.975637 10669.76565 Observations 43 107 df 42 106 F 0.384542245 P(F<=f) one-tail 0.000368433 F Critical one-tail 0.637072617 58 Estimating the Ratio of Two Population Variances • From the statistic F = [s12/12] / [s22/22] we can isolate 12/22 and build the following confidence interval: 2 2 s12 s 1 1 1 Fa / 2,n 2,n1 2 s2 F s2 2 2 a / 2,n1,n 2 2 where n1 n 1 and n 2 n2 1 59 Estimating the Ratio of Two Population Variances • Example 7 – Determine the 95% confidence interval estimate of the ratio of the two population variances in example 12.1 – Solution • We find Fa/2,v1,v2 = F.025,40,120 = 1.61 (approximately) Fa/2,v2,v1 = F.025,120,40 = 1.72 (approximately) • LCL = (s12/s22)[1/ Fa/2,v1,v2 ] = (4102.98/10,669.770)[1/1.61]= .2388 • UCL = (s12/s22)[ Fa/2,v2,v1 ] = (4102.98/10,669.770)[1.72]= .6614 60 13.6 Inference about the difference between two population proportions • In this section we deal with two populations whose data are nominal. • For nominal data we compare the population proportions of the occurrence of a certain event. • Examples – Comparing the effectiveness of new drug vs.old one – Comparing market share before and after advertising campaign – Comparing defective rates between two machines 61 Parameter tested and statistic • Parameter – When the data is nominal, we can only count the occurrences of a certain event in the two populations, and calculate proportions. – The parameter tested is therefore p1 – p2. • Statistic – An unbiased estimator of p1 – p2 is p̂1 p̂ 2 (the difference between the sample proportions). 62 Sampling distribution of p̂1 p̂ 2 • Two random samples are drawn from two populations. • The number of successes in each sample is recorded. • The sample proportions are computed. Sample 1 Sample size n1 Number of successes x1 Sample proportion x ˆp1 1 n1 Sample 2 Sample size n2 Number of successes x2 Sample proportion x2 p̂ 2 n2 63 Sampling distribution of p̂1 p̂ 2 • The statistic p̂1 p̂ 2 is approximately normally distributed if n1p1, n1(1 - p1), n2p2, n2(1 - p2) are all equal to or greater than 5. • The mean of p̂1 p̂ 2 is p1 - p2. • The standard deviation of p̂1 p̂ 2 is p1 (1 p1 ) p 2 (1 p 2 ) n1 n2 64 The z-statistic Z (p̂1 p̂ 2 ) (p1 p 2 ) p1 (1 p1 ) p 2 (1 p 2 ) n1 n2 Because p1 and p2 are unknown, we use their estimates instead. Thus, n1p̂1 , n1 (1 p̂1 ), n2p̂2 , n2 (1 p̂2 ) should all be equal to or greater than 5. 65 Testing p1 – p2 • There are two cases to consider: Case 1: H0: p1-p2 =0 Case 2: H0: p1-p2 =D (D is not equal to 0) Pool the samples and determine the pooled proportion Keep the sample proportions separate x1 x 2 p̂ n1 n 2 Then Z ˆp1 x1 ; ˆp 2 x 2 n1 n2 (p̂1 p̂ 2 ) p̂(1 p̂)( 1 1 ) n1 n 2 Then Z (p̂1 p̂ 2 ) D p̂1 (1 p̂1 ) p̂ 2 (1 p̂ 2 ) n1 n2 66 Testing p1 – p2 (Case I) • Example 8 – Management needs to decide which of two new packaging designs to adopt, to help improve sales of a certain soap. – A study is performed in two communities: • Design A is distributed in Community 1. • Design B is distributed in Community 2. • The old design packages is still offered in both communities. – Design A is more expensive, therefore, to be financially viable it has to outsell design B. 67 Testing p1 – p2 (Case I) • Summary of the experiment results – Community 1 - 580 packages with new design A sold 324 packages with old design sold – Community 2 - 604 packages with new design B sold 442 packages with old design sold – Use 5% significance level and perform a test to find which type of packaging to use. 68 Testing p1 – p2 (Case I) • Solution – The problem objective is to compare the population of sales of the two packaging designs. – The data is qualitative (yes/no for the purchase of the new design per customer) Population 1 – purchases of Design A – The hypotheses test are Population 2 – purchases of Design B H0: p1 - p2 = 0 H1: p1 - p2 > 0 – We identify here case 1. 69 Testing p1 – p2 (Case I) • Solving by hand – For a 5% significance level the rejection region is z > za = z.05 = 1.645 From Xm13-08.xls we have: The sample proportion s are p̂1 580 904 .6416, and p̂ 2 604 1046 .5774 The pooled proportion is p̂ ( x 1 x 2 ) (n1 n2 ) (580 604) (904 1046) .6072 The z statistic becomes Z (p̂1 p̂ 2 ) (p1 p 2 ) 1 1 p̂(1 p̂) n1 n2 .6416 .5774 1 1 .6072(1 .6072) 904 1046 2.89 70 Testing p1 – p2 (Case I) • Conclusion: At 5% significance level there sufficient evidence to infer that the proportion of sales with design A is greater that the proportion of sales with design B (since 2.89 > 1.645). 71 Additional example Testing p1 – p2 (Case I) • Excel (Data Analysis Plus) z-Test: Two Proportions sample proportions Observations Hypothesized Difference z Stat P(Z<=z) one tail z Critical one-tail P(Z<=z) two-tail z Critical two-tail Community 1 Community 2 0.6416 0.5774 904 1046 0 2.89 0.0019 1.6449 0.0038 1.96 • Conclusion Since 2.89 > 1.645, there is sufficient evidence in the data to conclude at 5% significance level, that design A will outsell design B. 72 Testing p1 – p2 (Case II) • Example 9 (modifying example 8) – Management needs to decide which of two new packaging designs to adopt, to help improve sales of a certain soap. – A study is performed in two communities: • Design A is distributed in Community 1. • Design B is distributed in Community 2. • The old design packages is still offered in both communities. – For design A to be financially viable it has to outsell design B by at least 3%. 73 Testing p1 – p2 (Case II) • Summary of the experiment results – Community 1 - 580 packages with new design A sold 324 packages with old design sold – Community 2 - 604 packages with new design B sold 442 packages with old design sold • Use 5% significance level and perform a test to find which type of packaging to use. 74 Testing p1 – p2 (Case II) • Solution – The hypotheses to test are H0: p1 - p2 = .03 H1: p1 - p2 > .03 – We identify case 2 of the test for difference in proportions (the difference is not equal to zero). 75 Testing p1 – p2 (Case II) • Solving by hand Z (p̂1 p̂ 2 ) D p̂1 (1 p̂1 ) p̂ 2 (1 p̂ 2 ) n1 n2 580 604 .03 580 324 604 442 1.58 .642(1 .642) .577(1 .577) 904 1046 The rejection region is z > z.05 = 1.645. Conclusion: Since 1.58 < 1.645 do not reject the null hypothesis. There is insufficient evidence to infer that packaging with Design A will outsell this of Design B by 3% or more. 76 Testing p1 – p2 (Case II) • Using Excel (Data Analysis Plus) z-Test: Two Proportions Sample Proportion Observations Hypothesized Difference z stat P(Z<=z) one-tail z Critical one-tail P(Z<=z) two-tail z Critical two-tail Community 1 0.6416 904 0.03 1.5467 0.061 1.6449 0.122 1.96 Community 2 0.5774 1046 77 Estimating p1 – p2 • Example (estimating the cost of life saved) – Two drugs are used to treat heart attack victims: • Streptokinase (available since 1959, costs $460) • t-PA (genetically engineered, costs $2900). – The maker of t-PA claims that its drug outperforms Streptokinase. – An experiment was conducted in 15 countries. • 20,500 patients were given t-PA • 20,500 patients were given Streptokinase • The number of deaths by heart attacks was recorded. 78 Estimating p1 – p2 • Experiment results – A total of 1497 patients treated with Streptokinase died. – A total of 1292 patients treated with t-PA died. • Estimate the cost per life saved by using t-PA instead of Streptokinase. 79 Estimating p1 – p2 • Solution – The problem objective: Compare the outcomes of two treatments. – The data is nominal (a patient lived/died) – The parameter estimated is p1 – p2. • p1 = death rate with t-PA • p2 = death rate with Streptokinase 80 Estimating p1 – p2 • Solving by hand 1497 1292 .0730, p̂ 2 .0630 – Sample proportions: p̂1 20500 20500 (p̂1 p̂ 2 ) p̂1 (1 p̂1 ) p̂ 2 (1 p̂ 2 ) n1 n2 – The 95% confidence interval is .0730(1 .0730 ) .0630(1 .0630 ) .100 .0049 20500 20500 UCL .0149 .0730 .0630 1.96 LCL .0051 81 Estimating p1 – p2 • Interpretation – We estimate that between .51% and 1.49% more heart attack victims will survive because of the use of t-PA. – The difference in cost per life saved is 2900-460= $2440. – The total cost saved by switching to t-PA is estimated to be between 2440/.0149 = $163,758 and 2440/.0051 = $478,431 82