Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Confidence interval wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Psychometrics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Transcript

Chapter 13 Inference about Two Populations 1 12.1 Introduction • Variety of techniques are presented whose objective is to compare two populations. • We are interested in: – The difference between two means. – The ratio of two variances. – The difference between two proportions. 2 13.2 Inference about the Difference between Two Means: Independent Samples • Two random samples are drawn from the two populations of interest. • Because we compare two population means, we use the statistic x1 x 2. 3 The Sampling Distribution of x1 x 2 1. 2. x1 x 2 is normally distributed if the (original) population distributions are normal . x1 x 2 is approximately normally distributed if the (original) population is not normal, but the samples’ size is sufficiently large (greater than 30). 3. The expected value of 4. The variance of x1 x 2 is m1 - m2 x1 x 2 is s12/n1 + s22/n2 4 Making an inference about m – m • If the sampling distribution of x1 x 2 is normal or approximately normal we can write: ( x 1 x 2 ) (m m ) Z s s n1 n2 • Z can be used to build a test statistic or a confidence interval for m1 - m2 5 Making an inference about m – m • Practically, the “Z” statistic is hardly used, because the population variances are not known. ( x 1 x 2 ) (m m ) Zt 2 2 s s ? S ? S1 2 n1 n2 • Instead, we construct a t statistic using the sample “variances” (S12 and S22). 6 Making an inference about m – m • Two cases are considered when producing the t-statistic. – The two unknown population variances are equal. – The two unknown population variances are not equal. 7 Inference about m – m: Equal variances • Calculate the pooled variance estimate by: 2 2 ( n 1 ) s ( n 1 ) s 1 2 2 S p2 1 n1 n2 2 The pooled variance estimator n1 = 10 S n2 = 15 S 22 2 1 Example: s12 = 25; s22 = 30; n1 = 10; n2 = 15. Then, (10 1)( 25) (15 1)( 30) Sp 28.04347 10 15 2 2 8 Inference about m – m: Equal variances • Calculate the pooled variance estimate by: 2 2 ( n 1 ) s ( n 1 ) s 1 2 2 S p2 1 n1 n2 2 The pooled Variance estimator n2 = 15 n1 = 10 S 2 1 S 22 S p2 Example: s12 = 25; s22 = 30; n1 = 10; n2 = 15. Then, (10 1)( 25) (15 1)( 30) Sp 28.04347 10 15 2 2 9 Inference about m – m: Equal variances • Construct the t-statistic as follows: ( x1 x 2 ) (m m ) t 1 2 1 sp ( ) n1 n2 d.f . n1 n2 2 • Perform a hypothesis test H0: m m = 0 H1: m m > 0 or < 0 or 0 Build a confidence interval ( x1 x 2 ) t 1 1 sp ( ) n1 n2 2 where is the confidence level. 10 Inference about m – m: Unequal variances t ( x1 x2 ) ( m m ) d.f. s12 s 22 ( ) n1 n2 ( s12 n1 s 22 / n2 ) 2 ( s12 2 n1 ) n1 1 ( s 22 n2 ) n2 1 2 11 Inference about m – m: Unequal variances Conduct a hypothesis test as needed, or, build a confidence interval Confidence interval s12 s22 ( x1 x2 ) t 2 ( ) n1 n2 where is the confidence level 12 Which case to use: Equal variance or unequal variance? • Whenever there is insufficient evidence that the variances are unequal, it is preferable to perform the equal variances t-test. • This is so, because for any two given samples The number of degrees of freedom for the equal variances case The number of degrees of freedom for the unequal variances case 13 14 Example: Making an inference about m – m • Example 13.1 – Do people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast? – A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. – For each person the number of calories consumed at lunch was recorded. 15 Example: Making an inference about m – m Consmers Non-cmrs 568 498 589 681 540 646 636 739 539 596 607 529 637 617 633 555 . . . . 705 819 706 509 613 582 601 608 787 573 428 754 741 628 537 748 . . . . Solution: • The data are interval. • The parameter to be tested is the difference between two means. • The claim to be tested is: The mean caloric intake of consumers (m1) is less than that of non-consumers (m2). 16 Example: Making an inference about m – m • The hypotheses are: H0: (m1 - m2) = 0 H1: (m1 - m2) < 0 – To check the whether the population variances are equal, we use (Xm13-01) computer output to find the sample variances We have s12= 4103, and s22 = 10,670. – It appears that the variances are unequal. 17 Example: Making an inference about m – m • Compute: Manually – From the data we have: x1 604 .02, x2 633 .23 s12 4,103 , s22 10,670 (4103 43 10670 107 ) 2 4103 43 10670 107 43 1 107 1 2 2 122 .6 123 18 Example: Making an inference about m – m • Compute: Manually – The rejection region is t < -t, = -t.05,123 1.658 t ( x1 x2 ) ( m m ) s12 n1 s22 n2 (604 .02 633 .23) (0) -2.09 4103 10670 43 107 19 Example: Making an inference about m – m Xm13-01 t-Test: Two-Sample Assuming Unequal Variances Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail Consumers Nonconsumers 604.02 633.23 4102.98 10669.77 43 107 0 123 -2.09 0.0193 1.6573 0.0386 1.9794 At the 5% significance level there is sufficient evidence to reject the null hypothesis. .0193 < .05 -2.09 < -1.6573 20 Example: Making an inference about m – m • Compute: Manually The confidence interval estimator for the difference between two means is s2 s2 1 2 (x x ) t 1 2 2 n n 2 1 4103 10670 (604 .02 633 .239 ) 1.9796 43 107 29.21 27.65 56.86, 1.56 21 22 Example: Making an inference about m – m • Example 13.2 – An ergonomic chair can be assembled using two different sets of operations (Method A and Method B) – The operations manager would like to know whether the assembly time under the two methods differ. 23 Example: Making an inference about m – m • Example 13.2 – Two samples are randomly and independently selected • A sample of 25 workers assembled the chair using method A. • A sample of 25 workers assembled the chair using method B. • The assembly times were recorded – Do the assembly times of the two methods differs? 24 Example: Making an inference about m – m Assembly times in Minutes Method A Method B 6.8 5.2 Solution 5.0 6.7 7.9 5.7 5.2 6.6 • The data are interval. 7.6 8.5 5.0 6.5 • The parameter of interest is the difference 5.9 5.9 5.2 6.7 between two population means. 6.5 6.6 . . . . • The claim to be tested is whether a difference . . between the two methods exists. . . 25 Example: Making an inference about m – m • Compute: Manually –The hypotheses test is: H0: (m1 - m2) 0 H1: (m1 - m2) 0 – To check whether the two unknown population variances are equal we calculate S12 and S22 (Xm13-02). – We have s12= 0.8478, and s22 =1.3031. – The two population variances appear to be equal. 26 Example: Making an inference about m – m • Compute: Manually – To calculate the t-statistic we have: x1 6.288 x2 6.016 s12 0.8478 s22 1.3031 (25 1)( 0.848) (25 1)(1.303) S 1.076 25 25 2 2 p t (6.288 6.016) 0 1 1 1.076 25 25 d.f . 25 25 2 48 0.93 27 Example: Making an inference about m – m • The rejection region is t < -t/, =-t.025,48 = -2.009 or t > t/, = t.025,48 = 2.009 For = 0.05 • The test: Since t= -2.009 < 0.93 < 2.009, there is insufficient evidence to reject the null hypothesis. Rejection region Rejection region -2.009 .093 2.009 28 Example: Making an inference about m – m Xm13-02 t-Test: Two-Sample Assuming Equal Variances Method A Method B Mean 6.29 6.02 Variance 0.8478 1.3031 Observations 25 25 Pooled Variance 1.08 Hypothesized Mean Difference 0 df 48 t Stat 0.93 P(T<=t) one-tail 0.1792 t Critical one-tail 1.6772 P(T<=t) two-tail 0.3584 t Critical two-tail 2.0106 -2.0106 < .93 < +2.0106 .3584 > .05 29 Example: Making an inference about m – m • Conclusion: There is no evidence to infer at the 5% significance level that the two assembly methods are different in terms of assembly time 30 Example: Making an inference about m – m A 95% confidence interval for m1 - m2 is calculated as follows: ( x1 x2 ) t s 2p ( 1 1 ) n1 n2 1 1 6.288 6.016 2.0106 1.075( ) 25 25 0.272 0.5896 [0.3176 , 0.8616 ] Thus, at 95% confidence level -0.3176 < m1 - m2 < 0.8616 Notice: “Zero” is included in the confidence interval 31 Checking the required Conditions for the equal variances case (Example 13.2) Design A 12 10 The data appear to be approximately normal 8 6 4 2 0 5 5.8 6.6 Design B 7.4 8.2 More 4.2 5 5.8 7 6 5 4 3 2 1 0 6.6 7.4 More 32 13.4 Matched Pairs Experiment • What is a matched pair experiment? • Why matched pairs experiments are needed? • How do we deal with data produced in this way? The following example demonstrates a situation where a matched pair experiment is the correct approach to testing the difference between two population means. 33 34 13.4 Matched Pairs Experiment Example 13.3 – To investigate the job offers obtained by MBA graduates, a study focusing on salaries was conducted. – Particularly, the salaries offered to finance majors were compared to those offered to marketing majors. – Two random samples of 25 graduates in each discipline were selected, and the highest salary offer was recorded for each one. The data are stored in file Xm13-03. – Can we infer that finance majors obtain higher salary offers than do marketing majors among MBAs?. 35 13.4 Matched Pairs Experiment • Solution – Compare two populations of interval data. – The parameter tested is m1 - m2 – H0: (m1 - m2) = 0 H1: (m1 - m2) > 0 Finance 61,228 51,836 20,620 73,356 84,186 . . . Marketing 73,361 36,956 63,627 71,069 40,203 . . . m1 The mean of the highest salary offered to Finance MBAs m2 The mean of the highest salary offered to Marketing MBAs 36 13.4 Matched Pairs Experiment • Solution – continued From the data we have: x1 65,624 x 2 60,423 s12 360,433,294, s 22 262,228,559 • Let us assume equal variances Equal Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail Finance 65624 360433294 25 311330926 0 48 1.04 0.1513 1.6772 0.3026 2.0106 Mark eting 60423 262228559 25 There is insufficient evidence to conclude that Finance MBAs are offered higher 37 salaries than marketing MBAs. The effect of a large sample variability • Question – The difference between the sample means is 65624 – 60423 = 5,201. – So, why could we not reject H0 and favor H1 where (m1 – m2 > 0)? 38 The effect of a large sample variability • Answer: – Sp2 is large (because the sample variances are large) Sp2 = 311,330,926. – A large variance reduces the value of the t statistic and it becomes more difficult to reject H0. ( x1 x 2 ) (m m ) t 1 2 1 sp ( ) n1 n2 39 Reducing the variability The range of observations sample A The values each sample consists of might markedly vary... The range of observations sample B 40 Reducing the variability Differences ...but the differences between pairs of observations might be quite close to one another, resulting in a small The range of the variability of the differences. differences 0 41 The matched pairs experiment • Since the difference of the means is equal to the mean of the differences we can rewrite the hypotheses in terms of mD (the mean of the differences) rather than in terms of m1 – m2. • This formulation has the benefit of a smaller variability. Group 1 Group 2 Difference 10 15 12 11 -2 +4 Mean1 =12.5 Mean2 =11.5 Mean1 – Mean2 = 1 Mean Differences = 1 42 The matched pairs experiment • Example 13.4 – It was suspected that salary offers were affected by students’ GPA, (which caused S12 and S22 to increase). – To reduce this variability, the following procedure was used: • 25 ranges of GPAs were predetermined. • Students from each major were randomly selected, one from each GPA range. • The highest salary offer for each student was recorded. – From the data presented can we conclude that Finance majors are offered higher salaries? 43 The matched pairs hypothesis test • Solution (by hand) – The parameter tested is mD (=m1 – m2) Finance Marketing – The hypotheses: H0: mD = 0 The rejection region is H1: mD > 0 t > t.05,25-1 = 1.711 – The t statistic: t xD mD sD Degrees of freedom = nD – 1 n 44 The matched pairs hypothesis test • Solution – From the data (Xm13-04) calculate: GPA Group Finance Marketing Difference 1 95171 89329 5842 2 88009 92705 -4696 3 98089 99205 -1116 4 106322 99003 7319 5 74566 74825 -259 6 87089 77038 10051 7 88664 78272 10392 8 71200 59462 11738 9 69367 51555 17812 10 82618 81591 1027 . . . . . . . . . Difference Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 5065 1329 3285 #N/A 6647 44181217 -0.6594 0.3597 23533 -5721 17812 126613 25 45 The matched pairs hypothesis test • Solution x D 5,065 s D 6,647 – Calculate t x D mD 5065 0 t 3.81 sD n 6647 25 46 The matched pairs hypothesis test t-Test: Paired Two Sample for Means Finance Mark eting Mean 65438 60374 Variance 444981810 469441785 Observations 25 25 Pearson Correlation 0.9520 Hypothesized Mean Difference 0 df 24 t Stat 3.81 P(T<=t) one-tail 0.0004 t Critical one-tail 1.7109 P(T<=t) two-tail 0.0009 t Critical two-tail 2.0639 Xm13-04 3.81 > 1.7109 .0004 < .05 47 The matched pairs hypothesis test Conclusion: There is sufficient evidence to infer at 5% significance level that the Finance MBAs’ highest salary offer is, on the average, higher than that of the Marketing MBAs. 48 The matched pairs mean difference estimation Confidence Interval Estimator of m D x D t / 2,n 1 s n Example 13 .5 The 95 % confidence interval of the mean difference 6647 in Example 13 .4 is 5065 2.064 5,065 2,744 25 49 The matched pairs mean difference estimation Using Data Analysis Plus GPA Group Finance Marketing Difference 1 95171 89329 5842 2 88009 92705 -4696 3 98089 99205 -1116 4 106322 99003 7319 5 74566 74825 -259 6 87089 77038 10051 7 88664 78272 10392 8 71200 59462 11738 9 69367 51555 17812 10 82618 81591 1027 . . . . . . . . . Xm13-04 t-Estimate: Mean Mean Standard Deviation LCL UCL Difference 5065 6647 2321 7808 First calculate the differences, then run the confidence interval procedure in Data Analysis Plus. 50 Checking the required conditions for the paired observations case • The validity of the results depends on the normality of the differences. Frequency Histogram 10 5 0 0 5000 10000 15000 20000 Difference 51 13.5 Inference about the ratio of two variances • In this section we draw inference about the ratio of two population variances. • This question is interesting because: – Variances can be used to evaluate the consistency of processes. – The relationship between population variances determines which of the equal-variances or unequalvariances t-test and estimator of the difference between means should be applied 52 Parameter and Statistic • Parameter to be tested is s12/s22 • Statistic used is 2 1 2 2 s s F s s 2 1 2 2 • Sampling distribution of s12/s22 – The statistic [s12/s12] / [s22/s22] follows the F distribution with 1 = n1 – 1, and 2 = n2 – 1. 53 Parameter and Statistic – Our null hypothesis is always H0: s12 / s22 = 1 S12/s12 – Under this null hypothesis the F statistic F = 2 2 S2 /s2 becomes s F s 2 1 2 2 54 55 Testing the ratio of two population variances Example 13.6 (revisiting Example 13.1) (see Xm13-01) Calories intake at lunch Consmers Non-cmrs In order to perform a test 568 705 The hypotheses are: 498 819 regarding average 589 706 681 509 s consumption of calories at 540 613 H0: 1 646 582 people’s lunch in relation to 636 601 s 739 608 the inclusion of high-fiber 539 787 s 596 573 cereal in their breakfast, the 1 H : 607 428 1 529 754 variance ratio of two samples s 637 741 has to be tested first. 617 628 633 555 . . . . 537 748 . . . . 56 Testing the ratio of two population variances • Solving by hand – The rejection region is F>F/2,1,2 or F<1/F/,, F F / 2, 1, 2 F.025, 42,106 F.025,40,120 1.61 F 1 F / 2, 2, 1 1 F.025,106, 42 1 F.025,120,40 1 .58 1.72 – The F statistic value is F=S12/S22 = .3845 – Conclusion: Because .3845<.58 we reject the null hypothesis in favor of the alternative hypothesis, and conclude that there is sufficient evidence at the 5% significance level that the 57 population variances differ. Testing the ratio of two population variances Example 13.6 (revisiting Example 13.1) (see Xm13-01) In order to perform aare: test The hypotheses regarding average s consumption at H0: ofcalories 1 in relation to people’s s lunch the inclusion of high-fiber s cereal in breakfast, the 1 H1: their s variance ratio of two samples has to be tested first. F-Test Two-Sample for Variances Consumers Nonconsumers Mean 604 633 Variance 4103 10670 Observations 43 107 df 42 106 F 0.3845 P(F<=f) one-tail 0.0004 F Critical one-tail 0.6371 58 Estimating the Ratio of Two Population Variances • From the statistic F = [s12/s12] / [s22/s22] we can isolate s12/s22 and build the following confidence interval: 2 2 s12 s s 1 1 1 F / 2, 2,1 2 s2 F s2 s 2 2 / 2,1, 2 2 where 1 n 1 and 2 n2 1 59 Estimating the Ratio of Two Population Variances • Example 13.7 – Determine the 95% confidence interval estimate of the ratio of the two population variances in Example 13.1 – Solution • We find F/2,v1,v2 = F.025,40,120 = 1.61 (approximately) F/2,v2,v1 = F.025,120,40 = 1.72 (approximately) • LCL = (s12/s22)[1/ F/2,v1,v2 ] = (4102.98/10,669.77)[1/1.61]= .2388 • UCL = (s12/s22)[ F/2,v2,v1 ] = (4102.98/10,669.77)[1.72]= .6614 60 13.6 Inference about the difference between two population proportions • In this section we deal with two populations whose data are nominal. • For nominal data we compare the population proportions of the occurrence of a certain event. • Examples – Comparing the effectiveness of new drug versus older one – Comparing market share before and after advertising campaign – Comparing defective rates between two machines 61 Parameter and Statistic • Parameter – When the data are nominal, we can only count the occurrences of a certain event in the two populations, and calculate proportions. – The parameter is therefore p1 – p2. • Statistic – An unbiased estimator of p1 – p2 is p̂1 p̂ 2 (the difference between the sample proportions). 62 Sampling Distribution of p̂1 p̂ 2 • Two random samples are drawn from two populations. • The number of successes in each sample is recorded. • The sample proportions are computed. Sample 1 Sample size n1 Number of successes x1 Sample proportion pˆ 1 x1 n1 Sample 2 Sample size n2 Number of successes x2 Sample proportion x2 p̂ 2 n2 63 Sampling distribution of p̂1 p̂ 2 • The statistic p̂1 p̂ 2 is approximately normally distributed if n1p1, n1(1 - p1), n2p2, n2(1 - p2) are all greater than or equal to 5. • The mean of p̂1 p̂ 2 is p1 - p2. • The variance of p̂1 p̂ 2 is (p1(1-p1) /n1)+ (p2(1-p2)/n2) 64 The z-statistic Z ( pˆ 1 pˆ 2 ) ( p1 p 2 ) p1 (1 p1 ) p 2 (1 p 2 ) n1 n2 Because p1 and p 2 are unknown the standard error must be estimated using the sample proportions. The method depends on the null hypothesis 65 Testing the p1 – p2 • There are two cases to consider: Case 1: H0: p1-p2 =0 Calculate the pooled proportion Case 2: H0: p1-p2 =D (D is not equal to 0) Do not pool the data x1 x 2 p̂ n1 n 2 Then (p̂1 p̂ 2 ) (p1 p 2 ) Z 1 1 p̂(1 p̂)( ) n1 n2 x1 p̂1 n1 Then Z x2 p̂ 2 n2 (p̂1 p̂ 2 ) D p̂1 (1 p̂1 ) p̂ 2 (1 p̂ 2 ) n1 n2 66 Testing p1 – p2 (Case 1) • Example 13.8 – The marketing manager needs to decide which of two new packaging designs to adopt, to help improve sales of his company’s soap. – A study is performed in two supermarkets: • Brightly-colored packaging is distributed in supermarket 1. • Simple packaging is distributed in supermarket 2. – First design is more expensive, therefore,to be financially viable it has to outsell the second design. 67 Testing p1 – p2 (Case 1) • Summary of the experiment results – Supermarket 1 - 180 purchasers of Johnson Brothers soap out of a total of 904 – Supermarket 2 - 155 purchasers of Johnson Brothers soap out of a total of 1,038 – Use 5% significance level and perform a test to find which type of packaging to use. 68 Testing p1 – p2 (Case 1) • Solution – The problem objective is to compare the population of sales of the two packaging designs. – The data are nominal (Johnson Brothers or other soap) Population 1: purchases at supermarket 1 – The hypotheses are Population 2: purchases at supermarket 2 H0: p1 - p2 = 0 H1: p1 - p2 > 0 – We identify this application as case 1 69 Testing p1 – p2 (Case 1) • Compute: Manually – For a 5% significance level the rejection region is z > z = z.05 = 1.645 The sample proportions are pˆ 1 180 904 .1991 , and pˆ 2 155 1,038 .1493 The pooled proportion is pˆ ( x1 x 2 ) (n1 n 2 ) (180 155 ) (904 1,038 ) .1725 The z statistic becomes ( pˆ pˆ 2 ) ( p1 p 2 ) .1991 .1493 Z 1 2.90 1 1 1 1 .1725 (1 .1725 ) pˆ (1 pˆ ) 70 904 1,038 n1 n 2 Testing p1 – p2 (Case 1) • Excel (Data Analysis Plus) Xm13-08 z-Test: Two Proportions Supermark et 1 Supermark et 2 Sample Proportions 0.1991 0.1493 Observations 904 1038 Hypothesized Difference 0 z Stat 2.90 P(Z<=z) one tail 0.0019 z Critical one-tail 1.6449 P(Z<=z) two-tail 0.0038 z Critical two-tail 1.96 Conclusion: There is sufficient evidence to conclude at the 5% significance level, that brightly-colored design will outsell the simple design. 71 Testing p1 – p2 (Case 2) • Example 13.9 (Revisit Example 13.8) – Management needs to decide which of two new packaging designs to adopt, to help improve sales of a certain soap. – A study is performed in two supermarkets: – For the brightly-colored design to be financially viable it has to outsell the simple design by at least 3%. 72 Testing p1 – p2 (Case 2) • Summary of the experiment results – Supermarket 1 - 180 purchasers of Johnson Brothers’ soap out of a total of 904 – Supermarket 2 - 155 purchasers of Johnson Brothers’ soap out of a total of 1,038 – Use 5% significance level and perform a test to find which type of packaging to use. 73 Testing p1 – p2 (Case 2) • Solution – The hypotheses to test are H0: p1 - p2 = .03 H1: p1 - p2 > .03 – We identify this application as case 2 (the hypothesized difference is not equal to zero). 74 Testing p1 – p2 (Case 2) • Compute: Manually Z ( pˆ 1 pˆ 2 ) D pˆ 1 (1 pˆ 1 ) pˆ 2 (1 pˆ 2 ) n1 n2 180 155 .03 904 1,038 1 .15 .1991 (1 .1991 ) .1493 (1 .1493 ) 904 1,038 The rejection region is z > z = z.05 = 1.645. Conclusion: Since 1.15 < 1.645 do not reject the null hypothesis. There is insufficient evidence to infer that the brightly-colored design will outsell the simple design by 3% or more. 75 Testing p1 – p2 (Case 2) • Using Excel (Data Analysis Plus) Xm13-08 z-Test: Two Proportions Supermark et 1 Supermark et 2 Sample Proportions 0.1991 0.1493 Observations 904 1038 Hypothesized Difference 0.03 z Stat 1.14 P(Z<=z) one tail 0.1261 z Critical one-tail 1.6449 P(Z<=z) two-tail 0.2522 z Critical two-tail 1.96 76 Estimating p1 – p2 • Estimating the cost of life saved – Two drugs are used to treat heart attack victims: • Streptokinase (available since 1959, costs $460) • t-PA (genetically engineered, costs $2900). – The maker of t-PA claims that its drug outperforms Streptokinase. – An experiment was conducted in 15 countries. • 20,500 patients were given t-PA • 20,500 patients were given Streptokinase • The number of deaths by heart attacks was recorded. 77 Estimating p1 – p2 • Experiment results – A total of 1497 patients treated with Streptokinase died. – A total of 1292 patients treated with t-PA died. • Estimate the cost per life saved by using t-PA instead of Streptokinase. 78 Estimating p1 – p2 • Solution – The problem objective: Compare the outcomes of two treatments. – The data are nominal (a patient lived or died) – The parameter to be estimated is p1 – p2. • p1 = death rate with t-PA • p2 = death rate with Streptokinase 79 Estimating p1 – p2 • Compute: Manually 1497 1292 .0730, p̂ 2 .0630 – Sample proportions: p̂1 20500 20500 (p̂1 p̂ 2 ) p̂1 (1 p̂1 ) p̂ 2 (1 p̂ 2 ) n1 n2 – The 95% confidence interval estimate is .0730 .0630 1.96 LCL .0051 .0730 (1 .0730 ) .0630 (1 .0630 ) .0100 .0049 20500 20500 UCL .0149 80 Estimating p1 – p2 • Interpretation – We estimate that between .51% and 1.49% more heart attack victims will survive because of the use of t-PA. – The difference in cost per life saved is 2900-460= $2440. – The total cost saved by switching to t-PA is estimated to be between 2440/.0149 = $163,758 and 2440/.0051 = $478,431 81