Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparing Populations Proportions and means Most studies will have more than one population. Example The Salk-vaccine trial 1954 A large study to determine if the Salk vaccine was effective in reducing the incidence of polio. Two populations: 1. Individuals vaccinated with the Salk vaccine 2. Individuals vaccinated with a placebo A double blind study both individuals vaccinated and MD’s treating the cases did not know who recieved the vaccine and who received the placebo When there are more than one population one will be interested in making comparisons. Comparisons are sometimes made through differences, sometimes through ratios An important fact: The sampling distribution of differences of Normal Random Variables If X and Y denote two independent normal random variables, then : D = X – Y is normal with mean D X Y standard deviation D X2 Y2 This fact allows us to determine the sampling distribution of differences Comparing proportions Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to compare the two population proportions Consider the statistic: x1 x2 D pˆ1 pˆ 2 = n1 n2 This statistic has a normal distribution with D pˆ pˆ p1 p2 1 2 using the important fact D = pˆ pˆ p2ˆ p2ˆ 1 2 1 p1 1 p1 n1 2 p2 1 p2 n2 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 n1 n2 Thus z D D D pˆ1 pˆ 2 - p1 p2 pˆ pˆ 1 2 pˆ1 pˆ 2 - p1 p2 p1 1 p1 p2 1 p2 n1 n1 pˆ1 pˆ 2 - p1 p2 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 n1 n1 Has a standard normal distribution We want to test either: 1. H 0 : p1 p2 vs H A : p1 p2 or 2. H 0 : p1 p2 vs H A : p1 p2 or 3. H 0 : p1 p2 vs H A : p1 p2 If p1 = p2 (p say) then the test statistic: z D D D pˆ1 pˆ 2 - p1 p2 pˆ pˆ 1 2 pˆ1 pˆ 2 - p1 p2 p1 1 p1 p2 1 p2 n1 n2 pˆ1 pˆ 2 1 1 p 1 p n1 n2 pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 has a standard normal distribution. where x1 x2 pˆ n1 n2 is an estimate of the common value of p1 and p2. Thus for comparing two binomial probabilities p1 and p2 The test statistic z pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 where x1 x2 pˆ1 , pˆ 2 n1 n2 x1 x2 and pˆ n1 n2 The Critical Region The Alternative Hypothesis HA The Critical Region H A : p1 p2 z z / 2 or z z / 2 H A : p1 p2 z z H A : p1 p2 z z Example • In a national study to determine if there was an increase in mortality due to pipe smoking, a random sample of n1 = 1067 male nonsmoking pensioners were observed for a five-year period. • In addition a sample of n2 = 402 male pensioners who had smoked a pipe for more than six years were observed for the same five-year period. • At the end of the five-year period, x1 = 117 of the nonsmoking pensioners had died while x2 = 54 of the pipe-smoking pensioners had died. • Is there a the mortality rate for pipe smokers higher than that for non-smokers We want to test: H 0 : p1 p2 vs H A : p1 p2 The test statistic: z pˆ1 pˆ 2 pˆ pˆ 1 2 pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 Note: x1 117 pˆ1 0.1097 n1 1067 x2 54 pˆ 2 0.1343 n2 402 (Non smokers) (Pipe smokers) x1 x2 117 54 pˆ n1 n2 1067 402 171 0.1164 1469 (Combined) The test statistic: z pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 0.1097 .1343 1 1 0.11641 0.1164 1067 402 1.315 We reject H0 if: z z -z0.05 1.645 Not true hence we accept H0. Conclusion: There is not a significant ( = 0.05) increase in the mortality rate due to pipe-smoking Estimating a difference proportions using confidence intervals Situation • We have two populations (1 and 2) • Let p1 denote the probability (proportion) of “success” in population 1. • Let p2 denote the probability (proportion) of “success” in population 2. • Objective is to estimate the difference in the two population proportions d = p1 – p2. Confidence Interval for d 100P% = 100(1 – ) % : = p1 – p2 pˆ1 pˆ 2 z / 2 pˆ1 pˆ 2 pˆ1 pˆ 2 z / 2 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 n1 n2 Example • Estimating the increase in the mortality rate for pipe smokers higher over that for nonsmokers d = p2 – p1 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 pˆ 2 pˆ1 z / 2 n1 n2 0.10971 0.1097 0.13431 0.1343 0.1343 0.1097 1.960 1067 0.0247 0.0382 0.0136 to 0.0629 1.36% to 6.29% 402 Comparing Proportions Summary The test for a difference in proportions z pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 (The test statistic) Estimating the difference in proportion by a confidence interval pˆ 2 pˆ1 z / 2 pˆ1 1 pˆ1 pˆ 2 1 pˆ 2 n1 n2 Comparing Means Comparing Means Situation • We have two normal populations (1 and 2) • Let 1 and 1 denote the mean and standard deviation of population 1. • Let 2 and 2 denote the mean and standard deviation of population 2. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means We want to test either: 1. H 0 : 1 2 vs H A : 1 2 or 2. H 0 : 1 2 vs H A : 1 2 or 3. H 0 : 1 2 vs H A : 1 2 Consider the test statistic: z xy xy xy 2 x xy 2 1 n 2 2 m 2 y xy 2 x 2 y s s n m H 0 : 1 2 is true If: z xy 2 1 n 2 2 m xy 2 x 2 y s s n m • will have a standard Normal distribution • This will also be true for the approximation (obtained by replacing 1 by sx and 2 by sy) if the sample sizes n and m are large (greater than 30) Note: n n x x i 1 i n sx y i 1 m i i 1 i n 1 n n y x x 2 sy 2 y y i i 1 m 1 The Alternative Hypothesis HA The Critical Region H A : 1 2 z z / 2 or z z / 2 H A : 1 2 z z H A : 1 2 z z Example • A study was interested in determining if an exercise program had some effect on reduction of Blood Pressure in subjects with abnormally high blood pressure. • For this purpose a sample of n = 500 patients with abnormally high blood pressure were required to adhere to the exercise regime. • A second sample m = 400 of patients with abnormally high blood pressure were not required to adhere to the exercise regime. • After a period of one year the reduction in blood pressure was measured for each patient in the study. We want to test: H 0 : 1 2 The exercise group did not have a higher average reduction in blood pressure vs H A : 1 2 The exercise group did have a higher average reduction in blood pressure The test statistic: z xy xy xy 2 x xy 2 1 n 2 2 m 2 y xy 2 x 2 y s s n m Suppose the data has been collected and: n n x x i 1 i n 10.67 sx x x y i 1 m i i 1 n 1 n n yi 2 7.83 sy y i 1 i 3.895 y m 1 2 4.224 The test statistic: z xy 2 x 2 y s s n m 10.67 7.83 3.895 2 500 4.224 2.84 10.4 0.273765 2 400 We reject H0 if: z z z0.05 1.645 True hence we reject H0. Conclusion: There is a significant ( = 0.05) effect due to the exercise regime on the reduction in Blood pressure Estimating a difference means using confidence intervals Situation • We have two populations (1 and 2) • Let 1 denote the mean of population 1. • Let 2 denote the mean of population 2. • Objective is to estimate the difference in the two population proportions d = 1 – 2. Confidence Interval for d = 1 – 2 ˆ1 ˆ 2 z / 2 ˆ ˆ 1 x y z / 2 2 x 2 2 y s s n m Example • Estimating the increase in the average reduction in Blood pressure due to the excercize regime d = 1 – 2 x y z / 2 2 x 2 y s s n m 3.895 10.67 7.83 1.960 2 500 2.84 1.96(.273765) 2.84 0.537 2.303 to 3.337 4.224 2 400 Comparing Means – small samples The t test Comparing Means – small samples Situation • We have two normal populations (1 and 2) • Let 1 and 1 denote the mean and standard deviation of population 1. • Let 2 and 2 denote the mean and standard deviation of population 1. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means We want to test either: 1. H 0 : 1 2 vs H A : 1 2 or 2. H 0 : 1 2 vs H A : 1 2 or 3. H 0 : 1 2 vs H A : 1 2 Consider the test statistic: z xy xy xy 2 x xy 2 1 n 2 2 m 2 y xy 2 x 2 y s s n m If the sample sizes (m and n) are large the statistic t xy 2 x 2 y s s n m will have approximately a standard normal distribution This will not be the case if sample sizes (m and n) are small The t test – for comparing means – small samples (equal variances) Situation • We have two normal populations (1 and 2) • Let 1 and denote the mean and standard deviation of population 1. • Let 2 and denote the mean and standard deviation of population 1. • Note: we assume that the standard deviation for each population is the same. 1 = 2 = Let n n x x i 1 i n sx y i 1 m i i 1 i n 1 n n y x x 2 sy 2 y y i i 1 m 1 The pooled estimate of . Note: both sx and sy are estimators of . These can be combined to form a single estimator of , sPooled. sPooled n 1sx2 m 1s 2y nm2 The test statistic: xy t s 2 Pooled n s 2 Pooled m xy 1 1 sPooled n m If 1 = 2 this statistic has a t distribution with n + m –2 degrees of freedom The Alternative Hypothesis HA The Critical Region H A : 1 2 t t / 2 or t t / 2 H A : 1 2 t t H A : 1 2 t t t / 2 and t are critical points under the t distribution with degrees of freedom n + m –2. Example • A study was interested in determining if administration of a drug reduces cancerous tumor size. • For this purpose n +m = 9 test animals are implanted with a cancerous tumor. • n = 3 are selected at random and administered the drug. • The remaining m = 6 are left untreated. • Final tumour sizes are measured at the end of the test period We want to test: H 0 : 1 2 The treated group did not have a lower average final tumour size. vs H A : 1 2 The treated group did have a lower average final tumour size. The test statistic: xy t 1 1 sPooled n m Suppose the data has been collected and: drug treated untreated 1.89 2.08 1.79 1.28 1.29 1.75 n x xi n 1.657 i 1 n sx n y y i 1 m 1.90 i 2.32 x x i 1 2.16 2 i n 1 0.3215 n 1.915 sy 2 y y i i 1 m 1 0.3693 The test statistic: sPooled n 1sx2 m 1s 2y nm2 20.3215 50.3693 0.3563 7 2 2 1.657 1.915 .258 t 1.025 .252 1 1 0.3563 3 6 We reject H0 if: t t t0.05 1.895 with d.f. = n + m – 2 = 7 Hence we accept H0. Conclusion: The drug treatment does not result in a significant ( = 0.05) smaller final tumour size, Confidence intervals for the difference in two means of normal populations (small sample sizes equal variances) (1 – )100% confidence limits for 1 – 2 1 1 n m x y t / 2 sPooled where sPooled n 1 s and df n m 2 2 x m 1 s nm2 2 y Tests, Confidence intervals for the difference in two means of normal populations (small sample sizes, unequal variances) t Consider the statistic xy 2 s s y n m 2 x For large sample sizes this statistic has standard normal distribution. For small sample sizes this statistic has been shown to have approximately a t distribution with s s n m 2 x df 2 y 2 1 s 1 s n 1 n m 1 m 2 x 2 2 y 2 The approximate test for a comparing two means of Normal Populations (unequal variances) 2 2 2 sx s y Test statistic n m xy df t 2 2 2 2 2 2 sx s y 1 sx 1 sy n m n 1 n m 1 m Null Hypothesis H0: 1 = 2 Alt. Hypothesis H0: 1 ≠ 2 H0: 1 > 2 H0: 1 < 2 Critical Region t < -t/2 or t > t/2 t > t t < -t Confidence intervals for the difference in two means of normal populations (small samples, unequal variances) (1 – )100% confidence limits for 1 – 2 x y t / 2 with df s s n m s s n m 2 x 2 y 2 x 2 y 2 1 s 1 s n 1 n m 1 m 2 x 2 2 y 2 Testing for the equality of variances The F test Situation: Let x1, x2, x3, … xn, denote a sample from a Normal distribution with mean x and standard deviation x Let y1, y2, y3, … ym, denote a second independent sample from a Normal distribution with mean y and standard deviation y We want to test for the equality of the two variances 2 2 x and y i.e.: Test H0 : x2 y2 against H A : x2 y2 (Two sided alternative) or Test H0 : x2 y2 against H A : x2 y2 (one sided alternative) or Test H0 : against H A : 2 x 2 y (one sided alternative) 2 x 2 y The test statistic (F) s F s 2 x 2 y or 1 s F s 2 y 2 x The sampling distribution of the test statistic If the Null Hypothesis (H0) is true then the sampling distribution of F is called the F-distribution with n1 = n - 1 degrees in the numerator and n2 = m - 1 degrees in the denominator The F distribution n1 = n - 1 degrees in the numerator 0.7 n2 = m - 1 degrees in the denominator 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 F(n1, n2) 4 5 Note: If s F s 2 x 2 y has F-distribution with n1 = n - 1 degrees in the numerator and n2 = m - 1 degrees in the denominator then s y2 1 2 F sx has F-distribution with n1 = m - 1 degrees in the numerator and n2 = n - 1 degrees in the denominator Critical region for the test: H0 : x2 y2 against H A : x2 y2 (Two sided alternative) Reject H0 if or sx2 F 2 F / 2 n 1, m 1 sy 2 y 2 x 1 s F / 2 m 1, n 1 F s Critical region for the test (one tailed): H0 : against H A : 2 x 2 y 2 x (one sided alternative) Reject H0 if sx2 F 2 F n 1, m 1 sy 2 y Example • A study was interested in determining if administration of a drug reduces cancerous tumor size. • For this purpose n +m = 9 test animals are implanted with a cancerous tumor. • n = 3 are selected at random and administered the drug. • The remaining m = 6 are left untreated. • Final tumour sizes are measured at the end of the test period Suppose the data has been collected and: drug treated untreated 1.89 2.08 1.79 1.28 1.29 1.75 n x xi n 1.657 i 1 n sx n y y i 1 m 1.90 i 2.32 x x i 1 2.16 2 i n 1 0.3215 n 1.915 sy 2 y y i i 1 m 1 0.3693 We want to test: H0 : x2 y2 against H A : x2 y2 (H0 is assumed for the t-test for comparing the means ) Using =0.05 we will reject H0 if or sx2 F 2 F0.25 2,5 5.79 sy 2 1 sy 2 F0.025 5, 2 19.30 F sx Test statistic: .3215 F 2 .3693 2 and 0.1033 0.76 0.1364 1 .3693 0.1364 1.32 2 F .3215 0.1033 2 Therefore we accept H0 : 2 x 2 y The paired t-test An example of improved experimental design • Often we are interested in comparing the effect of two (or more) treatments on some variable. Examples: 1. The effect of two diets on weight loss. 2. The effect of two drugs on the drop in Cholesterol levels. 3. The effects of two method in teaching on Math Proficiency • One possible design is to randomly divide the available subjects into two groups. • The first group will receive treatment 1. • The 2nd group will receive treatment 2. We then collect data on the two groups 1. Let x1, x2, x3,…, xn denote the data for treatment 1. 2. Let y1, y2, y3,…, ym denote the data for treatment 2. This design is called the independent sample design. To test for the equality of treatment means we use the two sample t test The test statistic: xy t 1 1 sPooled n m The Alternative Hypothesis HA The Critical Region H A : 1 2 t t / 2 or t t / 2 H A : 1 2 t t H A : 1 2 t t d.f. = n + m - 2 The matched pair experimental design (The paired sample experiment) Prior to assigning the treatments the subjects are grouped into pairs of similar subjects. Suppose that there are n such pairs (Total of 2n = n + n subjects or cases), The two treatments are then randomly assigned to each pair. One member of a pair will receive treatment 1, while the other receives treatment 2. The data collected is as follows: – (x1, y1), (x2 ,y2), (x3 ,y3),, …, (xn, yn) . xi = the response for the case in pair i that receives treatment 1. yi = the response for the case in pair i that receives treatment 2. Let xi = the measurement of the response for the subject in pair i that received treatment 1. Let yi = the measurement of the response for the subject in pair i that received treatment 2. The data x1 y1 x2 y2 x3 y3 … xn yn Let di = yi - xi. Then d1, d2, d3 , … , dn is a sample from a normal distribution with mean, d = 2 – 1 , and standard deviation d 2 xy x y 2 x 2 y Note if the x and y measurements are positively correlated (this will be true if the cases in the pair are matched effectively) than d will be small. To test H0: 1 = 2 is equivalent to testing H0: d = 0. (we have converted the two sample problem into a single sample problem). The test statistic is the single sample t-test on the differences d1, d2, d3 , … , dn namely d 0 td sd n df = n - 1 d the mean of the d i' s and sd the std. dev. of the d i' s Example We are interested in comparing the effectiveness of two method for reducing high cholesterol The methods 1. Use of a drug. 2. Control of diet. The 2n = 8 subjects were paired into 4 match pairs. In each matched pair one subject was given the drug treatment, the other subject was given the diet control treatment. Assignment of treatments was random. The data reduction in cholesterol after 6 month period Pair Treatment Drug treatment Diet control Treatment 1 30.3 25.7 2 10.2 9.4 3 22.3 24.6 4 15.0 8.9 Differences Pair Treatment Drug treatment Diet control Treatment di d 2.3 1 30.3 25.7 4.6 2 10.2 9.4 0.8 3 22.3 24.6 -2.3 4 15.0 8.9 6.1 sd 3.792 d 0 2.3 td 1.213 sd n 3.792 4 t0.025 3.182 for df = n – 1 = 3, Hence we accept H0. Example 2 In this example the researcher is interested in the effect of an antidepressant in reducing depression. Subjects were given a psychological test measuring depression (on a scale 0-100) at the beginning of the study (Pre-score) and after a period of one month on the anti-depressant (Post-score). Did the drug have any effect on reducing depression? Table: Prescore (xi), Postscore (yi), difference (di) subject 1 Pre 73.7 Post 63.9 d i = diff 9.8 d d i 2 3 4 5 6 7 8 9 61.1 60.7 0.4 76.5 72.7 3.8 64.5 50.7 13.8 76.9 67.2 9.7 82.4 66.9 15.5 71.1 62.0 9.1 61.1 44.1 17.0 89.5 90.5 -1.0 i d d t i i n 1 8.603 t0.05 1.796 for df 11, thus H 0 is rejected. d sd n 11 12 59.6 58.6 89.3 56.0 69.4 70.8 3.6 -10.8 18.5 7.450 n 2 sd 10 3.00 Comments • This last example is a matched pair experiment that occurs frequently. • You have two observations on the same subject. • One observation under 1 condition or treatment (the Pre score), the other observation under a second condition (the Post score) (after treatment) • The subject is his own matched twin. • This design is sometimes called a Repeated Measures design Example 3 • In this example, one is interested in determining if a new method of mathematics instruction is an improvement over the current method. • To determine this, 20 grade 4 students were selected. • They were divided into n = 10 matched pairs. • The students were matched relative to ability. • One member of each matched pair was instructed using the new method, the other member using the current method. • All students were tested at the end of the instruction period The data Pair New (x i ) Current (y i ) di = xi - yi 1 2 3 4 5 6 7 8 9 10 90 75 90 88 55 67 94 75 88 87 84 67 90 95 40 68 85 67 86 81 6 8 0 -7 15 -1 9 8 2 6 d 0 d 4.60, sd 6.2218 and t 2.338 sd n t0.05 1.833, t0.01 2.821 for d . f . n 1 9 Summary of Tests One Sample Tests Situation Test Statistic Sample form the Normal distribution with unknown mean and known variance (Testing ) z Sample form the Normal distribution with unknown mean and unknown variance (Testing ) Testing of a binomial probability Sample form the Normal distribution with unknown mean and unknown variance (Testing ) t z n x 0 H0 0 n x 0 s pˆ p0 p0 (1 p0 ) n n 1s 2 U 02 p = p0 0 HA p ≠p0 p >p0 p0 p < 0 Critical Region z < -z/2 or z > z/2 z > z z <-z t < -t/2 or t > t/2 t > t t < -t z < -z/2 or z > z/2 z > z z < -z U 12 / 2 n 1 or 0 U 2 n 1 0 U 12 n 1 U 2 / 2 n 1 Two Sample Tests Situation Two independent samples from the Normal distribution with unknown means and known variances (Testing 1 - 2) Test Statistic x1 x2 z 12 n1 H0 HA Critical Region 1 2 1 2 z < -z/2 or z > z/2 22 1 2 z > z n2 1 2 z < -z Two independent samples from the Normal distribution with unknown means and unknown but equal variances. (Testing 1 - 2) t sp sp Estimation of a the difference between two binomial probabilities, p1-p2 x1 x2 zz 1 2 1 2 t < -t/2 or t > t/2 df n m 2 1 1 n1 n2 1 2 t > t n 1s12 m 1s22 1 2 t < -t nm2 pˆˆ11 ˆpˆ2 2 11 11 ˆ pˆˆ (1 ) 1 pˆ n1 n2 n1 n2 p11 p22 df n m 2 df n m 2 p1 2 z < -z/2 or z > z/2 1 p2 p11 p22 z > z p1 p2 z < -z Two Sample Tests - continued Situation Two independent Normal samples with unknown means and variances (unequal) Two independent Normal samples with unknown means and variances (unequal) Test statistic t x1 x2 H0 HA 1 2 1 ≠ 2 s12 s22 n1 n2 * = df t > t df = * 1 < 2 t < - t df = * F > F(n-1, m -1) 1 < 2 1/F > F(m-1, n -1) 2 2 1 s 1 s 2y n2 1 nm2 n1 1 n 1 m 2 F > F/2(n-1, m -1) or 1/F > F/2(m-1, n -1) 1 > 2 s1x2 s22y n1 nm2 2 1x t < - t/2 or t > t/2 df = * 1 > 2 1 2 1 ≠ 2 s12 1 s22 F 2 or s2 F s12 Critical Region 2 The paired t test Situation n matched pair of subjects are treated with two treatments. di = xi – yi has mean d = 1 – 2 Test statistic t H0 HA Critical Region 1 2 1 ≠ 2 d sd n Independent samples Treat 1 Treat 2 1 > 2 t > t df = n - 1 1 < 2 t < - t df = n - 1 Matched Pairs Treat 1 Pair 1 Pair 2 Pair 3 Possibly equal numbers t < - t/2 or t > t/2 df = n - 1 Pair n Treat 2 Sample size determination When comparing two or more populations Estimating a difference in proportions using confidence intervals Confidence Interval for d = p1 – p2 : pˆ1 pˆ 2 B where B z 2 p1 1 p1 p2 1 p2 n1 n2 Again we want to choose n1 and n2 to set B at some predetermined level with a fixed level of confidence 1 – . There are many solutions for n1 and n2 that will achieve a specified error bound B with level of confidence 1 – . You can make B small by increasing n1 or n2 or a combination of both. Some useful practical solutions satisfy 1. Equal sample size: n1 = n2 This would be an appropriate choice if one researcher was to collect data from population 1, another was to collect data from population 2 and you wanted to equalize the workload. 2. Minimize Total sample size: Choose n1 and n2 so that the required error bound B is achieved and the total sample size, n1 + n2, is minimized. This would be an appropriate choice if a single researcher was to collect data from both population 1 and population 2 and you wanted to minimize his workload. 3. Minimize Total Cost of the sample: Suppose that the study has a fixed cost of C0$ and that the cost of a single observation populations 1 and 2 is c1$ and c2$ repectively, Then the total cost of the study is: C0 + n1c1 + n2c2 . This approach chooses n1 and n2 so that the required error bound B is achieved and the total cost, C0 + n1c1 + n2c2, is minimized. Special solutions - case 1: n1 = n2 = n. then n1 n2 n z / 2 2 p1111 p11 p2 11 p22 22 BB Special solutions - case 2: Choose n1 and n2 to minimize N = n1 + n2 = total sample size then z2 / 2 n1 2 B p 11p p11 11 p11 p22 11 p22 z2 / 2 n2 2 B p 11p p1 11 p11 p2 11 p2 11 22 11 22 Special solutions - case 3: Choose n1 and n2 to minimize C = C0 + c1 n1 + c2 n2 = total cost of the study Note: C0 = fixed (set-up) costs c1 = cost per unit in population 1 c2 = cost per unit in population 2 then z2 / 2 n1 2 B cc2 2 1p111 1p12p211 2p2 p1111p11 c1c1 z2 / 2 n2 2 B cc1 1 1p111 1p1 2p211 2p2 p221 p22 cc2 2 Determination of sample size (means) When the objective is to compare the two means of two Normal populations Estimating a difference in means using confidence intervals Confidence Interval for d = 1 – 2 : x1 x2 B where B z 2 2 1 n1 2 2 n2 Again we want to choose n1 and n2 to set B at some predetermined level with a fixed level of confidence 1 – . The sample sizes required, n1 and n2, to estimate 1 – 2 within an error bound B with level of confidence 1 – are: Equal sample sizes n n1 n2 z / 2 2 1x2 2y2 2 B Minimizing the total sample size N = n1 + n2 . z2 / 2 2 n1 2 1x 1x 2y B z2 / 2 2 n2 2 x2 1x y2 B Minimizing the total cost C = C0 + c1n1 + c2n2 . z2 / 2 n1 2 B 2 z2 / 2 2 c2 c1 1x 2y n2 2 2y 1x 2y 1x c1 B c2 Some general comments • If a population is more variable (2 larger) – more observations should be assigned to the sample from that population • If it is less costly to take observations in a population – more observations should be assigned to the sample from that population Next Topic: Comparing k populations