Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SAMPLE SIZE DETERMINATION Reasons for Sampling •Samples can be studied more quickly than populations. •A study of a sample is less expensive than studying an entire population, because smaller number of items or subjects are examined. This consideration is especially important in the design of large studies that require a length follow-up. •A study of an entire population (census) is impossible in most situations. Sometimes, the process of the study destroys or depletes the item being studied. •Sample results are often more accurate than results based on a population. •If samples are properly selected, probability methods can be used to estimate the error in the resulting statistics. It is this aspect of sampling that permits investigators to make probability statements about observations in a study. The primary purpose of sampling is to estimate certain population parameters such as means, totals, proportions or to test hypothesis on such parameters. When estimating a population parameter or testing hypothesis on parameters, the question immediately arises: How large a sample do I need? The size of the sample depends on: i. Type of data Categorical Percentages or proportions Numerical Means ii. Variation iii. Desired precision iv. Confidence level v. Size of the population SAMPLE SIZE DETERMINATION FOR ESTIMATING A POPULATION PROPORTION n n 2 1 z P(1 P) 2 d When population size (N) is unknown Nz P(1 P) 2 () 2 ( N 1)d P(1 P)z 2 () When population size (N) is known Example: A district medical officer wishes to estimate the proportion of children in the district who received all childhood vaccinations. How many children must be studied if the resulting estimate is to fall within 10 percentage points of the true proportion with 95% confidence? z12 P(1 P) n d2 N is unknown 1.96 2 x0.5 x0.5 n 96.04 97 2 0.10 1.96 2 x0.25 x0.75 n 72.03 73 2 0.10 No estimate of P is available. Assuming that P can never exceed 0,25 Example: A district medical officer wishes to estimate the proportion of children in the district who received all childhood vaccinations. If 500 children recide in the district, how many children must be studied if the resulting estimate is to fall within 10 percentage points of the true proportion with 95% confidence? n Nz(2 ) P(1 P) ( N 1)d 2 P(1 P)z (2 ) 500x1.962 x 0.5x 0.5 80.7 81 2 2 499x 0.1 1.96 x 0.5x 0.5 Assuming P<0.25 n Nz(2 ) P(1 P) ( N 1)d 2 P(1 P)z (2 ) 500x1.962 x 0.25x 0.75 63.1 64 2 2 499x 0.1 1.96 x 0.25x 0.75 SAMPLE SIZE DETERMINATION WHEN TESTING A HYPOTHESIS There are four possible outcomes that could be reached as a result of the null hypothesis being either true or false and the decision being either “fail to reject” or “reject”. Null Hypothes is Decision True False Accept H0 Correct Decision Type II Error (1- α) β Reject H0 Type I Error Correct Decision α (1- β) SAMPLE SIZE DETERMINATION FOR TESTING HYPOTHESIS FOR A SINGLE POPULATION PROPORTION z n 1 P0 (1 P0 ) z1 Pa (1 Pa ) 2 ( Pa P0 ) 2 Example: Previous surveys have demonstrated that the rate of dental caries among school children in a particular community is about 25%. How many children should be studied in a new survey if it is desired to be 90% sure of detecting a rate of 20% or less at 5% significance level? 1.65 n 0.25x 0.75 1.28 0.20x 0.80 (0.25 0.20) 2 2 490.6 491 SAMPLE SIZE DETERMINATION HYPOTHESIS TESTING FOR POPULATION PROPORTIONS z n 1 2 P(1 P) z1 P1 (1 P1 ) P2 (1 P2 ) ( P1 P2 ) 2 2 FOR TWO ( P1 P2 ) P 2 Example: Suppose we wish to conduct a clinical trial to compare the effectiveness of a new treatment and the standard treatment. The standard treatment is known to have a success rate of 0.6. Researchers want to be 90% confident when concluding that the new treatment is more effective if its success rate is higher by at least 15% than the standard one. How many subjects must be studied in each of the two groups if the hypothesis is tested at 5% significance level? 1.65 n 2(0.675)(0.325) 1.28 (0.60)(0.40) (0.75)(0.25) 0.152 2 166 166 subjects should be taken in each of the two groups, making a total of 332 subjects. SAMPLE SIZE DETERMINATION FOR ESTIMATING A POPULATION MEAN i) When population size, N, is unknown n z 2 () d 2 2 ii) When population size, N, is known n Nz 2 () 2 d ( N 1) z 2 2 () 2 Example If we wish, with 95% confidence, to estimate the average birth weight of infants, within 250 gr around the unknown population mean, how large a sample should we select? (Assume =700 gr) (1.96) 2 (700) 2 n 31 2 (250) When N=500 2 2 500(1.96) (700) n 29 2 2 2 (250) (499) (1.96) (700) When d=400 gr, required sample size, n is 11.5~12. TESTING A HYPOTHESIS FOR A SINGLE POPULATION MEAN A survey had indicated that the average cholesterol level of men with newly diagnosed heart disease was 260. However it is suspected that the average cholesterol level of such men is now somewhat lower. How large a sample would be necessary to test at 5% level of significance with a power of 90% whether the average cholesterol level unchanged versus the alternative that it has decreased from 260 to 230 with an estimated standard deviation of cholesterol levels of 75 units? z1 z1 2 2 n 0 a 2 752 1,65 1,28 53,6 54 2 260 230 2 In the previous example the alternative hypothesis was one sided. A similar approach is followed when the alternative is two sided. That is H0 : 0 H1 : 0 A two sided test of previous example could be designed to test the hypothesis that the average cholesterol has not changed versus the alternative that the average cholesterol has changed and that a difference of 30 units would be considered significant. 75 (1,96 1,28) n 65 , 6 66 2 30 2 2 SAMPLE SIZE DETERMINATION FOR TESTING HYPOTHESIS BETWEEN TWO POPULATION MEANS Suppose we would like to know how many observations to take in order to be 100(1-α)% confident of rejecting H0 when in fact the true difference between the population means is (μ1-μ2)=δ. n 2 z1 / 2 z1 2 2 1 2 2 Example: Suppose a study is being designed to measure the effect, on systolic blood pressure, of lowering sodium in the diet. From a pilot study, it is observed that the standard deviation of the systolic blood preesure in a community with a high sodium diet is 12 mmHg, while that in a group with a low sodium diet is 10.3mmHg. If α=0.05 and β=0.10 how large a sample from each community should be selected if we want to be able to detect a 2 mmHg difference in blood pressure between the two communities? S (S S ) / 2 (144 106.1) / 2 125.05 2 p 2 1 n 2S 2 p z 1 / 2 2 2 z1 2 1 2 2 2125.051.96 1.282 657.17 658 2 2 A sample of 658 subjects from each community would be needed. Example:A study is being planned to test whether a dietary supplement for pregnant women will increase the birthweight of babies. One group of women will receive the new supplement and the other group will receive the usual nutrition consultation. From a pilot study the standard deviation in birthweight is estimated as 500gr and is assumed to be the same in both groups. The hypothesis of no difference is to be tested at the 5% level of significance. It is desired to have 80% power of detecting an increase of 100gr. n 2Sp2 z 1 z1 2 1 2 2 2500 2 1.65 0 / 842 309.26 310 2 100 2 A sample of 310 subjects should be studied in each of the two groups.