* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Summary of sample size determination for a desired margin of error
Survey
Document related concepts
Transcript
Sample Size and CI’s for the Population Mean (m) and the Population Proportion (p) Sample Size and CI’s for m Suppose we wish to estimate a population mean m using a 95% CI and have a margin of error no larger than E units. What sample size do we need to use? Recall the “large” sample CI for m is given by: X (z - value) s 95% z = 1.96 n 90% z = 1.645 99% z = 2.576 MARGIN OF ERROR (E) Note: The z-value should actually be a t-distribution value, but for sample size planning purposes we will use a standard normal value. Sample Size and CI’s for m For a 95% CI if we want margin of error, E we have E 1.96 s n After some wonderful algebraic manipulation 1.96 s n E 2 Oh, oh! We don’t know s !! 1. “Guesstimate” 2. Use sample SD from pilot or prior study. 3. Use fact 95% of observations generally lie with 2 SD’s of the mean thus Could also use fact that 99% lie within 3 SD’s and use 6 instead of 4 in our crude approximation. Range s 4 where Range represents the expected maximum – minimum we would see in sample. Example: Estimating Mean Cholesterol Level of Females 30 – 40 yrs. of age Q: What sample size would be necessary to estimate the mean cholesterol level for the population of females between the ages of 30 – 40 with a 95% confidence interval that has a margin of error no larger than E = 3 mg/dl? Sample Size and CI’s for m Suppose from a pilot study we find s = 19.8 mg/dl We can use this estimate to find the sample size that will give E = 3 mg/dl. 1.96 s 1.96 19.8 E 167.34 3 E 2 n 168 2 Standard normal values 90% = 1.645 95% = 1.960 99% = 2.576 Sample Size and CI’s for m Suppose we do not have any information about the standard deviation of the cholesterol levels of individuals in this population. We could use the Range/4 or Range/6 as crude approximations to the standard deviation. What is the smallest serum cholesterol level we would expect to see? 100 mg/dl (my guess) What is the largest? 300 mg/dl (my guess again) SD approximation = 200/4 = 50 mg/dl or SD approximation = 200/6 = 33.33 mg/dl Sample Size and CI’s for m Using this crude estimate for the standard deviation we find the following sample size requirements 1.96 50 E 1067.11 3 2 n 1068 or 1.96 33.33 E 474.18 n 475 3 2 Sample Size and CI’s for p Suppose we wish to estimate p using a 95% CI and have a margin of error of 3%. What sample size do we need to use? Recall the CI for p is given by: p̂(1 - p̂) pˆ (z - value) n MARGIN OF ERROR (E) Sample Size and CI’s for p Here for a 95% CI we want E = .03 or 3% p̂(1 - p̂) E 1.96 .03 n After some wonderful algebraic manipulation 1.96 pˆ (1 pˆ ) n 2 E 2 Oh, oh! We don’t know p-hat !! 1. “Guesstimate” 2. Use p-hat from pilot or prior study. 3. Largest n we would ever need comes when p-hat = .50. Sample Size and CI’s for p 1. Informed approach 1.96 2 pˆ (1 pˆ ) n 2 E 2. pˆ from prior knowledge Conservative approach (i.e. worst case scenario) 2 1.96 n 2 4E uses pˆ .50 Standard normal values 90% = 1.645 95% = 1.960 99% = 2.578 Sample Size and CI’s for p Original Question: Suppose we wish to estimate p using a 95% CI and have a margin of error of 3%. What sample size do we need to use? Assume that we estimate the 5 yr. survival rate for a new kidney cancer therapy, and we know historical that it this survival rate is around 20%. Using informed approach 1.962 pˆ (1 pˆ ) 1.962 (.20)(.80) n 682.95 n 683 subjects 2 2 E .03 Sample Size and CI’s for p Original Question: Suppose we wish to estimate p using a 95% CI and have a margin of error of 3%. What sample size do we need to use? Assume that we estimate the 5 yr. survival rate for a new kidney cancer therapy, and we know historical that it this survival rate is around 20%. Using conservative approach 2 2 1.96 1.96 n 1067.1 n 1068 subjects 2 2 4E 4(.03 ) This is why in media polls you they usually report a sampling error of + 3% and that the poll was based on a sample of n = 1000 individuals.