Download determination of sample size

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
SAMPLE SIZE DETERMINATION
Reasons for Sampling
•Samples can be studied more quickly than populations.
•A study of a sample is less expensive than studying an
entire population, because smaller number of items or
subjects are examined. This consideration is especially
important in the design of large studies that require a
length follow-up.
•A study of an entire population (census) is impossible in
most situations. Sometimes, the process of the study
destroys or depletes the item being studied.
•Sample results are often more accurate than results
based on a population.
•If samples are properly selected, probability methods
can be used to estimate the error in the resulting
statistics. It is this aspect of sampling that permits
investigators to make probability statements about
observations in a study.
The primary purpose of sampling is to estimate
certain population parameters such as means, totals,
proportions or to test hypothesis on such parameters.
When estimating a population parameter or testing
hypothesis on parameters, the question immediately
arises:
How large a sample do I need?
The size of the sample depends on:
i. Type of data
Categorical
Percentages or proportions
Numerical
Means
ii. Variation
iii. Desired precision
iv. Confidence level
v. Size of the population
SAMPLE SIZE DETERMINATION FOR
ESTIMATING A POPULATION
PROPORTION
n
n
2
1
z
P(1  P)
2
d
When population size (N)
is unknown
Nz P(1  P)
2
()
2
( N  1)d  P(1  P)z
2
()
When population size
(N) is known
Example: A district medical officer wishes to
estimate the proportion of children in the
district who received all childhood vaccinations.
How many children must be studied if the
resulting estimate is to fall within 10
percentage points of the true proportion with
95% confidence?
z12 P(1  P)
n
d2
N is unknown
1.96 2 x0.5 x0.5
n
 96.04  97
2
0.10
1.96 2 x0.25 x0.75
n
 72.03  73
2
0.10
No estimate of P is
available.
Assuming that P can never
exceed 0,25
Example: A district medical officer wishes to
estimate the proportion of children in the
district who received all childhood vaccinations.
If 500 children recide in the district, how
many children must be studied if the resulting
estimate is to fall within 10 percentage points
of the true proportion with 95% confidence?
n
Nz(2 ) P(1  P)
( N  1)d 2  P(1  P)z (2 )
500x1.962 x 0.5x 0.5

 80.7  81
2
2
499x 0.1  1.96 x 0.5x 0.5
Assuming P<0.25
n
Nz(2 ) P(1  P)
( N  1)d 2  P(1  P)z (2 )
500x1.962 x 0.25x 0.75

 63.1  64
2
2
499x 0.1  1.96 x 0.25x 0.75
SAMPLE SIZE DETERMINATION WHEN TESTING A
HYPOTHESIS
There are four possible outcomes that could be
reached as a result of the null hypothesis being either
true or false and the decision being either “fail to
reject” or “reject”.
Null Hypothes is
Decision
True
False
Accept H0
Correct Decision
Type II Error
(1- α)
β
Reject H0
Type I Error
Correct Decision
α
(1- β)
SAMPLE SIZE DETERMINATION FOR
TESTING HYPOTHESIS FOR A SINGLE
POPULATION PROPORTION

z
n
1
P0 (1  P0 )  z1  Pa (1  Pa )

2
( Pa  P0 ) 2
Example: Previous surveys have demonstrated that the rate of dental
caries among school children in a particular community is about
25%. How many children should be studied in a new survey if it is
desired to be 90% sure of detecting a rate of 20% or less at 5%
significance level?

1.65
n
0.25x 0.75  1.28 0.20x 0.80
(0.25  0.20) 2

2
 490.6  491
SAMPLE
SIZE
DETERMINATION
HYPOTHESIS
TESTING
FOR
POPULATION PROPORTIONS

z
n
1
2 P(1  P)  z1  P1 (1  P1 )  P2 (1  P2 )
( P1  P2 ) 2

2
FOR
TWO
( P1  P2 )
P
2
Example: Suppose we wish to conduct a clinical trial to
compare the effectiveness of a new treatment and the
standard treatment. The standard treatment is known to
have a success rate of 0.6. Researchers want to be 90%
confident when concluding that the new treatment is
more effective if its success rate is higher by at least
15% than the standard one. How many subjects must
be studied in each of the two groups if the hypothesis is
tested at 5% significance level?

1.65
n
2(0.675)(0.325)  1.28 (0.60)(0.40)  (0.75)(0.25)
0.152

2
 166
166 subjects should be taken in each of the two groups,
making a total of 332 subjects.
SAMPLE SIZE DETERMINATION FOR
ESTIMATING A POPULATION MEAN
i) When population size, N, is unknown
n
z 
2
()
d
2
2
ii) When population size, N, is known
n
Nz 
2
()
2
d ( N  1)  z 
2
2
()
2
Example
If we wish, with 95% confidence, to estimate the
average birth weight of infants, within 250 gr around
the unknown population mean, how large a sample
should we select? (Assume =700 gr)
(1.96) 2 (700) 2
n
 31
2
(250)
When N=500
2
2
500(1.96) (700)
n
 29
2
2
2
(250) (499)  (1.96) (700)
When d=400 gr, required sample size, n is 11.5~12.
TESTING A HYPOTHESIS FOR A SINGLE POPULATION
MEAN
A survey had indicated that the average cholesterol level of men
with newly diagnosed heart disease was 260. However it is
suspected that the average cholesterol level of such men is now
somewhat lower. How large a sample would be necessary to
test at 5% level of significance with a power of 90% whether
the average cholesterol level unchanged versus the alternative
that it has decreased from 260 to 230 with an estimated
standard deviation of cholesterol levels of 75 units?
 z1  z1 
2
2
n
0  a 
2
752 1,65  1,28

 53,6  54
2
260  230
2
In the previous example the alternative hypothesis was one
sided. A similar approach is followed when the alternative is
two sided. That is
H0 :   0
H1 :    0
A two sided test of previous example could be designed to test
the hypothesis that the average cholesterol has not changed
versus the alternative that the average cholesterol has changed
and that a difference of 30 units would be considered
significant.
75 (1,96  1,28)
n

65
,
6

66
2
30
2
2
SAMPLE SIZE DETERMINATION FOR TESTING
HYPOTHESIS BETWEEN TWO POPULATION MEANS
Suppose we would like to know how many observations to take
in order to be 100(1-α)% confident of rejecting H0 when in fact
the true difference between the population means is (μ1-μ2)=δ.
n
2 z1 / 2  z1 
2
2
1   2 
2
Example: Suppose a study is being designed to measure the effect,
on systolic blood pressure, of lowering sodium in the diet. From a
pilot study, it is observed that the standard deviation of the systolic
blood preesure in a community with a high sodium diet is 12
mmHg, while that in a group with a low sodium diet is 10.3mmHg.
If α=0.05 and β=0.10 how large a sample from each community
should be selected if we want to be able to detect a 2 mmHg
difference in blood pressure between the two communities?
S  (S  S ) / 2  (144  106.1) / 2  125.05
2
p
2
1
n  2S
2
p
z
1  / 2
2
2
 z1 2
1   2 2
2125.051.96  1.282

 657.17  658
2
2
A sample of 658 subjects from each community would be
needed.
Example:A study is being planned to test whether a dietary
supplement for pregnant women will increase the birthweight of
babies. One group of women will receive the new supplement and
the other group will receive the usual nutrition consultation. From
a pilot study the standard deviation in birthweight is estimated as
500gr and is assumed to be the same in both groups. The
hypothesis of no difference is to be tested at the 5% level of
significance. It is desired to have 80% power of detecting an
increase of 100gr.
n  2Sp2
z
1 
 z1 2
1   2 2
2500 2 1.65  0 / 842 

 309.26  310
2
100
2
A sample of 310 subjects should be studied in each of the two
groups.