Download Sample size determination

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
The Normal Probability
Distribution
 What is a distribution? A collection of scores, values,




arranged to indicate how common various values, or scores
are.
Mean (population, sample)
Standard deviation (population, sample)
Median
Mode
Scores in our class
CHARACTERISTICS OF A NORMAL DISTRIBUTION
Normal curve is symmetrical
two halves identical
-
Tail
Theoretically, curve
extends to - infinity
Tail
Mean, median, and
mode are equal
Theoretically, curve
extends to + infinity
AREAS UNDER THE NORMAL CURVE
 About 68 percent of the area under the normal curve
is within plus one and minus one standard deviation
of the mean. This can be written as m ± 1s.
 About 95 percent of the area under the normal curve
is within plus and minus two standard deviations of
the mean, written m ± 2s.
 Practically all (99.74 percent) of the area under the
normal curve is within three standard deviations of
the mean, written m ± 3s.
Between:
m 1s
68.26%
m 2s
95.44%
m3s
99.97%
m-3s
m-2s m-1s
m
m+1s m+2s
m+3s
Normal Distributions with Equal Means
but Different Standard Deviations.
s = 3.1
s = 3.9
s = 5.0
m = 20
Normal Probability Distributions with Different
Means and Standard Deviations.
m = 5, s = 3
m = 9, s = 6
m = 14, s = 10
What is this good for??
• describes the data and how it clusters, arranges around a
mean.
• it’s good for us because it can allow us to make statistical
inferences
CHARACTERISTICS OF A NORMAL
PROBABILITY DISTRIBUTION
 A normal distribution with a mean of 0 and a
standard deviation of 1 is called the standard normal
distribution.
 z value: The distance between a selected value,
designated X, and the population mean m, divided by
the population standard deviation, s.
Z = Xs- m
Disguised under z-score, normal scores, standardized score
What is it good for?
 Indicates how many standard deviations an observation is
above/below the mean
 It’s good, because it allows us to compare observations from
other normal distributions
 Is a 3.00 GPA UNLV student as good as a 3.00 GPA UCF
student?
EXAMPLE 1
 The monthly incomes of recent high school graduates
in a large corporation are normally distributed with a
mean of $2,000 and a standard deviation of $200. What
is the z value for an income X of $2,200? $1,700?
 For X = $2,200 and since z = (X - m) / s, then
z =.
EXAMPLE 1 (continued)
 For X = $1,700 and since z = (X - m)/s, then
 A z value of +1.0 indicates that the value of $2,200 is
___ standard deviation ______ the mean of $2,000.
 A z value of – 1.5 indicates that the value of $1,700 is
____ standard deviation ______ the mean of $2,000.
EXAMPLE 2
 The daily water usage per person in Toledo, Ohio is normally
distributed with a mean of 20 gallons and a standard deviation
of 5 gallons.
 About 68% of the daily water usage per person in Toledo lies
between what two values?
 m ± 1s = _____________
 That is, about 68% of the daily usage per person will lie
between __________________ gallons.
 Similarly for 95% and 99%, the intervals will be
__________________________________________ .
POINT ESTIMATES
 Point estimate: one number (called a point) that is used to estimate a
population parameter.
 Examples of point estimates are the sample mean, the sample standard
deviation, the sample variance, the sample proportion, etc.

EXAMPLE: The number of defective items produced by a machine
was recorded for five randomly selected hours during a 40-hour
work week. The observed number of defectives were 12, 4, 7, 14, and
10. So the sample mean is ____ . Thus a point estimate for the
weekly mean number of defectives is 9.4.
INTERVAL ESTIMATES
 Interval Estimate: states the range within which a
population parameter probably lies.
 The interval within which a population parameter is
expected to occur is called a confidence interval.
 The two confidence intervals that are used extensively are
the 95% and the 99%.
 A 95%confidence interval means that about 95% of the
similarly constructed intervals will contain the parameter
being estimated.
INTERVAL ESTIMATES (continued)
 Another interpretation of the 95% confidence interval is
that 95% of the sample means for a specified sample size
will lie within 1.96 standard deviations of the
hypothesized population mean.
 For the 99% confidence interval, 99% of the sample means
for a specified sample size will lie within 2.58 standard
deviations of the hypothesized population mean.
Determining Sample Size for
Probability Samples
 Financial, Statistical, and Managerial Issues
 The larger the sample, the smaller the sampling error, but
larger samples cost more.
 Budget Available
 Rules of Thumb
Typical Sample Sizes
Number of
subgroup
analyses
Consumer research
National
Special
population
population
Business research*
National
Special
population
population
None/few
200-500
100-500
20-100
20-50
Average
500-1000
200-1000
50-200
50-100
Many
1000-2000
500-1000
200-500
100-250
Sample Size Determination
 Sample size depends on
 Allowable Error/level of precision/ sampling error (E)
 Acceptable confidence in standard errors (Z)
 Population standard deviation (s)
Sample size determination
 Problem involving means:
 Sample Size (n) = Z2 s2 / E2




where:
Z = level of confidence expressed in standard errors
s = population standard deviation
E = acceptable amount of sampling error
Sample size determination
 Problem involving proportions:
 Sample Size (n) = Z2 [P(1-P)] / E2
Sampling Exercise
 Let us assume we have a population of 5 people whose names
and ages are given below:





Abe
Bob
Cara
Don
Emily
24
30
36
42
36
Average of all samples of size = 1
 Abe 24
 Bob 30
 Cara 36
 Don 42
 Emily
48
 Average of all possible “size = 1” samples= 36
Average of all samples of size = 2











Abe, Bob
(24+30)/2 = 27
Abe, Cara
30
Abe, Don
33
Bob, Cara
33
Abe, Emily
36
Bob, Don
36
Bob, Emily
39
Cara, Don
39
Cara, Emily
42
Don, Emily
45
Average of all possible “size = 2” samples= 36
Average of all samples of size = 3

Abe, Bob, Cara
30

Abe, Bob, Don
32

Abe, Bob, Emily
34

Abe, Cara, Don
34

Abe, Cara, Emily 36

Bob, Cara, Don
36

Bob, Cara, Emily 38

Abe, Don, Emily
38

Bob, Don, Emily
40

Cara, Don, Emily 42
 Average of all possible “size = 3” samples= 36
Average of all samples of size = 4
 Abe, Bob, Cara, Don
 Abe, Bob, Cara, Emily
 Abe, Bob, Don, Emily
 Abe, Cara, Don, Emily
 Bob, Cara, Don, Emily
33
34.5
36
37.5
39
 Average of all possible “size = 4” samples= 36
Average of all samples of size = 3
 Abe, Bob, Cara, Don, Emily
36
 Average of all possible “size = 5” samples= 36
What can be learned?
 What is the average of the average of the sample for a given
size?
 Does the mean of any individual sample equal to the
population mean?
 Range of values for each sample size category?
Sampling Distribution
 Population distribution: A frequency distribution of all
the elements of a population.
 Sample distribution: A frequency distribution of all the
elements of an individual sample.
 Sampling distribution- a frequency distribution of the
means of many samples.
Normal Distribution
 Central Limit Theorem - Central Limit Theorem—distribution
of a large number of sample means or sample proportions will
approximate a normal distribution, regardless of the distribution
of the population from which they were drawn
The Standard Error of the Mean
Applies to the standard deviation of a distribution of
sample means.
sx
=
s
√ n
Sampling Distribution of
the Proportion
The Standard Error of the Distribution of Proportions
Applies to the standard deviation of a distribution of sample
proportions.
Sp
=
√ P (1-P)
n
where:
Sp = standard error of sampling
distribution proportion
P = estimate of population
proportion
n = sample size
Sample size determination – adjusting
for population size
 Make an adjustment in the sample size if the sample size is
more than 5 percent of the size of the total population.
Called the Finite Population Correction (FPC).
sx
=
s
√ n
√
N-n
N-1