Download Document

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Is there a familiar pattern
to the variability of x ?
• As the sample size becomes
larger, the distribution of the
sample mean becomes closer
to a normal distribution,
regardless of the population
from which the sample is
drawn.
• The central limit theorem by
Polya (1920’s) is a very
important theorem which
states that the distribution of
the sample mean is Normal
9 - 65
Central Limit Theorem
If a sufficiently large random sample
(i.e. n > 30) is drawn from a population
with mean, m, and variance, s2, the
distribution of the sample mean will
have the following characteristics:
1. an approximately normal
distribution regardless of the
distribution of the underlying
population.
2.
3.
m X  E( X )  m
s 2x 
s2
n
9 - 66
Example 7
Suppose the random variable X has a
mean of 50 and a standard deviation of 10.
Calculate the mean and the standard
deviation of the sample mean (standard
error) for each of following sample sizes:
(Assume the population is infinite.)
a. n=40
b. n=55
c. n=100
d. What are the sizes of the standard
deviation of the sample mean
(Standard errors) as the sample size
increases?
9 - 67
Example 7 - Solution
We are given that X has m = 50
and s = 10 and the population is
infinite. SE= s / n
a. m X  m  50
sX  s
n  10
40  1. 5811
b. m X  m  50
sX  s
n  10
55  1. 3484
9 - 68
Example 7 - Solution
c. m X  m  50
sX  s
n  10
100  1
d. It decreases–reflecting the
additional information
provided by a larger sample
size.
Summary
n = 40
n = 50
n = 100
s X = 1.5811
s X = 1.3484
sX = 1
9 - 69
Importance of the
Central Limit Theorem
• The most important feature of
this theorem is that it can be
applied to any population.
• Because the theorem does not
have any distribution
assumptions, it is widely
applicable and is one of the
cornerstones of statistical
inference.
9 - 70
Central Limit Theorem
and Sample Size
• The only restrictive feature of
the theorem is that the sample
size must be sufficiently large
for the theorem to be
applicable.
• Even if the distribution of the
population deviates
substantially from the normal
distribution, a sample size of
30 will usually be sufficiently
large to produce a sampling
distribution for x that is
approximately normal.
9 - 71
Distribution Shapes
• Population
Distribution
Distribution of the
Sample Mean for
Large Samples
Bimodal Population
exponential
population
Exponential Population
9 - 72
Distribution Shapes
•Population
Distribution
Distribution of the
Sample Mean for
Large Samples
normal
population
Normal Population
uniform
population
Uniform Population
9 - 73
Example 8
Suppose a sample of size 40 is
drawn from a population that has a
mean of 276 and a variance of 81.
What is the probability that the
mean of the sample will be less
than 273?
9 - 74
Example 8 - Solution
We are given that a sample of
size n = 40 is drawn from a
population that has m = 276 and
s = 81 = 9 .
By the CLT, X has a normal
distribution with
m X  m  276,
sX  s
n 9
40  1.423.
9 - 75
Example 8 - Solution
9 - 76
Example 8 - Solution
x
P( X < 273) = P(
X  mX
sX
 276 )
< 273
1.423
= P(z < -2.11) = .5 - P(-2.11 < z < 0)
= .5 -.4826 = .0174
z
9 - 77
Example 9
Suppose there is a normally
distributed population with a mean of
100 and a standard deviation of 10.
If X is the average of a sample of
50, find the following probabilities.
a. P( X  103)
b. P( X  96)
c. P(95  X  103)
9 - 78
Example 9 - Solution
We are given that X has a
normal distribution with m = 100
and s = 10 and n = 50.
By the CLT, X has a normal
distribution with
m X  m  100 ,
sX  s
n  10
50  1.4142.
9 - 79
Example 9 - Solution
a.
X  m X  103  100
P ( X  103) = P( s
1.4142 )
X
= P(z  2.12) = .5 + P(0 < z < 2.12)

.5 + .4830 = .9830
b.
X  m X  96  100
P ( X  96) = P( s
1.4142 )
X
= P(z  -2.83) = .5 + P(-2.83 < z < 0)

.5 + .4977 = .9977
9 - 80
Example 9 - Solution
c.
P (95  X  103)
X  m X  103  100
95

100
 s
= P(
)
1
.4142
1.4142
X
= P(-3.54  z  2.12)
= P(-3.54 < z < 0) + P(0 < z < 2.12)
 .5 + .4830 = .9830
9 - 81
Example 10
A travel agency conducted a
survey of the prices charged by
ocean cruise ship lines and
determined they were
approximately normally
distributed with a mean of $110
per day and a standard
deviation of $20 per day.
9 - 82
Example 10 - Questions
1. If an ocean cruise ship
line is chosen at random,
find the probability that
they will charge less than
$99 per day?
2. What is the probability
that the average charge
for a randomly selected
sample of 35 ocean cruise
shop lines will be less than
$99 per day?
9 - 83
Example 10 - Solution
1.
P(X < 99) = P( X  m < 99  110 )
s
20
= P(z < -.55)
= .5 - P(-.55 < z < 0)
= .5 - .2088 = .2912
9 - 84
Example 10 - Solution
2.
By the CLT, X has a normal
distribution with
m X  m  110 ,
sX  s
n  20
P( X < 99) = P(
35  3.381.
X  m X  99  110 )
sX
3.381
= P(z  -3.25) = .5 - P(-3.25 < z < 0)
= .5 - .4994 = .0006
9 - 85
The Distribution of the
Sample Proportion
9 - 86
Proportions
• There are many instances in
which the variable of interest is
a proportion.
• Examples:
– A marketing researcher may be
interested in what proportion of
persons on a mailing list will buy
their product.
– A college is concerned with the
fraction of freshmen that will be
in academic difficulty after the
first year.
9 - 87
Population Proportions
and Sample Proportions
• Population proportions must
be estimated just like
population means.
• The sample proportion is a
reasonable estimate of the
population proportion.
• Sample proportions vary
depending on the selected
samples.
9 - 88
Symbols
The symbols used to represent
the population and sample
proportions are
p - population proportion,
p - sample proportion.
9 - 89
How do you determine
a sample proportion?
When calculating a proportion,
the number in the sample that
possesses the characteristic of
interest goes in the numerator,
and the size of the sample is
placed in the denominator.
p = x
n
where x is the number in
the sample possessing the
characteristic of interest
9 - 90
What is the central
value of p ?
• The expected value (mean) of
the sample proportion is the
population proportion.
E( p ) = p
• Since the expected value of
the estimator p is equal to p,
then p is an unbiased
estimator of p.
9 - 91
What is the variance of p ?
• The variance of p is given by
p (1 p )
s 
.
n
2
p
• If the population proportion is
unknown (which is usually the
case), p can be estimated by p ,
and the variance of the sample
proportion is estimated as
p (1 p )
s 
.
n
2
p
9 - 92
Is there a familiar pattern
to the variability of p?
• The sampling distribution of p
approaches normality as n
becomes sufficiently large.
• The sample size is generally
considered “sufficiently large”
 5 and n(1-p)
 5.
if np
Sampling
Distribution
of p
p
9 - 93
Sampling distribution of
the Sample Proportion
If the population is infinite and
the sample is sufficiently large,
the distribution of p has the
following characteristics:
1. an approximately normal
distribution.
2. m p  E( p )  p .
p (1 p ) p (1 p )
3. s 

.
n
n
2
p
9 - 94
Sampling Distribution of
the Sample Proportion
If the population is finite and the sample
is sufficiently large, the distribution of p
has the following characteristics:
1. an approximately normal
distribution.
2.
m p  E( p )  p.
N-n p (1- p )
N-n p
3. s =



N-1
n
N-1
N-n p (1- p )
N-n p (1- p )



,
N-1
n
N-1
n
2
p
where N is the size of the population.
9 - 95
Since p is a good
estimator of p ...
Can limits be established for
the error in estimation?
Since the sampling distribution
of p is known, determining
probabilities for various errors
of estimation can be
determined.
9 - 96
Example 11
A random sample of 100
employees of a large steel
company has 30 females
and 70 males.
1. Find the sample
proportion of
female employees.
2. Find the sample
proportion of
male employees.
9 - 97
Example 11 - Solution
30

1. p =
= .30
100
30
2. p = 1 - (
) = .70
100
9 - 98
Example 12
Suppose that the true proportion of
Americans over 25 years old that
have a 4 year college degree is .35.
Find the mean and the standard
deviation of the sample proportion for
samples of the following sizes.
a. n = 38
b. n = 52
c. n = 75
d. What happens to the size of the
standard deviation of the sample
proportion as the sample size
increases?
9 - 99
Example 12 - Solution
a.
m p  p  .35
s p 
b.
p(1  p )
.35(1  .35)

.0774
n
38
m p  p  .35
p(1  p )
.35(1  .35)
s p 

.0661
n
52
9 - 100
Example 12 - Solution
c.
m p  p  .35
p(1  p )
.35(1  .35)
s p 

.0551
n
75
d.
It decreases–reflecting the
additional information provided
by the larger sample size.
9 - 101
Example 13
Suppose that the true population
proportion, p = .30.
What is the probability that the
sample proportion of a sample of
size 30 will be less than .20?
9 - 102
Example 13 - Solution
m p  p  .30
s p 
p(1  p )
.3(1  .3)

.08367
n
30
p has an approximately normal
distribution because
np = (30)(.3) = 9, and
n(1 - p) = (30)(.7) = 21
are both greater than or equal to 5.
9 - 103
Example 13 ans
•
•
•
•
Zstat= (0.2-0.3)/0.08367
=-1.195172
Rounded to -1.20
Area 0 to 1.20 in Table A is
0.3849
• Tail area =0.5-0.3849
• =0.1151 this is the area in the
left tail
9 - 104
Example 14
• The property manager of a large
office building would like to make
the building smoke free; however,
he does not want to upset too many
of his customers.
• He decides to randomly select 50 of
the workers in the building and ask
them whether or not they smoke.
• If the sample proportion of workers
who smoke is less than .30, the
property manager will make the
building smoke free.
9 - 105
Example 14
1. Find the probability
that the property
manager will make
the building smoke
free when the true
proportion of
smokers is .50.
2. Find the probability that the
property manager will not
make the building smoke free
when the true proportion of
smokers is .20.
9 - 106
Example 14 - Solution
1.
Because
np = (50)(.50) = 25 and
n(1-p)=(50)(.50) = 25
are both greater than or equal to
5, we can assume that p has an
approximately normal distribution
with
mp  p  .50,
p(1  p )
.5(1  .5)
s p 

.0707.
n
50
9 - 107
Example 14 - Solution
1.
The property manager will make
the building smoke free if p is
less than .30.
p  mp .3  .5
P(p < .30) = P( s
< .0707 )
p
= P(z < -2.83)
= .5 - P(-2.83 < z < 0)
= .5 - .4977 = .0023
9 - 108
Example 14 - Solution
2.
Because
np = (50)(.20) = 10 and
n(1-p)=(50)(.80) = 40
are both greater than or equal to
5, we can assume that p has an
approximately normal distribution
with
mp  p  .20,
p(1  p )
.2(1  .2)
s p 

.0566.
n
50
9 - 109
Example 14 - Solution
2.
The property manager will not
make the building smoke free if p
is greater than .30.
p  mp .3  .2
P(p > .30) = P( s
> .0566 )
p
= P(z > 1.77)
= .5 - P(0 < z < 1.77)
= .5 - .4616 = .0384
9 - 110
Other Forms of
Sampling
9 - 111
Probability Samples
• Probability samples enable
an analyst to determine the
probable errors that an
estimator might generate.
• They allow the analyst a
known degree of confidence in
their estimation.
• All statistical inference relies
on probability sampling.
9 - 112
Types of Probability
Samples
• Cluster sampling involves dividing
the population into clusters, and
randomly selecting a sample of
clusters to represent the population.
• In stratified sampling, the
population is divided into strata,
which are sub-populations.
• A strata can be any identifiable
characteristic that can be used to
classify the population.
• If the population consisted of
people, then strata could be sex,
income, political party, religion,
education, race, and location.
9 - 113
Pros and Cons of
Cluster Sampling
• Cluster sampling can be as
effective as simple random
sampling if the clusters are as
heterogeneous as the
population; however, clusters are
almost never as diverse as the
population.
• Smaller cluster sizes will result in
more representative samples.
• Cluster sampling simplifies the
task of constructing the sampling
frame, since the initial frame is
composed only of clusters.
9 - 114
Stratified Sampling
Stratified sampling can provide
greater accuracy if the
population is heterogeneous,
and sub-populations of the
population can be identified that
are relatively homogeneous.
9 - 115
Non-probability
Samples
• Non-probability samples are
a convenient means of
obtaining sample data.
• If data from a non-probability
sample is used to estimate a
population parameter, there is
no statistical theory that helps
define the potential error of the
estimate and hence no
statement about an estimate’s
reliability can ba made.
9 - 116
Types of
Non-probability
Samples
• A judgment sample is a
sample in which sample
values are selected by an
expert in the field.
• A convenience sample is a
convenient group of
observations.
• One of the worst forms of nonprobability samples is the
voluntary or self-selected
sample.
9 - 117
Almost Random
Samples
• The systematic sample, does
not clearly belong to
probability or non-probability
samples.
• In a systematic sample, every
kth member of the population is
included in the sample.
• Note: If there is some pattern
in the sampling frame that
corresponds to the sampling
pattern, an unrepresentative
sample may result.
9 - 118
Example 15 (a - c)
A social researcher in Florida
wants to determine the average
number of children per family in
the state.
a. What is the population of
interest?
b. What variable will be
measured?
c. What level of measurement is
the variable of interest?
9 - 119
Example 15 (a - c)
Solution
a. Population - families in the
state of Florida
b. Variable measured - number
of children per family
c. Level of measurement - ratio
9 - 120
Example 15 (d)
d. What are the steps that
would be necessary for each
of the following sampling
methods:
1. Simple random sampling
2. Cluster sampling
3. Stratified sampling
9 - 121
Example 15 (d)
Solution
1. Simple Random Sample – List all families in the state of
Florida (perhaps from a census,
phone books, tax returns etc.
– Assign sequential numbers to all
of the families (1 to N).
– Select n random numbers
between 1 and N from a random
number table (or generate
these).
– Select the families corresponding
to the random numbers.
9 - 122
Example 15 (d)
Solution
2. Cluster Sampling – e.g. Take a map and divide the
state of Florida into 1000 regions.
– Number the regions from 1 to 1000.
– Select n random numbers between
1 and 1000.
– Select the n regions corresponding
to the random numbers.
– Survey every family in the region
indicated by the random numbers.
9 - 123
Example 15 (d)
Solution
3. Stratified Sampling – e.g. Separate all families in the
state by income level.
– Number each family within the
income level.
– Select e.g. 100 random numbers
for each income level.
– Select the 100 families for each
income level indicated by the
random numbers.
9 - 124
Example 15 (e)
What sampling method do
you believe would be most
cost effective?
9 - 125
Example 15 (e)
Solution
The most cost effective
method would be cluster
sampling.
9 - 126
Example 16
• A biology professor is
interested in the proportion of
students at his college who are
pre-med. majors.
• In his next class
he asks the
students who are
pre-med. majors
to raise their hands.
• Fifty percent of the students
raise their hands.
9 - 127
Example 16
1. What type of sampling
technique was used for this
survey?
2. What type of biases may be
present in the responses?
3. Is 50% a reasonable point
estimate of the proportion of
students at the college who are
pre-med. majors? Explain.
9 - 128
Example 16 - Solution
1. Convenience
2. If the Biology course is a
required course for all
majors, then there may be a
larger proportion of freshmen
and sophomores in the class
than in the college population
as a whole.
9 - 129
Example 16 - Solution
2. If the Biology course is not a
required course for all
majors, then there may be a
larger proportion of students
in the class who are in
majors which require the
course, than in the college
population as a whole.
3. No. For the reasons cited in
part 2.
9 - 130