Download No Slide Title

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Dr. Ka-fu Wong
ECON1003
Analysis of Economic Data
Ka-fu Wong © 2003
Chap 8- 1
Chapter Eight
Sampling Methods and the Central Limit
Theorem
GOALS





l
Explain why a sample is the only feasible way to
learn about a population.
Describe methods to select a sample.
Define and construct a sampling distribution of
the sample mean.
Explain the central limit theorem.
Use the Central Limit Theorem to find
probabilities of selecting possible sample means
from a specified population.
Ka-fu Wong © 2003
Chap 8- 2
Why Sample the Population?
 The physical impossibility of checking all
items in the population.
 The cost of studying all the items in a
population.
 The sample results are usually adequate.
 Contacting the whole population would often
be time-consuming.
 The destructive nature of certain tests.
Ka-fu Wong © 2003
Chap 8- 3
Probability Sampling
 A probability sample is a sample selected such
that each item or person in the population being
studied has a known likelihood of being included
in the sample.
Ka-fu Wong © 2003
Chap 8- 4
Methods of Probability Sampling
 Simple Random Sample: A sample
formulated so that each item or person in
the population has the same chance of
being included.
 Systematic Random Sampling: The items or
individuals of the population are arranged in
some order. A random starting point is
selected and then every kth member of the
population is selected for the sample.
Ka-fu Wong © 2003
Chap 8- 5
Methods of Probability Sampling
 Stratified Random Sampling: A population is
first divided into subgroups, called strata, and
a sample is selected from each stratum.
 Cluster Sampling: A population is first divided
into primary units then samples are selected
from the primary units.
Ka-fu Wong © 2003
Chap 8- 6
Potential problems with the sampling method of
“Sampling Straws”
 Choice of sampling method is important.
 As illustrated in the tutorial “Sampling Straws”
experiments, some sampling method can produce a
biased estimate of the population parameters.
 The bag contain a total of 12 straws, 4 of which are 4
inches in length, 4 are 2 inches long, and 4 are 1 inch
long.
 The population mean length is 2.33 (=4*(1+2+4)/12)
 Randomly draw 4 straws one by one with
replacement.
 Compute the sample mean.
 The average of the sample means of experiments is 2.37
 The corresponding standard deviation is 0.61
Ka-fu Wong © 2003
Chap 8- 7
Methods of Probability Sampling
 “Sampling Straws” experiments
 The bag contain a total of 12 straws, 4 of which are 4
inches in length, 4 are 2 inches long, and 4 are 1 inch long.
 The population mean length is 2.33 (=4*(1+2+4)/12)
 Randomly draw 4 straws one by one with replacement.
 Compute the sample mean.
 The average of the sample means of experiments is 2.37
 The corresponding standard deviation is 0.61
 The sample scheme is biased because the longer straws
have a higher chance of being drawn, if the draw is truly
random (say, draw your first touched straw).
 The draw may not be random because we can feel the
length of the straw before we pull out the straw.
Ka-fu Wong © 2003
Chap 8- 8
Methods of Probability Sampling
 “Sampling Straws” experiments
 The bag contain a total of 12 straws, 4 of which are 4
inches in length, 4 are 2 inches long, and 4 are 1 inch long.
 The population mean length is 2.33 (=4*(1+2+4)/12)
 Randomly draw 4 straws one by one with replacement.
 Compute the sample mean.
 The average of the sample means of experiments is 2.37
 The corresponding standard deviation is 0.61
 Alternative sampling scheme:
 Label the straws 1 to 12.
 Label 12 identical balls 1 to 12.
 Draw four balls with replacement.
 Measure the corresponding straws and compute the
sample mean.
Ka-fu Wong © 2003
Chap 8- 9
Methods of Probability Sampling
 In nonprobability sample inclusion in the sample
is based on the judgment of the person selecting
the sample.
 The sampling error is the difference between a
sample statistic and its corresponding population
parameter.
 Sampling error is almost always nonzero.
Ka-fu Wong © 2003
Chap 8- 10
Sampling Distribution of the Sample
Means
 The sampling distribution of the sample mean
is a probability distribution consisting of all
possible sample means of a given sample size
selected from a population.
Ka-fu Wong © 2003
Chap 8- 11
EXAMPLE 1
 The law firm of Hoya and Associates has five partners. At
their weekly partners meeting each reported the number of
hours they billed clients for their services last week.
1.
2.
3.
4.
5.
Partner
Hours
Dunn
Hardy
Kiers
Malinowski
Tillman
22
26
30
26
22
 The population mean is also 25.2 hours.
22  26  30  26  22

 25.2
5
Ka-fu Wong © 2003
Chap 8- 12
Example 1
 If two partners are selected randomly, how many different
samples are possible?
This is the combination of 5 objects taken 2 at a time. That
is:
5!
 10
5 C2 
2! (5  2)!
There are a total of 10 different samples.
Ka-fu Wong © 2003
Chap 8- 13
Example 1
Ka-fu Wong © 2003
continued
Partners
Total
Mean
1,2
48
24
1,3
52
26
1,4
48
24
1,5
44
22
2,3
56
28
2,4
52
26
2,5
48
24
3,4
56
28
2,4
52
26
2,5
48
24
Chap 8- 14
EXAMPLE 1
continued
 Organize the sample means into a sampling
distribution.
Sample Mean
Frequency
Relative Frequency probability
22
1
1/10
24
4
4/10
26
28
3
2
3/10
2/10
 The mean of the sample means is 25.2 hours.
22(1)  24( 4)  26(3)  28(2)
X 
 25.2
10
The mean of the sample means is exactly equal to the population mean.
Ka-fu Wong © 2003
Chap 8- 15
Example 1
 Population variance
= [ (22-25.2)2+(26-25.2)2 +… + (22-25.2)2 ] / 5 = 8.96
 Variance of the sample means:
=[ (1)(22-25.2)2+(4)(24-25.2)2 + (3)(26-25.2)2 + (2)(22-25.2)2 ]
/ ( 1+2+3+2) = 3.36
 The variance of sample means < variance of population
variance
 3.36/8.96 = 0.375 <1
Note that this is like sampling without replacement.
Ka-fu Wong © 2003
Chap 8- 16
Example
 Suppose we had a uniformly
distributed population
containing equal proportions
(hence equally probable
instances) of (0,1,2,3,4). If
you were to draw a very large
number of random samples
from this population, each of
size n=2, the possible
combinations of drawn values
and the sums are
Sums
Combinations
0
0,0
1
0,1 1,0
2
1,1 2,0 0,2
3
1,2 2,1 3,0 0,3
4
1,3 3,1 2,2 4,0 0,4
5
1,4 4,1 3,2 2,3
6
3,3 4,2 2,4
7
3,4 4,3
8
4,4
Note that this is sampling with replacement.
Ka-fu Wong © 2003
Chap 8- 17
Example
 Population mean = mean of sample means
 Population mean
= (0+1+2+3+4)/5=2
 Mean of sample means
= [ (1)(0) + (2)(0.5) + …+(1)(4) ] / 25
=2
Means
 Variance of sample means
= Population variance/ sample size
 Population variance
=(1-2)2 + … + (4-2)2 / 5
=2
 Variance of sample means
=(1)(0-2)2+… +(1)(4-2)2 / 25
=1
Ka-fu Wong © 2003
Combinations
0.0
0,0
0.5
0,1 1,0
1.0
1,1 2,0 0,2
1.5
1,2 2,1 3,0 0,3
2.0
1,3 3,1 2,2 4,0 0,4
2.5
1,4 4,1 3,2 2,3
3.0
3,3 4,2 2,4
3.5
3,4 4,3
4.0
4,4
Chap 8- 18
Probability Histograms
 In a probability histograms, the area of the bar
represents the chance of a value happening as a
result of the random (chance) process
 Empirical histograms (from observed data) for
a process converge to the probability
histogram
Ka-fu Wong © 2003
Chap 8- 19
Examples of empirical histogram
 Roll a fair die: 20, 50, 200 times
30
200 times
40
20 times
30
20
20
10
Percent
0
0
1
1
2
3
4
30
3
4
5
6
The empirical
histogram will
approach the
probability histogram
as the number of
draws increase.
DIE
50 times
DIE
20
10
Ka-fu Wong © 2003
2
6
Percent
Percent
10
0
1
DIE
2
3
4
5
6
Chap 8- 20
Empirical histogram of sum
 Roll a fair die 20 and sum the rolls
6
5
4
3
Percent
2
1
0
45
50
53
56
59
62
65
68
71
74
77
80
83
86
90
95
DIE_SUM
Ka-fu Wong © 2003
Chap 8- 21
Empirical histogram of average
 Roll a fair die 20 times and average rolls
6
5
4
3
Percent
2
1
0
2.25
2.65
2.50
2.95
2.80
3.25
3.10
3.55
3.40
3.85
3.70
4.15
4.00
4.50
4.30
4.75
DIE_SUM
Ka-fu Wong © 2003
Chap 8- 22
Distribution of Sample means of different sample
sizes and from different population distribution
 http://www.ruf.rice.edu/~lane/stat_sim/sampli
ng_dist/index.html
 http://www.kuleuven.ac.be/ucs/java/index.htm
and choose basic and distribution of mean.
 http://faculty.vassar.edu/lowry/central.html
Ka-fu Wong © 2003
Chap 8- 23
Central Limit Theorem
 For a population with a mean  and a variance 2
the sampling distribution of the means of all possible
samples of size n generated from the population will be
approximately normally distributed.
 The mean of the sampling distribution equal to  and
the variance equal to 2/n.
The population distribution
The sample mean of n observation
Ka-fu Wong © 2003
X ~ N( , )
2
X n ~ N ( , 2 / n )
Chap 8- 24
Central Limit Theorem: Sums
 For a large number of random draws, with replacement,
the distribution of the sum approximately follows the
normal distribution
 Mean of the normal distribution is
 n* (expected value for one repetition)
 SD for the sum (SE) is
n 
 This holds even if the underlying population is not
normally distributed
Ka-fu Wong © 2003
Chap 8- 25
Central Limit Theorem: Averages
 For a large number of random draws, with replacement,
the distribution of the average = (sum)/n approximately
follows the normal distribution
 The mean for this normal distribution is
 (expected value for one repetition)
 The SD for the average (SE) is

n
 This holds even if the underlying population is not
normally distributed
Ka-fu Wong © 2003
Chap 8- 26
Law of large numbers
 The sample mean converges to the population
mean as n gets large.
 For a large number of random draws from any
population, with replacement, the distribution of the
average = (sum)/n approximately follows the normal
distribution
 The mean for this normal distribution is the
(expected value for one repetition)
 The SD for the average (SE) is 
n
 SD for the average tends to zero as n increases.
 This holds even if the underlying population is not
normally distributed
Ka-fu Wong © 2003
Chap 8- 27
Point Estimates
 Examples of point estimates are the sample
mean, the sample standard deviation, the sample
variance, the sample proportion.
 A point estimate is one value ( a single point)
that is used to estimate a population parameter.
Ka-fu Wong © 2003
Chap 8- 28
Point Estimates
 If a population follows the normal distribution,
the sampling distribution of the sample mean
will also follow the normal distribution.
 To determine the probability a sample mean falls
within a particular region, use:
X 
z
 n
Ka-fu Wong © 2003
Chap 8- 29
Point Estimates
 If the population does not follow the normal
distribution, but the sample is of at least 30
observations, the sample means will follow the
normal distribution.
 To determine the probability a sample mean falls
within a particular region, use:
X 
z
s n
Ka-fu Wong © 2003
Chap 8- 30
Example 2
 Suppose the mean selling price of a gallon of
gasoline in the United States is $1.30. Further,
assume the distribution is positively skewed,
with a standard deviation of $0.28. What is the
probability of selecting a sample of 35 gasoline
stations and finding the sample mean within
$.08?
Ka-fu Wong © 2003
Chap 8- 31
Example 2
continued
 The first step is to find the z-values
corresponding to $1.24 and $1.36. These
are the two points within $0.08 of the
population mean.
X   $1.38  $1.30
z

 1.69
s n
$0.28 35
X   $1.22  $1.30
z

 1.69
s n
$0.28 35
Ka-fu Wong © 2003
Chap 8- 32
Example 2
continued
 Next we determine the probability of a z-value
between -1.69 and 1.69. It is:
P ( 1.69  z  1.69)  2(.4545)  .9090
 We would expect about 91 percent of the sample
means to be within $0.08 of the population mean.
Ka-fu Wong © 2003
Chap 8- 33
Chapter Eight
Sampling Methods and the Central Limit
Theorem
- END -
Ka-fu Wong © 2003
Chap 8- 34