Download Chapter 1: Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 7: Sample Variability
Empirical Distribution of Sample Means
9
8
7
6
5
Frequncy
4
3
2
1
0
6
.
8
7
.
2
7
.
6
8
.
0
8
.
4
8
.
8
9
.
2
9
.
6
1
0
.
0
1
0
.
4
1
0
.
8
1
1
.
2
S
a
m
p
l
e
M
e
a
n
Chapter Goals
• Investigate the variability in sample
statistics from sample to sample.
• Find measures of central tendency for
sample statistics.
• Find measures of dispersion for sample
statistics.
• Find the pattern of variability for sample
statistics.
7.1: Sampling Distributions
• To make inferences about a population, we
need to understand sampling.
• The sample mean varies from sample to
sample.
• The sample mean has a distribution; we
need to understand how the sample mean
varies and the pattern (if any) in the
distribution.
Sampling Distribution of a Sample Statistic: The
distribution of values for a sample statistic obtained from
repeated samples, all of the same size and all drawn from the
same population.
Example: Consider the set {1, 2, 3, 4}.
1. Make a list of all samples of size 2 that can be drawn from
this set. (Sample with replacement.)
2. Construct the sampling distribution for the sample mean
for samples of size 2.
3. Construct the sampling distribution for the minimum for
samples of size 2.
This table lists all
possible samples of
size 2, the mean for
each sample, the
minimum for each
sample, and the
probability of each
sample occurring
(all equally likely).
Sample
x
{1, 1}
{1, 2}
{1, 3}
{1, 4}
{2, 1}
{2, 2}
{2, 3}
{2, 4}
{3, 1}
{3, 2}
{3, 3}
{3, 4}
{4, 1}
{4, 2}
{4, 3}
{4, 4}
1.0
1.5
2.0
2.5
1.5
2.0
2.5
3.0
2.0
2.5
3.0
3.5
2.5
3.0
3.5
4.0
Minimum Probability
1
1
1
1
1
2
2
2
1
2
3
3
1
2
3
4
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
Summarize the information in the previous table to obtain the
sampling distribution of the sample mean and the sample
minimum.
Sampling Distribution
of the Sample Mean
x
1.0
1.5
2.0
2.5
3.0
3.5
4.0
P( x )
1/16
2/16
3/16
4/16
3/16
2/16
1/16
Histogram: Sampling
Distribution of the Sample Mean
P( x )
0
.
2
5
0
.
2
0
0
.
1
5
0
.
1
0
0
.
0
5
0
.
0
0
1
.
0
1
.
5
2
.
0
2
.
5
3
.
0
3
.
5
4
.
0
x
Sampling Distribution of the Sample Minimum
m
P (m )
1
7/16
2
5/16
3
3/16
4
1/16
Histogram: Sampling Distribution of the Sample Minimum
.
5
P (m) 0
0
.
4
0
.
3
0
.
2
0
.
1
0
.
0
1
2
3
4
m
Example: Consider the population consisting of six equally
likely integers: 1, 2, 3, 4, 5, and 6. Empirically investigate
the sampling distribution of the sample mean. Select 50
samples of size 5, find the mean for each sample, and
construct the empirical distribution of the sample mean.
The Population: Theoretical Probability Distribution
P( x )
0
.
1
8
  35
.
  17078
.
0
.
1
6
0
.
1
4
0
.
1
2
0
.
1
0
0
.
0
8
0
.
0
6
0
.
0
4
0
.
0
2
0
.
0
0
123456
x
Empirical Distribution of the Sample Mean
Samples of Size 5
x  3.352
sx  0.714
1
4
Frequency
1
2
1
0
8
6
4
2
0
1
.
8
2
.
3
2
.
8
3
.
3
3
.
8
4
.
3
4
.
8
5
.
3
Sample Mean
Note:
1. x : the mean of the sample means.
2. sx : the standard deviation of the sample means.
3. The theory involved with sampling distributions described
in the remainder of this chapter requires random sampling.
Random Sample: A sample obtained in such a way that each
possible sample of a fixed size n has an equal probability of
being selected.
(Every possible handful of size n has the same probability of
being selected.)
7.2: The Central Limit Theorem
• The most important idea in all of statistics.
• Describes the sampling distribution of the
sample mean.
• Examples suggest: the sample mean (and
sample total) tend to be normally
distributed.
Sampling Distribution of Sample Means
If all possible random samples, each of size n, are taken from
any population with a mean  and a standard deviation , the
sampling distribution of sample means will:
1. have a mean x equal to .
2. have a standard deviation  x equal to  n .
Further, if the sampled population has a normal distribution,
then the sampling distribution of x will also be normal for
samples of all sizes.
Central Limit Theorem
The sampling distribution of sample means will become
normal as the sample size increases.
Summary:
1. The mean of the sampling distribution of x is equal to the
mean of the original population:  x  .
2. The standard deviation of the sampling distribution of x
(also called the standard error of the mean) is equal to the
standard deviation of the original population divided by the
square root of the sample size:  x   n
Note:
a. The distribution of x becomes more compact as n
increases. (Why?)
2
2
x



n
b. The variance of : x
3. The distribution of x is (exactly) normal when the original
population is normal.
4. The CLT says: the distribution of x is approximately
normal regardless of the shape of the original distribution,
when the sample size is large enough!
Standard Error of the Mean
The standard deviation of the sampling distribution of sample
means:  x   n
Note:
1. The n in the formula for the standard error of the mean is
the size of the sample.
2. The proof of the Central Limit Theorem is beyond the
scope of this course.
3. The following example illustrates the results of the Central
Limit Theorem.
Graphical Illustration of the Central Limit Theorem:
Distribution
of x : n = 2
Original Population
10
20
30
x
10
Distribution
of x : n = 30
Distribution
of x : n = 10
10
x
20 x
10
x
7.3: Applications of the Central
Limit Theorem
• When the sampling distribution of the
sample mean is (exactly) normally
distributed, or approximately normally
distributed (by the CLT), we can answer
probability questions using the standard
normal distribution, Table 3, Appendix B.
Example: Consider a normal population with  = 50 and  =
15. Suppose a sample of size 9 is selected at random. Find:
1. P(45  x  60)
2. P( x  47.5)
Solution:
Since the original population is normal, the distribution of the
sample mean is also (exactly) normal.
x    50
x  
n  15
9  15 3  5
0.4772
0.3413
45
1
50
0
60
2
 45  50 x  50 60  50
P(45  x  60)  P



 5
5
5 
 P( 1  z  2)
 0.3413  0.4772  0.8185
x
z
0.3085
01915
.
47.5 50
.5 0
 x  50 47.5  50
P( x  47.5)  P


 5
5 
 P( z  .5)
 0.5000  01915
.
 0.3085
x
z
Example: A recent report stated that the day-care cost per
week in Boston is $109. Suppose this figure is taken as the
mean cost per week and that the standard deviation is known
to be $20.
1. Find the probability that a sample of 50 day-care centers
would show a mean cost of $105 or less per week.
2. Suppose the actual sample mean cost for the sample of 50
day-care centers is $120. Is there any evidence to refute
the claim of $109 presented in the report?
Solution:
The shape of the original distribution is unknown, but the
sample size, n, is large. The CLT applies.
The distribution of x is approximately normal.
x    109
x  
n  20
50  2.83
0.4207
0.0793
105
141
.
109
0
 x  109 105  109 
P( x  105)  P


 2.83
2.83 
 P( z  141
. )
 0.5000  0.4207  0.0793
x
z
• To investigate the claim, we need to examine how likely an
observation is the sample mean of $120.
• Consider how far out in the tail of the distribution of the
sample mean is $120.
• Compute the tail probability.
 x  109 120  109 
P( x  120)  P


 2.83
2.83 
 P( z  389
. )
 0.0001
• Since the tail probability is so small, this suggests the
observation of $120 is very rare (if the mean cost is really
$109).
• There is evidence to suggest the claim of  = $109 is
wrong.