Download Chapter 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 7
Sampling and sampling distributions
One of the reasons for taking a sample is to try to understand how the population is
distributed using a sample rather than a census. It probably appeals to most people that
you could gain a better understanding with a large sample as opposed to a small sample
(more information is better than less information). So, you might get a better
understanding with n=10000 as opposed to n=100 and n=100 might be better than n=1.
In this chapter we will study sampling from a known population (nobody would do this in
practice, why sample when you already know what the population looks like). However,
we can compare the results of the sample with something that we already know. This
will allow us to determine how much better off we would be if we increased the sample
size. It will also allow us to develop some rules to use when we consider sampling from
an unknown population.
What we want to do in this chapter is to determine
 If large samples are better than small samples.
 How much better large samples are than small samples.
 Construct some rules that we can use when sampling from an unknown
population.
Sampling from a known normal distribution
Suppose we took a sample of size n from a normal distribution with
  100 and   10 and then computed the sample mean. Now repeat this a large number
of times and plot the histogram. Figures 1,2, and 3 show histograms of sample means for
sample sizes n=1, n=10, and n=100.
Figure 1. The histogram of the means of 1000 samples of size n=1 taken from a normal
distribution
Here even with samples of size n=1, the distribution of sample means appears to be
___
normally distributed. The bins represent values of the sample means X
.
Figure 2. The histogram of 1000 samples of size n=10 taken from a normal distribution
Figure 2 also suggests that the distribution of sample means for samples of size n=10 is
___
normally distributed. However, the range of values of X is much smaller than it was for
samples of size n=1.
Figure 3. The histogram of 1000 samples of size n=100 taken from a normal distribution
___
The results for Figure 3 also indicate a normal distribution. The range of values of X is
still smaller than for samples of size n=10. Note that the spread (standard deviation) of
the distribution of sample means decreases as n gets larger.
The sampling distribution when sampling from a normal distribution
__

The distribution of sample means, X , will be normally distributed

This distribution will have a mean
X  

This distribution will have a standard deviation
X 

n
Suppose that you have a normal distribution with   100 and   10 . Find the
probability that a single observation taken from this distribution will be between 99 and
101. That is, find P(99<X<101).
(X  )
(99  100)  1

 0.1

10
10
( X   ) (101  100) 1
Z


 0.1

10
10
P(99  X  101)  P(0.1  Z  0.1)  0.5398  0.4602  0.0796
Z

Suppose now that we take a sample of size n=100 from this distribution. Find the
___
probability that the sample mean will be in [99,101]. That is find P(99  X  101).
 X    100

___

X

n

10
1
100
 ___

 X   __ 
X 
(99  100)  1
Z


 1
 ___
1
1
X


 X   __ 
X 
(99  100) 1
Z

 1
 ___
1
1
___
X
__
P(99  X  101)  P(1  Z  1)  0.8413  0.1587  0.6826
Suppose now that we take a sample of size n=1000 from this distribution. Find the
___
probability that the sample mean will be in [99,101]. That is find P(99  X  101).
 X    100

___
X


n

10
 0.316
10 00
 ___

 X   __ 
X 
(99  100)
1
Z


 3.16
X
0.316
0.316
 ___

 X   __ 
X
  (99  100)  1  3.16
Z
X
0.316
0.316
__
P(99  X  101)  P(3.16  Z  3.16)  0.9992  0.0008  0.9984
Of course we can also use Excel
Example 7.1 Suppose that and auditing team examines accounts receivable for a certain
firm. Unknown to the auditors the mean and standard deviation of these accounts is
  1332.52 and   237.55 (these are population values). The auditing team takes a
sample of n=36 accounts. Find the probability that the resulting sample mean will exceed
$1350. Find the probability that the sample mean will be less than 1300. Find the
probability that the sample mean will be between $1310 and $1360. Assume that
accounts receivable can be described by a normal distribution.
We have
 X    1332.52

X 
n

237.55
36

a) Find P X  1350
Z

 39.592

X     1350  1332.52  0.44
X
X
39.592

P X  1350  P( Z  0.44)  1  PZ  0.44  1  0.6700  0.3300

b) Find P X  1300

Z
X     1300  1332.52  0.82
X

X
39.592

P X  1300  P( Z  0.82)  0.2061
c) Find P(1310  X  1360)
Z
X     1310  1332.52  0.57
Z
X     1360  1332.52  0.69
X
X
X
39.592
X
39.592
P(1310  X  1360)  P(0.57  Z  0.69)  0.7549  0.2843
In Excel
Example 7.2 Suppose that the time it takes to takes to fabricate a central processor chip
for a computer can be described by a normal distribution with a mean of 35 minutes and a
standard deviation of 5 minutes. A time management team is studying the process in
hopes of improving it. The management team does not know what the mean fabrication
time is, so the take a sample of n=100 time histories to try to get an estimate of the true,
but unknown mean. Find the probability that the sample mean time is less than 34
minutes, the probability that it is greater than 36.3 minutes, and the probability that it will
be between 34 and 35.7 minutes.
 X    35

X 
n
5

100

 0.5
a) Find P X  34
Z

X     34  35  2.0
X

X
0.5

P X  34  PZ  2.0  0.0228


b) Find P X  36.3
Z

X     36.3  35  2.6
X
X
0.5

P X  36.3  PZ  2.6  1  PZ  2.6  1  0.9953  0.0047
c) Find

P 34  X  35.7

Z
X     34  35  2.0
Z
X     35.7  35  1.4

X
X
X
0.5
X

0.5
P 34  X  35.7  P 2.0  Z  1.4  0.9192  0.0228  0.8964
And using Excel
The central limit theorem
Figure 4. The normal distribution between 90 and 110
It might not be too surprising that the sample means taken from a normal distribution
would be normal, but let’s consider sampling from a uniform distribution where the
samples must be in the range [90,110]. The results of such samples are shown in Figures
5, 6, and 7.
Figure 5.
The histogram of 1000 sample means of size n=1 taken from a uniform
distribution between [90,110]
Note that for a sample of size n=1, we are just sampling the distribution, and so the
distribution of sample means just reproduces the distribution from which the sample was
taken. The results here are not normal, but are the for the uniform distribution.
Figure 6. The histogram of 1000 sample means of size n=10 taken from a uniform distribution between
[90,110]
Figure 7. The histogram of 1000 sample means of size n=100 taken from a uniform distribution between
[90,110]
If the sample size is increased to n=10, the distribution of sample means is starting to
look like a normal distribution. See the graph in Fig. 6. Fig. 7 shows the histogram of
1000 sample means of size n=100. The histogram of these 1000 means looks quite
normal.
It appears here that if the sample size is as small as n=10 the resulting sampling
distribution is normally distributed. In fact, if we make the size large enough, the
distribution of sample means will be normally distributed. As a rule of thumb, if the
sample size is on the order of n=30 then the sampling distribution will be normally
distributed. These results are the most important in statistics and are called the Central
Limit Theorem (CLT).
The Central Limit Theorem(CLT)
Regardless of the nature of the distribution from which a sample is taken, if the sample
size is large enough (rule of thumb, n=30 is large enough), then
__

The distribution of sample means, X , will be normally distributed

This distribution will have a mean
X  

This distribution will have a standard deviation
X 

n
Note: if the sample comes from a population that is normally distributed, the CLT will
hold for a sample of size n=1 or larger. In most practical applications in an unknown
situation, people will say the CLT hold for samples of size n=30 or larger.
The sampling distribution of the binomial distribution ( p̂ ).
Here we will vary a little bit from our rule of thumb of (n=30) being large enough for the
CLT to hold. We know a good deal more about the binomial; it is not just an unknown
distribution. The CLT will hold for the binomial when
np  5
nq  5
So the CLT will hold for the binomial when the normal approximation to the binomial
distribution is good. We do make a change here. We will find it convenient to look at
binomial problems in terms of the proportion of successes out of n trials rather than the
number of successes.
For the binomial distribution,
  E ( X )  np
where E(X) indicates the “expected value” of the distribution. It is another term for the
mean of a distribution. The expected value is the value you would expect to get for the
average result of performing an experiment a large number of times. Suppose that you
flipped a coin ten times where p=0.5. Call getting a head a Success and record X, the
number of S’s. Repeat this a large number of times and average the number of X’s. You
would expect this average to be five. So for n=10, p=0.5
E ( x)  np  5
Now define the proportion of successes in n trials to be
pˆ 
X
n
so that the mean or expected value of the proportion of successes in n trials for the
binomial is
E  pˆ  
E  X  np

 p.
n
n
The mean is just p, the probability of a success in any trial.
The standard deviation of the distribution in terms of p̂ is
 X   p  pq / n
For the binomial distribution, the sampling distribution of p̂ will be normally
distributed with
X
X
n
  p  pq / n
if
np  5 and nq  5 and the Z score is
 X  pˆ 
Z
 pˆ  p 
p
Example 7.3 Polls are almost always reported in terms of proportions (the percentage of
respondents that favor something) rather that in terms of X (the number of respondents
that favor it). Suppose that a poll has been commissioned in an election contest between
A and B. Consider a response for A to be a success. Suppose that 55% of all voters
actually favor A. The size of the poll is n=1200 voters. What is the probability that the
response for A will be in the range [52%,58%]?
In this problem p=0.55 and n=1200. This gives
 p  pq / n  (0.5)(0.45) / 1200  0.0144
Z
 pˆ  p   0.52  0.55  2.08
Z
 pˆ  p   0.58  0.55  2.08
p
p
0.0144
0.0144
P  0.52  pˆ  0.58  P(2.08  Z  2.08)  0.9812  0.0188  0.9624
In Excel
n=
p=
q=
sigma-p
P(0.52<p-hat<0.58)
1200
1200
0.55
0.55
0.45
0.45
0.0144 =SQRT(0.55*0.45/1200)
0.9628 =NORMDIST(0.58,0.55,0.0144,TRUE)-NORMDIST(0.52,0.55,0.0144,TRUE)
Example 7.4 A market survey is taken of n  1000 potential buyers to see how they
like a test product. Suppose that 10% of the population likes the product. What is the
probability that 12% or more of the test group will indicate that they like the product.
Does the CLT hold for the problem?
np  1000  0.1  100
nq  1000*0.9  900
so the CLT holds for this problem. The standard deviation of the sampling distribution is
 pˆ  pq / n  (0.1)(0.9) /1000  0.0095
So
z
and
 pˆ  p    0.12  0.10   2.10
p
0.0095
P  pˆ  0.12  P  Z  2.10  1  P  Z  2.10  1  0.9821  0.0179
So there is less than a 2% chance of getting a sample proportion greater than 12% if the
true population proportion is 10%.
Problems
7. 1 Suppose a sample of size n=10 is taken from a normal distribution with   150 and
  12 . Find
a. P  X  153
b. P 148  X  151
c. P  X  148 
7.2 Repeat problem 7.1 using a sample size of n=100.
7.3 A sample of size n=10 is taken from a population which is not normal, but has
  100 and   10 . Does the Central Limit Theorem hold? Can you find P  x  101
using the Central Limit Theorem?
7.4 Suppose household incomes in Flagstaff are normally distributed with   22, 000
and   2, 000 . A sample of size
n=10 households are sampled. Find
a. P  x  21000
b. P  21,599  x  22,500 P  21,599  x  22,500
c. P  x  21,500
7.5 Suppose a machine is producing defective items at a 10% rate. One thousand items
of the machines output are inspected. What is the probability that between 9% and
11% of the inspected items will be defective?
7.6 Repeat problem 7.5 to find the probability that between 8% and 12% of the inspected
items will be defective.
7.6 Suppose that 52% of the registered voters are in favor of a certain proposition placed
on an upcoming Arizona election ballot. A sample of n=1200 voters are selected at
Random. What is the probability that
a. between 49\% and 53\% of the sampled voters will favor the proposition?
b. a majority of the voters in the sample will favor the proposition?
c. more than 55\% of the voters in the sample will favor the proposition?
7.7 A machine is producing ball bearings with an average diameter of 101 cm and with a
standard deviation of 8cm. A sample of n=49 ball bearings is taken. What is the
probability that the sample mean will be between 98cm and 100cm?
Answers
7.1 a) 0.2148, b) 0.3045, 0.2981
7.2 a) 0.0062, b) $0.7492, c) 0.0475
7.3
7.4 a) 0.9429, b) 0.5209 , c) 0.2148
7.5 P  0.09  pˆ  0.11  P  1.05  Z  1.05  0.7062
7.6 0.9652
7.7 a) 0.7361, b) 0.9177, c) 0.0188
7.8 0.1878