Download 1342Lecture6.pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lecture 6
57
Instruction: Sampling Distributions
A population can be thought of as a set of measurements, either existing or conceptual.
Recall that a sample is a subset of measurements from the population. Recall also that a
population parameter is a numerical descriptive measure of the population. For example, the
population mean µ is a parameter. A statistic, on the other hand, is a numerical descriptive
measure of a sample such as the sample mean X .
Often we do not have access to all the measurements of an entire population, so we must
use samples instead. In such cases, we will use a statistic to make inferences about
corresponding population parameters. In order to evaluate the reliability of our inferences, we
will need to know the probability distribution for the statistic we are using. This probability
distribution is called a sampling distribution. Note that a sampling distribution is specific to a
particular statistic. For example, the sampling distribution of the mean is the distribution of all
the possible sample means of certain-sized samples. While the X -distribution is not the only
sampling distribution, it is a very important one.
Instruction: Sampling Distribution of the Mean
Consider a set of data that includes 100 samples of ten measurements. Suppose the table
below reflects the means of the 100 samples.
Classes of X for the 100 Samples
6<X<7
7<X<8
8<X<9
9 < X < 10
10 < X < 11
11 < X < 12
12 < X < 13
13 < X < 14
14 < X < 15
15 < X < 16
f f/100 = Relative Frequency
3
0.03
5
0.05
9
0.09
12
0.12
14
0.14
17
0.17
16
0.16
14
0.14
6
0.06
4
0.04
Since the relative frequencies may be thought of as probabilities, the table effectively represents
a probability distribution. Since X represents the mean measurement, then we can estimate the
probability of X falling into each class by using the relative frequencies. Accordingly, the
grouped relative frequency distribution given in Figure 1 represents a probability distribution of
the X values.
Lecture 6
58
0.2
0.15
0.1
0.05
0
6.
07.
0
7.
08.
0
8.
09.
9. 0
010
10 .0
.0
-1
1
11 .0
.0
-1
2
12 .0
.0
-1
3
13 .0
.0
-1
4
14 .0
.0
-1
5
15 .0
.0
-1
6.
0
Relative Frequency, f /100
Figure 1
Measurements
Each bar in Figure 1 represents the estimated probabilities of X values based on the table; thus,
the graph represents a probability sampling distribution for the sample mean based on random
samples of ten measurements.
We can see that the distribution is mound-shaped and almost bell-shaped. Irregularities
occur due to the small number of samples used (only 100 sample means) and the rather small
sample size (ten measurements per sample). These irregularities would become less obvious and
even disappear if the number of samples increased, if the number of classes increased, and if the
number of measurements per sample increased. In fact, the curve would eventually become a
perfect bell-shaped curve. This property of the sampling distribution for the sample mean is the
main conclusion of the two theorems discussed in the next portion of this lecture.
Instruction: Central Limit Theorem
The sample mean is said to be unbiased because the mean of all the possible sample
means of a certain size equals the population mean. Indeed, we can rely on the Sample Mean
Theorem stated below.
Let X be a random variable with a normal distribution whose mean is µ and
standard deviation is σ . Let X be the sample mean corresponding to
random samples of size n taken from the X-distribution. Then the Sample
Mean Theorem asserts that the following three statements are true:
a) The X -distribution is a normal distribution.
b) The mean of the X -distribution is µ .
c) The standard deviation of the X -distribution , called the standard
error of the mean, is σ n .
Lecture 6
59
Using the Sample Mean Theorem, we conclude that the X -distribution will be normal provided
that the X-distribution is normal regardless of the sample size. Furthermore, we can convert the
X -distribution to the standard normal Z-distribution using the formula below.
The Z-score for the sampling distribution of the mean is given by
Z=
X − µX
σX
=
X −µ
σ
n
=
(
n X −µ
σ
).
The Sample Mean Theorem gives complete information about the X -distribution provided the
original X-distribution is normal. It turns out, however, that the same conclusions can be had as
long as the sample size is "large enough" regardless of whether or not the X-distribution is
normal. This is the conclusion of the Central Limit Theorem stated below.
The Central Limit Theorem states that if X possesses any distribution with mean
µ and standard deviation σ , then the sample mean X based on a random
sample of size n will have a distribution that approaches the distribution of a
normal random variable with mean µ and standard deviation of σ n as n
increases without limit.
According to the Central Limit Theorem, the X -distribution will approximate the normal as the
sample size n increases without limit. Most statisticians agree that once n reaches at least thirty
the X -distribution will approximate a normal distribution closely enough to treat the
X -distribution as essentially normal.
Instruction: Normal Approximation to the Binomial Distribution
Sometimes the random variable of an experiment is categorical and can take only two
"values" (that is two categorical identities). For instance, consider a vaccine that protects 95% of
adults from a vaccine. From a population of vaccinated adults, a sample of adults could be
chosen and then observed to be protected or not protected. The random categorical variable, X,
takes on the so-called values of protected and not protected. In cases like this, the probability
that 480 adults from a sample of 500 are protected can be calculated using the binomial
distribution, but doing so would require tedious calculations. Luckily, the normal distribution
can be used to approximate the binomial distribution given the conditions stated below.
Let p be the probability of success and let 1 – p be the probability of failure in a single
binomial trial. Let n be the number of trials in the binomial experiment. If n, p, and
1 – p are such that both np > 5 and n(1 – p) > 5, then the normal probability
distribution with µ = np and σ = np (1 − p ) will be a good approximation to the
binomial distribution, and as n gets larger the approximation gets better.
Lecture 6
60
When the conditions above hold, the binomial distribution approximates the normal distribution.
In practice, one must keep in mind that the binomial distribution is discrete while the normal
distribution is continuous. Assume that the normal distribution is being used to approximate the
binomial for X successes. If X is a left endpoint of an interval, we subtract 0.5 to get the
corresponding normal variable X. If X is a right endpoint of an interval, we add 0.5 to get the
corresponding normal variable X. For instance, if we are interested in P ( X < 6 ) where X is a
binomial variable, we would approximate it with a normal variable X by calculating P ( X < 6.5 ) .
Similarly, if we are interested in P ( X > 9 ) where X is a binomial variable, we would
approximate it with a normal variable X by calculating P ( X > 8.5 ) .
Instruction: Sampling Distribution of the Proportion
Assume that 60% of a population meets a particular characteristic (while 40% of the
population does not). The parameter π indicates the proportion of the population that are
identified with the characteristic while the statistic p indicates the proportion of members of a
sample that are identified with the characteristic. If the sample proportion was calculated for
each of every possible n-sized sample, the distribution of the calculated p-values would be the
sampling distribution of the proportion, which is equal to the binomial distribution, which
approximates the normal distribution under certain conditions as we have seen.
The mean of the sampling distribution of the proportion equals π . The standard
deviation of the sampling distribution of the proportion is called the standard error of the
proportion is given by the equation below.
The standard error of the proportion denoted σ p is given
by
σp =
π (1 − π )
n
Using π for the population mean, p for the sample mean, and σ p for sample standard deviation,
we attain the Z-score for the sampling distribution of the proportion given below.
The Z-score for the sampling distribution of the proportion is given by
Z=
p −π
σp
=
p −π
π (1 − π )
n
Assignment 6
61
Problems
#1
Suppose a team of marine biologists studying a population of trout lengths has determined that
X has a normal distribution with µ = 10.2 inches and a standard deviation of σ = 1.4 inches.
What is the probability that the mean length of five trout taken at random is between 8 and 12
inches?
#2
A certain strain of bacteria occurs in all raw milk. Let X be the bacteria count per milliliter of
milk. The health department has found that if the milk is not contaminated, then X has a
distribution that is more or less mound-shaped but not symmetric. The mean of the Xdistribution is µ = 2,500 and the standard deviation is σ = 300 . In a large commercial dairy
the health inspector takes 42 random samples of the milk produced each day. At the end of the
day the bacteria in each of the 42 samples is averaged to obtain the sample mean bacteria count
X.
A) Assuming the milk is not contaminated, what is the mean and standard deviation of the
distribution of X ?
B) Assuming the milk is not contaminated, what is the probability that the average bacteria
count X for one day is less than 2,650 bacteria per milliliter?
C) At the end of each day, the health inspector should write a report to accept or reject the milk
produced that day. What should the health inspector do if the X for the day is greater than
2,650?
#3
The owner of a new apartment building must install twenty-five water heaters. From past
experience in other apartment buildings the owner knows that Sun-Temp is a good brand;
indeed, Consumer Reports has determined that the probability that a Sun-Temp water heater
will last ten years or more is 0.25. What is the probability that eight or more of the owner's
twenty-five new Sun-Temp water heaters will last at least ten years?
#4
According to Horseman's Quarterly, two out of five qualifying racehorses win money. If fifty
qualifying racehorses are selected at random, what is the probability that less than two-thirds of
the fifty horses will win money?
Related documents