Download Chapter 5. Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Lecture notes, Lang Wu, UBC
1
Chapter 5. Sampling Distributions
5.1. Introduction
In statistical inference, we attempt to estimate an unknown population characteristic,
such as the population mean, µ, using data in a sample, such as the sample mean,
x̄. That is, we might use the sample mean, x̄, to estimate the population mean, µ.
The accuracy of this estimation depends on the sample size, n, and the variability of
the data. We can better understand the uncertainty in the estimation, as well as the
basic idea behind statistical inference, by introducing an important concept called the
sampling distribution. A sample statistic (e.g., x̄,) can be conceptually viewed as a
random variable, because before we collect the data, we do not know what value the
statistic will take. The statistic might take on any number in a range of values. Thus,
it has a probability distribution, with the probability of certain values higher than the
probability of others. The mean and variance of this distribution can be used to estimate
the accuracy of using this statistic to estimate a population parameter.
The probability distribution of a statistic is called the sampling distribution
of this statistic. In other words, the sampling distribution of a statistic may be viewed
as the distribution of all possible values of this statistic. For example, the sampling
distribution of the sample mean, x̄, is the distribution of all possible values of x̄. So if
we take many samples from the same population and calculate x̄ for each of them, the
values we get will all fall somewhere along the distribution. By examining the sampling
distribution of x̄, we can get an idea of the variability and range of x̄, which are used to
determine the accuracy of using x̄ to estimate µ.
For example, suppose we wish to estimate the average sleep time of all students in
a university. Here, the population is “all students in the university,” and the population
parameter of interest is “average sleep time,” denoted by µ. We can randomly select
10 students from this university and record the average sleep time of these 10 students,
which is the sample mean x̄ with sample size n = 10. Suppose that x̄ = 6.5 (hours) for
these 10 students. If we randomly select another sample of 10 students, we may obtain a
different value of x̄, say, x̄ = 7 (hours). Repeating this procedure many times, we obtain
many values of x̄, such as 6.5, 7, · · · . The probability distribution of all possible values
of x̄ is called the sampling distribution of x̄. This procedure is used as an illustration
Lecture notes, Lang Wu, UBC
2
since it may not be feasible in practice. For some populations, such as a normally
distributed population, we can obtain the sampling distribution of the sample
mean, x̄, via theoretical derivations. We examine this more later.
Note that the population distribution and the sampling distribution are two different concepts. The population distribution refers to the distribution of a characteristic
in the population, while the sampling distribution refers to the distribution of a particular statistic for repeated samples taken from the same population. Note also that if we
randomly choose an individual from the population, the value of their characteristic can
be seen as a random variable, X, whose probability distribution follows the population
distribution.
There are many different sample statistics, so there are many different sampling
distributions. Here, we focus on the sampling distributions of the two most important
statistics:
• the sampling distribution of the sample proportion
• the sampling distribution of the sample mean
We focus on the above two sampling distributions because they are crucial to two respective population distributions: the binomial distribution (for discrete data) and the
normal distribution (for continuous data). The sample proportion is the most important statistic for a population with a binomial distribution, and the sample mean is the
most important statistic for a population with a normal distribution. Moreover, the
sampling distributions of these two statistics can be derived theoretically. For sampling
distributions of other statistics, such as the sample variance, readers are referred to more
advanced textbooks.
5.2. Sampling Distribution of the Sample Proportion
5.2.1. The Binomial Distribution
Before we discuss the sampling distribution of a sample proportion, we first introduce an
important distribution for a discrete binary population: the Bernoulli distribution. In
practice, many random variables take on only two possible values, often denoted by the
Lecture notes, Lang Wu, UBC
3
binary numbers 0 and 1 (or thought of as “success” and “failure”). Random variables
of this nature are said to follow a Bernoulli distribution. For example, a student taking
a course can either pass (1) or fail (0). If you toss a coin, you will get either heads (1)
or tails (0). In an election, a randomly selected person can either vote for candidate A
(1) or vote against candidate A (0). We can view these examples as experiments with
only two possible outcomes, often called Bernoulli trials.
Going back to the example regarding taking a course, let’s say we randomly select
10 students from a large class. Each student can pass or fail the course. We can view this
as an experiment consisting of 10 “trials,” with each trial having two possible outcomes
(pass or fail), and our interest lying in the number of students who pass the course.
Moreover, the 10 students may be viewed as independent and identically distributed.
Independent because they are randomly selected; identically distributed because we do
not know who will be selected so the probability of passing the course is the same for
all students in the class (e.g., each student has a passing probability of 0.8 and failing
probability of 0.2). The other examples above may be viewed in a similar way.
Example 1. In an election, a recent poll shows that 40% of people will vote for candidate
A. Suppose that three people are randomly selected. (1) What is the probability that
exactly two people vote for candidate A? (2) What is the probability that at least one
person votes for candidate A?
Solution: Here, each person has two options: vote for candidate A or vote for someone
else, so we can view each person as a random variable that follows a Bernoulli distribution. We can assume the three people are independent. Let Xi = 1 if person i votes for
candidate A and Xi = 0 otherwise, i = 1, 2, 3. Let X be the total number of people,
among the three who were selected, who vote for candidate A. Then, X = X1 + X2 + X3 ,
with X = 3 meaning all three people vote for candidate A.
(1) The probability that exactly two people vote for candidate A is given by


3
P (X = 2) =   × 0.42 × (1 − 0.4) = 0.288,
2


3
where the term   is the number of possible ways to have 2 out of 3 people vote for
2
candidate A, the term 0.42 × (1 − 0.4) is the probability that 2 people vote for candidate
A and the other one does not, assuming the 3 people are independent (so we can use
the multiplication rule and multiply the probabilities).
Lecture notes, Lang Wu, UBC
4
(2) The probability that at least one person votes for candidate A is
P (X ≥ 1) = 1 − P (X = 0) = 1 − 0.63 = 0.784.
Alternatively, we can use P (X ≥ 1) = P (X = 1) + P (X = 2) + P (X = 3) to get the
same answer, but the computation is more tedious.
In general, we can consider n independent and identically distributed Bernoulli
trials at once, where each trial has only two possible outcomes (“success” or “failure”).
We are often interested in the probability of a certain number of “successes.” Let p be
the probability of success for each trial, and let X be the total number of successes.
Then, the probability distribution of X is given by


n
P (X = k) =   pk (1 − p)n−k ,
k
k = 0, 1, 2, · · · , n.
The above distribution is called the binomial distribution, denoted by X ∼ B(n, p),
or X ∼ Bin(n, p). Thus, a Binomial distribution is determined by two numbers: the
number of trials, n, and the probability of success, p, with p being the only unknown
parameter (since n is usually known). This is different from the normal distribution
N (µ, σ), which is determined by two unknown parameters: the mean, µ, and the standard deviation, σ.
Remarks: 1). In practice, a binomial random variable X arises in the following settings: i) there are n i.i.d. Bernoulli trials, with n known and fixed; ii) the probability of
“success” p is the same for each trial; iii) X is the number of “successes” out of the n
trials.
2). The above n trials may be viewed as a sample of size n. The number of successes,
X, is the sample count. The proportion X/n, denoted by p̂, is the sample proportion,
and it indicates the proportion of the sample trials that were successful (i.e., the number
of successes divided by the total number of trials). The probability of success, p, is
the population proportion, and it represents the (usually unknown) true proportion of
success in the population. Since X is a count from a sample, the distribution of X may
be viewed as the sampling distribution of a count. Remember that X follows a binomial
distribution.
Theorem 1. If X ∼ B(n, p), then
E(X) = np,
Lecture notes, Lang Wu, UBC
5
V ar(X) = np(1 − p).
Thus, for a binomial random variable, X, we can immediately obtain its mean and
variance using the above formulas.
Example 2. Suppose the probability of getting a certain disease is 0.001, and suppose
50 people are randomly selected.
(1) What is the probability of exactly one person having the disease?
(2) What is the probability of at least one person having the disease?
(3) How many people should be selected so there is a 90% chance of at least one of them
having the disease?
(4) Find the mean and standard deviation of the number of people who have the disease
among the 50 people.
Solution: Each randomly selected person either has the disease or does not have the
disease. Let X be the number of people who have the disease among n randomly selected
people. We are working with a binomial distribution where n = 50 and p = 0.001.
(1) The probability that exactly one person has the disease is given by


50 
P (X = 1) = 
× 0.001 × 0.99949 = 0.0476.
1
(2) The probability that at least one person has the disease is given by


50 
P (X ≥ 1) = 1 − P (X = 0) = 1 − 
× 0.0010 × 0.99950 = 0.0488.
0
(3) In this case, n is unknown and needs to be determined. We need to find the value
of n so that P (X ≥ 1) = 0.9, i.e., 1 − P (X = 0) = 0.9 or P (X = 0) = 0.1. Thus


n 

× 0.0010 × 0.999n = 0.999n = 0.1,
0
i.e.,
n log(0.999) = log(0.1).
Solving the above equation, we have
n=
log(0.1)
≈ 2303.
log(0.999)
That is, we must select 2303 people to ensure there is 90% chance of at least one of them
having the disease.
(4) When n = 50 and p = 0.001, the mean and standard deviation of X are given by
E(X) = np = 50 × 0.001 = 0.05.
Lecture notes, Lang Wu, UBC
σX =
q
V ar(X) =
6
q
np(1 − p) =
√
50 × 0.001 × 0.999 = 0.22.
For n = 50, we have a mean of 0.05 people having the disease and a standard deviation
of 0.22 people.
Example 3. The probability of a battery life exceeding 4 hours is 0.135. There are
three batteries in use. (1) Find the probability that at most 2 batteries last for 4 or more
hours; (2) Find the mean and standard deviation of the number of batteries lasting 4 or
more hours.
Solution: A battery’s life will either exceed 4 hours or not exceed 4 hours. Let X be
the number of batteries lasting 4 or more hours. Here we have n = 3 and p = 0.135.
Thus,
(1) P (X ≤ 2) = 1 − P (X = 3) = 1 − 0.1353 × (1 − 0.135)0 = 0.997.
(2)
E(X) = np = 3 × 0.135 = 0.405.
√
σX = np(1 − p) = 3 × 0.135 × 0.865 = 0.59.
q
For n = 3, the mean is 0.405 batteries exceed 4 hours and the standard deviation is 0.59
batteries.
5.2.2. Sampling Distribution of the sample proportion
A major goal in statistics is to make inferences about unknown population parameters.
We do this by using sample statistics to estimate corresponding population parameters.
For example, we might use sample proportions to estimate population proportions or use
sample means to estimate population means. There is uncertainty in these estimations
because the value of a statistic will vary from one sample to the next. To measure the
uncertainty of each estimation, we look at the variability of the statistic (i.e., how much
its value might vary from one sample to the next). To do this, we need to find the
distribution of the sample statistic that is used to estimate the population parameter.
This distribution is called the sampling distribution of the corresponding statistic.
In this section, we consider a discrete population that follows a Bernoulli distribution (i.e., a population that is split into two groups, or a binary population), as
described in the previous section. For a population that follows a Bernoulli distribution,
the parameter of interest is the population proportion, p, which is the proportion of
“success” in the population (or the proportion of the population with the attribute of
Lecture notes, Lang Wu, UBC
7
interest). Recall the difference between a proportion and a percentage: a percentage is
a proportion multiplied by 100. A proportion is a number between 0 and 1, while a
percentage is a number between 0 and 100. Examples of population proportions include
the proportion of people who are literate, the proportion of people who smoke, the proportion of people with cancer, etc. Recall also the difference between a parameter and
a statistic: a parameter is a population characteristic, while a statistic is a
function or measure of data in a sample. The difference between the population proportion, p, and the sample proportion, p̂, is the difference between
a parameter and a statistic.
Let p be an (unknown) population proportion of success. We select a sample of
size n and think of it as n independent Bernoulli trials, with x being the number of
successes. Using the information in the sample, we can calculate the sample proportion
(denoted by p̂):
number of “successes” in the sample
=
sample size
We can then use p̂ as an estimation of p. For example, if the
p̂ =
x
.
n
unknown parameter p
is the proportion of people who smoke in Canada, then perhaps p̂ is the proportion of
people who smoke in a randomly selected sample of n individuals in Canada. Here, p̂ is a
number we can calculate and it gives us an estimate of p. From the previous section, we
know that before we collect the data, the number of successes, X, is a random variable
that follows a Binomial distribution. Once we have the data, we are interested in the
distribution of the sample proportion p̂ (i.e., the sampling distribution of p̂), which is
unknown. Remember that the sampling distribution of p̂ is the the distribution of all
possible values of p̂ if p̂ is calculated for an infinite number of samples of equal size taken
from the same population. This distribution will allow us to be fairly confident that the
actual value of p lies within a certain interval.
The distribution of p̂ is difficult to find, so we often approximate it with the normal
distribution, as described below in Theorem 3. In addition, the mean and standard
deviation of the distribution of p̂ can be easily found, as shown in Theorem 2 below.
Note that the normal distribution is completely determined by its mean and standard
deviation, but this property does not hold for all distributions.
Theorem 2. The mean and variance of the sampling distribution of the sample proportion p̂ are respectively given by
E(p̂) = p,
V ar(p̂) =
p(1 − p)
,
n
Lecture notes, Lang Wu, UBC
8
where p is the population proportion.
Note that Theorem 2 only gives the mean and variance (or standard deviation) of
the distribution of p̂. We still do not know what the exact distribution of p̂ is. However,
Theorem 3 below shows that the distribution of p̂ can be approximated by a normal
distribution.
Theorem 3. If the sample size n is sufficiently large such that
np ≥ 10
and
n(1 − p) ≥ 10,
then
(i) the sampling distribution
sample proportion, p̂, can be approximated by the
q of the
p(1−p)
,
normal distribution N p,
n
(ii) the distribution of the number of “successes,” X, can be approximated by the normal
q
distribution N np, np(1 − p) .
Theorem 3 shows that both p̂ and X may be approximated by normal distributions when the sample size, n, is large. Here, “large” means np ≥ 10 and n(1 − p) ≥ 10.
Some books use the condition np ≥ 5 and n(1 − p) ≥ 5. Readers should not worry about
the specific numbers 5 or 10. The key thing is, in order for the normal approximations to be accurate, n should be large and p should not be too close to
0 or 1. The larger the sample size, n, the more accurate the normal approximations.
Theorem 3 (ii) can be used to quickly calculate binomial probabilities. We know that X
follows B(n, p). However, computation of probabilities such as P (X < k) can be quite
tedious if k is not small. For example, P (X < 10) requires computation of 10 binomial
probabilities that are then added together. If we instead use the normal approximation
in Theorem 3 (ii), the normal distribution will quickly give us an approximate answer
to P (X < 10), as shown in the examples below.
We will explore the idea of inferring population parameters from sample statistics
in more detail in the next chapter. For now, we focus on familiarizing ourselves with the
relationships between parameters and sampling distributions (Theorem 2), as well as how
information can be gathered from an approximated sampling distribution (Theorem 3).
In the following examples, the population proportion, p, is already known, so we explore
how knowing this proportion can allow us to approximate the sampling distribution of
p̂ and then gather information from it.
Lecture notes, Lang Wu, UBC
Example 4.
9
Suppose 20% of people in a certain city smoke. A sample of 100 people
are randomly selected from this city. Find the probability that more than 30% of people
in this sample smoke.
Solution: Here the population proportion is known to be p = 0.2, and the sample size
is n = 100. The sample proportion p̂ can be viewed as a random variable before we
observe the data in the sample. Since np = 20 > 10 and n(1 − p) = 80 > 10, we can use
a normal approximation to find the probability P (p̂ > 0.3). By Theorem 3, we have,
approximately,

s
p̂ ∼ N 0.2,

0.2 × 0.8 
= N (0.2, 0.04).
100
Thus, an approximation of P (p̂ > 0.3) is given by
!
0.3 − 0.2
p̂ − 0.2
>
(standardization)
P (p̂ > 0.3) = P
0.04
0.04
≈ P (Z > 2.5) = P (Z ≤ −2.5) = 0.0062.
where we first use standardization (i.e., subtract the mean and divide by the standard
deviation) to get to the standard normal distribution, and then look up the probability
for the specific z value.
Note that for this problem, we can also do exact computation using the binomial distribution (where n = 100 and we are finding the probability that more than 30 people
smoke):
P (p̂ > 0.3) = P (X > 0.3 × 100) = P (X > 30)
= P (X = 31) + P (X = 32) + · · · + P (X = 100)






100  31
100  32
100  100
= 
0.2 × 0.869 + 
0.2 × 0.868 + · · · + 
0.2 × 0.80
31
32
100
which is very tedious to compute!
Example 5. A fair coin is tossed 60 times. Find the probability that less than 1/3 of
the results are heads.
Solutions: Let X be the number of heads. Here we have n = 60, p = 0.5, np =
n(1 − p) = 30 > 10, so we can use a normal approximation

p̂ ∼ N 0.5,
s

0.5 × 0.5 
= N (0.5, 0.0645).
60
Lecture notes, Lang Wu, UBC
Thus
P (p̂ < 1/3) = P
1
− 0.5
p̂ − 0.5
< 3
0.0645
0.0645
10
!
= P (Z < −2.58) = 0.0049.
This problem can also be solved exactly using binomial distributions, but the computation is again very tedious. The general method for these types of problems is to
approximate the binomial distribution with the normal distribution (after checking all
requirements are met), convert this distribution to a standard normal distribution using standardization, and then look up the probability for the resulting z value using a
standard normal table or statistical software.
5.3 The Sampling Distribution of a Sample Mean
In the previous section, we considered (discrete) binary populations that follow Bernoulli
distributions, as well as the sampling distribution of the sample proportion. In this section, we consider a population distribution that is continuous and has mean µ and
standard deviation σ. The population is not necessary normally distributed. (Remember that a normal distribution is completely determined by µ and σ but a general
continuous distribution may not be completely determined by µ and σ.) The parameters µ and σ are unknown. We will use the sample mean, x̄, as an estimate of the
population mean, µ. To measure the accuracy of this estimation, we need to find the
sampling distribution of the sample mean x̄, i.e., the distribution of all possible values
of the sample mean, x̄, if infinitely many samples of equal size are taken from the same
population and the mean is calculated for each of them. (Note: when we talk about
the sampling distribution of x̄, we are viewing x̄ as a random variable because we are
considering all possible samples. If we instead focus on a specific sample with observed
data, then x̄ is a number.)
When the population distribution is unknown (except that it is continuous), the
exact sampling distribution of the sample mean, x̄, cannot be known either. However,
if we know the population parameters, we can still obtain the mean and standard deviation of the sampling distribution of the sample mean, x̄, as shown in the theorem
below. Moreover, when the sample size is large, we can use a normal distribution to
approximate the sampling distribution of the sample mean, x̄.
Lecture notes, Lang Wu, UBC
11
Theorem 4. Consider a continuous population with mean µ and standard deviation σ.
When the population distribution is unknown, we have
(i) the mean of all possible values of x̄ (i.e., the mean of the sampling distribution of x̄,
or the mean of the sample mean) is equal to the population mean:
E(x̄) = µ;
(ii) The standard deviation of all possible values of x̄ (i.e., the standard deviation of the
√
sampling distribution of x̄, or the standard deviation of the sample mean) is n times
smaller than the population standard deviation:
σ
σx̄ = √ ,
n
or
V ar(x̄) =
σ2
.
n
As you can see, the formulas for the mean and standard deviation of the sample
mean distribution depend on the population µ and σ. This shows the relationship
between the parameters and the sampling distribution of the sample mean. In practice,
however, the parameters µ and σ are usually unknown, so we must estimate them using
the statistics we have from a sample. We use the sample mean, x̄, to estimate the
population mean, µ. Plugging x̄ instead of µ into Theorem 4(i), we get an estimate of
the mean of the sampling distribution of the sample mean. Similarly, we use the sample
standard deviation, σ̂ = s, to estimate the population standard deviation, σ. Plugging
σ̂ instead of σ into Theorem 4(ii), we get an estimate of the standard deviation of the
sampling distribution of the sample mean. We call this estimate the standard error of
the sample mean x̄, given by
σ̂
σ̂x̄ = √ .
n
In other words, the standard error is an estimate of the standard deviation of the sample
mean distribution. Since σx̄ =
√σ ,
n
the larger the sample size, n, the smaller the standard
error of the distribution of x̄ is (i.e., less variability in x̄), and so the more accurate x̄ is
as an estimate of µ. As an example, suppose that you wish to get an accurate measure
of your blood pressure. One way to increase your accuracy is to measure your blood
pressure as many times as possible and then take an average of the measurements.
In Theorem 4, we give the mean and standard deviation of the distribution of the
sample mean, x̄. We still do not know the exact distribution of x̄ since the population
distribution is unknown and mean and standard deviation cannot completely determine a
continuous distribution (unless it is a normal distribution). However, if the population
Lecture notes, Lang Wu, UBC
12
distribution is known to be normal or if the sample size, n, is large, the
distribution of x̄ is either exactly or approximately normal, as shown in the
theorem below.
Theorem 5. (i) If the population follows a normal distribution, N (µ, σ), then the
sample mean x̄ also follows a normal distribution exactly:
!
σ
.
x̄ ∼ N µ, √
n
(ii) If the population distribution is unknown but the sample size, n, is large (say, n ≥
25), then the sample mean, x̄, approximately follows the following normal distribution
!
σ
x̄ ∼ N µ, √
,
n
which is the same distribution as the one in (i).
Based on Theorem 5, when the sample size, n, is reasonably large, the distribu
tion of the sample mean, x̄, will approximately follow the distribution N µ, √σn . Some
books use n ≥ 25 as a condition and some books use n ≥ 30 or another number as a
condition. Readers should not worry too much about the specific number, since it just
sets a benchmark of accuracy for the normal approximation. The larger the value of
n, the more accurately the normal distribution will approximate the distribution of x̄.
Generally, if n < 10, the normal approximation may be poor.
Example 6. Suppose the weights of all adults in a large city form a distribution with
mean µ = 140 (pounds) and standard deviation σ = 20 (pounds). A sample of 25 adults
in the city is randomly selected. Find the probability that the mean weight of the adults
in the sample is at least 144 pounds.
Solutions: Here, we know the value of the parameters µ and σ so we can calculate the
mean and standard deviation of the distribution of the sample mean, x̄. E(x̄) = µ = 140
√
and σx̄ = √σn = 20/ 25 = 4. Since n = 25, we can approximate the sample mean
distribution by a normal distribution:
x̄ ∼ N (140, 4).
Now that we have approximated the sample mean distribution, we can calculate probabilities of certain values. We have
P (x̄ ≥ 144) = P (Z ≥
144 − 140
) = P (Z ≥ 1) = 0.1587.
4
Lecture notes, Lang Wu, UBC
13
Example 7. The weights of large eggs follow a normal distribution with a mean of 1
oz and a standard deviation of 0.1 oz. What is the probability that a dozen (12) eggs
weigh more than 13 oz.?
Solution: We are given the population mean, standard deviation, and distribution, so
we can directly use the above theorems. Since the population follows N (1, 0.1), the
√
sample mean x̄ follows N (1, 0.1/ 12) or N (1, 0.029). Let Xi be the weight of egg i,
iP= 1, 2, · · · , 12. Then, the total weight of 12 eggs is
12
i=1
12
Xi
P12
i=1
Xi and the mean weight is
= x̄. Thus,
P
12
X
i=1
!
Xi > 13
13
) = P (x̄ > 1.083)
12
1.083 − 1
= P Z>
0.029
= P (Z > 2.86) = 0.0021.
= P (x̄ >
In this example, the sample size n = 12 is not large, but we know the population
distribution, so we have the exact sampling distribution for the sample mean, x̄.
5.4. The Central Limit Theorem
In the previous sections, we have seen that regardless of whether the population is
discrete or continuous, the distributions of the sample proportion and sample mean
can be approximated by normal distributions when the sample sizes are large. There
is a reason for this – the normal approximations are justified by the so-called central
limit theorem (CLT). The central limit theorem is one of the most important theorems
in Statistics. Basically, the CLT says that, no matter what the population
distribution may be, when the sample size is sufficiently large, the mean of
i.i.d. random variables will be approximately normally distributed.
Note that both the sample proportion, p̂, and the sample mean, x̄, can be written
as means of independent and identically distributed (i.i.d.) random variables. This is
obvious for the sample mean, x̄. We can see that the sample proportion p̂ can also be
written as a mean:
Pn
i=1
xi
,
n
where each xi only takes on a value of 0 or 1. Note also that a simple random
p̂ =
sample (SRS) {x1 , x2 , · · · , xn } can be viewed as having i.i.d. random variables,
Lecture notes, Lang Wu, UBC
14
as noted earlier.
The Central Limit Theorem (CLT). The CLT can be stated as follows:
(i) If a continuous population has mean µ and standard deviation σ, when the sample
size n in a SRS is large, the sample mean approximately follows the following normal
distribution
!
σ
.
x̄ ∼ N µ, √
n
(ii) If a binary (or Bernoulli) population has proportion of “success” p, when the sample
size n in a SRS is large, the sample proportion approximately follows the following
normal distribution

p̂ ∼ N p,
s

p(1 − p) 
.
n
Remark: In the CLT above, the sample size, n, needs to be large in order for the
normal approximations to be accurate. For a continuous population, we usually need
n ≥ 25, while for a binary population, we need np ≥ 10 and n(1 − p) ≥ 10. These
are rough guidelines. The larger n is, the more accurate the normal distributions are
as approximations. An SRS ensures i.i.d. random variables because each individual is
randomly selected. Note that, for a continuous population, we do not need to know
the population distribution when applying the CLT. The CLT not only holds for
binary and continuous populations, but also holds for other populations, such
as counts. The key here is that the data in the sample must be i.i.d. (e.g.,
in an SRS,) and the statistic must be a sum or a mean.
The CLT can be used to provide an approximate distribution for a
statistic if the statistic can be written as a mean (or a sum) of i.i.d. random
variables. Since many statistics may be expressed (or approximated) as sums or means
of i.i.d. random variables, many statistics may be assumed to approximately follow
normal distributions. This explains why the normal distribution is the most common
distribution in statistics. However, some statistics, such as the median or the sample
standard deviation, cannot be written as a sum or mean of i.i.d. random variables. When
this is the case, the CLT cannot be used, so these statistics will not approximately follow
normal distributions even when the sample size is large.
The sampling distributions of the sample proportion and the sample mean are
examples of applications of the CLT. We give one more example below.
Example 8.
Suppose the scores in a standard test have an average of 500 and
Lecture notes, Lang Wu, UBC
15
a standard deviation of 60. A group of 49 students take the test. (1) What is the
probability that the average score of the group will fall between 480 and 520? (2) Find a
range of scores such that the group average will fall within this range with a probability
of 0.95.
Solution: In this example, we do not know the exact population distribution, but we
know it is continuous and has mean µ = 500 and standard deviation σ = 60. We can
assume the 49 students are an SRS. Since the sample size is 49, which is large, we may
apply the central limit theorem and approximate the distribution of the sample mean
by a normal distribution.
Let x̄ be the group mean. Then, the distribution of x̄ can be approximated by
√
x̄ ∼ N (500, 60/ 49) (i.e., N (500, 60/7)).
(1) We have
520 − 500
480 − 500
<Z<
)
60/7
60/7
= P (−2.33 < Z < 2.33)
P (480 < x̄ < 520) = P (
= 2P (0 < Z < 2.33) = 2(P (Z < 2.33) − 0.5) = 0.9802.
(2) From (1), x̄ ∼ N (500, 60/7) approximately. By the 68-95-99.8 rule for a normal
distribution, we have
P (µ − 2σx̄ < x̄ < µ + 2σx̄ ) ≈ 0.95.
So
2σx̄ = 2 ×
60
= 17.14,
7
and
500 − 17.14 = 482.86,
500 + 17.14 = 517.14.
Thus, with probability 0.95, x̄ will fall between 482.86 and 517.14. In this example,
we do not have to use the 68-95-99.8 rule. If we use a standard normal table, then we
should replace 2 with 1.96 in the above calculations.
Note that a continuous population can be converted into a binary population. For
example, in the above example, if we are only interested in the proportion of students
who scored over 600, then we have a binary population. The corresponding sample can
Lecture notes, Lang Wu, UBC
16
also be converted into binary data: each student’s score is either above 600 or below
600. When we convert continuous data into binary data, we will lose some information.
However, sometimes we are only interested in certain pieces of information, such as if
a student’s score is above 600 or not. In this sense, we do not actually lose any crucial
information.
5.5. Chapter Summary
In this chapter, we examined the sampling distributions of the sample proportion, p̂,
and the sample mean, x̄. These sampling distributions are important when making
statistical inferences for the unknown population proportion, p, or population mean, µ,
as will be shown in the next few chapters. When the sample size is large, the sampling
distributions of p̂ and x̄ can be approximated by normal distributions, which can then
be used in statistical inference. When the sample size is small, we must know the
population distributions in order to know the sampling distributions. The CLT can be
used to approximate the sampling distribution of a statistic if the statistic can be written
as a sum or mean of i.i.d. random variables.
5.6. Review Questions
1. What is a sampling distribution? Why do we need to consider sampling distributions?
2. Can you think of a sample that does not have i.i.d. random variables?
3. Can we use the CLT to find the sampling distribution of a sample correlation r?
Why?
4. I have a box containing a number of tickets numbered between -10 and +10. The
mean of the numbers is 0 and the standard deviation is 5. I am going to make
a number of draws, with replacement, from the box. If the mean of the numbers
that I draw falls between -1 and +1, I win and you will give me $10. Otherwise,
you win and I will give you $10. Which of the following number of draws will give
you the best chance of winning?
Lecture notes, Lang Wu, UBC
17
A. 10
B. 20
C. 100
D. There is insufficient information to tell
5. Suppose the daily precipitation in a city in December is uniformly distributed
between 0mm and 15mm. For the month of December (with 31 days), what is the
probability that the daily precipitation is less than 10 mm on at least 20 days?
Assume the daily precipitation for the different days are independent. Choose the
most appropriate answer.
A. Less than 0.16
B. Between 0.16 and 0.5
C. Between 0.5 and 0.84
D. Between 0.84 and 0.975
E. Greater than 0.975
6. Ture or false: For a continuous population, the sampling distribution of the sample
mean has the same mean as the population mean but has a smaller standard
deviation as long as the sample size is larger than 1.
7. Ture or false: The sample mean always under-estimates the population mean
because of sampling variation.
8. Ture or false: If the population is uniformly distributed on an interval, the sample
mean of a sample taken from this population will still be approximately normally
distributed if the sample size is large (say larger than 30).
9. The waiting time for a bus follows a uniform distribution with a mean of 5 hours
and a standard deviation of 1 hour. A student takes the bus 100 times in a
semester. There is a 95% chance that the average waiting time for this student
during that semester is approximately within which of the following hours of 5
hours.
(a) 0.1
(b) 0.2
(c) 2
(d) 1