Download Sample Distribution of the Mean and the Proportion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Sample Distribution of the Mean and the
Proportion
Daniel Royer
Geneva Business School
April 2016
1
1
Sampling Distribution of the Mean
The sample mean is unbiased because the mean of all the possible sample means (of
a given sample size, n) is equal to the population mean, µ.
The value of the standard deviation of all possible sample means, called the stand
error of the mean, expresses how the sample means vary from sample to sample.
The following equation defines the standard error of the mean when sampling with
replacement or sampling without replacement from large or infinite populations.
σ
σX̄ = √
n
where σ is the population mean.
• If the population’s distribution is Normal, with mean µ and standard deviation
σ, then regardless of the sample size n the sampling distribution of the mean
is normally distributed with mean µX̄ = µ, and standard error of the mean,
σX̄ = √σn
• However, in many instances either you know that the population is not normally
distributed or it is unrealistic to assume that the population is normally tributed.
• An important theorem in statistics, the Central Limit Theorem, deals with this
situation : the Central Limit Theorem.
Example
Oxford Cereals fills thousands of boxes of cereal during an eight-hour shift. As the
plant operations manager, you are responsible for monitoring the amount of cereal
placed in each box. To be consistent with package labeling, boxes should contain a
mean of 368 grams of cereal. Because of the speed of the process, the cereal weight
varies from box to box, causing some boxes to be underfilled and others overfilled.
If the process is not working properly, the mean weight in the boxes could vary too
much from the label weight of 368 grams to be acceptable.
Because weighing every single box is too time-consuming, costly, and inefficient, you
must take a sample of boxes. For each sample you select, you plan to weigh the
individual boxes and calculate a sample mean. You need to determine the probability that such a sample mean could have been randomly selected from a population
whose mean is 368 grams. Based on your analysis, you will have to decide whether to
maintain, alter, or shut down the cereal-filling process.
if you randomly select a sample of 25 boxes without replacement from the thousands
of boxes filled during a shift, the sample contains far less than 5% of the population.
Given that standard deviation of the cereal-filling process is 15 grams, the standard
error of mean is equal to
15
15
σ
=3
σX̄ = √ = √ =
n
5
25
2
How can you determine the probability that the sample of 25 boxes will have a below
365 grams ?
To find the area below 365 grams, you compute
Z=
−3
X̄ − µX̄
365 − 368
=
= −1.00
=
15
√
σX̄
3
25
The area corresponding to Z = -1.00 is 0.1587. Therefore, 15.87% of all possible
samples of 25 boxes have a sample mean below 365 grams.
If you select a sample of 100 boxes, what is the probability that the sample mean is
below 365 grams ?
Z=
X̄ − µX̄
365 − 368
−3
= −2.00
=
=
15
√
σX̄
1.5
100
The area less than Z = -2.00 is 0.0228. Therefore, 2.28% of the samples of 100 boxes
have means below 365 grams, as compared with 15.87% for samples of 25 boxes.
Sometimes you need to find the interval that contains a fixed proportion of the sample
means. You need to determine a distance below and above the population mean
containing a specific area of the normal curve.
σ
X̄ = µ + Z √
n
In our example, find an interval symmetrically distributed around the population
mean that will include 95% of the sample means, based on samples of 25 boxes.
If 95% of the sample means are in the interval, then 5% are outside the interval. Divide
the 5% into two equal parts of 2.5%. The value of Z corresponding to an area of 0.0250
in the lower tail of the normal curve is -1.96, and the value of Z corresponding to
a cumulative area of 0.9750 (i.e., 0.0250 in the upper tail of the normal curve) is +
1.96. The lower value of X (called XL ) and the upper value of X (called XU ) are
15
X¯L = 368 + (−1.96 √ = 362.12
25
15
X¯U = 368 + (1.96) √ = 373.88
25
Therefore, 95% of all sample means, based on samples of 25 boxes, are between 362.12
and 373.88 grams.
1.1
Sampling from Non-Normally Distributed Populations :
The Central Limit Theorem
The Central Limit Theorem states that as the sample size (i.e. the number of values
in each sample) gets large enough, the sampling distribution of the mean is approximately normally distributed. This is true regardless of the shape of the distribution
of the individual values in the population.
3
Figure 1 : Sampling distribution of the mean for different populations for samples
of n = 2, 5, and 30
2
Sampling Distribution of the Proportion
Consider a categorical variable that has only two categories, such as the customer
prefers your brand or the customer prefers the competitor’s brand. You are interested
in the proportion of items belonging to one of the categories-for example, the proportion of customers that prefer your brand. The population proportion, represented
by π, is the proportion of items in the entire population with the characteristic of
interest. The sample proportion, represented by p, is the proportion of items in the
sample with the characteristic of interest. The sample proportion, a statistic, is used
to estimate the population proportion, a parameter. To calculate the sample propor4
tion, you assign one of two possible values, 1 or 0, to represent the presence or absence
of the characteristic. You then sum all the 1 and 0 values and divide by n, the sample
size. For example, if, in a sample of five customers, three preferred your brand and
two did not, you have three 1s and two Os. Summing the three ls and two Os and
dividing by the sample size of 5 results in a sample proportion of 0.60.
2.1
Sample Proportion
p=
Number of items having the characteristic of interest
X
=
n
Sample size
The sample proportion, p, takes on values between 0 and 1. If all items have the
characteristic, you assign each a score of 1, and p is equal to 1. If half the items have
the characteristic, you assign half a score of 1 and assign the other half a score of 0,
and p is equal to 0.5. If none of the items have the characteristic, you assign each a
score of 0, and p is equal to 0.
The statistic p is an unbiased estimator of the population proportion, π.
2.2
Standard Error Of The Proportion
s
π(1 − π)
n
In most cases in which inferences are made about the proportion, the sample size
is substantial enough to meet the conditions for using the normal approximation.
Therefore, in many instances, you can use the normal distribution to estimate the
sampling distribution of the proportion
σp =
2.3
Finding Z For The Sampling Distribution Of The Proportion
Substituting p for X̄, π for µ, and
q
π(1−π)
n
for
√σ
n
p−π
Z=q
π(1−π)
n
2.4
we get
(1)
Example
To illustrate the sampling distribution of the proportion, suppose that the manager
of the local branch of a bank determines that 40% of all depositors have multiple
accounts at the bank. If you select a random sample of 200 depositors, the sample
size is large enough to assume that the sampling distribution of the proportion is
approximately normally distributed. Then, you can calculate the probability that the
5
sample proportion of depositors with multiple accounts is less than 0.30 by using
equation (1) :
p−π
Z=q
π(1−π)
n
0.30 − 0.40
= q
(0.40)(0.60)
200
−0.10
= q
0.24
200
−0.10
0.0346
= −2.89
=
Using normdist or a table we find that the area under the normal curve less than
-2.89 is 0.0019. Therefore, if the true proportion of items of interest in the population
is 0.40, then only 0.19% of the sample size of n = 200 would be expected to have
sample proportions less than 0.30.
6
2.5
Summary of the Key Equations
Figure 2 : Key Equations
7