Download 5 - Sylmar High School

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 7: Introduction to Sampling Distributions
Section
1
2
3
Title
Sampling Distributions
The Central Limit Theorem
Sampling Distributions for Proportions
Notes Pages
2–3
4–7
8 – 11
1
§7.1 Sampling Distributions
Recall from our studies in the Chapter 1, that a sample produces statistics by which we
will estimate parameters of the population.
Your book discusses this in terms of the types of inferences that we will be making about
populations in the coming chapters. Brase and Brase, further iterate that sample statistics
are used to make inferences about population parameters due to constraints on time,
money and effort. In reality, it is nearly an impossible task to summarize an entire
population, and therefore to find true population parameters. Here is a list and
description of the types of inferences we will be making in the coming chapters:
3 Types of Inferences
1)
Estimation which estimates the value of a parameter using sample
statistics.
2)
Testing to formulate a decision about a population parameter. The
decision will be based on sample statistics.
3)
Regression which will predict/forecast the results of a statistical
variable.
In order to evaluate the reliability of the inferences being made relies on the distribution
of the statistic being used. The distribution of the statistic is called a sampling
distribution. This section introduces us to the idea that statistics have distributions of
their own!
A sampling distribution is a probability distribution of all possible
simple random samples of size n, from the same population.
Note: The simple random samples must all be of the same size, n!
The sampling distributions that are typically studied in introductory statistics classes are
the distributions of:
We’ll study in
7.2 & 7.3
respectively
x-bar
which are approximately Normally distributed
p-hat
which are also approximately Normally distributed
s2
which is distributed with a distribution called Chi-Squared
s12/s22
which is distributed with an F-distribution
}
2
To see how a sample statistics takes on a distribution of it’s own, let’s take a look at some
samples of waiting times, to the nearest whole minute, of people standing in line at a
supermarket check out line.
Example:
These random samples were created using EXCEL and the values
are from a distribution called the Poisson. The Poisson is the
distribution that describes waiting times, and is the reason I used
this distribution to create this random data. (The lambda that Iused
was 2. If you would like to know more about the Poisson
distribution you are welcome to research it on your own.) What
you’ll notice based on a frequency histogram is that the average
waiting time for samples of individual waiting times is
approximately bell-shaped.
3
§7.2 Central Limit Theorem
The CLT (Central Limit Theorem) says that as the number of samples from any
population increases, their distribution will approach a normal distribution with mean
equal to the mean of the original sample and a standard deviation of the original divided
by the √n.
Central Limit Theorem
X is a random variable from any distribution with mean μ and
standard deviation σ.
n samples are drawn
The distribution of the means of the samples (x-bars) will be ~N (μx-bar, σx-bar)
μx-bar = μ
σx-bar = σ/√n
Note: Your book indicates that n ≥ 30 in order for this to be true, however, the more
symmetric the original distribution, the smaller the sample size is necessary, and if the
original distribution is normally distributed, then the distribution of sample means will
not need a sample size of 30 to be approximately normally distributed. We do like to
have a sample size of at least 30, if at all possible.
*The difference between this section and section 6.3 is the difference between a single sample and a
bunch of samples. In section 6.3 we were dealing with a single sample with a mean of x-bar and a
std. dev. of s. In this section we are dealing with many samples and the average of the samples’
means.
Example 1:
A soft drink vending machine is set so that the amount of drink
dispensed is a random variable with a mean of 200 mL and a
standard deviation of 15 mL. What is the probability that
a)
A single serving has at least 204 mL (assume normal distributed r.v.)
b)
the average amount dispensed in a random sample of 36 is at least
204 mL
*Note: Part a) is as it was in section 6.3 and part b) is using the CLT.
4
Example 2:
a)
A random sample of size 64 is taken from a normal population
with mean of 51.5 and std. dev. of 6.8. What is the probability that
the sample mean will
exceed 52.9?
b)
fall between 50.5 and 52.3?
c)
be less than 50.6?
d)
what’s the probability that a single observation would be below
50.6?
Example 3: We’re going to do #18 on p. 309 of your book too.
Dean Witter European Growth is a mutual fund that specializes in stocks
from the British Isles, continental Europe and Scandinavia. The fund has over 100
stocks. Let x be a random variable that represents the monthly percentage return from
this fund. Based on information from Morningstar Guide to Mutual Funds, x has a mean
of 1.4% and standard deviation of 0.8%.
a)
Is the average monthly return on the Dean Witter European
Growth fund approximately, normally distributed? Explain.
b)
After 9 months, what is the probability that the average montly
percentage return (an average of the 9 months’ x’s) will be between 1%
and 2%?
c)
After 18 months, what is the probability that the average monthly
percentage is between 1% and 2%?
5
d)
Compare the probability in b & c. Which is larger? Does this
make sense? (Think back to our soda example. As there are more sodas to
average, doesn’t the variation on the soda narrow, and therefore, the probability
for one soda to be a particular value is not as high as a sample of 5 and that isn’t
as high as a sample of 30 and that isn’t as high as a sample of 100.) Why
would this happen?
e)
At the of 18 months, the average monthly percentage return is
more than 2%. Would that shake your confidence that the
population mean is 1.4%? If after 18 months, the average was
more than 2%, would that mean that the European markets were
“heating up”? Explain your answer in terms of probability.
Something that also needs to be discussed is a concept that I mentioned in a discussion in
Chapter 1, a sample statistic being an unbiased estimator. This means that the mean of
the sample statistics distribution is equivalent to the value of the population parameter
being estimated. This is true for several of the sample statistics that we will be studying
and have studied. Two examples are:
x-bar and p-hat
The second discussion is one about the variability of the sampling distribution.
Sampling method and sample size both effect variability. As the sample size increases
variability decreases, since the variance (measure of variability) decreases in relation to the
original variability and the sample size (variance of sampling distribution is variance of
the population divided by sample size, and the std. dev. is the square root of that). We
saw this in part d) of Example 3 above. Because the variability about the original mean
decreases as the number of months increased that meant that the probability increased as
the number of months increased, because the standard deviation is decreasing. Draw a
picture to show this and you will see.
prob based on
factor of z
prob based on
factor of z
-(√n)•z
(√n)•z
-(√n)•z (√n)•z
6
Let’s look at another example that will highlight the change in variability as sample size
increases. As we, do this example, let’s keep in mind that as sample size increases, the
distribution becomes narrower, but there is more probability of being closer to the mean.
Think of it in terms of z-score and it will make more sense, as was shown in the diagrams
above.
Example 4:
a)
Exercise #10 on p. 307
Suppose x’s distribution has a mean of 5. Consider two
corresponding distributions of x-bar. The first based on samples of
size n = 49 and the second based on samples of size n = 81.
What is the value of the mean of each of the two distributions of
x-bar?
b)
For which x-bar distribution is P(x-bar>6) smaller? Explain.
c)
For which x-bar distribution is P(4 < x-bar < 6) greater? Explain.
7
§7.3 Sampling Distribution for Population Proportion
Recall the normal approximation to the binomial is used appropriately when
the following conditions are met.
np ≥ 5 and nq ≥ 5
or
n > 30
If
X ~ Binomial(n,p,q) then X ~ N(μ, σ) where
μ = np
σ = √npq
^
Thus, the distribution of p,p-hat,
the sample proportion, x/n, can be given
using the CLT as well.
p^ ~ N(p, √(pq/n) )
Note: Because p-hat = x/n , is a linear function of x, this means that the mean and standard
deviation of p-hat’s sampling distribution is also a linear function of x, which normally
distributed with mean of np and standard deviation of √npq. Thus, the mean of the sampling
dist. for p-hat is np/n or simply p. Also, the std. dev. of the sampling dist. for p-hat is (√npq)/n,
which is to say √[(npq)/n2], which simplifies to √(pq/n)
Part and parcel of the normal approximation to the binomial is continuity correction!
There is no exception here. Recall our discussion in §6.4 for continuity correction. See
page 16 and 17 of my Chapter 6 notes, for this review. Note that when doing continuity
correction for proportions this distills down to 0.5/n being added to or subtracted from the
proportion at hand.
Example:
a)
Let’s take #9 on p. 318 for an example
A mechanical press is used to mold shapes for plastic toys. when
the machine is adjusted and working well, it still produces about
6% defective toys. The toys are manufactured in lots of n=100.
Let x be a random variable representing the number of defective
toys in a lot. Then p-hat is x/n.
Why can p-hat be approximated by a normal random variable?
Note: Think in terms of the normal approximation to the binomial!
b)
What are the mean and standard deviation of the sampling
distribution of p-hat? (This is μp-hat and σp-hat).
8
c)
Suppose a lot of 100 toys had a 7% proportion of defective toys.
What is the probability of this event, or worse occurring? (That is
7% of the toys, or more, could be defective?)
Note: This is P(p-hat ≥ 0.07). Don’t forget the continuity correction!!!
d)
Suppose a lot of 100 toys had a 11% proportion of defective toys.
What is the probability that the situation could be that bad, or
worse? Do you think that the machine should maybe be adjusted?
Note: This is the P(p-hat ≥ 0.11). Don’t forget about the continuity correction. Compute the z-score for
this p-hat. What is that value? Recall that your book says that anything beyond 2.5 std. dev. from the mean
is an outlier. Also recall, the usual range in within 2 std. dev.
Example 2:
According to problem #10 on page 279 of Brase, about 64% of
people who are murdered, are murdered by someone they knew.
Suppose that in a random sample of 63 murder cases it is found
that 48 of the victims knew their murderer. What is the probability
that in samples of 63, at most 48 victims will have known their
murderer? Is the random sample an unusual one, compared to
other random samples of size 63?
Note: The difference between this example and the prior one is that we know the x & the n, so we need to
use this to compute p-hat before putting it into a probability.
Note2: This is different from the probability that was computed in b) of problem #10 on page 279 because
that probability was for single sample not for a sample proportion. This is exactly what we discussed in the
last section for the normal. In fact, if you compute b)’s probability for problem #10 on page 279 you will
find it is 0.9841, which is slightly larger than our computed probability. This agrees with what we found in
our example #4b) from the last section (the probability for p. 279 is based upon a sample of size 1, thus our
sample size has increased, and therefore our probability has decreased).
9
The last topic for discussion comes back to control charts like those that we saw in §6.1.
See my notes, p. 5 & 6 for explanation. When considering the sampling distribution for
the sample proportion we will also look at control charts, and we call them P-Charts.
There is one new form of notation that we need for P-Charts, a notation that we will see
again in relation to binomial.
Pooled P
p-bar = p =
sum of the observed x’s
Total number of observations
The control limits are set according to p-bar, and it’s complement, q-bar. These
correspond to the limits set in a control chart at 2 and 3 standard deviations from the
mean.
The mean is set as:
p-bar
First Set of Control Limits:
p-bar ± 2√(p-bar•q-bar)/n
Second Set of Control Limits:
p-bar ± 3√(p-bar•q-bar)/n
Example:
See p. 320 #13 for the following example’s description.
Day
1
2
3
4
5
6
7
8
9
x
60
53
61
66
67
55
53
58
60
p-hat 0.8
0.71
0.81
0.88
0.89
0.73
0.71
0.77
0.80
Day
x
p-hat
11
46
0.61
12
52
0.69
13
61
0.81
14
70
0.93
10
52
0.69
15
58
0.77
a)
Calculate p-bar = (sum of x’s) ÷ 15(75)
b)
Calculate 1st Set of Control Limits
p-bar ± 2√(p-bar•q-bar)/n
c)
Calculate 2nd Set of Control Limits
p-bar ± 3√(p-bar•q-bar)/n
10
d)
Graph sample number again p-hat
P-Chart
1
0.95
0.9
P-Hat
0.85
phat
2nd Control Up
2nd Control Down
1st Control Up
1st Control Down
0.8
0.75
0.7
0.65
0.6
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Days
Note: I created this P-Chart just the same way that I created the control chart in
Chapter 6.1. You should see if you can recreate it on your TI calculator.
11