Download Notes – Section 9

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Notes – Chapter 18
Sampling Distributions
A parameter is a number that describes the population. In statistical practice, the value
of a parameter is unknown. (µ, σ, and now p or π)
A statistic is a number that can be computed from the sample data without making use
of any unknown parameters. We often use a statistic to estimate an unknown parameter.
(
x , sx, and now p̂ )
No longer is a proportion something we just compute from a set of data. We now see it as
a random quantity that has a distribution. We call that distribution the sampling
distribution model for the proportion.
Sampling variability is the concept that in repeated random sampling, the value of the
statistic will vary. This makes sense; the proportions vary from sample to sample because
the samples are composed of different values.
To describe sampling distributions, use the same descriptions as other distribution:
overall shape, outliers, center, and spread.
The term bias has been used to suggest that a sample technique favors a certain outcome.
When we use the term bias in relation to a sampling distribution, it is the idea that the
center of the sampling distribution is not that of the population.
A statistic used to estimate a parameter is unbiased if the mean of its sampling
distribution is equal to the true value of the parameter being estimated.
The variability of a statistic is described by the spread of its sampling distribution. The
spread is determined by the sampling design and the sample size. Larger samples give less
variability.
Sampling Distribution of a Sample Proportion – Categorical Data
Choose an SRS of size n from a large population with population proportion p having
some characteristic of interest. Let
p̂
be the proportion of the sample having that
characteristic. Then the sampling distribution of p is approximately normal as long as
the conditions on the following page are met.
So, provided we meet the conditions, this sampling distribution will be
N(p,
p (1  p ) )
n
Conditions:
1) Randomization. The sample should be a simple random sample (SRS) of the
population. (This is often difficult to achieve in reality. At the very least, we need to be very
confident that the sampling method was not biased and that the sample is representative
of the population.)
2) 10% Rule. In order to insure independence, we cannot take a sample that is too large
without replacement. As long as our sample is no more than 10% of our population size,
we protect independence.
3) Success/Failure. To insure that the sample size is large enough to approximate
normal, we must expect at least 10 successes and at least 10 failures.
np  10 and n(1 – p)  10
Examples
1) For the years 2000 – 2002, the proportion of mothers in the state of Texas under the
age of 18 that gave birth to children less than 2500 grams was 9.6%.
A) Draw the sampling distribution of p-hat based on a random sample of 200.
B) What is the probability that more than 12% of the sample of mothers gave birth to
children less than 2500 grams?\
C) What is the probability that less than 5% of the sample of mothers gave birth to
children less than 2500 grams?
2) Through the census bureau we know that approximately 64% of all US households
have children under the age of 16. We take a random sample of 100 households in the
GCISD attendance area and find that 68% of households have children under the age of
16. If the GCISD attendance area follows the national model, what is the probability that
we will get a proportion as large as 68%?
3) A manufacturer of computer printers purchases plastic ink cartridges from a vendor.
When a large shipment is received, a random sample of 200 cartridges is selected, and
each is inspected. If the sample proportion of defectives is more than .02, the entire
shipment will be returned to the vendor.
A) What is the approximate probability that the shipment will be returned if the true
proportion of defectives in the shipment is .05? Be sure to check the conditions necessary
for accurate probabilities using proportions.
B) What is the approximate probability that the shipment will not be returned when the
true proportion of defectives in the shipment is .10?
Sampling Distribution of a Sample Mean – Quantitative Data
Sample means are when a distribution is created from the means of many samples. We do
this because:
*Averages are less variable than individual observations
*Averages are more normal than individual observations
The mean and standard deviations of a population are  and  respectively. These are
parameters.
The mean and standard deviation calculated from sample data are statistics. We write
the sample mean
and the sample standard deviation as sx.
x
x
Suppose that
is the mean of an SRS of size n drawn from a large population with mean
 and standard deviation . Then the mean of the sampling distribution of x is  and
its standard deviation is /n.
x
***The values of
are less spread out for larger samples Their standard deviation
decreases at the rate n, so you must take a sample 4 times as large to cut the standard
deviation of x in half.
It makes sense that the shape of the distribution x depends on the shape of the
population distribution.
** If the population distribution is normal, then so is the distribution of the sample mean
regardless of sample size.
Even for skewed or odd shaped distributions, if the sample size is large enough, the
sampling distribution will still be approximately normal. This idea leads us to…
The Central Limit Theorem (CLT)
The mean of a random sample has a sampling distribution whose shape can be
approximated by a normal model. The larger the sample, the better the approximation will
be.
The sampling distribution of the sample mean
x
is close to the normal distribution
N(, /n).
The Law of Large Numbers
Draw observations at random from any population with finite mean . As the number of
observations drawn increases, the mean
of the observed values gets closer and closer
to .
x
The Central Limit Theorem (CLT) allows us to use normal probability calculations to
answer questions about sample means as long as we meet the following conditions.
Conditions:
1) Randomization. The sample should be a simple random sample (SRS) of the
population. (This is often difficult to achieve in reality. At the very least, we need to be very
confident that the sampling method was not biased and that the sample is representative
of the population.)
2) 10% Rule. In order to insure independence, we cannot take a sample that is too large
without replacement. As long as our sample is no more than 10% of our population size,
we protect independence.
3) Large Enough Sample. The truth is, it depends. There is no “for sure” way to tell. It is
common practice to say any sample where n ≥ 30, you are safe to assume normality for
the sampling distribution.
We said at the beginning that in most real life cases, we will not know the population
parameters (µ, σ, p or π) so we will have to use the sample statistics as estimates of those.
Our terminology changes just a little…
If we don’t know
µ - we estimate it with
x
σ – we estimate it with sx
p̂
p(π) – we estimate it with
if we use these estimates to calculate the variability for a sampling distribution, we now
call that the standard error (instead of the standard deviation).
So… If SD(
p̂ ) =
p (1  p )
n
Then the SE(
p̂ ) =
pˆ (1  pˆ )
n
X
And… If SD( x ) =
n
Then the SE(
x)=
sx
n
Examples:
4) A soft-drink bottler claims that can volume is normally distributed with a mean of 12 oz
of soda and a standard deviation of .16 oz. A random sample of sixteen cans are selected
and the soda volume determined for each one Because the population distribution is
normal, the sample distribution of x is also normal.
A) Give the mean and standard deviation of the sampling distribution.
B) Find the probability that the sample mean soda volume falls between 11.94 and 12.06
ounces.
C) In this sample, the sample average soda volume is found to be 11.9 ounces. How likely
is this to happen if the true population mean is 12 ounces?
5) The College Student Journal (Dec. 1992) investigated differences in traditional and
nontraditional students, where nontraditional students are generally defined as those 25
year old or older. Based on the study results, we can assume that the population mean
and standard deviation for the GPA of all nontraditional students is 3.5 and 0.5
respectively. Suppose that a random sample of 100 nontraditional students is selected
from the population of all nontraditional students.
A) Find the mean and standard deviation of the sampling distribution.
B) What is the approximate probability that the nontraditional student sample has a mean
GPA between 3.40 and 3.65?
C) What is the approximate probability that the sample of 100 nontraditional students has
a mean GPA that exceeds 3.62?
6) As a graduate student I randomly sampled 25 CHHS students and recorded their IQ
score. I got a mean of 136 with a standard deviation of 2.4. My professor tells me that for
the purposes of my research, that standard deviation is too high. I need to reduce it to no
more than 0.6.
A) How can I reduce the standard deviation?
B) What sample size would I need?
7)The counselors are working with SAT scores and take a random sample of 60 CHHS
students. Their data shows a math SAT mean of 580 with a standard deviation of 12.8 If
they need to reduce the variability to 1/8 of that, what sample size would they need?