Download Chapter 3: Displaying Categorical Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Chapter 7: Sampling Distributions
Name:
M&M Activity: We want to figure out what proportion of M&M’s are green. We can’t possibly count every green
M&M in the world so we are going to take samples instead and see what happens.
 Open pack of M&M’s. Do not eat yet.
 Take a sample of size 20.
 Find the proportion of green M&M’s and write it here.
 Put the M&M’s back and repeat. Do three trials total.
Graph and describe the sampling distribution of proportions:
Notation:
p  true population proportion of green M&Ms
p̂  sample proportion of green M&Ms
Other Definitions: Sampling Error/Sampling Variability
The natural variability we can expect from one sample to another.
What’s the point of all this?
Different random samples give different values for a statistic. The model of a sampling distribution shows the
behavior of the statistic over all the possible samples for the same size n. We can use certain assumptions and
conditions that, if met, can help us describe the shape, center, and spread of certain sampling distributions.
Provided that the sampled values are independent and the sample size is large enough, the sampling distribution for
p̂
will follow a normal model with
 ( pˆ )  p and SD( pˆ ) 
pq
.
n
In other words, “Sampling models are what makes statistics work. They inform us about the amount of variation
we should expect when we sample.” Stats: Modeling the World, page 414.
Assumptions/Conditions to check before using a Sampling Distribution Model for a Proportion:
2 Assumptions
1. The sampled values must be independent
of each other.

3 Conditions
1. The sample should be a simple random sample of the
population. Sometimes it is difficult or impossible to get
an SRS. At least we need to be very confident that the
sampling method was not biased and that the sample
should be representative of the population.
2. If sampling has not been made without replacement, then
the sample size, n, must be no larger than 10% of the
population. There are other ways in which samples can fail
to be independent, but the only good protection from such
failures is to think carefully about possible reasons for
the data to fail to be independent. There are no simple
conditions that guarantee independence.
2. The sample size, n, must be large enough.
3. The sample size must be big enough for both np and nq to
be greater than 10. In other words, we must expect at
least 10 successes and at least 10 failures to have enough
data to make conclusions.
Interesting To Think About: How large does a sample need to be as the proportion changes?
Example 1: Assume that 30% of students at a large university wear contact lenses. We randomly pick 100
students. Find the probability that more than 1/3 of this sample wear contacts?
Check assumptions and conditions!
Independence:
1. Randomization Condition: It was stated that the sample was chosen randomly.
2. !0% Condition: 100 students is most likely less than 10% of the population of a large university, since a
large university will most likely have more than 1000 students. Also, it is reasonable to believe that
whether or not a student wears contacts is independent of whether other students wear contacts.
Sample Size:
3. Success/Failure:
np ³ 10
nq ³ 10
100(.3) ³ 10
100(1-.3) ³ 10
30 ³ 10
70 ³ 10
The sample is large enough.
Since all of the conditions are met we may assume that this sampling distribution follows a normal model
with a mean of .3 and a standard deviation of
(
P p̂ > 1
3
(.3)(.7)
= .0458 .
100
) = normalcdf ( 13, 99999, .3, .0458) = 23.35%
Example 2: A restaurant anticipates serving about 180 people on a Friday evening, and believes that about 20%of
the patrons will order the special. How many of those meals should he plan on serving in order to be pretty sure of
having enough ingredients on hand to meet customer demand?
Check assumptions and conditions!
Independence:
1. Randomization Condition: It is reasonable to believe that the 180 customers that come to the restaurant
that night are a random sample of all patrons of the restaurant.
2. !0% Condition: 180 customers is most likely less than 10% of the population of all of the people who have
ever gone to this restaurant, since a restaurant will most likely serve more than 1800 patrons. Also, it is
reasonable to believe that whether or not a customer will order the special is independent of whether
other customers order the special.
Sample Size:
3. Success/Failure:
np ³ 10
nq ³ 10
180(.2) ³ 10
180(1-.2) ³ 10
36 > 10
144 > 10
The sample is large enough.
Conclusion:
Since all of the conditions are met we may assume that this sampling distribution follows a normal model with a
mean of .2 and a standard deviation of
(.2)(.8)
= .0298 .
180
P ( p̂ > ?) £ .05
z = invnorm(.95) = 1.6449
p̂ -.2
1.6449 £
Þ p̂ ³ .2490
.0298
So, they should expect at least 24.90% of the 180 patrons to order the special, in other words they’ll need
enough ingredients to plan on serving at least 45 orders of the special.
What’s the point?
As the sample size grows, the sampling distribution of means becomes more and more symmetric and unimodal.
More specifically….
The Central Limit Theorem:
The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model.
The larger the sample, the better the approximation will be.
Why we need it:
To approximate a sampling distribution of means of populations that are not normally distributed.
Assumptions/Conditions to check before using a Sampling Distribution Model for a Mean:
1. The sampled values must be independent
1. The data values must be sampled randomly or the concept
of each other.
of a sampling distribution makes no sense.
2. If sampling has not been made without replacement, then
the sample size, n, must be no larger than 10% of the
population. There are other ways in which samples can fail
to be independent, but the only good protection from such
failures is to think carefully about possible reasons for
the data to fail to be independent. There are no simple
conditions that guarantee independence.
2. The sample size, n, must be large enough.
3. There is no one-size-fits-all rule for the large enough
condition. If the population is unimodal and symmetric,
even a relatively small sample is okay, but for a strongly
skewed population a larger sample size is needed. We
will not worry about exact numbers at this time.
The Sampling Distribution Model for the Mean:
If the above conditions are met then the sampling distribution model for a mean will follow a normal model with a
mean
m , equal to the population mean, and a standard deviation s (y) = SD(y) =
deviation of the population.
s
n
, where
s
is the standard
Example 3: The weight of potato chips in a medium-size bag is stated to be 10 ounces. The amount that the
packaging machine puts in these bags is believed to have a Normal distribution with mean 10.2 ounces and standard
deviation 0.12 ounces. What is the probability that the mean weight in a 12-bag case is below 10 ounces?
Example 4: Grocery store receipts show that customer purchases have a skewed distribution with a mean of $32
and a standard deviation of $20.
1.
Explain why you cannot determine the probability that the next customer will spend at least $40.
2. Can you estimate the probability that the next 10 customers will spend an average of at least $40?
3. Is it likely that the next 50 customers will spend an average of at least $40?
4. Suppose the store had 312 customers today. Estimate the probability that the store’s revenues were at
least $10,000.
5. If in a typical day, the store serves 312 customers, how much does the store take in on the worst 10% of
such days?
Example 5: Although most of us buy mild by the quart or gallon, farmers measure daily production in pounds.
Ayrshire cows average 47 pounds of milk a day, with a standard deviation of 6 pounds. For Jersey cows, the mean
daily production is 43 pounds, with a standard deviation of 5 pounds. Assume that Normal models describe mild
production for these breeds.
1.
We select an Ayrshire at random. What’s the probability that she averages more than 50 pounds of milk a
day?
2. What’s the probability that a randomly selected Ayrshire gives more milk than a randomly selected Jersey?
3. A farmer has 20 Jerseys. What’s the probability that the average production for this small herd exceeds
45 pounds of milk a day?
4. A neighboring farmer has 10 Ayrshires. What’s the probability that his herd average is at least 5 pounds
higher than the average for the Jersey herd?
Example 6: A champion archer can generally hit the bull’s-eye 80% of the time. Suppose she shoots 200 arrows
during competition. What’s the probability she gets at least 85% bull’s-eyes?
Example 7: The ISA Babcock Company supplies poultry farmers with hens, advertising that a mature B300 Layer
produces eggs with a mean weight of 60.7 grams. Suppose that egg weights follow a Normal model with standard
deviation 3.1 grams. What’s the probability that a dozen randomly selected eggs average more than 62 grams?