Download sampling distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Statistical Inference:
Statistic & Parameter
The government's Current Population Survey
contacted a sample of 113,146 households in March
2005. Their mean income was $60,528. Describe the
statistic and parameter of interest, µ and x bar.
Statistic & Parameter
The Gallup Poll asked a random sample of 515 US
adults whether they believe in ghosts. Of the
respondents, 160 said “Yes”. Identify the statistic and
parameter, p and phat.
For each boldface number, state whether it is a
statistic or a parameter.
1) A department store reports that 84% of all customers who use the store’s
credit plan pay their bills on time.
2) A sample of 100 students at a large university had a mean age of 24.1 years.
3) The Department of Motor vehicles reports that 22% of all vehicles registered in
a particular state are imports.
4) A hospital reports that based on the ten most recent cases, the mean length of
stay for surgical patients is 6.4 days.
5) A consumer group, after testing 100 batteries of a certain brand, reported an
average life of 63 hours of use.
QUESTION TIME!
1. Following a dramatic drop of 500 points in the Dow
Jones Industrial Average in September 1998, a poll
conducted for the Associated Press found that 92% of those
polled said that a year from now their family financial
situation will be as good as it is today or better. The number
92% is a:
(a) Statistic
(b) Sample
(c) Parameter
(d) Population
 Sampling Variability:
Do you have a summer birthday (June – August)?
Sample #1: Kevin, Lauren, Ernesto, Freddy
Sample #2: Erik, Stephanie, Bradley, Emma
Sample #3: Matthew, Neggin, Jason, Clay
Sampling Variability: If we were to take multiple
samples of size n from a population, our statistic (in this
case ____ would vary from sample to sample).
 Sampling Distributions
There are ________ possible samples of 4 students that I
could choose.
The difficult part….I don’t know which of these samples I
chose! Some are good representations of the
population…some are not so great.
Sampling Distribution
- Samples of 4 are a little small, so I ran a simulation taking 100 samples of size 50
from a population. I did a little research and found that (based on 2010 data) about
26% of births occur in June, July, and August.
- If is took EVERY POSSIBLE sample of size 50 from the population, and made a
histogram of the sample proportions, we would get a sampling distribution.
Here is an
approximate
sampling
distribution with
16,000 samples
with n=25.
It’s still not a
complete sampling
distribution, but
it’s closer!
If we were to take all possible samples of the same size from the population and
compute the sample proportion, 𝑝 of each sample and then create a distribution it would
be called a sampling distribution of 𝒑 .
Question Time!
The sampling distribution of a statistic is
(a) the probability that we obtain the statistic in repeated
random samples.
(b) the mechanism that determines whether
randomization was effective.
(c) the distribution of values taken by a statistic in all
possible samples of the same sample size from the
same population.
(d)the extent to which the sample results differ
systematically from the truth.
 The following properties generally describe a sampling distribution of created from samples
with a large size (usually n ≥30):
(1) If the sample size is large enough (or if we are told that the population is
normally distributed) The overall shape of the distribution is symmetric and
approximately normal. The larger the sample size the closer the shape is to a normal
distribution.
 A rule of thumb used to determine if a normal curve can be used to approximate
the sampling distribution of population proportions is if:
 a) np > 10 and
 b) n(1-p) > 10
(2) The mean (center) of the distribution is equal to the true population parameter, p.
(3) The variability (spread) of the sampling distribution depends on the sample size.
The larger the sample-size the smaller the variability of the sampling distribution.
(4) If the population is at least ten times larger than the sample size (N ≥ 10n).
The standard deviation of the sampling distribution is p(1  p)
n
(1) The overall shape of the distribution is symmetric and approximately normal. The larger
the sample size the closer the shape is to a normal distribution.
A rule of thumb used to determine if a normal curve can be used to approximate the
sampling distribution of population proportions is if:
a) np > 10 and
b) n(1-p) > 10
(2) The mean
(center) of the
distribution is
equal to the true
population
parameter, p.
20,000 Samples of n=50
mean_of_phat
S1 = mean
0.259504
 (3) The variability (spread) of the sampling distribution depends on the sample
size. The larger the sample-size the smaller the variability of the sampling
distribution.
n=50

p (1  p )
n
n=25
Was our sample proportion a good
estimation of the population
parameter?
The spread of the sampling distribution depends on
the sample size, not the size of the population!
An SRS of size 1500 from the entire population of the
United States (about 300 million) and an SRS of 1500
from San Francisco (~750,000) would be equally
precise/trustworthy!!!!
Properly chosen
statistics
computed from
random
samples of
sufficient size
will have low
bias and low
variability
http://statweb.calpoly.edu/chance/applets/Reeses/ReesesPieces.html
Question Time!
A simple random sample of 1000 Americans found that 61% were satisfied with
the service provided by the dealer from which they bought their car. A simple
random sample of 1000 Canadians found that 58% were satisfied with the service
provided by the dealer from which they bought their car. The sampling variability
associated with these statistics is
a) exactly the same
b) smaller for the sample of Canadians because the population of Canada is smaller than
that of the United States, hence the sample is a larger proportion of the population.
c) smaller for the sample of Canadians because the percent satisfied was smaller than that
for the Americans.
d) larger for the Canadians because Canadian citizens are more widely dispersed
throughout the country than in the United States, hence they have more variable views.
e) about the same.
Question Time!
If a statistic used to estimate a parameter is
such that the mean of its sampling distribution is
equal to the true value of the parameter being
estimated, the statistic is said to be
(a)random
(b)biased
(c) a proportion
(d)unbiased
So….
 Sampling distributions + Normal calculations will
allow us to quantify how confident we can be in our
sample statistics!
 We will be able to say that there is a ___% chance that
our sample proportion varies from the true
population proportions by more than ___% points.
Example:
An SRS of 1500 high school seniors in CA was asked whether they applied to college early. Let’s
assume that there are 100,000 high school seniors in the state of California, and that in fact 35% of
them apply to college early. What is the probability that your sample of 1500 seniors will give a result
within 2 percentage points of the true value of 35%?
We have an SRS with n = 1500 drawn from a population in which the proportion p = .35 apply to
college early.
The sampling distribution of 𝑝 has a mean μ𝑝 =
2) Find the standard deviation (don’t forget to check the “rule of thumb” for independence)
3) Normal? Check Rule of Thumb for Normality:
Example:
An SRS of 1500 high school seniors in CA was asked whether they applied to college early. Let’s
assume that there are 100,000 high school seniors in the state of California, and that in fact 35% of
them apply to college early. What is the probability that your sample of 1500 seniors will give a result
within 2 percentage points of the true value of 35%?
n = 1500
μ𝑝 =.35
𝜎𝑝 =.0123
4) Perform a Normal Calculation:
You practice…
 Survey undercoverage—One way of checking the undercoverage,
nonresponse, and other sources of error in a sample survey is to compare the
sample with known facts about the population. About 11% of Americans adults
are black. The proportion 𝑝 of black adults in an SRS of 1500 adults should
therefore be close to 0.11. It is unlikely to be exactly 0.11 because of sampling
variability.
 If a national sample contains only 9.2% black adults, should we suspect the
sampling procedure is somehow under representing black adults? We will find
the probability that a sample contains no more than 9.2% black adults when
the population is 11% black.
9.1/9.2 Exercises
 9.7 (a-d), 9.8, 9.11-9.13, 9.20, 9.21, 9.25 (hint: for e, use
the formula for standard deviation and your algebra
skills), 9.27