Download Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Sampling Distributions
How likely are the possible values
of a statistic?
The BIG Question
Did you prepare for today?
If so, mark yes and estimate the time you
spent preparing on your frequency log.
Part 1: the Sampling Distribution
of the Sample Mean
Briefly: What have we covered?
1. We use statistical analysis to make inferences about a population.
2. Sample statistics can be used to make such inferences.
3. We also learned that probability distributions can be used to
construct models of a population
Question
Who recalls what a sample statistic is?
In practice, sample statistics are numerical summaries of
sample data such as mean, variance, standard deviation,
and binomial proportion which are used to estimate
population parameters.
What was the definition of a population parameter?
It is a numerical summary of a population
which is almost always unknown.
Where are we headed?
Briefly:
1. We want to develop the notion that a sample statistic is a
random variable with a probability distribution.
2. Define a sampling distribution for a sample statistic.
3. Link the sampling distribution of the sample statistic to the
normal probability distribution.
I remember that:
Question
Before we proceed,
does anyone know what a sampling distribution is or the definition?
The concept of a sampling distribution is a little difficult for
some students to understand.
Basically, we have a population in which we could draw many
different samples from the population.
Sample 1
Sample 2
population
Sample 3
Sample 4
Conjecture
What is the result of being able to choose
different samples in which to get a sample
statistic?
The sample statistic itself is a
random variable.
Thus, the sampling distribution of a sample statistic calculated
from a sample of n measurements is the probability distribution
of the statistic, that is, it is the probability distribution that
specifies probabilities for the possible values the statistic can
take.
Moreover, sampling distributions
describe the variability that occurs from
study to study using statistics to
estimate population parameters.
Sampling Distribution of the
Sample Mean, x
is the probability distribution of all possible values of the
random variable computed from a sample of size n
from a population with mean  and standard deviation .
IMPORTANT: Even though we depend on sampling
distribution models, we never actually get to see them.
We never actually take repeated samples from the same
population and make a histogram. We only imagine or
simulate them.
Are you confused YET?
Can we look at
a simulation
Wilber?
You will find screen
shots for the
simulation on the next
few slides.
Screen shots 1
Sample of size
5 drawn from
population.
Sample of size
5 drawn from
population.
Here are the five drawn
from the population
Here is their mean.
One trial of
drawing a
sample of size
5.
Here are the means of the ten
trials done.
Ten trials of
drawing a
sample of size
5.
Screen shots 2
Sample of size 30
drawn from
population.
One trial of
drawing a
sample of size
30.
Notice how the means are more
clustered for the trials that contained
30 subjects in each trial verses the
ten trials in which the sample size
was 5.
Sample of size
30 drawn from
population.
Ten trials of
drawing a
sample of size
30.
Screen shots 3
Sample of size
5 drawn from
population.
10000 trials
of drawing a
sample of size
5.
Notice that the sampling distribution is
more squashed in for the sample sizes of
30 verses 5
Sample of size
30 drawn from
population.
10000 trials of
drawing a
sample of size
30.
Thoughts
What can you conclude when we take larger sample sizes?
As we take larger sample sizes, the larger values are offset by
smaller values giving us less spread in the sample means.
In fact, the larger the sample size n, the more approximately
normal the shape of the sample mean becomes.
Why is it important for us to have a normal distribution?
To be able to use previous results we have studied such as z-scores
and the standard normal distribution.
Deviation in the Sampling
Distribution
Does anyone know what the standard deviation is called for a sampling
distribution?
The sampling distribution of has a standard deviation called the
standard error in this case, the standard error of the sample mean,
which gives us a mechanism to understand how much variability to
expect in sample statistics that occur by chance.
The standard error of the sample mean is given by:
x 

n
Where
is the population standard
deviation and n is the sample size.
This holds for any size sample.
Now do you understand why the size of n matters?
As the size of n increases, so does the denominator which makes the
standard error decrease! Moreover, the sample mean is more likely to
fall closer to the population mean with a larger n.
Mean and Shape of the Sampling
Distribution
What about the sampling distribution mean?
The sampling distribution of the sample mean will have mean:
= µ Where µ is the population mean
What about a population that is not normally distributed, how
will that affect the sampling distribution of the mean ?
This is when the Central Limit Theorem comes in.
Central Limit Theorem
The Central Limit Theorem says that for a random sampling with a large size n,
the sampling distribution of the sample mean is approximately normal. This
result holds no matter what the shape of the distribution the samples were taken
from. HOWEVER:
The sampling distribution of the sample mean
becomes more bell-shaped as the random
sample size n increases. [Recall the example
from earlier when n was 5 then 30.]
The more skewed the population
distribution, the larger the n must be for the
shape of the sampling distribution is close to
normal.
Usually, the shape of the sampling
distribution is usually close to normal when
the sample size is at least 30.
Pause and Think
Why is it important for us to be able to have a normal
distribution for the sampling distribution when the population is
not normally distributed?
This enables us to make inferences about population means
regardless of the shape of the population distribution.
Let’s revisit the applet:
Does the distribution to
the right match any from
the previous table?
Example 1
Suppose existing houses for sale average 2200
square feet in size, with a standard deviation of
250 square feet. What is the probability that a
randomly selected house will have at least 2300
square feet?
Strategy:
Connect: Do you recall anything we have done that can help
you set up this problem?
We have used the standard normal distribution to find the
probability that a given value is a specific amount. So we must
standardize the value of 2300 square feet.
Calculate
Here we have the value of x being greater than or equal to 2300 square
feet. So we need to standardize this in order to use the standard normal
distribution. We know the population mean and standard deviation, so
we can find the z-value for x = 2300 square feet as we have done
previously:
Which here is:
z
2300  2200
 .40
250
2
HOWEVER, this question is asking us to find the probability that x ≥ 2300 ft or:
Px  2300
Question
2
What is the relevance of finding the z-value for the given value of 2300 ft ?
Recall that the z-values tell us how many standard deviations away a
value is from the mean.
2
Here we are questioning the probability that 2300 ft is the lower
bound for the size of a house randomly selected from a population
2
2
whose mean size is 2200 ft with a standard deviation of 250 ft .
Thus we need to find:
2300  2200 

Px  2300  P z 
  Pz  0.40
250


Visualize
What are we trying to calculate?
Recall that this area is
.5. So to find the area
you desire we must
subtract the area for the
z-value from table 4
from .5.
This is the area I want
to find. This is the
probability that z ≥. 40
By table 4 this area
for z = .40 is .1554
Calculate and Summarize
Calculate
Thus, by using table 4, we have that the
Pz  .40  .5  .1554  .3446
Summarize:
If a house is chosen at random from a group in which the
average square footage is 2200 square feet with a standard
deviation of 250 square feet, the probability that the house is
greater than 2300 square feet is .3446 or 34.46%.
Key
Think of the as an x value like we have dealt with
previously. Then, as the sample size increases, by the
Central Limit Theorem the sampling distribution that
comes from becomes approximately normal. Thus we can
use the z-value and normal distribution values (table 4) to
find the probability that does….
Example 2
What is the probability that a
randomly selected sample of
16 houses will average at least
2300 square feet?
Strategy:
Connect: How do we connect this
problem to the previous problem?
This is a similar problem BUT in this case we are asking the probability for a
randomly selected sample of houses not just one house.
Question: What do we know that can help to solve this problem?
We know how to find the z-value of a given x, but here we are asked about the mean
of one randomly selected sample of 16 houses that were chosen from the population of
houses. Thus is a value that will fall within the sampling distribution of the sample
mean. Thus by the Central Limit Theorem, I can find the z-value for .
Caution
What is the one difference between calculating the z-value for and the
z-value for a specific x?
The difference is that instead of dividing by the population standard deviation,
you have to divide by the standard error of which is the standard deviation
divided by the square root of n. That is:
=
Think: What are we trying to find?
:
Calculate and Summarize
Thus for x = 2300 with µ = 2200 and
=
=
:
Again by table 4 we must subtract the area of .4452 associated with the z-value of 1.60,
to get:
Summarize: The probability that a randomly selected sample of 16 houses will average
a size greater than 2300 square feet given that the population of houses average 2200
square feet with a standard deviation of 250 square feet is .0548 or 5.48%.
You Try
Water taxies have a safe capacity of 3500
lbs. Given that the population of men has
normally distributed weights with a mean
of 172 lb and a standard deviation of 29
lb,
a) If one man is randomly selected, find the probability that his weight is greater
than175 lb.
Solution:
Connect: How do the previous examples connect to this example?
Question: What do I know that will help me?
Think: Visualize what area or probability I am trying to find.
Calculate and Summarize
The appropriate z-value, divide by the right quantity, i.e.
Find the area for this z
from table 4 which is
.0398
Summarize: For a man chosen at random from the population of men with mean
weight 172 lbs. and a standard deviation of 29lbs, the probability that the
randomly chosen man weighs more than 172 lbs. is .4602 or 46.02%.
Second Part
b) If 20 different men are randomly selected, find the probability that their
mean weight is greater than 175 lb (so that their total weight exceeds the
safe capacity for the water taxi of 3500 pounds).
Strategy:
Connect: How do the previous examples connect to this example?
Question: What do I know that will help me?
Think: Visualize what area or probability I am trying to find.
Calculate and Summarize
The appropriate z-value, make sure you divide by the right quantity, i.e.
Area for z = .46
from table 4
Summarize: Given that the safe capacity of the water taxi is 3500 pounds, there is a
fairly good chance (with probability 0.3228) that it will be overloaded with 20
randomly selected men. Also notice that it is much easier for an individual to deviate
from the mean than it is for a group of 20 to deviate from the mean.
You Try
IQ scores are normally distributed with a mean of 100 and a standard
deviation of 15. What is the probability a random sample of 20 people
have a mean IQ score greater than 110?
Mozart and Einstein were hypothesized to have IQs of about 4
standard deviations above the mean of 100.
Strategy
• Connect: How do the previous examples
connect to this example?
• Question: What do I know that will help
me?
• Think: Visualize what area or probability I
am trying to find.
• Calculate: The appropriate z-value,
divide by the right quantity, i.e.
.
• Summarize:
• Answer: .0014
.
Part 2: Sampling Distribution for the
Sample Proportion
What is the sampling distribution for the sample proportion?
Like previously for the sampling distribution for the sample mean, it
is a probability distribution of the sample proportion. The sample
proportion is found by measuring if an individual either has or does
not have a specific characteristic, this is a binomial variable.
How is the sample proportion found?
We find a variable “p-hat”
which is the proportion of the individuals
in the sample with a specific characteristic we are interested in, x, divided
by the number of individuals in the sample, n,
. The sample
proportion estimates the population proportion p.
Simulation
AGAIN: This statistic will vary depending upon the sample taken from the
population. Thus, this statistic is a binomial random variable as well. Each
sample will vary with the number of individuals having the characteristic.
Using the simulation we would have:
This proportion is set so that 50% of the population has
the interested characteristic so 50% does not.
We selected 5 individuals
randomly at a time
I ran 1 trial.
I ran another trial of 5 randomly selected
individuals and only 1 had the characteristic.
You can see the sampling distribution on the
bottom now has 2 entries.
In this random sample 2
individuals have the
characteristic.
Screen shots 2
Notice what is happening as we take larger sample sizes and more trials.
Screen Shots 3
I set the probability of
the population to 70 %
has the characteristic
in this case.
IN ALL CASES no matter what the
is as long as the sample size is large and enough
trials are done, the sampling distribution of the sample proportion becomes approximately
normal!!!
Summary:
As the size of the sample, n, increases, the shape of the sampling
distribution of the sample proportion becomes approximately
.
normal
The mean of the sampling distribution of the sample proportion equals
the population proportion, p. That is, the mean of the sample proportions
is the population proportion. The expected value of the sample
proportion is equal to the population proportion.
The standard deviation (standard error) of the sampling distribution of the
sample proportion decreases as the sample size, n, increases.
Why is it important to be normal?!
So we can use the z-values and normal distribution values (table 4).
Standard Error and Mean
For the standard
error
Sampling distribution of
the sample proportion will
be approximately normal if
np(1 - p) ≥ 10.
For the mean
Example
In a 2008 study :
• 85% of college students with cell phones use text messaging.
• 1136 college students surveyed; 84% reported that they text on their cell phone.
• Assume the value 0.85 given in the study is the proportion p of college students
that text message; that is 0.85 is the population proportion p
• Compute the probability that in a sample of 1136 students, 84% or less, use text
messaging.
Solution
.84  .85

P p  .84  P( z 
)  Pz  .94  .1736
.0106
By table 4, z = -.94 has area .3264. Thus .5 - .3264 = .1736
Hence there is a 17.36% probability that 84% or less of college
students use text message.
Summary of Sampling Distributions
• This is the probability distribution of a sample statistic.
• With random sampling, the sampling distribution provides probabilities for all the
possible values of the statistic.
• The sampling distribution provides the key for telling us how close a sample statistic
falls to the corresponding unknown parameter.
• Its standard deviation is called the standard error.
• For random sampling with a large sample size n, by The Central Limit Theorem the
sampling distribution of the sample mean is approximately a normal distribution.
• This result applies no matter what the shape of the probability distribution from
which the samples are taken.
• In practice, the sampling distribution is usually close to normal when the
size n is at least about 30, and for sample proportions np(1-p)≥10.
• If the population distribution is approximately normal, then the sampling
distribution is approximately normal for all sample sizes.
sample
Thank-you for
your attention
and participation!