Download Chapter 11 Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability box wikipedia , lookup

Transcript
Chapter 11
Sampling Distributions
Parameters and Statistics
Typically, a number is computed from the sample and used to make inference
about some unknown value that describes a characteristic of the population.
A numerical characteristic of a Population is called
- an unknown value but fixed
A numerical characteristic of a Sample is called
- a known value based on the sample but
Ex: A marketing research company wanted to estimate the % of Americans that
have unfavorable opinion about an airline. They conducted telephone interviews
with a randomly selected national sample of 1,009. They report that 74% of the
people in this sample have an unfavorable opinion about that airline.
Population:
Parameter:
Sample:
Statistic:
Ex: A carload of ball bearings has mean diameter 2.5003 cm. This is within the
specifications for acceptance of the shipment by the purchaser. An inspector
chooses a random sample of 100 bearings from this carload and finds their mean
to be 2.5009 cm. This is outside the specified limits so the shipment is mistakenly
rejected.
Indicate whether 2.5003 is a parameter or statistic. Do the same for 2.5009.
1
Statistical Estimation and the Law of Large Numbers
Unbiased Estimate
 One of the desirable properties of a statistic is _____________.
 A statistic used to estimate a parameter is _____________ if the mean of its
sampling distribution is equal to the true value of the parameter being
estimated.
 To get an idea of this property, let’s look at the following dartboard players:
A
B
Which player is unbiased?
Estimation of population mean
Many times we are interested in estimating the mean of some characteristic of a
population.
For example:
 mean financial aid that an SMU student receives
 mean amount of CO2 emitted by the Beetle cars
 mean number of people riding DART train per weekend
We represent the mean value of a population by the symbol μ. So μ is a
parameter / statistic.
Our goal is to estimate the value of μ. To do this we take a sample and use the
information obtained from it.
2
What statistic would be useful for estimating the parameter μ?
We will be using the
will learn some of the properties of
to draw inference about μ. In this section we
.
Since the value of this statistic varies from sample to sample it can be viewed as a
______________.
Then why is it considered a good estimator of μ? One of the reasons is … …
Law of Large Numbers
Draw observations at random from any population with finite mean . As the
number of observations being drawn increases, the sample mean of the observed
values,
Example: Consider a roulette wheel in a casino. It has 38 slots, 18 are black, 18
are red and 2 are green. When the wheel is spun, the ball is equally likely to come
to rest in any of the slots. A bet of $1 on red returns $2 if the ball lands in a red
slot. Otherwise the player loses his/her dollar.
 Suppose you bet on red. What is your probability of winning?
The mean amount of money one gets from each bet on red is
______________. And the mean amount one spends on each bet is $1. So
the mean amount of money one loses is ________________. This is the
population mean.
 According to the law of large numbers, the more you bet on red, the mean
amount you lose gets closer and closer to the above population mean.
So for example, if you bet 100,000 times, you would expect to lose
 Do you want to gamble!!??
 But keep in mind, this doesn’t hold for few plays, in which case your
winnings (or losses) are quite unpredictable!
 For casino it is very profitable because they play tens of thousands of time
and hence its profit is predictable by the law of large numbers, which is
3
Issue: Although the law of large numbers guarantees that with a very, very large
sample the sample mean x will be close to the population mean μ, in real life we
cannot always afford to take extremely large samples. What can we say about x
computed from a sample of size that is not very large, say of size 10?
For this, we ask the question “What would happen if we took many, many samples
of size 10 each from the same population”?
The sample mean, x will
So x is a _______________________and has its own distribution.
Recall the definition of distribution of a random variable: it gives the possible
values that the variable can take and how often it takes those values.
The sampling distribution of a statistic is the distribution of values taken by the
statistic in all possible samples of the same size from the same population.
To get an idea about this distribution
1. take a sample of size 10 say, from the population of interest and compute its
sample mean x .
2. repeat step 1 many, many times taking sample of same size.
3. So now there are many, many different values of sample mean. Make a
histogram of these values of x . This shows the distribution of x .
4
Now let us look at the sampling distributions of x based on sample sizes 10 and
1000 given in the previous page….
Q: In each case, where does the center (mean) of the distribution seem to be?
Q: Does x seem to be unbiased or biased?
Q: What happens to the variability (spread) of the sampling distribution of x when
the sample size increases from 10 to 1000?
It makes sense because
What is more desirable: smaller variability or larger variability?
A
C
Mean and Standard Deviation of a Sample Mean, x
Suppose that x is the mean of an SRS of size n drawn from any large population
that has mean  and standard deviation . The mean of the sampling distribution
of x is _____ and its standard deviation is ______.
5
Implications:
 Mean of sampling distribution of x equal to
 Standard deviation of sampling distribution of x equal to
Example: Response to brake light - Consider the time we take to react to the
brake lights on a decelerating vehicle. This time is a random variable and is critical
in helping in avoiding rear-end collisions. A study has shown that response time to
a brake signal from standard brake lights is normally distributed with mean 1.25
sec and standard deviation of 0.46 sec.
(a)
Suppose we take an SRS of 50 drivers and find the value of their mean
response time x . We keep on repeating this process lot of times each
time getting a value of x . Plotting histogram of all these values of x
gives us the sampling distribution of x . What are the mean and standard
deviation of this distribution of x ?
(b)
Which of the following vary more - response time of individual drivers or
mean response time of 50 drivers?
(c)
Which of the following will vary more - mean response time of 50 drivers
or mean response time of 100 drivers?
6
Sampling Distribution of a Sample Mean
 If the population we are sampling from is Normal(,) then the sample mean x
based on n independent observations has a Normal(
,
) distribution.
Ex: Response to brake light (cont’d)
(a)
What is the probability that a randomly chosen driver takes more than
1.45 sec to react to the brake lights?
(b)
What is the distribution of the mean response time of 50 randomly
chosen drivers?
(c)
What is the probability that a random sample of 50 drivers will have
mean response time, x of 1.45 sec or higher? Will this probability be
larger or smaller than the one calculated in (a)?
Fact: If the population we sample from is not normal, then the distribution of x is
not normal.
Problem: In that case, how do we compute various probabilities regarding x like
the one we did in previous example?
We are rescued by a very famous and elegant result of probability theory CENTRAL LIMIT THEOREM!!
7
The Central Limit Theorem (CLT) states:
Draw an SRS of size n from any population with mean  and standard deviation .
When n is large, the sampling distribution of the sample mean x is approximately
_____________.
Good news : We don’t need to learn any new distributions even if the population
we are dealing with is non-normal as long as we are interested in
Q: How large is large enough?
A: It depends on the shape of the distribution we are sampling from. If the
population distribution is close to normal, then
8
9
Rule of Thumb: CLT is usually applicable for
Example: Suppose the number of four-can packs of beer sold every day at a beer
store varies with mean 105.7 and standard deviation 52.5. The manager records
the number of such beer packs sold in 40 randomly chosen days and calculates
the mean number of beer packs sold each day.
(a)
Can the population distribution of the number of beer packs sold be
normal?
(b)
What is the approximate distribution of mean number of beer packs sold
( x )? Why?
(c)
The manager is interested in finding approximate probability that the
mean number of beer packs sold is higher than 120. Calculate it.
10