Download Lecture 15 - Rice Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Section 6.4

How Likely Are the Possible
Values of a Statistic?

The Sampling Distribution
Agresti/Franklin Statistics, 1e, 1 of 139
Statistic

Recall: A statistic is a numerical
summary of sample data, such as a
sample proportion or a sample mean.
Agresti/Franklin Statistics, 1e, 2 of 139
Parameter

Recall: A parameter is a numerical
summary of a population, such as a
population proportion or a population
mean.
Agresti/Franklin Statistics, 1e, 3 of 139
Statistics and Parameters



In practice, we seldom know the values
of parameters.
Parameters are estimated using
sample data.
We use statistics to estimate
parameters.
Agresti/Franklin Statistics, 1e, 4 of 139
Example: 2003 California Recall
Election

Prior to counting the votes, the
proportion in favor of recalling
Governor Gray Davis was an
unknown parameter.

An exit poll of 3160 voters reported
that the sample proportion in favor of
a recall was 0.54.
Agresti/Franklin Statistics, 1e, 5 of 139
Example: 2003 California Recall
Election

If a different random sample of about
3000 voters were selected, a different
sample proportion would occur.
Agresti/Franklin Statistics, 1e, 6 of 139
Example: 2003 California Recall
Election


Imagine all the distinct samples of
3000 voters you could possibly get.
Each such sample has a value for the
sample proportion.
Agresti/Franklin Statistics, 1e, 7 of 139
Statistics and Parameters


How do we know that a sample
statistic is a good estimate of a
population parameter?
To answer this, we need to look at a
probability distribution called the
sampling distribution.
Agresti/Franklin Statistics, 1e, 8 of 139
Sampling Distribution

The sampling distribution of a
statistic is the probability distribution
that specifies probabilities for the
possible values the statistic can take.
Agresti/Franklin Statistics, 1e, 9 of 139
The Sampling Distribution of the
Sample Proportion




Look at each possible sample.
Find the sample proportion for each
sample.
Construct the frequency distribution of
the sample proportion values.
This frequency distribution is the
sampling distribution of the sample
proportion.
Agresti/Franklin Statistics, 1e, 10 of 139
Example: Sampling Distribution

Which Brand of Pizza Do You Prefer?
• Two Choices: A or D.
• Assume that half of the population prefers
•
Brand A and half prefers Random D.
Take a random sample of n = 3 tasters.
Agresti/Franklin Statistics, 1e, 11 of 139
Example: Sampling Distribution
Sample
No. Prefer
Pizza A
Proportion
(A,A,A)
3
1
(A,A,D)
2
2/3
(A,D,A)
2
2/3
(D,A,A)
2
2/3
(A,D,D)
1
1/3
(D,A,D)
1
1/3
(D,D,A)
1
1/3
(D,D,D)
0
0
Agresti/Franklin Statistics, 1e, 12 of 139
Example: Sampling Distribution
Sample
Proportion
Probability
0
1/8
1/3
3/8
2/3
3/8
1
1/8
Agresti/Franklin Statistics, 1e, 13 of 139
Example: Sampling Distribution
Agresti/Franklin Statistics, 1e, 14 of 139
Mean and Standard Deviation of the
Sampling Distribution of a Proportion

For a binomial random variable with n trials and
probability p of success for each, the sampling
distribution of the proportion of successes has:
Mean  p and standard deviation 

p(1 - p)
n
To obtain these value, take the mean np and
standard deviation np (1  p ) for the binomial
distribution of the number of successes and divide
by n.
Agresti/Franklin Statistics, 1e, 15 of 139
Example: 2003 California Recall
Election

Sample: Exit poll of 3160 voters.

Suppose that exactly 50% of the
population of all voters voted in favor
of the recall.
Agresti/Franklin Statistics, 1e, 16 of 139
Example: 2003 California Recall
Election

Describe the mean and standard deviation of
the sampling distribution of the number in the
sample who voted in favor of the recall.
• µ = np = 3160(0.50) = 1580
• 
np(1 - p)  3160 (0.50)(0.50)  28.1
Agresti/Franklin Statistics, 1e, 17 of 139
Example: 2003 California Recall
Election

Describe the mean and standard deviation of the
sampling distribution of the proportion in the
sample who voted in favor of the recall.
BE VERY CAREFUL
Mean  p  0.50
Standard Deviation 
p(1  p)
(0.50)(0.50)

 0.000079  0.0089
n
3160
Agresti/Franklin Statistics, 1e, 18 of 139
The Standard Error

To distinguish the standard deviation
of a sampling distribution from the
standard deviation of an ordinary
probability distribution, we refer to it
as a standard error.
Agresti/Franklin Statistics, 1e, 19 of 139
Example: 2003 California Recall
Election


If the population proportion supporting
recall was 0.50, would it have been
unlikely to observe the exit-poll sample
proportion of 0.54?
Based on your answer, would you be
willing to predict that Davis would be
recalled from office?
Agresti/Franklin Statistics, 1e, 20 of 139
Example: 2003 California Recall
Election

Fact: The sampling distribution of the
sample proportion has a bell-shape with a
mean µ = 0.50 and a standard deviation
σ = 0.0089.
Agresti/Franklin Statistics, 1e, 21 of 139
Example: 2003 California Recall
Election

Convert the sample proportion value of
0.54 to a z-score:
(0.54 - 0.50)
z
 4.5
0.0089
Agresti/Franklin Statistics, 1e, 22 of 139
Example: 2003 California Recall
Election
Agresti/Franklin Statistics, 1e, 23 of 139
Example: 2003 California Recall
Election


The sample proportion of 0.54 is more
than four standard errors from the
expected value of 0.50.
The sample proportion of 0.54 voting
for recall would be very unlikely if the
population support were p = 0.50.
Agresti/Franklin Statistics, 1e, 24 of 139
Example: 2003 California Recall
Election



A sample proportion of 0.54 would be
even more unlikely if the population
support were less than 0.50.
We there have strong evidence that the
population support was larger than 0.50.
The exit poll gives strong evidence that
Governor Davis would be recalled.
Agresti/Franklin Statistics, 1e, 25 of 139
Example: 2003 California Recall
Election

Describe the mean and standard deviation of the
sampling distribution of the proportion in the
sample who voted in favor of the recall.
BE VERY CAREFUL
Mean  p  0.50
Standard Deviation 
p(1  p)
(0.50)(0.50)

 0.000079  0.0089
n
3160
Agresti/Franklin Statistics, 1e, 26 of 139
Summary of the Sampling
Distribution of a Proportion

For a random sample of size n from a population
with proportion p, the sampling distribution of the
sample proportion has
p(1 - p)
Mean  p and standard error 
n

If n is sufficiently large such that the expected
numbers of outcomes of the two types, np and n(1p), are both at least 15, then this sampling
distribution has a bell-shape.
Agresti/Franklin Statistics, 1e, 27 of 139
Section 6.5
How Close Are Sample Means to
Population Means?
Agresti/Franklin Statistics, 1e, 28 of 139
The Sampling Distribution of the
Sample Mean



The sample mean, x, is a random
variable.
The sample mean varies from sample
to sample.
By contrast, the population mean, µ,
is a single fixed number.
Agresti/Franklin Statistics, 1e, 29 of 139
Mean and Standard Error of the
Sampling Distribution of the Sample
Mean

For a random sample of size n from a population
having mean µ and standard deviation σ, the
sampling distribution of the sample mean has:
•
•
Center described by the mean µ (the same as the
mean of the population).
Spread described by the standard error, which
equals the population standard deviation divided by
the square root of the sample size: 
n
Agresti/Franklin Statistics, 1e, 30 of 139
Example: How Much Do Mean
Sales Vary From Week to Week?

Daily sales at a pizza restaurant vary
from day to day.

The sales figures fluctuate around a
mean µ = $900 with a standard
deviation σ = $300.
Agresti/Franklin Statistics, 1e, 31 of 139
Example: How Much Do Mean
Sales Vary From Week to Week?



The mean sales for the seven days in a
week are computed each week.
The weekly means are plotted over time.
These weekly means form a sampling
distribution.
Agresti/Franklin Statistics, 1e, 32 of 139
Example: How Much Do Mean
Sales Vary From Week to Week?

What are the center and spread of the
sampling distribution?
  $900
300

 113
7
Agresti/Franklin Statistics, 1e, 33 of 139
Sampling Distribution vs.
Population Distribution
Agresti/Franklin Statistics, 1e, 34 of 139
Standard Error

Knowing how to find a standard error
gives us a mechanism for
understanding how much variability
to expect in sample statistics “just by
chance.”
Agresti/Franklin Statistics, 1e, 35 of 139
Standard Error

The standard error of the sample mean:

n


As the sample size n increases, the denominator
increase, so the standard error decreases.
With larger samples, the sample mean is more
likely to fall close to the population mean.
Agresti/Franklin Statistics, 1e, 36 of 139
Central Limit Theorem

Question: How does the sampling
distribution of the sample mean relate
with respect to shape, center, and
spread to the probability distribution
from which the samples were taken?
Agresti/Franklin Statistics, 1e, 37 of 139
Central Limit Theorem


For random sampling with a large
sample size n, the sampling
distribution of the sample mean is
approximately a normal distribution.
This result applies no matter what the
shape of the probability distribution
from which the samples are taken.
Agresti/Franklin Statistics, 1e, 38 of 139
Central Limit Theorem:
How Large a Sample?

The sampling distribution of the sample
mean takes more of a bell shape as the
random sample size n increases. The more
skewed the population distribution, the
larger n must be before the shape of the
sampling distribution is close to normal. In
practice, the sampling distribution is
usually close to normal when the sample
size n is at least about 30.
Agresti/Franklin Statistics, 1e, 39 of 139
A Normal Population Distribution
and the Sampling Distribution

If the population distribution is
approximately normal, then the
sampling distribution is
approximately normal for all sample
sizes.
Agresti/Franklin Statistics, 1e, 40 of 139
How Does the Central Limit Theorem
Help Us Make Inferences


For large n, the sampling distribution
is approximately normal even if the
population distribution is not.
This enables us to make inferences
about population means regardless of
the shape of the population
distribution.
Agresti/Franklin Statistics, 1e, 41 of 139