Download ch 9 notes - msmatthewsschs

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
9.1 – Sampling Distributions
Many investigations and research projects try to draw
conclusions about how the values of some variable x
are distributed in a population. Often, attention is
focused on a single characteristic of that distribution.
Examples include:
1. x = fat content (in grams) of a quarter-pound
hamburger, with interest centered on the mean fat
content μ of all such hamburgers
2. x = fuel efficiency (in miles per gallon) for a 2003
Honda Accord, with interest focused on the variability
in fuel efficiency as described by σ, the standard
deviation for the fuel efficiency population
distribution
3. x = time to first recurrence of skin cancer for a patient
treated using a particular therapy, with attention
focused on p, the proportion of such individuals whose
first recurrence is within 5 years of the treatment.
Parameter: A number that describes the population.
This number is typically unknown.
Statistic:
A number that describes the sample. We
use this number to estimate the
parameter.
Population
Sample
Parameter
Statistic
Mean

x
Standard
Deviation

x
Proportion
p
p̂
Standard
deviation of the
proportion
p
 p̂
Sampling Distribution:
The distribution of all values taken by the statistic
in all possible samples of the same size from the
same population
Ex: Take 100 samples of size n = 20.
Sampling Variability:
The variation between each groups of samples of the
same size.
If I compare many different samples and the statistic
is very similar in each one, then the sampling
variability is low. If I compare many different samples
and the statistic is very different in each one, then the
sampling variability is high.

Unbiased:
When the statistic is equal to the true value of the
parameter
Unbiased Estimator:
The unbiased statistic
Ex: x  20 and  = 20
How sampling works:
1. Take a large number of samples from the same
population.
2. Calculate the sample mean or sample proportion for
each sample
3. Make a histogram of the values of the statistics
4. Examine the distribution
Facts about Samples:
• If the population mean ( ) and the population standard
deviation ( ) are unknown, we can use x to estimate 
and use  x to estimate  . These estimates may or may
not be reliable.
• If I chose a different sample, it would still represent
the same population. A different sample almost always
produces different statistics.
• A statistic can be unbiased and still have high
variablility. To avoid this, increase the size of the
sample. Larger samples give smaller spread.
Example #1: Classify each underlined number as a
parameter or statistic. Give the appropriate notation
for each.
a. Forty-two percent of today’s 15-year-old girls will
get pregnant in their teens.
Parameter
p = 0.42
Example #1: Classify each underlined number as a parameter or statistic.
Give the appropriate notation for each.
b. The National Center for Health Statistics reports
that the mean systolic blood pressure for males 35 to
44 years of age is 128 and the standard deviation is
15. The medical director of a large company looks at
the medical records of 72 executives in this age group
and finds that the mean systolic blood pressure for
these executives is 126.07.
128 and 15 are parameters
 = 128
 = 15
126.07 is a statistic
x  126.07
Example #2: Suppose you have a population in which
60% of the people approve of gambling.
a. Is 60% a parameter or a statistic? Give appropriate
notation for this value.
Parameter,
p = 0.60
You want to take many samples of size 10 from this population to
observe how the sample proportion who approve of gambling
vary in repeated samples.
b. Describe the design of a simulation using the partial random
digits table below to estimate the sample proportion who approve
of gambling. Label how you will conduct the simulation. Then
carry out five trials of your simulation. What is the average of
the samples? How close is it to the 60%?
Assign: 0 – 5 approve of gambling
Stop:
6-9 don’t
After choosing 10
Count: # of people that approve of gambling
Repeaters: Ok to have repeat numbers, represent new
person
A D AA D
ADADA
3 6 0 0 9
3 9 6 3 8
1 9 3 6 5
8 5 4 5 3
1 5 4 1 2
4 6 8 1 6
3 8 4 4 8
2 4 6 9 7
4 8 7 8 9
3 9 3 6 4
1 8 3 3 8
4 2 0 0 6
8 2 7 3 9
4 7 5 1 1
5 7 8 9 0
8 1 6 7 6
2 0 8 0 7
5 5 3 0 0
6 0 9 4 0
2 4 9 4 3
7 2 0 2 4
6 1 7 9 0
1 7 8 6 8
9 0 6 5 6
6 8 4 1 7
7 2 7 6 5
3 5 0 1 3
8 5 0 8 9
1 5 5 2 9
5 7 0 6 7
1:
6/10 = 60%
2:
4/10 = 40%
3:
4/10 = 40%
4:
7/10 = 70%
5:
7/10 = 70%
 p̂ 
.6  .4  .4  .7  .7
5
 pˆ  0.56
c. The sampling distribution of p̂ is the distribution of
p̂ from all possible SRSs of size 10 from this
population. What would be the mean of this
distribution if this process was repeated 100 times?
p = 0.60
d. If you used samples of size 20 instead of size 10,
which sampling distribution would give you a better
estimate of the true proportion of people who
approve of gambling? Explain your answer.
20, larger the sample size means less variability
e. Make a histogram of the sample distribution.
Describe the graph.
C: 60%
U: none
S: Approx.
symmetrical
S: Range = 10-1
=9
9.2 – Sample Proportions
Using proportions:
count of "successes in sample X
pˆ 

size of sample
n
Remember Ch8?
 x  np
 x  np(1  p)
Use these when you know “p”
What if you only know the proportion of a sample?
Sampling Distribution of a Sample Proportion:
 pˆ  pˆ  p
 pˆ 
p(1  p)
n
Rule of Thumb #1:
You can only use  p̂ if the population is 10X the
sample size . A census should be impractical!
 pˆ 
p(1  p)
n
when
N  10n
Rule of Thumb #2:
Only use the Normal approximation of the sampling
distribution of p̂ when:
np  10
and
n(1  p)  10
Conclusion:
If p is the population proportion then,

N np, np(1  p)

If p̂ is the sample proportion then,

N  p,

ONLY if
np  10
p (1  p ) 

n

and
n(1  p)  10
So, to calculate a Z-score for this!
statistic  parameter
Standardized test statistic:
standard deviation of statistic
Z
pˆ  p
p(1  p)
n
Or
 pˆ  p
Z
 pˆ
Example #1
Suppose you are going to roll a fair six-sided die 60 times and
record , the proportion of times that a 1 or a 2 is showing.
a. Where should the distribution of the 60 -values be
centered?
2 1
p 
6 3
b. What is the standard deviation of the sampling
distribution of p̂ , the proportion of all rolls of the die
that show a 1 or a 2 out of the 60 rolls ?
Rule of Thumb #1: Population is 10X sample size
 pˆ 
p(1  p)

n
.33(1  .33)
 0.60858
60
c. Describe the shape of the sampling distribution of p̂
Justify your answer.
Rule of Thumb #2:
np  10
and
n(1  p)  10
1
60    10
3
 1
60 1    10
 3
20  10
40  10
Approximately Normal.
N  0.5,0.60858
Example #2
According to government data, 22% of American children
under the age of 6 live in households with incomes less than the
official poverty level. A study of learning in early childhood
chooses an SRS of 300 children. What is the probability that
more than 20% of the sample are from poverty households?
Rule of Thumb #1: N  10n
N  10(300)
N  3000
Population is 10X sample size, ok to use standard
deviation
Example #2
According to government data, 22% of American children
under the age of 6 live in households with incomes less than the
official poverty level. A study of learning in early childhood
chooses an SRS of 300 children. What is the probability that
more than 20% of the sample are from poverty households?
p  0.22
Example #2
According to government data, 22% of American children
under the age of 6 live in households with incomes less than the
official poverty level. A study of learning in early childhood
chooses an SRS of 300 children. What is the probability that
more than 20% of the sample are from poverty households?
Rule of Thumb #2:
np  10
300  0.22  10
66  10
n(1  p)  10
300 1  0.22  10
and
234  10
Approximately Normal.

N  p,

p (1  p ) 

n

p  0.22
 pˆ 
p(1  p)

n
.22(1  .22)
 0.0239
300
N  0.22,0.0239
Example #2
According to government data, 22% of American children
under the age of 6 live in households with incomes less than the
official poverty level. A study of learning in early childhood
chooses an SRS of 300 children. What is the probability that
more than 20% of the sample are from poverty households?
Z
0.20  0.22
pˆ  p
 0.8362

0.0239
p(1  p)
n
 = 0.0239
0.20
0.22
P(Z  – 0.8362) = 1 – P(Z  – 0.8362)
 = 0.0239
0.20
0.22
P(Z  – 0.8362) = 1 – P(Z  – 0.8362)
= 1 – 0.2005
= 0.7995
Or: normalcdf(0.20, 1000000, 0.22, 0.0239) = 0.7985
b. How large a sample would be needed to guarantee
that the standard deviation of is no more than 0.01?
Explain.
 pˆ 
p(1  p)
n
.22(1  .22)
0.01 
n
0.1716
0.0001 
n
0.0001n  0.1716
n  1716
9.3 – Sample Means
Sample Means Distribution:
x  
x 

n
How do you determine normality?
• If sample distribution is drawn from a Normal
population, sample distribution is Normal, no
matter how big n is
• If sample distribution is drawn from a Skewed
population, sample distribution is Skewed, if n is
small.
Central Limit Theorem: (CLT)
• No matter what the population distribution looks
like, if n  30, then the sample distribution is
approximately normal.
To calculate z-scores:
statistic  parameter
Standardized test statistic:
standard deviation of statistic
Z
x 

n
Or
x  
Z
x
Example #1
A soft-drink bottler claims that, on average, cans contain 12 oz of
soda. Let x denote the actual volume of soda in a randomly
selected can. Suppose that x is normally distributed with  = 0.16
oz. Sixteen cans are to be selected, and the soda volume will be
determined for each one.
a. Describe the shape of the sample distribution
Because the population is approx normal, so
is the sample distribution
Example #1
A soft-drink bottler claims that, on average, cans contain 12 oz of
soda. Let x denote the actual volume of soda in a randomly
selected can. Suppose that x is normally distributed with  = 0.16
oz. Sixteen cans are to be selected, and the soda volume will be
determined for each one.
b. Calculate the sample mean and standard deviation
x   
12

0.16
x 
 0.04

n
16
Example #1
A soft-drink bottler claims that, on average, cans contain 12 oz of
soda. Let x denote the actual volume of soda in a randomly
selected can. Suppose that x is normally distributed with  = 0.16
oz. Sixteen cans are to be selected, and the soda volume will be
determined for each one.
c. Determine the probability the sample mean soda volume is
between 11.9 oz and 12.1 oz of the company’s claim.
Z
x 

n
Z
x 

n
12.1  12

 2.5
0.04
 = 0.04
11.9  12

 2.5
0.04
11.9 12 12.1
P( -2.5 < Z < 2.5) = P(Z < 2.5) – P(Z< -2.5)
= P(Z < 2.5) – P(Z< -2.5)
P( -2.5 < Z < 2.5) = P(Z < 2.5) – P(Z< -2.5)
= 0.9938 – 0.0062
= 0.9876
Or: normalcdf(11.9, 12.2, 12, 0.04) = 0.9876
Example #2
The weights of newborn children in the United States vary
according to the normal distribution with mean 7.5 pounds and
standard deviation 1.25 pounds. The government classifies a
newborn as having low birth weight if the weight is less than 5.5
pounds.
a. What is the probability that a baby chosen at
random weighs less than 5.5 pounds at birth?
x    5.5  7.5
 1.6
Z
1.25

 = 1.25
P(Z < -1.6) =
5.5 =7.5
P(Z < -1.6) = 0.0548
Or: normalcdf(-1000000, 5.5, 7.5, 1.25) = 0.0548
Example #2
The weights of newborn children in the United States vary
according to the normal distribution with mean 7.5 pounds and
standard deviation 1.25 pounds. The government classifies a
newborn as having low birth weight if the weight is less than 5.5
pounds.
b. You choose forty babies at random and compute their mean
weight. What are the mean and standard deviation of the
mean weight of the three babies?
Distribution approx normal because population is,
also n  30
x   
7.5

1.25
x 
 0.1976

n
40
Example #2
The weights of newborn children in the United States vary
according to the normal distribution with mean 7.5 pounds and
standard deviation 1.25 pounds. The government classifies a
newborn as having low birth weight if the weight is less than 5.5
pounds.
c. What is the probability that the forty babies average birth
weight is less than 5.5 pounds?
 x    5.5  7.5
 10.12
Z

0.1976
 = 0.1976
n
P(Z < -10.12) =
5.5 =7.5
P(Z < -10.12) = 0
Or: normalcdf(-1000000, 5.5, 7.5, 0.1976) = 0
Example #2
The weights of newborn children in the United States vary
according to the normal distribution with mean 7.5 pounds and
standard deviation 1.25 pounds. The government classifies a
newborn as having low birth weight if the weight is less than 5.5
pounds.
d. Would your answers to a, b, or c be affected if the
distribution of birth weights in the population were distinctly
nonnormal?
Yes, you couldn’t use the normal approximation
for part a.
Part b and c are fine because n  30, and by the
CLT, the distribution is approximately normal