Download Chapter 7 Blank Notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Chapter 7 – Sampling Distributions & Central Limit Theorem
DAY DATE CLASSWORK
Section 7.1 - Population vs. Sample &
1
Parameter vs. Statistic + pg. 428 (#1 - 8)
HOMEWORK
Video #1: Sampling Distribution + SOCS
Video #2: Biased vs. Unbiased Estimator +
High/Low Bias/Variability
Video #3: Sampling Distributions for
Sample Proportions
2
Sampling Distribution Activity
3
Quiz 7.1 Worksheets (p. 8 – 9)
4
pg. 439-441 (#30, 34, 36, 38)
Nothing!!!
5
Quiz 7.2 Worksheets (p. 11 – 14)
6
pg. 454-455 (#49, 51, 54, 56)
Video #4: Sampling Distributions for
Sample Means (non-CLT)
Video #5: Sampling Distributions for
Sample Means (CLT)
7
pg. 455-456 (#59, 61, 63)
Finish problems from class if not completed
8
Quiz 7.3 Worksheets (p. 19 – 22)
5 problems on p. 23 – 24 in note packet
9
pg. 431 (#21 – 24) + pg. 441 (#43 – 46) +
pg. 456 (#65 – 68)
FRAPPY Day tomorrow!!!
10
FRAPPY DAY
Try out the Practice Worksheets (p. 25 – 27)
11
Chapter Review
Study for Test!!!
12
Chapter 7 Test
Watch & take notes over Video #1 of Chapter 8!!!
Chapter 7 kicks off second semester with a recurring theme that we will experience for the rest of the
school year. The ideas contained in this chapter will be re-used directly and indirectly, so it is VERY
IMPORTANT that you try your best to follow along and ask questions to ensure your complete
understanding of sampling distributions.
1
Chapter 7 Topics
Page Number
7.1 – Sampling Distributions [VIDEO #1]............................................................... 4
7.1 – Estimators + Bias & Variability [VIDEO #2] .................................................. 6
7.2 – Sampling Distributions for Sample Proportions [VIDEO #3] ....................... 10
7.3 – Sampling Distributions for Sample Means (w/o CLT) [VIDEO #4] .............. 15
7.3 – Sampling Distributions for Sample Means (w/ CLT) [VIDEO #5] ................ 16
2
7.1 – Population vs. Sample & Parameter vs. Statistic – [IN-CLASS]
Ex #1: You want to know the mean income of the subscribers to a particular magazine. You draw a random
sample of 100 subscribers and determine that their mean income is $27,500.

What is the population? _____________________________________________________________

What is the population parameter of interest? ___________________________________________

What is the sample? ________________________________________________________________

What is the sample statistic? _________________________________________________________
Ex #2: You want to know how many students at CHS consume alcohol. You survey a random sample of 200
CHS students and conclude that 65% do not consume alcohol.

What is the population? _____________________________________________________________

What is the population parameter of interest? ___________________________________________

What is the sample? ________________________________________________________________

What is the sample statistic? _________________________________________________________
3
7.1 – Sampling Distribution Introduction [VIDEO #1]

Inference: _____________________________________________________________________
When you can’t access the whole population, you should take a _______________ ( _____ ) from the population.
Since we don’t know much of anything about our population (its SOCS), then we need a distribution that we can rely
on…introducing the…_____________________ _____________________________.
Sampling Distribution: ____________________________________________________________________________
________________________________________________________________________________________________
________________________________________________________________________________________________
We do not need “many, many” samples thankfully. The MAIN thing we need is just _______ , _______ sample.
Is sampling distribution gives valuable information ( _________ ) that we can use to determine whether a claim about the
population is plausible or not plausible…
There may be only _______ population distribution, but there are ___________ different sampling distributions!!!
Let’s explore sampling distributions and some awesome properties!!!
“S”hape: ________________________________
“C”enter: ________________________________
“S”pread: ________________________________
******************************************************************************************************************************
“S”hape: ______________________________________
“C”enter: ______________________________________
“S”pread: ______________________________________
(two more sampling distributions on the next page!)
4
“S”hape: ______________________________________
“C”enter: ______________________________________
“S”pread: ______________________________________
******************************************************************************************************************************
“S”hape: ______________________________________
“C”enter: ______________________________________
“S”pread: ______________________________________
Summarize what you see with the population distribution vs. the various sampling distributions.
Shape:
A sampling distribution will act more and more like an __________________ _______________ distribution
as the sample size _____________, even when the population itself is not _________________ distributed.
Center:
Sampling distributions have essentially the _____________ mean as the population from which it is drawn.
Spread:
As you increase the sample size, the sampling distribution tends to be much less _____________________
or wild than the _______________________ from which it was drawn.
Outliers:
Since large sample sizes tend to average out ____________ observations, sampling distributions typically
do not have any outliers.
5
7.1 – Bias and Variability [VIDEO #2]
Can ANY sample statistic be used to estimate it’s population parameter? ______________________________
Some statistics produce too much ___________ or error to closely estimate the parameter of interest.
This type of statistic is called a __________________ _____________________________.
But if a statistic ______________ estimates a parameter with very little bias or error, then that statistic
is called an _____________________ ________________________________.
A statistic is ___________________ if the center of its sampling distribution is approximately the same as the
population parameter.
The center of this population distribution:
THE mean: _____________________
The center of this sampling distribution:
the mean of all the 𝑥̅ ’s: _____________________
Is the mean an unbiased estimator???
Conclusion…________
The center of this population distribution:
The center of this sampling distribution:
THE range: _____________________
the mean of all the sample ranges:
____________________________
Is the range an unbiased estimator???
Conclusion…________
6
There are four situations regarding BIAS and VARIABILITY!
1.
2.
High Bias
Low Variability
3.
Low Bias
High Variability
Match each histogram with
one of the descriptions above.

4.
High Bias
High Variability
Low Bias
Low Variability
Just because you are precise
(low variability) does NOT mean
you are accurate (low bias), too!
Answer the last problems of the video here…
7
Quiz 7.1 Worksheet #1
1. For each description below, identify each underlined number as a parameter or statistic. THEN, use
the appropriate symbol to describe each number, like 𝑝̂ = 96% or 𝑥̅ = 2.4 𝑜𝑧.
(a) A 1993 survey conducted by the Richmond Times-Dispatch one week before election day asked
voters which candidate for the state’s attorney general they would vote for. 37% of the respondents
said they would vote for the Democratic candidate. On election day, 41% actually voted for the
Democratic candidate.
37% is a _______________________.
________ = 37%
41% is a _______________________.
________ = 41%
(b) The National Center for Health Statistics reports that the mean systolic blood pressure for males 35
to 44 years of age is 128 and the standard deviation is 15. The medical director of a large company
looks at the medical records of 72 executives in this age group and finds that the mean systolic blood
pressure for these executives is 126.07.
128 is a ______________________.
__________ = 128
15 is a _______________________.
__________ = 15
126.07 is a ____________________.
_________ = 126.07
2. Suppose that in a certain community, 40% of the residents would answer “Yes” to the question “Do you
know the names of at least five other people who live on your block?” Suppose you plan to take a random
sample of 100 people from this community and calculate the proportion of people in your sample whose
response to this question is “Yes”.
(a) The proportion of residents in your sample of 100 people who would say “Yes” is the statistic.
Describe the parameter of interest in this situation.
(b) The statistic in this case is an unbiased estimator of the parameter. What does that mean?
(c) Suppose that in a much larger community, 40% of the residents would also answer “Yes”. If you took a
sample of 100 people from this much larger community, would the sampling distribution of the statistic
be different? In what way?
(d) If you took a sample of 50 people instead of 100 from the original community, would the sampling
distribution of the statistic change? In what way?
8
3. The Fathom screen shot below shows the results of
taking 500 SRSs of 10 temperature readings from a
population distribution that’s N(50, 3) and recording
the sample minimum each time.
(a) Is the dotplot to the right the true sampling
distribution of sample minimums? Explain.
(b) Describe the approximate sampling distribution.
(c) Suppose that the minimum of an actual sample is 40°F. What would you conclude about the
thermostat manufacturer’s claim? Explain.
4. During World War II, 12,000 able-bodied male undergraduates at the University of Illinois participated
in required physical training. Each student ran a timed mile. Their times followed the Normal distribution
with mean 7.11 minutes and standard deviation 0.74 minute. An SRS of 100 of these students has mean
time X = 7.15 minutes. A second SRS of size 100 has mean X = 6.97 minutes. After many SRSs, the
values of the sample mean 𝑋̅ follow the Normal distribution with mean 7.11 minutes and standard
deviation 0.074 minute.
(a) Describe the population distribution X of all 12,000 able-bodied male undergraduates at U of I.
(b) Describe the sampling distribution of 𝑋̅. How is it different from the population distribution?
9
7.2 – Sampling Distributions of Sample Proportions [VIDEO #3]
Characteristics of the Sampling Distribution of Sample Proportions
1. “S”hape – The sampling distribution for sample proportions ( p̂ ) will be ______________ _____________
if the following condition is met: ________________________ and ____________________________.

The larger the sample size, n, the closer the shape is in becoming approx. normal.
2. “C”enter – The mean of all possible sample proportions ( ____ ) is equal to the population proportion, ___.
______________ = ______
3. “S”pread – The standard deviation of all possible sample proportions ( _______ ) is
______________________________ IF the following condition is met!!!
Is the population at least _______________ as large as the sample???
This is referred to as the “independent condition” or the “10% condition”.
Ex: One way of calculating the effect of undercoverage, nonresponse, and other sources of error in a
sample survey is to compare the sample with known facts about the population. About 11% of Americans are
teens. The proportion p̂ of teens in a SRS of 1500 Americans should therefore be close to 11%. It is unlikely
to be exactly 11% because of sampling variability. If a national sample contains only 9.2% teens, should we
suspect that the sampling procedure is somehow under representing this group? Find the probability that a
sample contains no more than 9.2% teens when the population actually consists of 11% teenagers.
1. Calculate the mean and standard deviation of the proportion p̂ of the sample that are teens.
2. Calculate the probability that a sample contains no more than 9.2% teens when the population actually
consists of 11% teenagers.
3. Interpret your results
10
Quiz 7.2 Worksheet #1
11
12
Quiz 7.2 Worksheet #2
13
14
7.3 – Sampling Distributions of Sample Means [VIDEO #4]
Characteristics of the Sampling Distribution of Sample Means
1. “S”hape – The sampling distribution for sample means ( ____ ) will be ______________ _____________
for ANY sample size, n, IF the population distribution is also _________________ _______________.
What if the population is NOT approximately normal?!?!?  We will discuss that soon 
2. “C”enter – The mean of all possible sample means ( ____ ) is equal to the population mean, ____.
______________ = ______
3. “S”pread – The standard deviation of all possible sample means ( _______ ) uses the formula
______________________________ IF the following condition is met!!!
Is the population at least _______________ as large as the sample???
This is referred to as the “independent condition” or the “10% condition”.
Ex: A bottling company uses a filling machine to fill plastic bottles with soda. The bottles are supposed to
contain 300 mL. In fact, the contents vary according to a normal distribution with mean, μ = 298 mL, and
standard deviation, σ = 3 mL.
a) What is the probability that the mean contents of six randomly selected bottles is less than 295 mL?
b) Would the probability that the mean contents of ten randomly selected bottles being less than 295 mL
be less than or greater than your answer to part a)?
15
7.3 – The Central Limit Theorem (CLT) [VIDEO #5]
From Video #4…
Situation #1: When the Population Distribution is (Approximately) Normally Distributed

…then we can assume the sampling distribution of the sample means is also approx. normal.
……………………………………………………………………………………………………
Now for Video #5…
Situation #2: When the Population is Not Normally Distributed or not known altogether
 …then the sampling distribution CAN BE approximately normal IF we have a large ______________
_____________ ... all thanks to the __________________ _______________ _________________ !!!

How large a sample size n is needed for sampling distribution of sample means to be close to
approximately normal depends on the shape of the _____________________ _____________________.

More observations are required if the shape of the population distribution is far from ________________,
but we can safely call the sampling distribution approx. normal when we reach a sample size of _____’ish.

WARNING!!! The CLT is used with sampling distributions for sample ______________ ONLY!!!!
Ex: The number of lightning strikes on a square kilometer of open ground in a year has a mean of 6 and
standard deviation of 2.4. The National Lightning Detection Network (NLDN) uses automatic sensors to
watch for lightning in a random sample of 10 one-square-kilometer plots of land.
a) What are the mean and standard deviation of the sampling distribution of the sample mean
number of strikes per square kilometer?
b) Explain why you cannot safely calculate the probability that the mean number of lightning
̅ ) is less than 5 based on a sample size of 10.
strikes per square kilometer ( 𝒙
c) Suppose the NLDN takes a random sample of 50 square kilometers instead. Calculate the
̅ ) is less
probability that the mean number of lightning strikes per square kilometer ( 𝒙
than 5.
16
The Central Limit Theorem (CLT) Summarized!
If the distribution of the Population is Normal…
Population
Distribution
n
Shape
Center
1
Normal
μ = 64.5 in.
(Example: Heights of Women)
Spread
Picture
 = 2.5 in
57 59.5 62 64.5 67 69.5 72
Height of Women (Selected one at a time)
Sample
Distribution
n≥1
Also
Normal!
Also 64.5!
 x = 64.5 in.
Conclusions Shape: Stays Normal!!!
Much Less!
2.5
x 
n
If n = 100
 x  0.25
Center: Same!!!
57
59.5 62
64.5
67
69.5
72
Average Height of 100 Women
Spread: Smaller!!!
OR…
If the distribution of the Population is NOT Normal… (Ex: Rolling a Single Die)
Population
Distribution
n
Shape
Center
Spread
1
Uniform
μ = 3.5
 = 1.71
Picture
1
2
3
4
5
6
Outcome of a Single Die Roll
Sample
Distribution
n≥1
Becomes
Normal!
Also 3.5!
 x = 3.5
Much Less!
1.71
x 
n
Average of 10 Die Rolls
Conclusions Shape: Becoming Normal!!!
Center: Same!!!
Spread: Smaller!!!
17
CLT & SOCS
The central limit theorem tells us that a sampling distribution always has significantly less wildness or
variability, as measured by standard deviation, than the population it’s drawn from. Additionally, the sampling
distribution will look more and more like normal distribution as the sample size is increased, even when the
population itself is not normally distributed!
Thanks to the central limit theorem, we can be sure that a mean or x-bar based on a reasonably large randomly
chosen sample will be remarkably close to the true mean of the population. If we need more certainty we need
only increase the sample size.
As the Sample Size “n” Increases:
Shape: becomes more and more approx. normal’ish
Center: stays the same as the population!
Spread: becomes less and less variable or spread out!
18
Quiz 7.3 Worksheet #1
19
20
Quiz 7.3 Worksheet #2
21
22
Chapter 7: Sampling Distribution Practice Problems
1. Suppose that 35% of all business executives are willing to switch companies if offered a
higher salary. If a headhunter randomly contacts an SRS of 100 executives, what is the
probability that over 40% will be willing to switch companies if offered a higher salary?
2. The average outstanding bill for delinquent customer accounts for a national department
store chain is $187.50 with a standard deviation of $54.50. If a delinquent account were
randomly chosen, what is the probability that it has an outstanding bill of over $200?
23
3. The average outstanding bill for delinquent customer accounts for a national department
store chain is $187.50 with a standard deviation of $54.50. In an SRS of 50 delinquent
accounts, what is the probability that the mean outstanding bill is over $200?
4. The average number of daily emergency room admissions at a hospital is 85 with a
standard deviation of 37. In an SRS of 30 days, what is the probability that the mean
number of daily emergency admission is between 75 and 95?
5. Given that 58% of all gold dealers believe next year will be a good one to speculate in
South African gold coins, in an SRS of 150 dealers, what is the probability that between
55% and 60% believe that it will be a good year to speculate?
24
Practice Worksheet 7.1
.
25
Practice Worksheet 7.2
I flip a fair coin ten times and record the proportion of heads I obtain. I then repeat this process of
flipping the coin ten times and recording the proportion of heads obtained many, many times. When
done, I make a histogram of my results.
1. About where will the center of my histogram be? Use appropriate notation to describe this fact.
2. What is the standard deviation of the sampling distribution of the proportion pö of heads obtained?
3. Describe the shape of the sampling distribution of pö . Justify your answer.
The Harvard College Alcohol Study finds that 67% of college students support efforts to “crack down
on underage drinking.” The study took a sample of almost 15,000 students, so the population
proportion whom supports a crackdown is very close to p = 0.67. The administration of a large college
surveys an SRS of 100 students and finds that 62 support a crackdown on underage drinking.
4. What is the sample proportion who supports a crackdown on underage drinking?
5. If in fact the proportion of all students on your campus who support a crackdown is the same as
the national 67%, what is the probability that the proportion in an SRS of 100 students is as small or
smaller than the result of the administration’s sample? Be sure to check that any necessary rules of thumb
are met.
26
Practice Worksheet 7.3
The weights of newborn children in the United States vary according to the normal distribution with
mean 7.5 pounds and standard deviation 1.25 pounds. The government classifies a newborn as having
low birth weight if the weight is less than 5.5 pounds.
1. What is the probability that a baby chosen at random weighs less than 5.5 pounds at birth?
You choose three babies at random and compute their mean weight, x .
2. What are the mean and standard deviation of the mean weight x of the three babies?
3. What is the probability that their average birth weight is less than 5.5 pounds?
4. Would your answers to 1, 2, or 3 be affected if the distribution of birth weights in the population were
distinctly nonnormal?
27
Practice Worksheet 7.1 Answers:
1.
2. (a) No, it’s just an approximation of a sampling distribution generated by simulating 200 sample
means. The actual sampling distribution includes the means from ALL POSSIBLE SAMPLES
of size 12 from the population – many more than the 200 values here.
(b) Only 8 out of 200 (or 4%) of the sample means in our simulation are as far or farther above
150 pounds as our sample was. If the population mean is really 150 pounds, then our sample
is unusual, and we should be suspicious about the manufacturer’s claim.
Practice Worksheet 7.2 Answers:
1. The center of the sampling distribution will be  p̂ = 0.5
2. The standard deviation of the sampling distribution will be  p̂ =
p(1 p)
=
n
.5(.5)
= 0.1581
10
3. The shape will be symmetric because p = 0.5, but because n is small it may not be normally distributed.
4.
p =0.62
5. The population of all students at that college is most likely greater than 10 times the sample size (10 x 100
= 1000), so we can calculate the standard deviation.
np10 because (100)(0.67) = 67
n(1-p)  10 because (100)(0.33) = 33, so we can use the normal approximation
P( p  0.62) = Normalcdf(- ∞,0.62, 0.67,0.047) = 0.1438
Note:  p̂ =
(.67)(.33)
100
Practice Worksheet 7.3 Answers:
1. P(x < 5.5) = Normalcdf(-∞, 5.5, 7.5, 1.25) = 0.0548
2.  x = 7.5 and  x =

n
=
1.25
3
= 0.7217
3. P( x < 5.5) = Normalcdf(-∞, 5.5, 7.5, 0.7217) = 0.0028
4. The answers to numbers 1 and 3 would be affected if the population were distinctly non-normal because
the CLT only assures normality for large sample sizes. Since we are doing “normal” calculations, we need
a larger sample size to assure the sample distribution becomes normal.
28
Preparing for Your Chapter 7 Test

How to identify a parameter and a statistic from the context of the situation.

How to find the mean of a sampling distribution (as long you have a SRS the mean of the sampling
distribution should equal that of the population) - Know the proper notation.

How to calculate the standard deviation of a sample mean and sample proportion (know the formulas
and the proper notation)

The exact definition of the important terms in the chapter such as: Sampling Distribution of a Statistic,
Unbiased Estimator, Variability of a Statistic, etc.

That the size of the sample is what impacts the spread (sampling variability) of the distribution. The
population size does NOT affect spread (as long as the population is at least 10x the sample size).

How to use and apply the Rule of Thumb #1 to sampling distributions

How to use and apply the Rule of Thumb #2 to sampling distributions

How to calculate probabilities based on the normal approximations using either Table A or the
calculator commands (normalcdf) – Look over HW problems. Use proper notation.

How to describe a sampling distribution. Address the following: shape, center, and spread. For
example, “The distribution is normal with a mean of ____ and a standard deviation of ____”.

The law of large numbers ensures us that as the number of observations drawn increases, the mean (
x ) of the observed values eventually approaches the mean  of the population as closely as you
specified and stays that close.

The significance and use for the central limit theorem.
o
If the population is normally distributed, then the sampling distribution will also be normal
regardless of the sample size.
o
If the population is NOT normally distributed, then the sampling distribution becomes more and
more normal as the sample size increases. The larger the sample size, the more normally
distributed we can assume the data to be.

How to identify high/low bias and variability.
29