Download File

Document related concepts

Sufficient statistic wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 9
Sampling distributions
Parameter vs Statistic
 In each of the following determine which bold face # is a
parameter or a statistic.
During the first quarter, Mr. Pines was planning on
giving out on average about 20 baseball cards per
student. After surveying 25 students at random there
were actually about 32 cards per student given out.
A Parameter is the number that goes with the population,the number that we
are expecting from prior studies or given information.
A Statistic is the number obtained from a sample.
Parameter vs Statistic
 In each of the following determine which bold face # is a
parameter or a statistic.
On a certain airline flight last week, 7% of the 195
passengers were selected for random security
screening. Typically this same flight randomly selects
11% of its passengers.
A Parameter is the number that goes with the population,the number that we
are expecting from prior studies or given information.
A Statistic is the number obtained from a sample.
Ch2 Vs Ch9
 The Chapter 2 Test scores had a mean of 73
and a standard deviation of 9
Ch2 Question: What is the probability that a randomly chosen test score
was greater than 88?
Ch9 Question: What is the probability that 5 randomly chosen test
scores have a mean greater than 88?
Do you see the difference?
Symbols
The German Tank Problem
The German Tank Problem
As a group you will need to come up with a “Statistic” for estimating the
number of tanks.
You will be given 5 sample tank #’s.
Using these numbers only you will need to estimate the total number of
tanks.
The German Tank Problem
Your main goal is to create a “formula”
Your formula should give you a number greater or equal to the max tank number in the
sample.
5 students will choose a tank number from the bag.
The closest group to the actual number of tanks will earn 5 cards each, 2nd will earn 3
cards, and 3rd will receive 1 card each.
Let’s say that 5 different tanks were captured, how well does your “statistic” do for
other samples. Use your statistic to calculate each below. Plot NEATLY on a
dotplot. Same rewards as before for the closest.
80
120
50
49
20
51
45
94
34
71
29
10
62
20
18
69
93
52
29
52
20
101
52
76
55
101
117
47
65
62
31
126
55
23
64
41
124
30
123
14
The German Tank Problem
There were 127 Tanks
The German Tank Problem
( Max -1) ( n +1)
n
This is the formula the British Mathematicians used.
The German Tank Problem
( Max -1) ( n +1)
n
127
After 10 samples of 9 tanks.
The German Tank Problem
( Max -1) ( n +1)
n
127
After 20 samples of 9 tanks.
The German Tank Problem
( Max -1) ( n +1)
n
127
After 30 samples of 9 tanks.
Vocabulary
 μ : this is the mean of a population, it is rarely
known.
 x : this is the mean of a sample— the goal
of x is to be an estimator of μ.
 σ –this is the populations’ standard deviation.
 s– this is the standard deviation of a
sample—it is meant to be an unbiased
estimator of σ.
Means


The mean x of the sample means should be
close to μ.
The standard deviation of the distribution of
sample means is s
n
Central Limit Theorem(CLT)
 CLT states that large samples become closer
to the Normal Distribution
 In this class, if a sample is 30+ we can
assume the distribution is Normal by the CLT
Let’s practice
 There are 200 boys in the freshman class
with a mean height of 64” and a standard
deviation of 3 inches. Assume that the
heights are normally distributed. What is the
probability that a randomly chosen boy will be
67 inches tall or more?
 Same scenario as above. What is the
probability that 10 randomly chosen boys will
average 67 inches tall or more?
 What is the mean and standard deviation of
20 kids? Of 50 kids?
Practice!!
 Which question from the previous example
would be most affected if the data is nonnormal? Why?
a) A sample of 10 freshman
b) A sample of 20 freshman
c) A sample of 50 freshman
9.2 Sample proportions
 Let’s sample for a proportion---roll 2 dice and tell me
the proportion of times you get a 7.


Chart the distribution of p̂
Is the mean of this distribution close to P. What is P in
this situation?
 Some facts about the distribution of p̂ .
 It’s distribution is close to normal in shape if you have
enough samples.
 It’s mean is close to P
 It’s standard deviation gets smaller as the sample size
gets larger.
9.2 sample proportions



The mean of the distribution of p̂ is P (the often
unknown parameter)
P(1- P)
The standard deviation is
n
There are two rules that you need to know involving
proportions. They are sometimes called the rule of
thumb 1 and the rule of thumb 2
1.
2.
N ≥ 10n…. The overall population needs to be 10
times the sample size---this ensures that the issue of
replacement is not violated so that the standard
deviation will work.
np≥10 and n(1-p)≥10 ---this ensures that there is a
big enough sample size for the distribution to be
normal.
Means vs Proportions
Practice 9.2
 Suppose a large candy machine has 15%
orange candies. Imagine taking an SRS of 25
candies from the machine and observing the
sample proportion p-hat of orange candies.
 (a) What is the mean and standard deviation
of the sampling distribution of p-hat?
 (b) Check the 10% condition and check to
see if the Normal condition is met.
 (c) If the sample size was 75 instead of 25,
how would this change the sampling
distribution of p-hat?
Practice 9.2 continued
(d) Taking an SRS of 75 candies, what is the
probability of obtaining a sample with more
than 21% orange candies?
Practice
 A polling organization asks an SRS of 1500
first year college students whether they
applied for admission to any other college. In
fact, 35% of all first-year students applied to
colleges besides the one they are attending.
What is the probability that the random
sample of 1500 students will give a result
within 2 percentage points of this true value?
Different Wording in this problem.
This is what a typical “AP Test” FRQ may look like.
The answer means that if the professor took many samples on different days he
would get a sample of 39.3% binge drinkers or lower about 7% of the time.(In
statistics 7% is not considered to low a percent)
What sample percent of binge drinking would have the same chance of
happening on the other end of the normal curve?
Similar idea as previous slide
 Rene has a mean score of 71 on his tests this
year and a StDev of 6.(we will assume his
scores follow a normal dist)
 Rene scores an 80% on his next test. Should
the teacher be surprised? Should the teacher
accuse him of cheating? Explain.
To speed things up lets use a calculator.
normalcdf(.80,1000,71,6) = .0668
This means that we could
expect Rene to score an 80
or higher on a test about 7%
of the time.
The teacher should not accuse him of cheating or be surprised.
p = .0668 is about 7% which is about how often you can expect Rene to score
80 or higher if his mean is in fact 71 with a stdev of 6.
MECHANICS
Just Kidding, I drew this one
MECHANICS
CLEAN IT UP!!!!
There were 200 total
poker chips in the bag
Poker Chip Activity
Can you make an
estimate of P? The true
% of RED poker chips in
the bag.
83 Students drew 20
poker chips randomly and
found the % of RED chips
in their sample
There were exactly 100
RED poker chips P = .50
Poker Chip Activity
Some samples were far
from P.
But if you take enough samples
you will start to see the
distribution center itself on P.
Poker Chip Activity
This is why in this chapter that
we use p-hat as an unbiased
estimator for P.
Some samples were far
from P.
This is called a sampling
distribution…which will have a
symmetric shape, a mean of phat, and stdev based on the
sample size n
#1 MC on HW Explained
#1 MC on HW Explained
Samples of size 2 taken from the population of previous slide
Sampling with Quarters
There are 405 quarters which has a mean year of
1994 and a standard deviation of 11.53.
The distribution has a strong skew to the left.
Sampling Distribution
 Using your calculator we will simulate taking
samples of 3 quarters.
 In order to get different results we need to
seed our calcs……your#STO>rand
 Press the button “MATH” then go to “PRB”,
go to 5:randInt(
 Type
 randInt(1,405,3)
Sampling Distribution
 Find the 3 quarters that the calculator gave to
to you.
 Find the mean of these 3 quarters
 Round to the nearest year.
 We will construct a dotplot.
Population of Quarters
Mean = 1994
StDev = 11.53
Dot Plot of n = 3
Mean = 1993.6
StDev = 6.43343
Sampling Distribution of 67 samples of size 3
Sampling Distribution
 Now a bigger sample should be more
accurate to the mean, reduce variability,
which should reduce StDev.
 Let’s sample for 9 quarters.
 Find the mean of your 9 quarters.
 Round to the nearest year.
 We will construct a dotplot.
Dot Plot of n = 9
Mean = 1994
StDev = 3.27263
Sampling Distribution of 67 samples of size 9
POPULATION
Mean = 1994
StDev = 11.53
Mean = 1993.6
StDev = 6.43343
Sampling Distribution of 67
samples of size 3
Mean = 1994
StDev = 3.27263
Sampling Distribution of
67 samples of size 9
Sampling Distribution
m = .975
1.15081
s=
2
s = .81375
The sampling distribution for size 2 would be approximately Normal. The shape
would be symmetric and centered around .975. The Standard Deviation would
decrease to about .81375
Vocabulary
 Parameter– The “parameter of interest” is a
description of the population at hand.



Men over 40
Rancho Boys in the freshman class.
People who shop at Wal-Mart.
 Statistic—a sample of the overall parameter
that attempts to describe the parameter.


Sample of 20 boys from the freshman class
randomly selected.
Sample people as they leave the checkout at
Wal-Mart.
Vocabulary Continued
 Bias—how far off from the parameter your
statistic is. (Could have high or low bias)
 Variability---if you took multiple samples how
dispersed are these samples.—see next
slides for how it may look.
Vocabulary continued
 P: P is a parameter, it is the true population
proportion.
 p̂ : this is the proportion of a statistic
involving proportions.
Vocabulary. Know these terms!
 Parameter
 Statistic
 Sampling Distribution
 Unbiased Estimator vs Biased Estimator
 Variability & Bias
Activity for 9.1
 Flip a coin 10 times and report your
proportion of heads. Chart the distribution of
P-hat in this situation. Is P-hat close to the
known P of .5?
 It is assumed that 20% of all Americans will
get cancer. Simulate 20 Americans on a
random digit table and decide how you will
simulate 20%. Then report what proportion of
your 20 American’s will get cancer.
The point for 9.1!!
 Here’s the point—if you sample well enough,
your mean of your sampling distribution will
match exactly the parameter of the population
(either P or μ).
 Take a sample of numbers like 2,4,6,8 and do
all possible combinations of size 2. Chart the
work. See if the resulting sampling
distribution has a mean of 5, which is the
mean of those 4 numbers. TRY IT!!
Relating sample proportions with
sample means and binomials
 What is the probability of rolling two die 500
times and getting more than 100 sevens?



Do this with a binomial calculation. Remember
to cover the requirements (assumptions)
Do this with a proportion calculation.
Remember to cover the assumptions or
requirements.
Do this with a means calculation. Remember
to cover the requirements.
Sketch the Normal Curve
Means Problem
StDev = .21457
Z > -1.23
Calculate p
MAKE IT NEAT!!!!
CH 9 Vocabulary
The sampling distribution of p-hat has a mean equal to the
population proportion p.
The sampling distribution of p-hat is considered close to Normal
provided that np ≥ 10 and n(1-p) ≥ 10.
Sample means and sample proportions are called unbiased
estimators for their corresponding population parameters
Smaller samples have more variability than larger samples
CH 9 Vocabulary
Sampling error is natural variation between samples, is always
present, it can be reduced but not eliminated.
Sampling error is smaller when the sample size is larger.
Provided that the population size is greater than the sample size, the
spread of a sampling distribution is about the same no matter the
population size
When n is large, the sampling distribution of x-bar is approx normal even
if the population is not normal