Download Chapter 18 Notes - peacock

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 18: Sampling Distribution
Models
Key Vocabulary:




parameter
statistic
proportion

sampling distribution
model
Central Limit Theorem

Standard error
Suppose I randomly select 100 seniors in Howard County and record each one’s GPA.
1.95
1.98
1.86
2.04
2.75
2.72
2.06
3.36
2.09
2.06
2.33
2.56
2.17
1.67
2.75
3.95
2.23
4.53
1.31
3.79
1.29
3.00
1.89
2.36
2.76
3.29
1.51
1.09
2.75
2.68
2.28
3.13
2.62
2.85
2.41
3.16
3.39
3.18
4.05
3.26
1.95
3.23
2.53
3.70
2.90
2.79
3.08
2.79
3.26
2.29
2.59
1.36
2.38
2.03
3.31
2.05
1.58
3.12
3.33
2.04
2.81
3.94
0.82
3.14
2.63
1.51
2.24
2.22
1.85
1.96
2.05
2.62
3.27
1.94
2.01
1.68
2.01
3.15
3.44
4.00
2.33
3.01
3.15
2.25
3.34
2.22
3.29
3.90
2.96
2.61
3.01
2.86
1.70
1.55
1.63
2.37
2.84
1.67
2.92
3.29
These 100 seniors make up one possible_______________. All seniors in Howard County make up the
_________________________.
The sample mean (_____) is 2.5470 and the sample standard deviation (_____) is 0.7150. The
population mean (_____)and the population standard deviation (_____) are unknown.
We can use _____ to estimate _____ and we can use _____ to estimate _____. These estimates may or
may not be reliable.
Chapter 18: Sampling Distribution Models
A number that describes the population is called a _______________. Hence,  and
 are both
_______________. A parameter is usually represented by _____.
A number that is computed from a sample is called a _______________. Therefore, x and s x are both
_______________. A statistic is usually represented by _____.
If I had chosen a different 100 seniors, then I would have a different sample, but it would still represent
the same population. A different sample almost always produces different statistics.
Practice:
State bias and variability
(a)__________________ (b)_________________ (c)_________________
(d)_________________
Example: Let _____ represent the proportion of seniors in a sample of 100 seniors whose GPA is 2.0 or
higher.
pˆ1  .78
pˆ 2  .72
p3  .81
pˆ 4  .70
pˆ 5  .68
pˆ 6  .75
pˆ 7  .79
pˆ 8  .72
pˆ 9  .83
pˆ10  .76
If I compare many different samples and the statistic is very similar in each one, then the
______________________________is low. If I compare many different samples and the statistic is very
different in each one, then the ______________________________is high.
The ______________________________of a statistic is a model of the values of the statistic from all
possible samples of the same size from the same population.
Chapter 18: Sampling Distribution Models
Example: Suppose the sampling model consists of the samples pˆ1 , pˆ 2 ,..., pˆ 9 , pˆ10 . (Note: There are
actually many more than ten possible samples.) This sampling model has mean 0.754 and standard
deviation 0.049.
sampling distribution x  4s
The statistic used to estimate a parameter is _______________if the mean of its _______________
_______________ is equal to the true value of the parameter being estimated.
Example:
Since the mean of the sampling model is 0.754, then _____ is an unbiased estimator of _____ if the true
value of _____ (the proportion of all seniors in Howard County with a GPA of 2.0 or higher) equals 0.754.
A statistic can be _______________and still have high _______________. To avoid this, increase the
size of the sample. Larger samples give smaller spread.
Sample Proportions:
The parameter _____ is the population proportion. In practice, this value is always unknown. (If we
know the population proportion, then there is no need for a sample.)
The statistic _____ is the sample proportion. We use _____ to estimate the value of _____. The value
of the statistic _____ changes as the sample changes.
How can we describe the sampling model for _____?
1.
shape?
_____________________________________
2.
center?
_____________________________________
3.
spread?
_____________________________________
Chapter 18: Sampling Distribution Models
If our sample is an SRS of size n, then the following statements describe the sampling model for _____:
ASSUMPTION:
CONDITION:
1.
Independence
Randomization: The sample should be a SRS of the population
The shape is ______________________________.
ASSUMPTION:
CONDITION:
Sample size is sufficiently large.
Success/Failure: np  10 and
2.
The _______________is p .
3.
The ______________________________is
ASSUMPTION:
CONDITION:
nq  10
pq
.
n
Sample size is sufficiently large.
10%: The population is at least 10 times as large as the sample.
If we have _____________________data, then we must use ______________________________to
construct a sampling model.
Example:
Suppose we want to know how many seniors in Maryland plan to attend college. We want to know how
many seniors would answer, “YES” to the question, “Do you plan to attend college?” These responses
are _____________________.
So _____ (our parameter) is the proportion of all seniors Maryland who plan to attend college. Let
_____ (our statistic) be the proportion of Maryland students in an SRS of size 100 who plan to attend
college. To calculate the value of _____, we divide the number of “Yes” responses in our sample by the
total number of students in the sample.
If I graph the values of _____ for all possible samples of size 100, then I have constructed a
___________________________________. What will the sampling model look like? It will be
_____________________normal. In fact, the larger my sample size, the closer it will be to a normal
Chapter 18: Sampling Distribution Models
model. It can never be perfectly normal, because our data is discrete, and normal distributions are
continuous.
So how large is large enough to ensure that the sampling model is close to normal??? Both _____and
_____should be at least 10 in order for normal approximations to be useful. Furthermore…
The mean of the sampling model will equal the true population proportion, _____. And…
The standard deviation (if the population is at least 10 times as large as the sample) will be
__________.
Your Turn:
The Census Bureau reports that 40% of the 50,000 families in a particular region have more than
one color TV in their household. What is the probability that a simple random sample of size 100 will
indicate 45% or more households with more than one color TV?
Solution:
 State what we want to know:

Check the Thumb Rules (verify the conditions):

State the parameters and the sampling distribution model:

Make a picture:

State the problem in terms of p :
Chapter 18: Sampling Distribution Models

Convert to a z-score:

Find the resulting probability:

Discuss the probability in the context of the question:
Review:
Chapter 18: Sampling Distribution Models
Sample Means:
If, on the other hand, we have _____________________data, then we can use ___________________
________________to construct a sampling model.
Example:
Suppose I randomly select 100 seniors in Maryland and record each one’s GPA. I am interested in
knowing the average GPA of a senior in Maryland
1.95
1.98
1.86
2.04
2.75
2.72
2.06
3.36
2.09
2.06
2.33
2.56
2.17
1.67
2.75
3.95
2.23
4.53
1.31
3.79
1.29
3.00
1.89
2.36
2.76
3.29
1.51
1.09
2.75
2.68
2.28
3.13
2.62
2.85
2.41
3.16
3.39
3.18
4.05
3.26
1.95
3.23
2.53
3.70
2.90
2.79
3.08
2.79
3.26
2.29
2.59
1.36
2.38
2.03
3.31
2.05
1.58
3.12
3.33
2.04
2.81
3.94
0.82
3.14
2.63
1.51
2.24
2.22
1.85
1.96
2.05
2.62
3.27
1.94
2.01
1.68
2.01
3.15
3.44
4.00
2.33
3.01
3.15
2.25
3.34
2.22
3.29
3.90
2.96
2.61
3.01
2.86
1.70
1.55
1.63
2.37
2.84
1.67
2.92
3.29
These 100 seniors make up one possible _______________. The sample mean (_____) is 2.5470 and the
sample standard deviation (_____) is 0.7150.
So _____ (our parameter) is the true mean GPA of a senior in Maryland.
And _____ (our statistic) is the mean GPA of a senior in Maryland in an SRS of size 100.
To calculate the value of _____, we find the mean of our sample (_____).
If we pick different samples, then the value of our statistic _____ changes:
Chapter 18: Sampling Distribution Models
pˆ1  x1  2.5470
pˆ 2  x2  2.4943
pˆ 3  x3  2.6223
pˆ 4  x4  2.5289
pˆ 5  x5  2.4037
pˆ 6  x6  2.3962
pˆ 7  x7  2.5019
pˆ 8  x8  2.5621
pˆ 9  x9  2.6083
pˆ10  x10  2.5667
If I graph the values of _____ for all possible samples of size 100, then I have constructed a
______________________________of sample means. What will the sampling model look like?
Remember that each _____ value is a mean. Means are _______________________than individual
observations because if we are looking only at means, then we don’t see any extreme values, only
average values. We won’t see GPA’s that are very low or very high, only average GPA’s.
The larger the sample size, the less variation we will see in the values of _____. So the standard
deviation decreases as the sample size increases.
So what will the sampling model look like???
ASSUMPTION:
CONDITION:
Independence
Randomization: The sample should be a SRS of the population
If the sample size is large, it will be _________________________.
It can never be perfectly normal, because our data is discrete, and normal distributions are continuous.
ASSUMPTION:
CONDITION:
Sample size is sufficiently large.
Large enough sample IAW Central Limit Theorem (n ≥ 30)
Furthermore…
The mean of the sampling model will equal the true population mean _____.
ASSUMPTION:
CONDITION:
Sample size is sufficiently large.
10%: The population is at least 10 times as large as the sample.
And…
The standard deviation will be __________ (if the population is at least 10 times as large as the sample).
Chapter 18: Sampling Distribution Models
Central Limit Theorem (CLT)
Draw an SRS of size n from any population whatsoever with mean  and standard deviation  .

When n is large, the sampling model of the sample means x is close to the normal model N   ,
with mean  and standard deviation

n

 

n
.
Law of Large Numbers
Draw observations at random from any population with mean  . As the number of observations
increases, the sample mean x gets closer and closer to  .
Your Turn:
The census bureau has established that the mean income of heads of household in a particular
city is $41,500 with a standard deviation of $18,700. A simple random sample of 100 heads
of households in that city indicates that the mean income of the sample individuals is $45,510
with a standard deviation of $23,156. Find the probability that a sample of this size will
indicate a mean of $45,510 or more?
• State what we want to know:
•
Check the conditions (thumb rules):
•
State the parameters and the sampling distribution model:
•
Make a picture:
Chapter 18: Sampling Distribution Models
•
Write the problem in terms of x :
•
Convert to a z-score:
•
Find the resulting probability:
•
Discuss the probability in the context of the problem:
Review:
Chapter 18: Sampling Distribution Models
Steps for Generating a Sampling Distribution Model
1. Random Sample – take a random sample
of a fixed size n from the population.
2. Summary Statistic – Compute a summary
statistic.
3. Repetition – Repeat steps 1 and 2 many
times.
4. Distribution – display the distribution of
the summary statistics.
Questions:
1. Explain the difference between a parameter and a statistic.
2. What is meant by sampling variability?
3. How is the size of a sample related to the spread of the sampling distribution?
4. In an SRS of size n, what is the mean of the sampling distribution of p̂ ?
Chapter 18: Sampling Distribution Models
5. In an SRS of size n, what is the standard deviation of the sampling distribution of p̂ ?
6. What happens to the standard deviation of p̂ as the sample size n increases?
7. When does the formula
pq
apply to the standard deviation of p̂ ?
n
8. When the sample size n is large, the sampling distribution of p̂ is approximately normal.
What test can you use to determine if the sample is large enough to assume that the sampling
distribution is approximately normal?
9. What is the mean of the sampling distribution of x , if x is the mean of an SRS of size n
drawn from a large population with mean and standard deviation ?
10. What is the standard deviation of the sampling distribution of x , if x is the mean of an SRS
of size n drawn from a large population with mean and standard deviation ?
Chapter 18: Sampling Distribution Models
11. To cut the standard deviation of x in half, you must take a sample _____ times as large.
12. When should you use

n
to calculate the standard deviation of x ?
13. What does the central limit theorem say about the shape of the sampling distribution of x ?
14. What does the law of large numbers state?
Chapter 18: Sampling Distribution Models