Download Chapter 3: Describing Relationships (first spread)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Ex. 7.1-1: Heights and Cell Phones
Identify the population, the parameter, the sample, and the statistic in each of the following settings.
(a) A pediatrician wants to know the 75th percentile for the distribution of heights of 10-year-old boys so
she takes a sample of 50 patients and calculates Q3 = 56 inches.
(b) A Pew Research Center poll asked 1102 12- to 17-year-olds in the United States if they have a cell
phone. Of the respondents, 71% said yes. http://www.pewinternet.org/Reports/2009/14--Teens-andMobile-Phones-Data-Memo.aspx
Ex. 7.1-2: Choosing Cards Activity
Materials: A deck of cards with aces and face cards removed so that only the cards 2 through 10 remain.
Shuffle the deck, randomly select 5 cards, and note the median value of the cards. For example, if the
selected cards were 2, 2, 4, 5, and 9, the median would be 4.
Record the value of the sample median on a dotplot going from 2 to 10. Collect the outcomes from other
students in class and add these medians to your dotplot.
Describe what you see: shape, center, spread, and any unusual values.
Ex. 7.1-3: Sampling Heights
Suppose that the heights of adult males are approximately Normally distributed with a mean of 70 inches
and a standard deviation of 3 inches. To see why sample size matters, we took 1000 SRSs of size 100 and
calculated the sample mean height and then took 1000 SRSs of size 1500 and calculated the sample mean
height. Here are the results, graphed on the same scale for easy comparisons:
As you can see, the spread of the approximate sampling distributions is much different. When the sample
size was larger, the distribution of the sample mean was much less variable. In other words, when the
sample size is larger, the sample mean will be closer to the true mean, on average.
Ex. 7.1-4: More Tanks
Here are 5 methods for estimating the total number of tanks: (1) partition = max(5/4), (2) max = max, (3)
MeanMedian = mean + median, (4) SumQuartiles = Q1 + Q3, (5) TwiceIQR = 2IQR. The graph below
shows the approximate sampling distribution for each of these statistics when taking samples of size 4 from
a population of 342 tanks.
Measures from Sample of Collection 1
Dot Plot
Partition
Max
MeanMedian
SumQuartil...
TwiceIQR
0
100 200 300 400 500 600 700
= 342
(a) Which of these statistics appear to be biased estimators? Explain.
(b) Of the unbiased estimators, which is best? Explain.
(c) Explain why a biased estimator might be preferred to an unbiased estimator.
Ex. 7.2-1: Penny for your thoughts…on Proportions - Activity
1.
2.
3.
4.
5.
6.
7.
Take a sample of size 5 from a large collection of pennies.
Calculate p̂ = the proportion of pennies in the sample that were minted in the 2000s.
Record your value on a dotplot . For the dotplot, use increments of 0.05 starting at 0 and ending at 1.
Collect results from students in class and add these to your dotplot.
Repeat the entire process for samples of size 10.
Repeat the entire process for samples of size 20.
Describe the changes in the distribution of p̂ as the sample size increases.
Ex. 7.2-2: Penny for your thoughts…on Proportions
Here are the results of 500 SRSs of size 5, 500 SRSs of size 10, and 500 SRSs of size 20 when sampling
from a population of 2341 pennies where the true proportion of pennies minted in the 2000s is p = 0.293.
Dot Plot
Measures from Sample of Pennies...
Dot Plot 5Measures from Sample of Pennies...
Dot Plot
Measures from Sample of Pennies
0.0 0.2 0.4 0.6 0.8 1.0 1.2
SampleProportion5
0.0 0.2 0.4 0.6 0.8 1.0 1.2
SampleProportion10
0.0 0.2 0.4 0.6 0.8 1.0 1.2
SampleProportion20
Notice that all three distributions have a mean of about 0.293, the value of the true proportion of pennies
minted in the 2000s in the population. The spread, however, gets smaller as the sample size increases and
the shape becomes more symmetric and less skewed to the right as the sample size increases.
Now, suppose we wanted to estimate the proportion of pennies minted before 1976. In the same population,
the true proportion of pennies minted before 1976 is p = 0.092. Here are the results of 500 SRSs of sizes 5,
10, and 20:
Dot Plot 5 Measures from Sample of Pennies...
Dot Plot
Dot Plot
Measures from Sample of Pennies
Measures from Sample of Pennies...
0.0 0.2 0.4 0.6 0.8 1.0
SampleProportion5
0.0
0.2 0.4 0.6 0.8 1.0
SampleProportion10
0.0
0.2 0.4 0.6 0.8 1.0
SampleProportion20
Notice that the means of the distributions are about the same and approximately equal to the true proportion,
p = 0.092. Also, the spread of the distributions get smaller as the sample size increases and the shape of the
distribution becomes more symmetric and less skewed to the right as the sample size increases, although the
shape is still clearly right-skewed for n = 20 in this case.
For all three sample sizes, the distributions were more skewed when p = 0.092 than when p = 0.293. In
general, the closer p is to 0 or 1, the more skewed the distribution of p̂ will be for samples of a given size.
Ex. 7.2-3: Planning for College
The superintendent of a large school district wants to know what proportion of middle school students in her
district are planning to attend a four-year college or university. Suppose that 80% of all middle school
students in her district are planning to attend a four-year college or university.
What is the probability that a SRS of size 125 will give a result within 7 percentage points of the true value?
State: We want to find the probability that the proportion of middle school students who plan to attend a
four-year college or university falls between 73% and 87%.
That is, P(0.73  p̂  0.87).
Plan:  p = 0.80.
Because the school district is large, we can assume that there are more than 10(125) = 1250 middle school
0.80(0.20)
students so  pˆ 
 0.036 .
125
We can consider the distribution of p̂ to be approximately Normal since
np = 125(0.80) = 100  10 and n(1 – p) = 125(0.20) = 25  10.
Do: P(0.73  p̂  0.87)  normalcdf(0.73, 0.87, 0.80, 0.036) = 0.948.
(Note: To get full credit when using normalcdf on an AP exam question, students must explicitly state the
mean and standard deviation of the distribution as in the Plan step above.)
Conclude: About 95% of all SRSs of size 125 will give a sample proportion within 7 percentage points of
the true proportion of middle school students who are planning to attend a four-year college or university.
Ex. 7.3-1: Sampling from a skewed population using a graphic calculator - Activity
In this Activity, we will be taking SRSs of various sizes from a population whose distribution is very
skewed to see how the sample size affects the sampling distribution of the sample mean. The density curve
for this distribution is shown below. (Note: This distribution is a chi-square distribution with 1 degree of
freedom. The mean of this distribution is 1 and the standard deviation is 2 .
To simulate an SRS of size 5 from this distribution on your graphing calculator, use the command
(randNorm(0,1,5))2 and store these values to L1.
Find the mean of these values and mark the value on a dotplot.
Describe the shape, center, and spread of the distribution of x .
Repeat Step 2 for larger sample sizes. For example, to simulate an SRS of size 10 from this population, use
the command (randNorm(0,1,10))2.
How does the sampling distribution of x change as the sample size increases?
Here is what the distributions of x look like for 100 SRSs of size 5, 10, and 25:
Ex. 7.3-2: Stealing Bases
The histogram below shows the distribution of stolen bases (SB) for the 1341 Major League Baseball
players who had at least 1 plate appearance in the 2009 season. The right tail of the histogram actually
extends to 70, but we cut off the scale at 20 to be able to focus on the majority of the observations, which
are near 0.
Here is a histogram showing the distribution of the sample mean number of stolen bases for 100 SRSs of
size n = 45. It is graphed on the same scale to make it easier to see the difference in variability.
In addition to being much less spread out, the distribution of x is also much more symmetric than the
population distribution. However, the center of each distribution is about the same.
Ex. 7.3-3: Movie going students
Suppose that the number of movies viewed in the last year by high school students has an average of 19.3
with a standard deviation of 15.8. Suppose we take an SRS of 100 high school students and calculate the
mean number of movies viewed by the members of the sample.
(a) What is the mean of the sampling distribution of x ?
(b) What is the standard deviation of the sampling distribution of x ? Check whether the 10% condition
is satisfied.
Ex. 7.3-4: Sampling by hand from a Normal population
There are advantages to starting with actual physical simulations before moving to the calculator/computer,
which beginning students often regard as a “magic box.” Here’s a simulation Activity. Prepare a population
of 100 identical index cards. Write numbers on the cards as follows:
Write each of these
. . . on this
numbers . . .
many cards
50
10
49, 51
9
48, 52
9
47, 53
8
46, 54
6
45, 55
5
44, 56
3
43, 57
2
42, 58
1
41, 59
1
40, 60
1
The 100 cards form a population. The distribution of measurements (the numbers on the cards) in this
population is roughly Normal with mean µ = 50 and standard deviation  = 4. Make a dotplot of the 100
population values and find their mean (it is exactly µ = 50 because of the symmetry). This is a population
distribution, and its mean is a parameter.
Now put the cards in a box and have a student take a random sample of size 4 by drawing 4 cards blindly
and recording the numbers on them. Calculate the mean x of the 4 observations. This is a statistic. Return
these cards to the box and shuffle the cards in the box thoroughly. Draw another random sample of size 4,
record the numbers, and find x . Repeat this as many times as is convenient, preferably about 100 times.
Make a dotplot of the x -values and find their mean and standard deviation. This is an approximation to the
sampling distribution of x . Repeat this simulation for samples of different sizes as well. Students should
see that the center of the x distribution is always around 50, that the shape is always approximately
Normal, and that the spread of the x distribution decreases as the sample size increases.
Ex. 7.3-5: Buy Me Some Peanuts and Sample Means
At the P. Nutty Peanut Company, dry roasted, shelled peanuts are placed in jars by a machine. The
distribution of weights in the bottles is approximately Normal, with a mean of 16.1 ounces and a standard
deviation of 0.15 ounces.
(a) Without doing any calculations, explain which outcome is more likely, randomly selecting a single
jar and finding the contents to weigh less than 16 ounces or randomly selecting 10 jars and finding
the average contents to weigh less than 16 ounces.
(b) Find the probability of each event described above.
Ex. 7.3-6: Sampling by hand from a skewed population
Follow the directions for Example 7.3-4 but modify the population so it is skewed. For example, use 100
index cards with the following numbers.
Write each of these
numbers . . .
50
51
52
53
54
55
56
57
58
59
60
. . . on this many
cards
30
20
15
10
8
7
4
3
1
1
1
Make sure to do this for several different sample sizes, so students clearly see the change in shape as well as
the change in spread.
Ex. 7.3-7: Another Strange Population
Here is another population distribution with a strange shape:
What do you think the sampling distribution of x will look like for samples of size 2?
What about samples of size 5?
Size 25?
Here are the results of 10,000 SRSs of each size.
The first graph has three peaks, since there are only 4 basic outcomes for a sample: two small values, which
gives a small mean, two large values, which gives a large mean, or one of each, with gives a mean in the
middle. Since there are two ways to get one of each, the middle pile is roughly twice as big.
Ex. 7.3-8: Mean Texts
Suppose that the number of texts sent during a typical day by a randomly selected high school student
follows a right-skewed distribution with a mean of 15 and a standard deviation of 35.
Assuming that students at your school are typical texters, how likely is it that a random sample of 50
students will have sent more than a total of 1000 texts in the last 24 hours?
State: What is the probability that the total number of texts in the last 24 hours is greater than 1000 for a
random sample of 50 high school students?
Plan: A total of 1000 texts among 50 students is the same as an average number of texts of 1000/50 = 20.
We want to find P( x > 20), where x = sample mean number of texts.
Since n is large (50 > 30), x is approximately N(15, 35 50 ).
Do: P( x > 20)  normalcdf (20, 9999, 15, 35
50 ) = 0.1562.
Conclude: There is about a 16% chance that a random sample of 50 high school students will send more
than 1000 texts in a day.