Download Ch11 links

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Kerns central limit /
ch11 image
library(TeachingDemos)
example(clt.examp)
chapter 11 (sampling distributions) exercises: 11.2, 11.7, 11.11, 11.13, 11.15, 11.19, 11.26,
11.33
11.2 Florida voters. Florida played a key role in the 2000 and 2004 presidential elections. Voter
registration records in August 2010 show that 41% of Florida voters are registered as Democrats
and 36% as Republicans. (Most of the others did not choose a party.) To test a random digit
dialing device that you plan to use to poll voters for the 2010 Senate elections, you use it to call
250 randomly chosen residential telephones in Florida. Of the registered voters contacted, 34%
are registered Democrats. Is each of the boldface numbers a parameter or a statistic?
Answer
41 %
of registered voters are Democrats: parameter
36%
of registered voters are Republicans: parameter
34%
of voters contacted are Democrats: statistic
11.7 Generating a sampling distribution. Let’s illustrate the idea of a sampling distribution in
the case of a very small sample from a very small population. The population is the scores of 10
students on an exam:
The parameter of interest is the mean score μ in this population. The sample is an SRS of size n =
4 drawn from the population. Because the students are labeled 0 to 9, a single random digit from
Table B chooses one student for the sample.
(a) Find the mean of the 10 scores in the population. This is the population mean μ.
(b) Use the first digits in row 116 of Table B to draw an SRS of size 4 from this population.
What are the four scores in your sample? What is their mean ? This statistic is an estimate of
μ.
(c) Repeat this process 9 more times, using the first digits in rows 117 to 125 of Table B. Make a
histogram of the 10 values of . You are constructing the sampling distribution of . Is the
center of your histogram close to μ?
Answer
(a) μ = 694/10 = 69.4.
(b) The table below shows the results for line 116. Note that we need to choose 5 digits because
the digit 4 appears twice.
(c) The results for the other lines are in the table; the histogram is shown after the table.
11.11 What does the central limit theorem say? Asked what the central limit theorem says, a
student replies, “As you take larger and larger samples from a population, the histogram of the
sample values looks more and more Normal.” Is the student right? Explain your answer.
Answer
No: the histogram of the sample values will look like the population distribution, whatever it
might happen to be. The central limit theorem says that the histogram of sample means (from
many large samples) will look more and more Normal.
11.13 More on insurance. An insurance company knows that in the entire population of
millions of apartment owners, the mean annual loss from damage is μ = $75 and the standard
deviation of the loss is σ = $300. The distribution of losses is strongly right–skewed: most
policies have $0 loss, but a few have large losses. If the company sells 10,000 policies, can it
safely base its rates on the assumption that its average loss will be no greater than $85? Follow
the four-step process as illustrated in Example 11.8.
Answer
This is simply the problem, as stated: We want to find the probability that the average loss for
10,000 policies will be greater than $85 when the long-run average loss is $75.
The central limit theorem says that, in spite of the skewness of the population distribution, the
average loss among 10,000 policies will be approximately N($75, $3).
Use the central limit theorem to approximate this probability; this is justified because for
n = 10,000 the sampling distribution of sample means will be very close to normal.
P( > $85) = P((xbar-mu)/(sigma/sqrt(n))= P((85-75)/ (300/sqrt(10000))
= P(Z >
) = P(Z > 3.33) = 1 − 0.9996 = 0.0004. (from Table A)
11.15 A study of voting chose 663 registered Canadian voters at random shortly after the 2008
elections. Of these, 72% said they had voted in the election. Election records show that only
58.8% of registered voters voted in the election (a record low). The boldface number is a
(a) sampling distribution.
(b) statistic.
(c) parameter.
Answer
(c) 58.8% is a proportion of all registered voters (the population).
11.19 A newborn baby has extremely low birth weight (ELBW) if it weighs less than 1000
grams. A study of the health of such children in later years examined a random sample of 219
grams. This sample mean is an unbiased
children. Their mean weight at birth was
estimator of the mean weight μ in the population of all ELBW babies. This means that
(a) in many samples from this population, the mean of the many values of will be equal to μ.
(b) as we take larger and larger samples from this population, will get closer and closer to μ.
(c) in many samples from this population, the many values of will have a distribution that is
close to Normal.
Answer
(a) “Unbiased” means that the estimator is right “on the average.”
11.26 The Medical College Admission Test. Almost all medical schools in the United States
require students to take the Medical College Admission Test (MCAT). To estimate the mean
score μ of those who took the MCAT on your campus, you will obtain the scores of an SRS of
students. The scores follow a Normal distribution, and from published information you know
that the standard deviation is 6.4. Suppose that (unknown to you) the mean score of those taking
the MCAT on your campus is 25.0.
(a) If you choose one student at random, what is the probability that the student’s score is
between 20 and 30?
(b) You sample 25 students. What is the sampling distribution of their average score ?
(c) What is the probability that the mean score of your sample is between 20 and 30?
Answer
(a) Z= (x – mu) / sigma
Z1 = (20 – 25)/6.4 = -0.78
Z2 = (30 - 25)/6.4 = 0.78
P( -0.78 < Z < 0.78) = 0.7823−0.2177 = 0.5646 (from Table A)
(b) mean xbar = mu = 25
SD(xbar) = sigma / aqrt(n) = 6.4 / aqrt(25) = 1.28
xbar is N(25, 1.28)
(c)
Z1 = (20 – 25)/1.28 = -3.91
Z2 = (30 - 25)/1.28 = 3.91
P(20 < xbar < 30) m= P( -3.91 <Z< 3.91) = 0.9999077
(Using R pnorm(3.91) - pnorm(-3.91) # 0.9999077)
(text Table A doesn’t go beyond Z = ±3.49.)
11.33 Returns on stocks. Andrew plans to retire in 40 years. He plans to invest part of his
retirement funds in stocks, so he seeks out information on past returns. He learns that from 1960
to 2009, the annual returns on U.S. common stocks had mean 10.8% and standard deviation
17.1%. The distribution of annual returns on common stocks is roughly symmetric, so the mean
return over even a moderate number of years is close to Normal. What is the probability
(assuming that the past pattern of variation continues) that the mean annual return on common
stocks over the next 40 years will exceed 10%? What is the probability that the mean return will
be less than 5%? Follow the four-step process as illustrated in Example 11.8.
Answer
The central limit theorem says that over 40 years, (the mean return) is approximately Normal
with mean μ = 10.8% and standard deviation 17.1%/
= 2.704%.
Therefore, P( > 10%) = P(10 – 10.8) /(17.1/
)= P(Z > −0.30) = 0.6179, and P( < 5%) =
P((5-10.8)/ (17.1/
)) = P(Z < −2.14) = 0.0162. Table A. Note: We have to assume that
returns in separate years are independent.
Computer exercise
(a) Select 1000 samples of n = 10 observations each from the normal population N(100, 15)
and find the mean of the means and plot their histograms.
par(mfrow=c(2,1))
hist(rnorm(1000,100,15), prob=T,col="red")
average<-numeric(1000)
for(i in 1:1000) average[i]<-mean(rnorm(10,100,15))
hist(average, prob=T,col="green")
mean(average) # mu = 100
sd(average) # 15/sqrt(10) = 4.743416
0.020
0.010
0.000
Density
Histogram of rnorm(1000, 100, 15)
40
60
80
100
120
140
rnorm(1000, 100, 15)
0.04
0.00
Density
0.08
Histogram of average
85
90
95
100
105
110
115
average
(b) Redo the simulation in (a), but for 1000 samples each of size n = 50. As before, construct a
histogram for the sampling distribution of the sample means, and summarize this distribution.
Compare the obtained mean and standard deviation of the sample means with the theoretical
values. Compare and contrast the sampling distribution for the means based on samples of size
50 with the distribution based on samples of size 10 [from part (a)]. Be careful to take into
account the different horizontal scales of the two histograms.
par(mfrow=c(2,1))
hist(rnorm(1000,100,15), prob=T,col="red")
average<-numeric(1000)
for(i in 1:1000) average[i]<-mean(rnorm(50,100,15))
hist(average, prob=T,col="green")
mean(average) # mu = 100
sd(average) # 15/sqrt(50) = 2.12132
0.000 0.010 0.020
Density
Histogram of rnorm(1000, 100, 15)
60
80
100
120
140
rnorm(1000, 100, 15)
0.10
0.00
Density
Histogram of average
94
96
98
100
102
104
106
average
(c) Sampling from exponential distribution, select 1000 samples each of size n = 5, and n = 25
from the exponential distribution with Rate parameter 1. For each of the two sets of samples,
examine the distribution of the sample means. In each case, compare the mean and standard
deviation of the sample means with the theoretical values. What happens to the shape of the
sampling distribution as the sample size grows? What about the centre and spread of the
sampling distribution? How does this problem illustrate the central limit theorem?
# exp dist lamda = 1
par(mfrow=c(2,1))
hist(rexp(1000,1),breaks = "Sturges",prob=T,col="red")
average<-numeric(1000)
for(i in 1:1000) average[i]<-mean(rexp(5,1))
hist(average, breaks = "Sturges",prob=T,col="green")
mean(average) # mu = 1
sd(average) # 1/sqrt(5) = 0.4472136
0.4
0.2
0.0
Density
0.6
Histogram of rexp(1000, 1)
0
1
2
3
4
5
6
7
rexp(1000, 1)
0.4
0.0
Density
0.8
Histogram of average
0.0
0.5
1.0
1.5
average
# exp dist lamda = 1
par(mfrow=c(2,1))
hist(rexp(1000,1),breaks = "Sturges",prob=T,col="red")
average<-numeric(1000)
for(i in 1:1000) average[i]<-mean(rexp(25,1))
hist(average, breaks = "Sturges",prob=T,col="green")
mean(average) # mu = 1
sd(average) # 1/sqrt(25) = 0.2
2.0
2.5
3.0
0.4
0.2
0.0
Density
0.6
Histogram of rexp(1000, 1)
0
2
4
6
8
rexp(1000, 1)
1.0
0.5
0.0
Density
1.5
Histogram of average
0.4
0.6
0.8
1.0
average
1.2
1.4
1.6