Download ch2 freq dist and histogram # R code

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Opinion poll wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Kerns confidence intervals / ch14 image
Ch14 (confidence intervals) exercises 14.7, 14.8, 14.9, 14.22, 14.23, 14.24
14.7 IQ test scores. Here are the IQ test scores of 31 seventh-grade girls in a Midwest school
district:4
(a) These 31 girls are an SRS of all seventh-grade girls in the school district. Suppose that the
standard deviation of IQ scores in this population is known to be σ = 15. We expect the
distribution of IQ scores to be close to Normal. Make a stemplot of the distribution of these
31 scores (split the stems) to verify that there are no major departures from Normality. You
have now checked the “simple conditions” to the extent possible.
(b) Estimate the mean IQ score for all seventh-grade girls in the school district, using a 99%
confidence interval. Follow the four-step process as illustrated in Example 14.3.
Answer
(a) A stemplot is provided. The two low scores (72 and 74) are both possible outliers, but
there are no other apparent deviations from Normality.
(b) The problem states that these girls are an SRS of the population, which is very large, so
conditions for inference are met. In part (a), we saw that the scores are consistent with
having come from a Normal population. Our 99% confidence interval for μ is given by
105.84 ± 2.576
= 98.90 to 112.78. We are 99% confident that the mean IQ of
seventh-grade girls in this district is between 98.90 and 112.78 points.
14.8 Confidence level and margin of error. Example 14.1 described NHANES survey data on
the body mass index (BMI) of 654 young women. The mean BMI in the sample was = 26.8.
We treated these data as an SRS from a Normally distributed population with standard deviation
σ = 7.5.
(a) Give three confidence intervals for the mean BMI μ in this population, using 90%, 95%, and
99% confidence.
(b) What are the margins of error for 90%, 95%, and 99% confidence? How does increasing the
confidence level change the margin of error of a confidence interval when the sample size
and population standard deviation remain the same?
x σn ±
Answer
(a) The three confidence intervals are given in the table below. In all three cases,xbar = 26.8
and sigma/sqrt(n)= 0.2933, so the confidence interval is computed as 26.8 (plus/minus)
z*(0.2933), where z* changes with the confidence level.
(b) The margins of error, given in the “m.e.” column of the table, increase as confidence level
increases. 14.9 Sample size and margin of error. Example 14.1 described NHANES survey data on the
body mass index (BMI) of 654 young women. The mean BMI in the sample was = 26.8. We
treated these data as an SRS from a Normally distributed population with standard deviation σ =
7.5.
(a) Suppose that we had an SRS of just 100 young women. What would be the margin of error
for 95% confidence?
(b) Find the margins of error for 95% confidence based on SRSs of 400 young women and 1600
young women.
(c) Compare the three margins of error. How does increasing the sample size change the margin
of error of a confidence interval when the confidence level and population standard deviation
remain the same?
Answer
With z* = 1.96 and σ = 7.5, the margin of error is
. (a) and (b) The margins of error
are given in the table. (c) Margin of error decreases as n increases. (Specifically, every time the
sample size n is quadrupled, the margin of error is halved.)
14.22 Explaining confidence. A student reads that a 95% confidence interval for the mean ideal
weight given by adult American women is 140 ± 1.4 pounds. Asked to explain the meaning of
this interval, the student says, “95% of all adult American women would say that their ideal
weight is between 138.6 and 141.4 pounds.” Is the student right? Explain your answer.
Answer
The student is wrong. A 95% confidence interval does not contain 95% of population values.
Instead, all we can say is that if we repeatedly sampled the same number of women, each
determining a 95% confidence interval for their average perceived ideal weight, then in the long
run 95% of these confidence intervals would capture the true, unknown average ideal weight as
perceived by all American women.
14.23 Explaining confidence. You ask another student to explain the confidence interval for
mean ideal weight described in the previous exercise. The student answers, “We can be 95%
confident that future samples of adult American women will say that their mean ideal weight is
between 138.6 and 141.4 pounds.” Is this explanation correct? Explain your answer.
Answer
This student is also confused. If we repeated the sample over and over, 95% of all future sample
means would be within 1.96 standard deviations of μ (that is, within 1.96 ) of the true,
unknown value of μ. Future samples will have no memory of our sample.
14.24 Explaining confidence. Here is an explanation from the Associated Press concerning one
of its opinion polls. Explain briefly but clearly in what way this explanation is incorrect.
For a poll of 1,600 adults, the variation due to sampling error is no more than three percentage
points either way. The error margin is said to be valid at the 95 percent confidence level. This
means that, if the same questions were repeated in 20 polls, the results of at least 19 surveys
would be within three percentage points of the results of this survey.
Answer
The mistake is in saying that 95% of other polls would have results close to the results of this
poll. Other surveys should be close to the truth — not necessarily close to the results of this
survey. (Additionally, there is the suggestion that 95% means “exactly 19 out of 20,” when really
95% refers to repeating the survey infinitely often.)
R confidence intervals
set.seed(12345)
# First, in R, install.packages("TeachingDemos")
library(TeachingDemos)
# Draw 25 observations from a normal distribution
x <- rnorm(25, mean = 100, sd = 5)
## Compute a Z-test of the hypothesis mu = 120
z.test(x, mu = 120, stdev = 5, conf.level = 0.95)
One Sample z-test
data: x
z = -20.0059, n = 25, Std. Dev. = 5, Std. Dev. of the sample mean = 1, p-value <
2.2e-16
alternative hypothesis: true mean is not equal to 120
95 percent confidence interval:
98.03415 101.95408
sample estimates:
mean of x
99.99411
set.seed(12345)
x <- rnorm(25, mean = 100, sd = 5)
## Compute a t-test of the hypothesis mu = 120
t.test(x, mu = 120, conf.level = 0.95)
One Sample t-test
data: x
t = -21.1677, df = 24, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 120
95 percent confidence interval:
98.04349 101.94473
sample estimates:
mean of x
99.99411
# plots of Z and t
x <- seq(-4,4,length=100)
plot(x,dnorm(x),type="l",ylab="Density",xlab="Z, t")
lines(x,dt(x,df=10),lty=2,col=2)
legend(-4,max(dnorm(x)),c("Z","t (df=10)"),lty=c(1,2),col=c(1,2),cex=.5)
set.seed(12345)
x <- rnorm(100, mean = 10)
# Use the t.test() function to compute a confidence interval
# for mu.x when the variance is unknown
t.test(x, conf.level = 0.95)$conf.int
# Of course, you could do it manually
mean(x)-qt(0.975,df=length(x)-1)*sqrt(var(x)/length(x))
mean(x)+qt(0.975,df=length(x)-1)*sqrt(var(x)/length(x))
set.seed(12345)
x <- rnorm(100, mean = 10)
y <- rnorm(100, mean = 5)
# Use the t.test() function to compute a confidence interval
# for mu.x - mu.y when the variances are unknown and unequal
t.test(x, y, conf.level = 0.95, var.equal = FALSE)$conf.int