Download CI Review Solutions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Confidence Interval Review – Solutions
1) Julia enjoys jogging. She has been jogging over a period of several years, during which
time her physical condition has remained relatively constant. Usually, she jogs 2 miles per
day. The standard deviation of all of her times is σ = 1.80 minutes. During the past year
Julia has recorded her times required to run 2 miles. She has a random sample of 90 of
these times. For these 90 times the mean is x = 15.60 minutes. Let µ be the mean jogging
time for the entire distribution of Julia’s 2-mile running times. Find a 95% confidence
interval for µ. Follow the inference toolbox.
(1) Choose procedure and check conditions: Use a one-sample z confidence interval for a
population mean (because we know the population standard deviation).
(a) SRS: We are told the sample is a random sample of her times.
(b) Normality: We can assume the sampling distribution of her sample mean times ( x )
is at least approximately normal (by the CLT with n = 90 ≥ 30 ).
(c) Independence: We are willing to assume the 90 times make up less than 10% of her
jogging times, since she has been jogging over the past several years (jogging every
day for three years would produce over 900 times).
(2) Do the math: x ± z *
σ
n
= 15.60 ± 1.960
1.80
= (15.228,15.972)
90
(3) Conclusion: We are 95% confident that the mean time for Julia to jog 2 miles is between
15.228 and 15.972 minutes.
In the above problem, what sample size is required to have a margin of error of at most
± 0.2 minutes?
σ
1.80
3.528
3.528
z * ( ) ≤ m 1.960(
) ≤ 0.2 ≤ n 311.1696 ≤ n n ≥ 312 .
≤ 0.2 0.2
n
n
n
2) How much does a sleeping bag cost? Let’s say you want a sleeping bag that should keep
you warm in temperatures from 20 to 45°F. A random sample of prices ($) for sleeping
bags in this temperature range was taken from Backpacker Magazine: Gear Guide (Vol. 25,
Issue 157, No. 2):
80
90
100
120
75
37
30
23
100
110
105
95
105
60
110
120
95
90
60
70
Find a 90% confidence interval for the mean price µ of all sleeping bags used for this
temperature range. Follow the steps of the inference toolbox.
(1) Choose procedure and check conditions: Use a one-sample t confidence interval for the
population mean (we do not know the population standard deviation and must estimate it
with the sample standard deviation s).
(a) SRS: We are told that this sample is a random sample from the population of sleeping
bags in this temperature range.
(b) Normality:
Boxplot of Price
Probability Plot of Price
Normal
120
99
95
100
90
70
80
60
50
40
30
Price
Percent
80
60
20
10
5
40
1
0
20
40
60
80
Price
100
120
140
160
20
The normal probability plot of the prices looks fairly linear with no apparent outliers
or severe skewness; the boxplot shows no outliers but is not quite symmetric.
However, the sample size is 20, so we can still use the t-procedures (in the absence of
strong skewness or influential outliers) because the sampling distribution of sample
mean prices ( x ) should be at least approximately normal.
(c) Independence: We assume there are more than 200 different prices of sleeping bags
in this temperature range, so the sample makes up less than 10% of the total
population, and the assumption of independence of observations is justified.
s
28.97
= 83.75 ± 1.729
= (72.55,94.95)
(2) Do the math: x ± t *
n
20
(3) Conclusion: We are 90% confident that the mean price for sleeping bags useful for this
temperature range is between 72.55 and 94.95 dollars.
3) Suppose that 800 students were selected at random from a student body of 20,000 college
students and given shots to prevent a certain type of flu. All 800 students were exposed to
the flu, and 600 of them did not get the flu. Let p represent the probability that the shot
will be successful for any single student selected at random from the entire population of
20,000. Find a 99% confidence interval for p. Follow the steps of the inference toolbox.
(1) Choose procedure and check conditions: Use a one-sample z confidence interval for a
population proportion.
(a) SRS: We are told the students were selected at random from this particular population
of college students.
(b) Normality: 800(.75) = 600 > 10 and 800(.25) = 200 > 10, so we are willing to
assume the sampling distribution of sample proportions of students avoiding the flu
( p̂ ) is approximately normal.
(c) Independence: The sample of 800 makes up less than 10% of the total student body
of 20,000, so we are willing to consider the individual observations independent of
one another.
pˆ (1 − pˆ )
(.75)(.25)
(2) Do the math: pˆ ± z *
= .75 ± 2.576
= (.71057,.78943)
n
800
(3) Conclusion: We are 99% confident that the proportion of students who do not get sick
after receiving the flu shot is between .71057 and .78943.
4) For large U.S. companies, what percent of their total income comes from foreign sales?
A random sample of technology companies (IBM, Hewlett-Packard, Intel, and others) gave
the following data:
62.8 55.7 47.0 59.6 55.3 41.0 65.1 51.1
53.4 50.8 48.5 44.6 49.4 61.2 39.3 41.8
Another independent random sample of basic consumer product companies (Goodyear,
Sarah Lee, H.J. Heinz, and others) gave the following data:
28.0 30.5 34.2 50.3 11.11 28.8 40.0 44.9 40.7
60.1 23.1 21.3 42.8 18.0 36.9 28.0 32.5
Find each sample mean, sample standard deviation, and sample size.
Technology: n = 16, x = 51.66, s = 7.93
Consumer: n = 17, x = 33.6, s = 12.26
Find a 90% confidence interval for µ1-µ2 when µ1 represents the technology companies and
µ2 represents the consumer companies. Follow the inference toolbox.
(1) Choose procedure and check conditions: Use a two-sample t confidence interval for
the difference in two means.
(a) SRS: Both samples are said to be random samples from their respective populations
and are independent of one another.
(b) Normality: For the technology sample, the normal probability plot of the sample
data is roughly linear so we assume the sampling distribution of sample mean
incomes ( x1 ) is approximately normal. The normal probability plot for the
consumer companies is also roughly linear and we can again assume the sampling
distribution of sample mean incomes ( x2 ) is also approximately normal.
Probability Plot of Tech, Consumer
Normal
99
Variable
Tech
C onsumer
95
90
Percent
80
70
60
50
40
30
20
10
5
1
0
10
20
30
40
Data
50
60
70
(2) Do the math:
s12 s2 2
7.932 12.26 2
+
= (51.66 − 33.66) ± 1.753
+
= (11.981, 24.143)
n1 n2
16
17
(3) Conclusion: We are 90% confident that the difference in the mean percentage of
income that comes from foreign sales for technology companies and the mean
percentage of income that comes from foreign sales for consumer product companies is
between 11.981% and 24.143%.
( x1 − x2 ) ± t *
5) The U.S. department of Commerce Environmental Data Service gave the following
information about average temperature (°F) in January in Phoenix, Arizona, for the past
39 years. Assume σ = 3.04 °F.
52.8
52.3
50.4
52.2
51.6
50.7
52.7
43.7
54.0
53.8
49.7
52.4
54.5
49.9
51.5
48.5
52.6
48.4
53.3
51.2
46.7
52.4
50.7
53.0
51.7
54.2
51.4
43.2
48.7
49.6
51.4
56.0
54.6
42.8
54.0
52.3
48.5
51.9
54.9
Find a 99% confidence interval for the January mean temperature in Phoenix. Follow the
inference toolbox.
(1) Choose procedure and check conditions: Use a one-sample z confidence interval for a
population mean (because we are given the population standard deviation).
(a) SRS: We are not told that this is an SRS. However, it is probably safe to assume
this sample is representative of the January temperatures in Phoenix.
(b) Normality: We are not told that the distribution of temperatures is normal, however
because our sample size is relatively large (n = 39), we are assured that the sampling
distribution of sample mean temperatures ( x ) is at least approximately normal due
to the Central Limit Theorem.
(c) Independence: We are willing to assume that the observations are independent of
one another (temperatures from year to year probably are, for the most part).
σ
3.04
(2) Do the math: x ± z *
= 51.13 ± 2.576
= (49.879,52.387)
n
39
(3) Conclusion: We are 99% confident that the true mean temperature in Phoenix, Arizona
in January is between 49.879 and 52.387 (°F).
6) Suppose an archaeologist discovers only 7 fossil skeletons from a previously unknown
species of miniature horse. Reconstructions of the skeletons of the 7 miniature horses show
the shoulder heights (in cm) to be:
45.3
47.1
44.2
46.8
46.5
45.5
47.6
Even though one condition is violated, find a 99% confidence interval for µ, the mean
shoulder height of the entire population of such a horse. Follow the inference toolbox. In
your work mention the condition that is violated and state why you can still carry on with
the procedure.
(1) Choose a procedure and check conditions: Use a one-sample t confidence interval for
a population mean.
(a) SRS: This is not quite an SRS – it is made up of all of the 7 skeletons that are
known to exist; however, we will carry out the t-procedure, keeping in mind that
generalizing to the entire extinct population might be somewhat of a stretch.
(b) Normality: Although we only have 7 data points, the normal probability plot does
not show any obvious deviations from normality, so we are willing to conclude that
the sampling distribution of sample mean shoulder heights ( x ) is approximately
normal. We should proceed with caution because our sample size is so small (i.e.,
our results might be somewhat inaccurate).
Probability Plot of Sholder Height
Normal
99
95
90
Percent
80
70
60
50
40
30
20
10
5
1
43
44
45
46
Sholder Height
47
48
49
(c) Independence: We are willing to assume that the entire extinct population
numbered over 70 (10 times the size of the sample), and hence will assume that these
observations are independent of one another. (However, all of these skeletons were
found in the same place, and it is conceivable that there is a reason for this that might
be related to their shoulder heights.)
s
1.19
= 46.142 ± 3.707
= (44.475, 47.81)
n
7
(3) We are 99% confident that the mean shoulder height µ for this species of miniature horse
is between 44.475 and 47.81 cm. (But keep in mind the warnings from step 1!)
(2) Do the math: Step 3: x ± t *
7) David E. Brown is an expert in wildlife conservation. In his book The Wolf in Southwest:
The Making of an Endangered Species (University of Arizona Press), he records the
following weights of adult gray wolves from two regions in Old Mexico.
Chihuahua region (in pounds) – sample 1:
86
75
91
70
79
80
68
71
74
64
Durango region (in pounds) – sample 2:
68
72
79
68
77
89
68
59
63
66
58
54
62
71
55
59
68
67
Find a 90% confidence interval for the difference in the mean weights between the wolves
from the two regions. Follow the inference toolbox.
Chihuahua: n = 10, x = 75.8, s = 8.324
Durango: n = 18, x = 66.83, s = 8.867
(1) Choose a procedure and check conditions: Use a two-sample t confidence interval for
the difference in two means.
(a) SRS: Although it is not stated, we will assume that both samples are random samples
from their respective populations (regions), and that they are independent samples.
However, if these sample data were not collected via SRSs, we will not be able to
generalize our results to the two populations of wolves.
(b) Normality: Both normal probability plots look roughly linear, so we are willing to
assume that the sampling distributions of sample mean Chihuahua wolf weights ( x1 )
and Durango wolf weights ( x2 ) are approximately normal.
Probability Plot of Chihuahua, Durango
Normal
99
Variable
Chihuahua
Durango
95
90
Percent
80
70
60
50
40
30
20
10
5
1
40
50
60
70
Data
80
90
100
(c) Independence: We will assume that both populations are at least 10 times the size of
their corresponding sample, and hence will consider the observations in each sample
independent of one another.
(2) Do the math:
s12 s2 2
8.322 8.862
+
= (75.8 − 66.83) ± 1.83
+
= (2.812,15.127)
n1 n2
10
18
(3) We are 90% confident that the true difference in the mean weight of wolves from the
Chihuahua region and the mean weight of those from the Durango region of Old Mexico
is between 2.812 and 15.127 pounds.
( x1 − x2 ) ± t *
8) What price to farmers get for their watermelons? In the third week of July, a random
sample of 40 farming regions gave a sample mean of $6.88 per 100 pounds of watermelon.
Assume that σ is known to be $1.92 per 100 pounds. ( Reference: Agricultural Statistics,
U.S. Department of Agriculture). Find a 90% confidence interval for the population mean
price (per 100 pounds) that farmers in this region get for their watermelon crop. Follow
the inference toolbox.
(1) Choose a procedure and check conditions: Use a one-sample z confidence interval
for a population mean (because we know the population standard deviation).
(a) SRS: We are told that the sample was a random sample from a population of
farming regions.
(b) Normality: We are not told that the distribution of prices is normal. By the
Central Limit Theorem (CLT), we can conclude that the sampling distribution of
sample mean watermelon prices per 100 lbs. ( x ) is at least approximately normal.
(c) Independence: We will assume the observations are independent of one another
(i.e., we assume that there are more than 400 farming regions).
σ
1.92
= (6.3807,7.3793)
n
40
(3) We are 90% confident that the true mean price for 100 pounds of watermelon is
between 6.38 and 7.38 dollars.
(2) Do the math: Step 3: x ± z *
= 6.88 ± 1.645
Assume a confidence interval for the watermelon data was calculated and found to be
(6.285 to 7.475). What is the confidence level for this interval?
Since the mean is 6.88: 7.475 − 6.88 = .595 which is the margin of error
1.92
40
Then
.595 = z *
and
.595 = z * (.30357) .
Therefore, z * = 1.959 which is close to the 1.96 that corresponds to the 95% confidence level.
9) Most married couples have two or three personality preferences in common. Myers
used a random sample of 375 married couples and found that 132 had three preferences in
common. Another sample of 571 couples found that 217 had two personality preferences in
common. Let p1 be the population proportion of all married couples with three personality
preferences in common and let p2 be the population proportion of all married couples with
two personality preferences in common. Find a 90% confidence interval for the difference
in these two proportions. Follow the steps of the inference toolbox.
(1) Choose procedure and check conditions: Use a two-sample z confidence interval for
132
217
proportions, with pˆ1 =
= .352 and pˆ 2 =
= .38
375
571
(a) SRS: We are told that both samples are random samples from their respective
populations, and we assume that the two samples are independent of one another.
(b) Normality: Since 132 ≥ 5 and 243 ≥ 5 , and 217 ≥ 5 and 254 ≥ 5 , all are bigger than
5 so we can assume the sampling distributions of the sample proportions p̂1 and p̂2
are approximately normal.
(c) Independence: 10(375) = 3750 and 10(571) = 5710 ; we assume there are more than
5710 married couples. We can therefore assume independence of individual
observations.
(3) Do the math:
pˆ (1 − pˆ1 ) pˆ 2 (1 − pˆ 2 )
( pˆ1 − pˆ 2 ) ± z * 1
+
n1
n2
= (.352 − .38) ± 1.645
.352(1 − .352) .38(1 − .38)
+
375
571
= (−.0806,.02452)
(4) Conclusion: We are 90% confident that the true difference in the proportion of couples
that have 3 personality preferences in common and the proportion of couples that have 2
personality preferences in common is between -0.0806 and 0.02452.
10) The home run percentage is the number of home runs per 100 times at bat. A random
sample of 43 professional baseball players gave the following statistics for home run
percentages. (Reference: The Baseball Encyclopedia, Macmillan). For this data, x = 2.29
and s = 1.40. Compute a 95% confidence interval for the population mean µ of home run
percentages for all professional baseball players. Follow the steps of the inference toolbox.
(1) Choose procedure and check conditions: Use a one-sample t confidence interval for a
population mean (even though we’re given the standard deviation, note that it is the
sample standard deviation, s, hence we use t instead of z).
(a) SRS: It is stated that this is a random sample from the population of (presumably)
all professional baseball players.
(b) Normality: We do not know if the home run percentages are normally distributed.
However, because the sample size is 43, the CLT assures us that the sampling
distribution of sample mean home run percentages ( x ) is at least approximately
normal.
(c) Independence: Mathematically, we know that the sample size is less than 10% of
the population. However, logically, there might be a dependent relationship
between one player’s home run average and another player’s average, so we might
want to be careful about the interpretation of the results of our calculations.
1.40
s
(2) Do the math: x ± t *
= 2.29 ± 2.021
= (1.8591, 2.7209)
n
43
(3) Conclusions: We are 95% confident that the mean home run percentage in 100 at bats
for professional baseball players is between 1.8591 and 2.7209.
11) The manager of the dairy section of a large supermarket took a random sample of 250
egg cartons and found that 40 cartons had at least one broken egg. Find a 90% confidence
interval for p. Follow the steps of the inference toolbox.
(1) Choose procedure and check conditions: Use a one-sample z confidence interval for a
40
population proportion, with pˆ =
.
250
(a) SRS: We are told the sample is a random sample of egg cartons from the population
of all egg cartons at this large supermarket.
(b) Normality: npˆ = 40 and n (1 − pˆ ) = 210 , so we can assume the sampling distribution
of sample proportions ( p̂ ) is at least approximately normal.
(c) Independence: We are willing to assume the cartons are independent of one another,
and we assume that the population consists of more than 2500 egg cartons.
(2) Do the math: pˆ ± z *
pˆ (1 − pˆ )
.16(1 − .16)
= .16 ± 1.645
= (.12186,.19814)
n
250
(3) Conclusion: We are 90% confident that the proportion p of egg cartons in the dairy
section of this particular large supermarket that have at least one broken egg is between
.12186 and .19814.
Assume the manger repeats this same process a different week and found the same sample
proportion of cartons broken for 250 egg cartons. Since the manager knows some statistics
he calculated a confidence interval for the mean proportion and found it to be (.11456 to
.20544). A couple days later he looked at his work and realized he forgot to write down the
confidence level for his interval. Find the manager’s confidence level for this interval.
0.20544-0.16=.04544 is the margin of error
z*
pˆ (1 − pˆ
=m
n
z*
.16(1 − .16)
= .04544
250
z * (.023186) = .04544
z * = 1.9597 , which is close to the z-score of 1.96 that corresponds to a confidence level of 95%
12) Independent random samples of professional football and basketball player’s heights
gave the following information (Reference: Sports Encyclopedia of Pro Football and Official
NBA Basketball Encyclopedia).
Football: n = 45, x1 = 6.179 feet and s1 = 0.366 feet
Basketball: n=40, x2 = 6.453 feet and s2 = 0.314 feet
Follow the steps of the inference toolbox to construct a 95% confidence interval for the
difference between the means of football and basketball player’s heights. 7
(1) Choose procedure and check conditions: Use a two-sample t confidence interval for the
difference in two population means.
(a) SRS: We are told the samples are independent random samples from their
corresponding populations.
(b) Normality: We are not told that the players’ heights vary normally. However, this
is typically true of heights. Even so, by the CLT, we can conclude that the sampling
distributions of x1 and x2 do vary at least approximately normally.
(c) Independence: We are willing to treat the observations as independent observations
because each sample makes up less than 10% of its corresponding population.
(2) Do the math:
( x1 − x2 ) ± t *
.3662 .3142
s12 s2 2
+
= (6.179 − 6.453) ± 2.042
+
= (−.4207, −.1273)
n1 n2
45
40
(3) Conclusion: We are 95% confident that the difference in the mean height of
professional football and the mean height of professional basketball players is between
−0.4207 and − 0.1273 .
13) At a community hospital, the burn center is experimenting with a new plasma compress
treatment. A random sample of n1 = 316 patients with minor burns received the plasma
compress treatment. Of these patients, it was found that 259 had no visible scars after
treatment. Another random sample of n2 = 419 patients with minor burns received no
plasma compress treatment. For this group, it was found that 94 had no visible scars after
treatment. Let p1 be the population proportion of all patients with minor burns receiving
the plasma compress treatment that have no visible scars. Let p2 be the population
proportion of all patients with minor burns not receiving the plasma compress treatment
that have no visible scars.
Find a 95% confidence interval for p1 – p2. Follow the steps of the inference toolbox.
(1) Choose procedure and check conditions: Use a two-sample z confidence interval for
259
94
the difference in two population proportions, with pˆ1 =
= .82 and pˆ 2 =
= .22 .
316
419
(a) SRS: We are told the samples are random samples from their corresponding
populations, and we will assume that these samples are independent of one
another.
(b) Normality: Since (316)(.82) and 316(1-.82) are bigger than 5 and 419(.22) and
419(1-.22) are bigger than 5 we can assume the distribution of p1 – p2 is
approximately normal.
(c) Independence: We can assume there are more than 316(10) and 419(10) burn
patients, hence we are willing to consider each patient independent from each of
the other patients within the same sample.
(3) Do the math:
.82(1 − .82) .22(1 − .22)
pˆ (1 − pˆ1 ) pˆ 2 (1 − pˆ 2 )
( pˆ1 − pˆ 2 ) ± z * 1
+
= (.82 − .22) ± 1.96
+
= (.53703,.65352)
n1
n2
316
419
(4) Conclusion: We are 95% confident that the true difference p1 – p2 in the proportion of
patients that have no visible scars after plasma compress treatment (p1) and the
proportion of patients that have no scars without the plasma compress treatment (p2) is
between 0.53703 and 0.65352 .
14) Attending sporting events is a popular source of entertainment. When 1000 people
were surveyed, 590 said that getting together with friends was an important reason for
attending a sporting event (USA Today). Find a 99% confidence interval for p. Follow the
steps of the inference toolbox.
(1) Population and Parameter: We wish to estimate p, the proportion of people who would
say that getting together with friends is an important reason for attending a sporting
event.
(2) Choose procedure and check conditions: Use a one-sample z confidence interval for a
590
population proportion, with pˆ =
= .59 .
1000
(a) SRS: We are not told whether this is a random sample. If it is not, we need to be
careful about generalizing our findings to the entire population.
(b) Normality: np = 590 and n(1 – p) = 410. Both are greater than 10, so we can
assume that the distribution of p̂ is at least approximately normal.
(c) Independence: We are not told that the sample observations are independent of one
another, so we must be careful about interpreting the results of our calculations.
.59(1 − .59)
pˆ (1 − pˆ )
(3) Do the math: pˆ ± z *
= .59 ± 2.576
= (.54994,.63006)
n
1000
(4) Conclusion: We are 99% confident that the proportion of people who would say that
getting together with friends is an important reason for attending a sporting event is
between .54994 and .63006.
Assume we have the same sample proportion as stated above but an unknown sample size.
If we want to be 99% confident and want the margin of error to be ± .06, how large of a
sample is needed?
2.576
.59(1 − .59)
≤ .06
n
2.576 .59 (.41)
≤ n
.06
21.116 ≤ n
21.1162 ≤ n
n ≥ 445.887
So we need a sample size of at least n = 446.
15) From public records, individuals were identified as having been charged with drunken
driving not less than 6 months or more than 12 months from the starting date of a given
study. Two random samples from this group were studied. In the first sample of 30
individuals, the respondents were asked in face-to-face interviews if they had been charged
with drunken driving in the last 12 months. Of these 30 people, 16 answered the question
accurately. The second random sample consisted of 46 people who had been charged with
drunken driving. During a telephone interview, 25 of these responded accurately. Assume
the samples are representative of all the people recently charged with drunken driving. Let
p1 be the population proportion of those interviewed face-to-face that answered correctly
and let p2 be the population proportion of those interviewed over the phone that answered
correctly. Find a 98% confidence interval for p1 – p2. Follow the steps of the inference
toolbox.
(1) Choose procedure and check conditions: Use a two-sample z confidence interval for
16
25
the difference in two population proportions, with pˆ1 =
= .533 and pˆ 2 =
= .5434 .
30
46
(a) SRS: We are told both samples are random samples, and we are willing to assume
that the samples are independent.
(b) Normality: n1p1 = 16 and n1 (1 – p1) = 14, while n2p2 = 25 and n2 (1 – p2) = 21. All
are greater than 5, so we can assume that the distributions of both sample proportions
are at least approximately normal.
(c) Independence: We are not told that the sample observations are independent of one
another, but this seems like a reasonable assumption in this case.
(2) Do the math:
( pˆ1 − pˆ 2 ) ± z *
pˆ1 (1 − pˆ1 ) pˆ 2 (1 − pˆ 2 )
+
n1
n2
= (.533 − .5434) ± 2.326
.533(1 − .533) .5434(1 − .5434)
+
= (−.2823,.26205)
30
46
(3) Conclusion: We are 98% confident that the difference in the proportion of those
individuals charged with drunken driving who were truthful when interviewed via phone
(p2) and the proportion of those individuals charged with drunken driving who were
truthful when interviewed face-to-face (p1).is between -.2823 and .26205.