Download Sampling Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Name _______________________________
Period _____________
Test 10 Confidence Intervals Homework
(Chpt 10.1, 11.1, 12.1)
For 1 & 2, determine the point estimator you would use and calculate its value.
1.
How many pairs of shoes, on average, do female teens have? To find out, an AP Statistics class conducted
a survey. They selected an SRS of 20 female students from their school. Then they recorded the number
of pairs of shoes that each student reported having. Here are the data:
50
26
26
31
57
19
24
22
23
38
13
50
13
34
23
30
49
13
15
51
607
Point estimator would be the sample mean, π‘₯,
οΏ½ π‘₯Μ… =
= 30.35
20
2. Tonya wants to estimate what proportion of the seniors in her school plan to attend the prom. She
interviews an SRS of 50 of the 750 seniors in her school and finds that 36 plan to go to the prom.
36
Point estimator would be the sample proportion, 𝑝̂ , 𝑝̂ = = .72
50
3. Young people have a better chance of full-time employment and good wages if they are good with
numbers. One source of data is the National Assessment of Educational Progress (NAEP) Young Adult
Literacy Assessment Survey, which is based on a nationwide probability sample of households. The NAEP
survey includes a short test of quantitative skills and scores on the test range from 0 to 500. Suppose
that you give the NAEP test to an SRS of 840 people from a large population in which the scores have
mean 280 and standard deviation 𝜎𝜎 = 60. The mean π‘₯Μ… of the 840 scores will vary if you take repeated
samples.
a) Describe the shape, center, and spread of the sampling distribution of π‘₯Μ… .
We can’t verify that the population distribution is Normal, but since n = 840 the CLT says the
sampling distribution will be approximately Normal.
10% condition: 10(840) = 8400; there are at least 8400 people in the population
πœ‡πœ‡π‘₯Μ… = πœ‡πœ‡ = 280;
𝜎𝜎π‘₯Μ… =
60
√840
= 2.0702
N(280, 2.0702)
b) Sketch the sampling distribution of π‘₯Μ… . Mark its mean and the values one, two, and three standard
deviations on either side of the mean.
274 276 278 280 282 284 286
c) According to the 68-95-99.7 rule, about 95% of all values of π‘₯Μ… lie within a distance m of the mean of
the sampling distribution. What is m? Shade the region on the axis of your sketch that is within m of
the mean.
2 standard deviations, π‘š = 2(2.0702) = 4.1704
d) Whenever π‘₯Μ… falls in the region you shaded, the population πœ‡πœ‡ lies in the confidence interval π‘₯Μ… ± π‘š. For
what percent of all possible samples does this interval capture πœ‡πœ‡?
95% of all samples will capture πœ‡πœ‡.
e) Below your sketch, choose one value of π‘₯Μ… inside the shaded region and draw its corresponding
confidence interval. Do the same for one value of π‘₯Μ… outside the shaded region. What is the most
important difference between these intervals?
One captures πœ‡πœ‡, the other does not.
4. The figure to the right shows the result of taking 25 SRSs
from a Normal population and constructing a confidence
interval for each sample. Which confidence level – 80%, 90%,
95%, or 99% - do you think was used? Explain.
5. A New York Times/CBS News Poll asked the question, β€œDo you favor an amendment to the Constitution
that would permit organized prayer in public schools?” Sixty-six percent of the sample answered β€œYes”.
The article describing the poll says that it β€œis based on telephone interviews conducted from Sept. 13 to
Sept. 18 with 1,664 adults around the United States, excluding Alaska and Hawaii…The telephone
numbers were formed by random digits, thus permitting access to both listed and unlisted residential
numbers.” The article gives the margin of error for a 95% confidence level as 3 percentage points.
a) Explain what the margin of error means to someone who knows little statistics.
If we were to repeat the sampling procedure many times, the sample proportion would be within 3
percentage points of the true proportion in 95% of the intervals.
b) State and interpret the 95% confidence interval.
(.63, .69) We are 95% confident that the interval from .63 to .69 captures the true proportion of
those who favor an amendment to the Constitution that would permit organized prayer in school.
c) Interpret the confidence level.
The method that was used to construct the interval will capture the true proportion of those who
favor an amendment to the Constitution that would permit organized prayer in school about 95%
of the time in repeated sampling.
6. The news article in problem 5 goes on to say: β€œThe theoretical errors do not take into account a margin of
additional error resulting from the various practical difficulties in taking an survey of public opinion.” List
some of the β€œpractical difficulties” that may cause errors in addition to the ±3 percentage point margin
of error. Nonresponse and undercoverage of those who do not have telephones or have only cell phones.
7. A 95% confidence interval for the mean body mass index (BMI) of young American women in 26.8 ± 0.6.
Discuss whether each of the following explanations is correct.
a) We are confident that 95% of all young women have BMI between 26.2 and 27.4.
INCORRECT – the interval is for the mean, not individuals
b) We are 95% confident that future samples of young women will have mean BMI between 26.2 and
27.4.
INCORRECT – we are not predicting for samples
c) Any value from 26.2 to 27.4 is believable as the true mean BMI of young American women.
CORRECT – provided an interval of plausible values for πœ‡πœ‡
d) In 95% of all possible samples, the population mean BMI will be between 26.2 and 27.4.
INCORRECT – πœ‡πœ‡ is constant, it’s either in the interval or it’s not
e) The mean BMI of young American women cannot be 28.
INCORRECT – our interval may be one that missed πœ‡πœ‡
8. A researcher plans to use a random sample of n = 500 families to estimate the mean monthly family
income for a large population. A 99% confidence interval based on the sample would be ________ than a
90% confidence interval.
a) narrower and would involve a larger risk of being incorrect
b) wider and would involve a smaller risk of being incorrect
c) narrower and would involve a smaller risk of being incorrect
d) wider and would involve a larger risk of being incorrect
e) wider, but it cannot be determined whether the risk of being incorrect would be larger or smaller
9. You have measured the systolic blood pressure of an SRS of 25 company employees. A 95% confidence
interval for the mean systolic blood pressure for the employees of this company is (122, 138). Which of
the following statements gives a valid interpretation of this interval?
a) 95% of the sample of employees have a systolic blood pressure between 122 and 138.
b) 95% of the population of employees have a systolic blood pressure between 122 and 138.
c) If the procedure were repeated many times, 95% of the resulting confidence intervals would
contain the population mean systolic blood pressure.
d) The probability that the population mean blood pressure is between 122 and 138 is 0.95.
e) If the procedure were repeated many time, 95% of the sample means would be between 122 and
138.
For 10 & 11, check whether each of the conditions is met for calculating a confidence interval for the
population proportion p.
10. Latoya wants to estimate what proportion of the seniors at her boarding high school like the cafeteria
food. She interviews an SRS of 50 of the 175 seniors living in the dormitory. She finds that 14 think the
cafeteria food is good.
RANDOM – stated SRS
NORMAL – 14 yes, 36 no; both > 10
INDEPENDENT – No, 10(50) = 500 but the population is only 175
11. In the National AIDS Behavioral Surveys sample of 2673 adult heterosexuals, 0.2% had both received a
blood transfusion and had a sexual partner from a group at high risk of AIDS. We want to estimate the
proportion p in the population who share these two risk factors.
RANDOM – ?? not sure…it doesn’t state how the sample was obtained
NORMAL – no. .002(2673) = 5.346 which is < 10
INDEPENDENT – 10(2673) = 26730; there are more than 26730 adult heterosexuals
12. Find z* for a 98% confidence interval using Table A or your calculator. Show your method.
Using a table, draw a curve and mark the middle 98%. There will be 1% on each side. Look up .0100 on
Table A and the corresponding z-score is 2.33 so z* = 2.33. Using Table B z* = 2.326. On the calculator:
invNorm(.01, 0, 1) = 2.32635
13. Tonya wants to estimate what proportion of her school’s seniors plan to attend the prom. She interviews
an SRS of 50 of the 750 seniors in her school and finds that 36 plan to go to the prom.
a) Identify the population and parameter of interest.
Population: seniors at Tonya’s high school
Parameter: true proportion who plan to attend prom
b) Check conditions for constructing a confidence interval for the parameter.
RANDOM – stated SRS
NORMAL – 36 yes, 14 no; both > 10
INDEPENDENT – Yes, 10(50) = 500 and there are 750 seniors at her school
c) Construct a 90% confidence interval for p. Show your method.
90% confidence z* = 1.645 one-sample z-interval for p; conditions have been met
1propzint(36, 50, .9)
OR
𝑝̂ =
36
50
= .72
. 72 ± 1.645οΏ½
.72(.28)
(. 616, .824)
50
d) Interpret the interval in context.
We are 90% confident that the interval from .616 to .824 captures the true proportion of seniors
who plan to attend the prom.
14. In a Harvard School of Public Health survey, 2105 of 10,904 randomly selected U.S. college students
were classified a abstainers (nondrinkers).
a) Construct and interpret a 99% confidence interval for p. Follow the process.
Parameter: true proportion of U.S. college students who abstain from drinking
99% confidence z* = 2.576
one-sample z interval for p if conditions are met
RANDOM – stated randomly selected
NORMAL – 2105 yes, 8799 no; both > 10
INDEPENDENT – Yes, 10(10904) = 109040 and there are more than 109040 U.S. college students
1propzint(2105, 10904, .99) OR
𝑝̂ =
2105
10904
= .193
. 193 ± 2.576οΏ½
.193(.807)
10904
(. 183, .203)
We are 99% confident that the interval from .183 to .203 captures the true proportion of U.S.
college students who are classified as abstainers.
b) A newspaper article claims that 25% of U.S. college students are nondrinkers. Use your result
from (a) to comment on this claim.
The value of 25% (.25) does not appear in our 99% confidence interval so it would be very
surprising if the claim were correct. It could be in the 1%.
c) Describe a possible source of error that is not included in the margin of error for the 99%
confidence interval.
Response bias: whether students told the truth or not
15. A random sample of students who took the SAT college entrance examination twice found that 427 of
the respondents had paid for coaching courses and that the remaining 2733 had not. Construct and
interpret a 99% confidence interval for the proportion of coaching among students who retake the SAT.
Follow the process.
Parameter: true proportion of students who got coaching to retake the SAT
99% confidence z* = 2.576
one-sample z interval for p if conditions are met
RANDOM – stated random sample
NORMAL – 427 successes, 2733 failures; both > 10
INDEPENDENT – Yes, 10(3160) = 31600 and there are more than 31600 students who take the SAT
twice
1propzint(427, 3160, .99) OR
𝑝̂ =
427
3160
= .135
. 135 ± 2.576οΏ½
.135(.865)
3160
(. 119, .151)
We are 99% confident that the interval from .119 to .151 captures the true proportion of students taking
the SAT twice who receive coaching.
16. A television news program conducts a call-in poll about a proposed city ban on handgun ownership. Of the
2372 calls, 1921 oppose the ban. The station, following recommended practice, makes a confidence
statement: β€œ81% of the Channel 13 Pulse Poll sample opposed the ban. We can be 95% confident that the
true proportion of citizens opposing a handgun ban is within 1.6% of the sample result.”
a) Is the station’s quoted 1.6% margin of error correct? Explain.
Mathematically it is correct (1propzint(1921, 2372, .95) = (.79407, .82566); .03159/2 = .015795),
but the RANDOM condition was not met so they should not calculate a confidence interval.
b) Is the station’s conclusion justified? Explain.
NO. This was a voluntary response sample so no inference should be made.
17. PTC is a substance that has a strong bitter taste for some people and is tasteless for others. The ability
to taste PTC is inherited. About 75% of Italians can taste PTC, for example. You want to estimate the
proportion of Americans who have at least one Italian grandparent and who can taste PTC.
a) How large a sample must you test to estimate the proportion of PTC tasters within 0.04 with 90%
confidence? Answer this question using the 75% estimate as the guessed value for 𝑝̂ .
𝑝�(1βˆ’π‘οΏ½)
𝑀𝐸 β‰₯ 𝑧 βˆ— οΏ½
.75(.25)
. 04 β‰₯ 1.645οΏ½
𝑛
𝑛
𝑛 β‰₯ 317.42
A sample size of 318 is needed
b) Answer the question in part (a) again, but this time use the conservative guess 𝑝̂ = 0.5. By how
much do the two sample sizes differ?
𝑝�(1βˆ’π‘οΏ½)
𝑀𝐸 β‰₯ 𝑧 βˆ— οΏ½
𝑛
OR 𝑛 β‰₯ οΏ½
π‘§βˆ—
2𝑀𝐸
οΏ½
2
𝑛β‰₯οΏ½
1.645 2
οΏ½
2(.04)
423 βˆ’ 318 = 105, the samples differ by 105,
𝑛 β‰₯ 422.816
A sample size of 423 is needed
18. Gloria Chavez and Ronald Flynn are the candidates for mayor in a large city. We want to estimate the
proportion p of all registered voters in the city who plan to vote for Chavez with 95% confidence and a
margin of error no greater than 0.03. How large a random sample do we need? Show your work.
𝑝�(1βˆ’π‘οΏ½)
𝑀𝐸 β‰₯ 𝑧 βˆ— οΏ½
𝑛
OR 𝑛 β‰₯ οΏ½
π‘§βˆ—
2𝑀𝐸
οΏ½
2
𝑛β‰₯οΏ½
1.96
οΏ½
2(.03)
2
𝑛 β‰₯ 1067.11
A sample size of 1068 is needed
19. The body mass index (BMI) of all American young women is believed to follow a Normal distribution with a
standard deviation of about 7.5. How large a sample would be needed to estimate the mean BMI πœ‡πœ‡ in this
population to within ±1 with 99% confidence? Show your work.
𝑀𝐸 β‰₯ 𝑧 βˆ—
𝜎
βˆšπ‘›
OR 𝑛 β‰₯ οΏ½
𝑧 βˆ—πœŽ 2
𝑀𝐸
οΏ½
𝑛β‰₯οΏ½
2.576(7.5) 2
οΏ½
1
𝑛 β‰₯ 373.262
A sample size of 374 is needed
𝑛 β‰₯ 173.186
A sample size of 174 is needed
20. Biologists studying the healing of skin wounds measured the rate at which new cells closed a razor cut
made in the skin of an anesthetized newt. Suppose you know that the standard deviation of healing rates
for this species of newt is 8 micrometers per hour. How large a sample would enable you to estimate the
mean healing rate of skin wounds in newts within a margin of error of 1 micrometer per hour with 90%
confidence?
𝑀𝐸 β‰₯ 𝑧 βˆ—
𝜎
βˆšπ‘›
OR 𝑛 β‰₯ οΏ½
𝑧 βˆ—πœŽ 2
𝑀𝐸
οΏ½
𝑛β‰₯οΏ½
1.645(8) 2
1
οΏ½
21. What critical value t* from Table B would you use for a confidence interval for the population mean in
each of the following situations?
a) A 95% confidence interval based on n = 10 observations.
95% 𝑛 = 10 𝑑𝑓 = 9 𝑑 βˆ— = 2.262
b) A 99% confidence interval from an SRS of 20 observations.
99% 𝑛 = 20 𝑑𝑓 = 19 𝑑 βˆ— = 2.861
22. A medical study finds that π‘₯Μ… = 114.9 and sx = 9.3 for the seated systolic blood pressure of the 27
members of one treatment group. What is the standard error of the mean?
𝑠π‘₯
βˆšπ‘›
9.3
√27
1.7898
In repeated sampling, the average distance between sample means and the population
mean will be about 1.7898 units.
23. Computers in some vehicles calculate various quantities related to performance. One of these is fuel
efficiency, or gas mileage, usually expressed as miles per gallon (mpg). For one vehicle equipped in this
way, the miles per gallon were recorded each time the gas tank was filled and the computer was then
reset. Here are the mpg values for a random sample of 20 of these records:
15.8
13.6
15.6
19.1
22.4
15.6
22.5
17.2
19.4
22.6
19.4
18.0
14.6
18.7
21.0
14.8
22.6
21.5
14.3
20.9
Construct and interpret a 95% confidence interval for the mean fuel efficiency πœ‡πœ‡ for this vehicle.
Parameter: true mean fuel efficiency for this vehicle
95% confidence one-sample t interval for πœ‡πœ‡ if conditions are met
RANDOM – stated random sample
NORMAL – boxplot does not show strong skewness or outliers
INDEPENDENT – Yes, 10(20) = 200 and there are most likely more than 200 fill-ups
π‘₯Μ… = 18.48 𝑠π‘₯ = 3.116 95% 𝑛 = 20 𝑑𝑓 = 19 𝑑 βˆ— = 2.093
3.116
𝑠
(17.022, 19.938)
tint(18.48, 3.116, 20, .95) OR
π‘₯Μ… ± 𝑑 βˆ— π‘₯ 18.48 ± 2.093
βˆšπ‘›
√20
We are 95% confident that the interval from 17.022 to 19.938 captures the true mean mpg for this
vehicle.
24. What critical value t* from Table B would you use for a 99% confidence interval for the population mean
based on an SRS of size 58? If possible, use technology to find a more accurate value of t*. What
advantage does the more accurate df provide?
π‘‘π‘’π‘β„Žπ‘›π‘œπ‘™π‘œπ‘”π‘¦: 𝑖𝑛𝑣𝑑(. 005, 57) = 2.66487
99% 𝑛 = 58 𝑑𝑓 = 57 𝑒𝑠𝑒 𝑑𝑓 = 50 𝑑 βˆ— = 2.678
Using exact df gives a smaller t* and therefore a shorter interval with the same level of confidence.
25. In each of the following situations, discuss whether it would be appropriate to construct a one-sample t
interval to estimate the population mean.
a) We collect data from a random sample of adult residents in a state. Our goal is to estimate the
overall percent of adults in the state who are college graduates.
NO, the goal is to estimate a population proportion, not a population mean.
b) The coach of a college men’s basketball team records the resting heart rates of the 15 team
members. We use these data to construct a confidence interval for the mean resting heart rate
of all male students at this college.
NO, the sample was team members, not all male students at the college.
c) We want to estimate the average age at which U.S. presidents have died. So we obtain a list of all
U.S. presidents who have died and their ages at death.
NO, we don’t need to estimate, we have the data to calculate it.
d) How much time do students spend on the Internet? We collect data from the 32 members of our
AP Statistics class and calculate the mean amount of time that each student spent on the
Internet yesterday.
NO, the sample was AP Statistics students, not all students.
26. One reason for using a t distribution instead of the standard Normal curve to find critical values when
calculating a level C confidence interval for a population mean is that
a) z can be used only for large samples.
b) z requires that you know the population standard deviation 𝜎𝜎.
c) z requires that you can regard your data as an SRS from the population
d) the standard Normal table doesn’t include confidence levels at the bottom.
e) a z critical value will lead to a wider interval than a t critical value.
27. A quality control inspector will measure the salt content (in milligrams) in a random sample of bags of
potato chips from an hour of production. Which of the following would result in the smallest margin of
error in estimating the mean salt content πœ‡πœ‡?
a) 90% confidence; n = 25
b) 90% confidence; n = 50
c) 95% confidence; n = 25
d) 95% confidence; n = 50
e) n = 100 at any confidence level
For 28-30 go through the entire process for constructing a confidence interval:
1. Identify the parameter and confidence level
2. Identify appropriate inference method. Check conditions.
3. If the conditions are met, perform calculations
4. Interpret the interval in the context of the problem
28. The scores of a certain population on the Wechsler Intelligence Scale for Children (WISC) are thought
to be Normally distributed with mean πœ‡πœ‡ and standard deviation 𝜎𝜎 = 10. A simple random sample of 25
children from this population is taken and each is given the WISC. The mean of the 25 scores is π‘₯Μ… =
104.32. Based on this data, construct a 95% confidence interval for πœ‡πœ‡.
Parameter and Confidence Level: true population mean score on WISC @ 95% confidence
Method and Conditions: 𝜎𝜎 is known so one-sample z interval for πœ‡πœ‡ if conditions are met
RANDOM – stated SRS of n = 25
NORMAL – population is Normally distributed so sampling distribution will be Normal
INDEPENDENT – Yes, 10(25) = 250 and there are more than 250 children
10
𝜎
Calculations: zint(10, 104.32, 25, .95) OR π‘₯Μ… ± 𝑧 βˆ—
104.32 ± 1.96
104.32 ± 3.92 (100.4, 108.24)
√25
βˆšπ‘›
Interpret: We are 95% confident that the interval from 100.4 to 108.24 captures the true mean score on
the WISC.
29. Scores on the Math SAT (SAT-M) are believed to be normally distributed with mean πœ‡πœ‡. The scores of a
random sample of three students who recently took the exam are 550, 620, and 480. Construct a 95%
confidence interval for πœ‡πœ‡ based on this data.
Parameter and Confidence Level: true population mean score on SAT-M @ 95% confidence
Method and Conditions: 𝜎𝜎 is unknown so one-sample t interval for πœ‡πœ‡ if conditions are met
RANDOM – stated SRS of n = 3
NORMAL – population is Normally distributed so sampling distribution will be Normal
INDEPENDENT – Yes, 10(3) = 30 and there are more than 30 students taking the SAT-M
70
𝑠
Calculations: tint(550, 70, 3, .95) OR π‘₯Μ… ± 𝑑 βˆ— π‘₯
550 ± 4.303
550 ± 173.904 (376.096, 723.904)
βˆšπ‘›
√3
Interpret: We are 95% confident that the interval from 376.096 to 723.904 captures the true mean
score on the SAT-M.
30. In the fiscal year 1996, the U.S. Agency for International Development provided 238,300 metric tons of
corn soy blend (CSB) for development programs and emergency relief in countries throughout the world.
CSB is a highly nutritious, low-cost fortified food that is partially precooked and can be incorporated into
different food preparations by the recipients. As part of a study to evaluate appropriate vitamin C levels
in this commodity, measurements were taken on samples of CSB produced in a factory. The following data
are the amounts of vitamin C, measured in milligrams per 100 grams (mg/100g) of blend (dry basis), for a
random sample of size 8 from a production run:
26
31
23
22
11
22
14
31
Construct a 99% confidence interval for πœ‡πœ‡.
Parameter and Confidence Level: true population mean amount of Vitamin C (in mg/100g) @ 99%
confidence
Method and Conditions: 𝜎𝜎 is unknown so one-sample t interval for πœ‡πœ‡ if conditions are met
RANDOM – stated random sample of n = 8
NORMAL – boxplot shows no outliers and no strong skewness
INDEPENDENT – Yes, 10(8) = 80 and it is reasonable to assume there are more than 80 samples of
CSB produced in a production run
Calculations:
π‘₯Μ… = 22.5 𝑠π‘₯ = 7.1926 99% 𝑛 = 8 𝑑𝑓 = 7 𝑑 βˆ— = 3.499
7.1926
𝑠
tint(22.5, 7.1926, 8, .99) OR π‘₯Μ… ± 𝑑 βˆ— π‘₯
22.5 ± 3.499
22.5 ± 8.897 (16.603, 31.397)
βˆšπ‘›
√8
Interpret: We are 99% confident that the interval from 16.603 mg/100mg to 31.397 mg/100mg captures
the true mean amount of Vitamin C (in mg/100mg) of CSB.