Download AP Statistics - Somerset Independent Schools

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical inference wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
1
AP Statistics – Unit 4 Concepts (Inference)
(Chapter 8, 9, 10, 11, 12)






I can identify a point estimator to help estimate an unknown parameter.
I can correctly interpret the meaning of the margin of error in context.
I can understand that a confidence interval gives a range of plausible values for the parameter.
I can interpret a confidence level and interval in context.
I can understand why each of the three inference conditions—Random, Normal, and
Independent—is important.
I can understand how confidence level or sample size will affect the margin of error.
I can construct and interpret a confidence interval for a population proportion or mean.
I can determine critical values for calculating a confidence interval using a table or your calculator.
I can explain how practical issues like nonresponse, under coverage, and response bias can affect
the interpretation of a confidence interval.
I can carry out the steps in constructing a confidence interval for a population proportion: define
the parameter; check conditions; perform calculations; interpret results in context.
I can determine the sample size required to obtain a level C confidence interval for a population
proportion or mean with a specified margin of error.
I can understand how the margin of error of a confidence interval changes with the sample size
and the level of confidence C.
I can carry out the steps in constructing a confidence interval for a population mean: define the
parameter; check conditions; perform calculations; interpret results in context.
I can determine sample statistics from a confidence interval.
I can correctly identify the parameter of interest for a hypothesis test.
I can state correct hypotheses for a significance test about a population proportion or mean.
I can interpret P-values in context.
I can interpret a Type I error and a Type II error in context, and give the consequences of each.
I can understand the relationship between the significance level of a test, P(Type II error), and
power.
I can check conditions for carrying out a test about a population proportion or mean.
I can recognize that if conditions are met, conduct a significance test about a population
proportion or mean.
I can use a confidence interval to draw a conclusion for a two-sided test about a population
proportion.
I can use a confidence interval to draw a conclusion for a two-sided test about a population mean.
I can recognize when a confidence interval is not needed for estimations.
I can recognize paired data and use one-sample t procedures to perform significance tests for such
data.
I can describe the characteristics of the sampling distribution of pˆ1  pˆ 2

I can calculate probabilities using the sampling distribution of pˆ1  pˆ 2





I can determine whether the conditions for performing inference are met.
I can construct and interpret a confidence interval to compare two proportions.
I can perform a significance test to compare two proportions.
I can interpret the results of inference procedures in a randomized experiment.
I can describe the characteristics of the sampling distribution of x1  x2




















2

I can calculate probabilities using the sampling distribution of x1  x2








I can determine whether the conditions for performing inference are met.
I can use two-sample t procedures to compare two means based on summary statistics.
I can use two-sample t procedures to compare two means from raw data.
I can interpret standard computer output for two-sample t procedures.
I can perform a significance test to compare two means.
I can check conditions for using two-sample t procedures in a randomized experiment.
I can interpret the results of inference procedures in a randomized experiment.
I can know how to compute expected counts, conditional distributions, and contributions to the
chi-square statistic.
I can check the Random, Large sample size, and Independent conditions before performing a chisquare test.
I can use a chi-square goodness-of-fit test to determine whether sample data are consistent with a
specified distribution of a categorical variable.
I can examine individual components of the chi-square statistic as part of a follow-up analysis.
I can check the Random, Large sample size, and Independent conditions before performing a chisquare test.
I can use a chi-square test for homogeneity to determine whether the distribution of a categorical
variable differs for several populations or treatments.
I can interpret computer output for a chi-square test based on a two-way table.
I can examine individual components of the chi-square statistic as part of a follow-up analysis.
I can show that the two-sample z test for comparing two proportions and the chi-square test for a
2-by-2 two-way table give equivalent results.
I can check the Random, Large sample size, and Independent conditions before performing a chisquare test.
I can use a chi-square test of association/independence to determine whether there is convincing
evidence of an association between two categorical variables.
I can interpret computer output for a chi-square test based on a two-way table.
I can examine individual components of the chi-square statistic as part of a follow-up analysis.
I can distinguish between the three types of chi-square tests.
I can check conditions for performing inference about the slope 𝜷 of the population regression
line.
I can interpret computer output from a least-squares regression analysis.
I can construct and interpret a confidence interval for the slope 𝜷 of the population regression
line.
I can perform a significance test about the slope 𝜷 of a population regression line.
I can use transformations involving powers and roots to achieve linearity for a relationship
between two variables.
I can make predictions from a least-squares regression line involving transformed data.
I can use transformations involving logarithms to achieve linearity for a relationship between two
variables.
I can make predictions from a least-squares regression line involving transformed data.
I can determine which of several transformations does a better job of producing a linear
relationship.
I can determine the proper inference procedure to use in a given setting.























3
CLIFF NOTES: AP Statistics – Exam Review (Inference Review)
Overview of Chapters 8, 9, 10, 11, 12
Some Key Vocabulary:
 Understand the difference between a statistic and a parameter
̅, 𝒔𝒙 , 𝒑
̂
o STATISTICS: 𝒙
o PARAMETERS: μ, σ P
 Null hypothesis versus alternative hypothesis; One versus two-sided alternative
Major Concepts to be mastered:
#1. You need to know the difference between a population parameter, a sample statistic, and the
sampling distribution of a statistic
a. In sample proportions, what do we represent the population parameter as?
ANSWER: p-hat
b. Be familiar with the formulas to find the mean and standard deviations of a sampling
distribution of (p-hat).
𝒑(𝟏 − 𝒑)
𝝈𝑷̂ = √
; 𝑵𝑶𝑻𝑬: 𝟏 − 𝒑 = 𝒒
𝒏
c. Know the formula to find the standard deviation of a sampling distribution for a mean.
𝒔𝒙
𝒔𝒙̅ =
√𝒏
#2. When can you determine if a sample is large enough to assume that sampling distribution is
approximately normal?
FOR MEANS: If the sample size is greater than 30, we can assume an approximately normal
distribution (normality) thanks to the Central Limit Theorem.
FOR PROPORTIONS: you must show normality by doing the tests:
n * p ≥ 10; n * q ≥ 10; note: q = 1-p
Inference Review
You must be able to decide which statistical inference procedure is appropriate in a given setting.
Working lots of review problems will help you.
#3. On any hypothesis testing problem:
Textbook refers to this as the FOUR STEP PROCESS, we use the acronym: PHANTOMS:
1. P/H: State the parameter of interest in the context of the problem. State hypotheses in words
and symbols.
2. A/N: Identify the correct inference procedure and verify assumptions/conditions for using it.
3. T/O: Calculate the test statistic and the P-value (or rejection region).
4. M/S: Draw a conclusion in the context of the problem that is directly linked to your P-value or
rejection region.
TIPS:
 State your hypotheses in terms of population parameters, NOT ON SAMPLE STATISTICS!!
 Use the standard notation in your hypotheses: μ for a population mean; ‘p’ for a population
proportion, or 𝜷 of the slope of a regression line.
 Don’t reverse the NULL and ALTERNATIVE hypotheses. Remember, the null hypotheses is
basically a statement of ‘no effect’ of ‘no difference.’ If you hope to show that there is a
difference been TWO POPULATION MEANS, then the null hypotheses should be that the
population MEANS ARE EQUAL!
 It is not good enough to state the conditions/assumptions for the chosen inference procedure.
You must show that the conditions/assumptions are indeed satisfied.
4
#4. On any confidence interval problem:
STEPS (P.A.N.I.C.):
1. P: Identify the population of interest and the parameter you want to draw conclusions about.
2. A/N: Choose the appropriate inference procedure and verify assumptions/conditions for its
use.
3. I: Carry out the inference procedure.
4. C: Interpret your results in the context of the problem.
#5. You need to know the specific conditions required for the validity of each statistical inference
procedure -- confidence intervals and significance tests.
They are:
 RANDOM; INDEPENDENT; NORMAL
Introduction to Inference
#6. Be sure to have a clear understanding as to what a confidence interval tells us.
“We are ________% confident that the interval (_____, _____) captures
the true mean/proportion/difference of ___________________.”
#7. What is z*?
ANSWER: z-score upper ‘p’ critical value.
#8. Understand what margin of error tells us.
ANSWER: Shows us how accurate we believe our ‘guess’ is going to be.
#9. Be able to understand what happens to the margin error as
z* (OR t*) decreases
m.e. decreases
σ decreases.
m.e. decreases
n decreases
m.e. increases
By how many times must the sample size n increase in order to cut the margin
of error in half?
Multiply by 4
#10. Know what happens in regards to the null hypothesis when the p-value is both large and small.
LARGE: Fail to reject Ho
SMALL: reject Ho
#11. What does a test statistic estimate?
ANSWER: It says how far our statistic is from the parameter. It is often a measures of
standard deviation. It’s a distance from the mean…
#12. Be able to understand the difference between a Type I and Type II Error. Be able to calculate the
probability of a Type I Error
TYPE I: When we reject the null hypothesis when we should not
P(Type I) = alpha α
TYPE II: When we fail to reject the null hypothesis when we should reject it.
P(Type II) = beta β
5
#13. Know how to calculate the power of a test.
Power of a test:
 Probability of correctly rejecting a null hypothesis
Power = 1 - P(Type II error).
 You can increase the power of a test by increasing the sample size or increasing the
significance level (the probability of a Type I error).
t-distributions
#14. When do we use‘s’ as an estimate of σ?
Virtually always…. We do this when we do not know the true value of standard deviation,
which is essentially always.
𝒔
#15. What is the standard error of the sample mean (x-bar)? 𝒔𝒙̅ = 𝒙𝒏
√
#16. What are some differences between a standard normal distribution and a t-distribution?
ANSWER: a ‘z’ distribution is perfectly normal; whereas a ‘t’ distribution is only approximately
normal. A ‘t’ distribution has more spread, variation, and variability.
#17. What will happen to the t-distribution as the degrees of freedom increases?
ANSWER:
df = n – 1
As the df increases, the distribution will get closer and closer to being normal. (variation of
the CENTRAL LIMIT THEOREM).
#18. In a matched pair’s t-procedure, what is the parameter of interest (μ)?
ANSWER: The parameter of the interest is the DIFFERENCE BETWEEN the two MEANS:
𝑯𝒐 : 𝝁𝑫 = 𝟎, this means that there is NO DIFFERENCE between the two means, thus the letter
‘D’.
#19. Know the assumptions for a t-distribution.
RANDOM, INDEPENDENCE, NORMAL
#20. How are two sample problems different from one-sample problems?
 You must ask the question about whether the two samples are INDEPENDENT of each other or
not. If they are NOT independent, then you are probably dealing with a matched pairs
scenario.
 ALSO: When you are doing the tests for proportions, you must do it for BOTH proportions.
Comparing Two Means
#21. Know the assumptions for comparing two means.
RANDOM, INDEPENDENCE, NORMAL
#22. Do the two sample sizes have to be the same for comparing two means?
NO!!!, but they can be.
For MATCHED PAIRS, they will be the same.
Just because sample sizes are the same, don’t ALWAYS assume its matched pairs.
#23. What is the null hypothesis when comparing two means?
𝑴𝑬𝑨𝑵𝑺: 𝑯𝒐 : 𝝁𝟏 = 𝝁𝟐 ;
𝑷𝑹𝑶𝑷𝑶𝑹𝑻𝑰𝑶𝑵𝑺: 𝑯𝒐 : 𝒑𝟏 = 𝒑𝟐
#24. How do you standardize for comparing two means?
t
 x1  x2    1  2 
s12 s22

n1 n2
6
Inference for Proportions
#25. Know how to calculate the mean and standard deviation of a sample proportion.
Formulas are both featured in the formula booklet provided on all tests and on the AP EXAM.
#26. Know how to calculate the standard error of 𝑝̂ . SEE ABOVE
#27. Know the assumptions that must be met in order to use z-procedures for inference about a
proportion.
RANDOM, INDEPENDENCE, NORMAL
#28. Know how to calculate the test statistic and margin of error for a one sample proportion.
𝒛=
̂−𝒑
𝒑
𝒑(𝟏−𝒑)
√
𝒏
; Margin of error =
̂(𝟏−𝒑
̂)
𝒑
∗
√
𝒛
𝒏
Chi-Square Distributions
#29. What is a chi-square distribution? What is the shape of a chi-square distribution?
ANSWER: We use chi-square tests when we are dealing with CATEGORICAL data/variables.
A chi-square distribution is ALWAYS skewed to the right. It is NOT a NORMAL distribution.
#30. Know how to state the null and alternative hypotheses for a goodness of fit test.
ANSWER: There are variety of ways, depending on whether it is a:
 CHI-SQUARE test for Goodness of Fit
 CHI-SQUARE test for Independence
 CHI-SQUARE test for Homogeneity
#31. Know the assumptions that must be met in order to use a goodness of fit test.
RANDOM, INDEPENDENCE, LARGE SAMPLE SIZE
The NORMAL condition is replaced by the ‘LARGE SAMPLE SIZE’ condition, which asks whether all the
expected counts are bigger than 5 or not.
#32. Know how to calculate the expected count in any cell of a two-way table when the null hypothesis
is true.
𝒓𝒐𝒘 𝒕𝒐𝒕𝒂𝒍 𝒙 𝒄𝒐𝒍𝒖𝒎𝒏 𝒕𝒐𝒕𝒂𝒍
𝑬𝒙𝒑𝒆𝒄𝒕𝒆𝒅 𝒄𝒐𝒖𝒏𝒕𝒔 =
𝒕𝒂𝒃𝒍𝒆 𝒕𝒐𝒕𝒂𝒍
#33. Know how to calculate the degrees of freedom for a two-way table in a chi-square test.
df = (r – 1 )(c – 1), for a matrix
Otherwise: df = n – 1, where ‘n’ is the number of
categories, NOT the number of individuals in the sample.
Inference for the Regression Setting
#34. When dealing with inference in the regression setting, what are the unknown parameters?
SLOPE: 𝜷, which is a parameter. The sample statistic that estimates the slope is of course, ‘b’.
#35. How do you calculate the degrees of freedom in the regression setting?
df = n – 2, because there are TWO variables now instead of just 1.
7
#36. What are the assumptions for inference in the regression setting?
L.I.N.E.R.:
 Relationship between the data should be LINEAR
 INDEPENDENCE
 NORMALITY
 EQUAL VARIANCE
 RANDOM
#37. How do you calculate the test statistic in the regression setting?
𝒃
𝒕=
𝑺𝑬𝒃
#38. How do you calculate a confidence interval in the regression setting?
𝒃 ± 𝒕∗ 𝑺𝑬𝑩
#39. What is the null hypothesis going to be in the regression setting?
𝑯𝒐 : 𝑻𝒉𝒆𝒓𝒆 𝒊𝒔 𝑵𝑶 𝒂𝒔𝒔𝒐𝒄𝒊𝒂𝒕𝒊𝒐𝒏/𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏𝒔𝒉𝒊𝒑..
8
TEN BASELINE QUESTIONS:
#1. Concept:


I can understand how confidence level or sample size will affect the margin of error.
I can interpret a confidence level and interval in context.
I can correctly interpret the meaning of the margin of error in context.

Based on a survey of a random sample of 500 adults in the United States, a statistician reports
that 55 percent of adults in the United States are in favor of increasing the minimum hourly
wage.
 If the reported percent has a margin of error of 5.7 percentage points, what is the
closest to the level of confidence?
#2. Concept:
 I can identify a point estimator to help estimate an unknown parameter.
A large sample 95 percent confidence interval for the proportion of airline tickets that are
canceled on the intended arrival day is (0.028, 0.106). What is the point estimate for the
proportion of airline tickets that are canceled from which this interval was constructed?
#3. Concept:
 I can carry out the steps in constructing a confidence interval for a population mean: define the
parameter; check conditions; perform calculations; interpret results in context.
When using a one-sample t-procedure to construct a confidence interval for the mean of a
finite population, there is a condition known as the 10% condition. Explain this condition and
what the reason for the condition is designed to ensure?
9
#4. Concept:
 I can construct and interpret a confidence interval for a population proportion or mean.
A random sample of 150 students at a large high school resulted in a 80 percent confidence
interval for the mean number of hours of studying per day of (0.75, 2.8). What best
summarizes the meaning of this confidence interval?
“We are ________ confident that the interval : _______________________
captures the _____________ _________________________________________.”
“About _______ of all _______________________ of size ________________
from this population would result in a(n) __________ confidence interval that
would cover the
______________________________________________________________. “
#5. Concept:
 I can determine the sample size required to obtain a level C confidence interval for a population
proportion or mean with a specified margin of error.
In 2006 a survey of Internet usage found that 68 percent of adult’s age 18 years and older in the
United States use the Internet. A broadband company believes that the percent is greater now
that it was in 2006 and will conduct a survey.
The company plans to construct a 90 percent confidence interval to estimate the current
percent and wants the margin of error to be no more than 4.5 percentage points. Assuming
that at least 68 percent of adults use the Internet, what inequality should be used to find the
sample size (n) needed?
10
#6. Concept:

I can recognize paired data and use one-sample t procedures to perform significance tests for such
data.
I can determine the proper inference procedure to use in a given setting.

Suppose you want to do a hypothesis test for two sets of data that are independent from each
other, what type of test would you do?
#7. Concept:
 I can interpret P-values in context.
A university researcher conducted a two-tailed hypothesis test on a set of data and obtained a
p-value of 0.32. If the experimenter had conducted a one-tailed test on the same set of data,
what would be some possible p-value(s) that the researcher could have obtained?
#8. Concept:


I can recognize paired data and use one-sample t procedures to perform significance tests for such
data.
I can determine the proper inference procedure to use in a given setting.
What type of inference procedure would be used to determine if there is a relationship
between the type of car a person drives and the driver’s gender?
What type of inference procedure would be done where a group of randomly selected subjects
take a sleeping pill one week and then a week later take another sleeping pill and afterwards
the results of the two sleeping pills are compared?
11
#9. Concept:
 I can construct and interpret a confidence interval for the slope 𝜷 of the population regression
line.
12
#10. Concept: (from the 2011 AP Stats Exam)
 I can interpret P-values in context.
High cholesterol levels in people can be reduced by exercise, diet, and medication. Twenty
middle-aged males with cholesterol readings between 220 and 240 milligrams per deciliter
(mg/dL) of blood were randomly selected from the population of such male patients at a large
hospital. Ten of the 20 males were randomly assigned to group A, advised on appropriate
exercise and diet, and also received a placebo. The other 10 males were assigned to Group B,
received the same advice on appropriate exercise and diet, but received a drug intended to
reduce cholesterol instead of a placebo. After three months, post treatment cholesterol
readings were taken for all 20 males and compared to pretreatment cholesterol readings. The
tables below give the reduction in cholesterol level (pretreatment reading minus post
treatment reading) for each male in the study.
a. Do the data provide convincing evidence at the α = 0.01 level, that the cholesterol drug is
effective in producing a reduction in mean cholesterol level beyond that produced by
exercise and diet?
b. Interpret what this p-value measures in the context of this study.
c. Based on this p-value and study design, what conclusion should be drawn in the context of this
study? Use a significance level of 𝜶 = 0.01.
d. Based on your conclusion in part (b), which type of error, Type I or Type II, could have been
made? What is ONE potential consequence of this error?
13
#11. BASELINE: CONCEPT:
I can use a chi-square to determine whether sample data are consistent with a specified distribution
of a categorical variable.
A few weeks before the senatorial election between Senator Smirk and his challenger, former Governor
Graft, the senator’s polling organization wants to know where he should concentrate his campaigning.
They take simple random samples of potential voters in the southern and northern portions of the state,
and ask them if they have decided who to vote for or are still undecided. Here are the results:
Decided on a candidate
REGIONS: NORTH 116
SOUTH 148
TOTALS 264
Still Undecided
60
52
112
TOTALS
176
200
376
a. Do these data provide convincing evidence that there is a difference in the distribution of voters
who have decided or are still undecided in the two regions?
b. The pollsters are concerned that while all 200 people in the ‘south’ sample responded, 24
people (out of the original SRS of 200) in the ‘north’ sample did not respond. Is it possible that
the opinions of these people would change the pollsters’ conclusions? What type of error might
have been made?
14
FREE RESPONSE #1: “Women & Time”
A New York Times poll on women’s issues interviewed 1025 women randomly selected from the United
States, excluding Alaska and Hawaii. The poll found that 47% of the women said they do not get enough
time for themselves.
a. Construct and interpret a 90% confidence interval that estimates the proportion of women in
the United States who do not feel that they get enough time for themselves.
b. Suppose this poll was conducted by telephone calls made from 9 AM to 5 PM. Explain how
using this method might result in biased results, and speculate about the direction of the bias
on whether the 47% was an overestimate or underestimate of the true proportion.
c. Which type of error was made in this sampling procedure? Sampling error or non-sampling
error?
15
FREE RESPONSE #2: “Fuel Efficiency”
National Fuelsaver Corporation manufactures the Platinum Gasver, a device they claim “may increase
gas mileage by 30%.” Here are the percent changes in gas mileage for 15 identical, randomly selected
vehicles, as presented in one of the company’s advertisements:
-2.4
6.9
10.4
10.8
24.8
28.7
33.7
34.6
38.5
28.7
40.2
44.6
46.8
46.9
48.3
̅ = 𝟐𝟗. 𝟒𝟑 and the sample standard deviation is s = 16.23. Calculate and
a. The sample mean is 𝒙
interpret the standard error of the mean for these data.
b. Construct and interpret a 90% confidence interval to estimate the mean change (in percent) in
gas mileage. Does this data support the company’s claim? Explain.
16
FREE RESPONSE #3: “Big Box Electronics”
Big Box Electronics, a large national chain store, has one store in the city of Kingston. One factor in
deciding whether to build a second store in the city is whether the current store is serving all residents
equally well, or whether unequal proportions of residents from different parts of town are using the
store because its located on one side of town. The national managers of Big Box divide Kingston into
four geographical regions and determine the percentage of residents who live in each region.
Here’s what they find:
Region
NORTH
SOUTH
EAST
WEST
Percentage of the 40%
24%
22%
14%
Population
Then the managers take a simple random sample used by a higher proportion of the residents in some
parts of town and determine which part of town they come from by asking for their zip code when they
are checking out:
Region
NORTH
SOUTH
EAST
WEST
Number of
120
48
62
20
shoppers
a. Is Kingston’s only Big Box store used by a higher proportion of the residents in some parts of
the town than others? Support your answer with an appropriate statistical test.
b. Considering the decision that you made in part (a) regarding the p-value that you obtained,
what type of error might have you made? What would be a possible consequence of this
error?
17
FREE RESPONSE #4: “Distracted Driving” (from 2007 AP Stats Exam Question #5)
Researchers want to determine whether drivers are significantly more distracted while driving when
using a cell phone than when talking to a passenger in the car. In a study involving 48 people, 24 people
were randomly assigned to drive in a driving simulator while using a cell phone. The remaining 24 were
assigned to drive in the driving simulator while talking to a passenger in the simulator. Part of the
driving simulation for both groups involved asking drivers to exit the freeway at a particular exit. In the
study, 7 of the 24 cell phone users missed the exit, while 2 of the 24 talking to a passenger missed the
exit.
a. Would this study be classified as an experiment or an observational study? Provide an
explanation to support your answer.
b. State the null and alternative hypotheses of interest to the researchers.
c. One test of significance that you might consider using to answer the researchers’ question is a
two-sample z-test for proportions. State the conditions required for this test to be
appropriate. Then comment on whether each condition is met.
d. Using an advanced statistical method for small samples to test the hypotheses in part (b), the
researchers report a p-value of 0.0683. Interpret, in everyday language, what this p-value
measures in the context of this study and state what conclusion should be made based on this
p-value.
18
FREE RESPONSE #5: “Estimators” (from 2008 AP Stats Exam Question #2)
Four different statistics have been proposed as estimators of a population parameter. To investigate
the behavior of these estimators, 500 random samples are selected from a known population and each
statistic is calculated for each sample. The true value of the population parameter is 75. The graphs
below show the distribution of values for each statistic.
a. Which of the statistics appear to be unbiased estimators of the population parameter? How
can you tell?
b. Which of the statistics A or B would be a better estimator of the population parameter?
Explain your choice.
c. Which of the statistics C or D would be a better estimator of the population parameter?
Explain your choice.
19
FREE RESPONSE #6: “Name Brands Versus Generic Brands” (from 2001 AP Stats Exam Question #5)
A growing number of employers are trying to hold down the costs that they pay for medical insurance
for their employees. As part of this effort, many medical insurance companies are now requiring clients
to use generic brand medicines when filling prescriptions. An independent consumer advocacy group
wanted to determine if there was a difference, in milligrams, in the amount of active ingredient
between a certain “name” brand drug and its generic counterpart. Pharmacies may store drugs under
different conditions. Therefore, the consumer group randomly selected ten different pharmacies in a
large city and filled two prescriptions at each of these pharmacies, one for the “name” brand and the
other for the generic brand of the drug. The consumer group’s laboratory then tested a randomly
selected pill from each prescription to determine the amount of active ingredient in the pill. The results
are given in the following table:
ACTIVE INGREDIENT (in milligrams)
Pharmacy
1
2
3
4
5
6
7
8
9
10
Name Brand
245
244
240
250
243
246
246
246
247
250
Generic Brand
246
240
235
237
243
239
241
238
238
234
a. Based on these results, what should the consumer group’s laboratory report about the
difference in the active ingredient in the two brands of pills? Give appropriate statistical
evidence to support your response.
b. Consider the decision that you made in part (a). What type of error might have been made?
What might be a possible consequence of that error?
20
FREE RESPONSE #7: “Hospitals” (from 2004 AP Stats Exam Question #5)
A rural county hospital offers several health services. The hospital administrators conducted a poll to
determine whether the residents’ satisfaction with the available services depends on their gender. A
random sample of 1,000 adult county residents was selected. The gender of each respondent was
recorded and each was asked whether he or she was satisfied with the services offered by the hospital.
The resulting data are shown in the table below:
Male
Female
Total
Satisfied
384
416
800
Not Satisfied
80
120
200
464
536
1,000
Total
a. Using a significance level of 0.05, conduct an appropriate test to determine if, for adult
residents of this county, there is an association between gender and whether or not they were
satisfied with services offered by the hospital.
𝟖𝟎𝟎
b. Is 𝟏𝟎𝟎𝟎 a reasonable estimate for the proportion of all adult county residents who are satisfied
with the services offered by this hospital? Explain why or why not.
21
FREE RESPONSE #8: “Foot Length”
Can we predict the heights of school-aged children from foot length? Below is computer output from a
regression analysis of this relationship for 15 randomly-selected Canadian children from 8 to 15 years
old, along with a residual plot. The explanatory variable is each child’s foot length (in centimeters), and
the response variable is the child’s height (in centimeters).
a. What is the equation of the least-squares regression line based on these data? Define any
parameters used. Interpret the slope of the regression line.
b. Assuming all conditions have been met, construct and interpret a 99% confidence interval for
the slope of the least squares regression of height on foot length.
c. If you were to perform a test of the hypotheses 𝑯𝒐 : 𝜷 = 𝟎 versus 𝑯𝒂 : 𝜷 ≠ 𝟎 at the α = 0.01
level, what would you conclude? Justify your answer by using your result in part (c).
22
FREE RESPONSE #9: “Flu Vaccine” (from 2011 AP Stats Exam Question #5)
During a flu vaccine shortage in the United States, it was believed that 45 percent of vaccine-eligible
people received flu vaccine. The results of a survey given to a random sample of 2,350 vaccine-eligible
people indicated that 978 of the 2,350 people had received flu vaccine.
a. Construct and interpret a 99 percent confidence interval for the proportion of vaccine-eligible
people who had received flu vaccine. Use your confidence interval to comment on the belief
that 45 percent of the vaccine eligible people had received flu vaccine.
b. Suppose a similar survey will be given to vaccine eligible people in Canada by Canadian health
officials. A 99 percent confidence interval for the proportion of people who will have received
the flu vaccine is to be constructed. What is the smallest sample size that can be used to
guarantee that the margin of error will be less than or equal to 0.02?
23
FREE RESPONSE #10: “Customer Satisfaction” (from 2010 AP Stats Exam Question #6)
An automobile company wants to learn about customer satisfaction among the owners of five specific
car models. Large sales volumes have been recorded for three of the models, but the other two models
were recently introduced so their sales volumes are smaller. The number of new cars sold in the last six
months for each of the models is shown in the table below:
Car Model
Number of new cars sold in
the last six months
A
B
C
D
E
TOTAL
112,338 96,174 83,241 3,278 2,323 297,354
The company can obtain a list of all individuals who purchased new cars in the last six months for each
of the five models shown in the table. The company wants to sample 2,000 of these owners.
a. For simple random samples of 2,000 new car owners, what is the expected value of owners of
model E and the standard deviation of the number of owners of model E?
b. When selecting a simple random sample of 2,000 new car owners, how likely is it that fewer
than 12 owners of model E would be included in the sample? Justify your answers.
c. The company is concerned that a simple random sample of 2,000 owners would include fewer
than 12 owners of model D or fewer than 12 owners of model E. Briefly describe a sampling
method for randomly selecting 2,000 owners that will ensure at least 12 owners will be
selected for each of the 5 car models.
24
FREE RESPONSE #11: “Sewing Machines” (from 2012 AP Stats Exam Question #1)
The scatterplot below displays the price in dollars and quality rating for 14 different sewing machines.
a. Describe the nature of the association between price and quality rating for the sewing
machines.
b. One of the 14 sewing machines substantially affects the appropriateness of using a linear
regression model to predict quality rating based on price. Report the approximate price and
quality rating of that machine and explain your choice.
c. Chris is interested in buying one of the 14 sewing machines. He will consider buying only those
machines for which there is no other machine that has both higher quality and lower price. On
the scatterplot reproduced below, circle all data points corresponding to machines that Chris
will consider buying.