Download Ch 1-11 Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Gibbs sampling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
AP Statistics Topics by Chapter
Chapter 1: Exploring Data
 Graphs for categorical and quantitative variables
 pie charts, bar graphs, stemplots, histograms, ogive
 patterns in distributions
 shape of distribution including roughly symmetric, skewed, or neither
 Discuss measures of center, unusual points, shape and spread
 Outliers
 relative and cumulative frequency plots
 time plots
 Measures of center
 mean vs. median
 Clusters and gaps
 quartiles, Q1, Q3
 boxplot
 Range
 5-number summary
 IQR
 Outliers
 standard deviation
 choosing which center and spread measure to use
 knowing properties of standard deviation
 The effect of changing units on summary measures including adding and multiplying
 Comparing distributions using side-by-side bar graphs, dotplots, back-to-back stemplots, parallel
boxplots, or using number summaries from computer output or calculators.
 Comparing center and spread within the group or between groups
 Comparing clusters, gaps, outliers, or any other unusual features
1. Which of the following are true statements?
I. Pie charts are useful for both categorical and quantitative data
II. Histograms are useful for small and large data
III. Histograms show the overall shape, center, and spread of the distribution of data
(A) I only
(B) II only
(C) III only
(D) I and III only
(E) II and III only
2. Which of the following is inappropriate for displaying quantitative data?
(A) Stem and leaf plot
(B) Dot plot
(C) Bar graph
(D) Box plot
(E) histogram
3. The height of Mrs. Clark’s tomato plants is what type of data?
(A) Categorical
(B) Quantitative and continuous
(C) Quantitative and discrete
(D) Categorical and Quantitative
(E) Categorical and continuous
4. The mean assessed value of homes in Southern County is $158,000 with a standard deviation of
$32,000. If the county supervisors decided to increase everyone’s assessment by $5,000, the new
mean and standard deviation would be
(A) $158,000 and $32,000
(B) $163,000 and $37,000
(C) $163,000 and $32,000
(D) $158,000 and $37,000
(E) Cannot be determined
5. The mean exam score for the second-period physics class, which had 25 students, was 87.3. The
mean exam score for the third-period physics class, which had 19 students, was 92.4. What was
the average of both classes?
(A) 89.85
(B) 89.50
(C) 90.18
(D) 91.91
(E) Cannot be determined
6. A distribution is skewed right if
(A) mean = median
(B) mean < median
(C) mean > median
(D) IQR > difference between the mean and median
(E) Cannot be determined
7. Consider the following back-to-back stem and leaf plots comparing weight gain in kilograms for
male and female horses.
Male
Female
1
7
7
2
23488
32
3
14567899
8754
4
34668
965330
5
6
6543331
6
Which of the following are true statements?
I.
II.
III.
IV.
V.
(A)
(B)
(C)
(D)
(E)
The distributions have the same number of observations
The ranges for the two distributions are the same
The means for the two distributions are the same
The medians of the two distributions are the same
The variances for the two distributions are the same
I and II
I and IV
II and V
III and V
I, II, and III
8. The following stem and leaf plot displays the ages of the presidents of the United States at the
time of their inaugurations.
4 23
4 677899
5 0111112244444
5 555566677778
6 0111244
6 589
(a) Determine if there are any outliers
(b) Make a boxplot for the data.
(c) Describe the shape of the distribution.
9. The following data represents the hours of continuous use for two brands of batteries.
Brand A: 65, 67, 69, 71, 63, 62, 70, 72, 66
Brand B: 65, 67, 67, 68, 70, 64, 64, 65, 65
Using plots and summary statistics investigate and report on the comparison of these two
batteries. Include a complete analysis of the distributions.
Chapter 2: Describing Location in a Distribution
 Percentiles
 Z-scores and Z Table
 Properties of the normal distribution
 Model for measurements
 Chebyshev’s Inequality
 Density curves
 Normal distributions
 68-95-99.7 rule
 Standard normal curve
 Nonstandard normal curves and calculations
 Assessing Normality
 Normal Probability Plots
1. The grading at Central High gives a B for grades between 86 and 93. On the English
final for seniors, what proportion of the class would get a B if the grades were normally
distributed with a mean grade of 86.34 and standard deviation of 14.23?
(A) 0.07
(B) 0.4905
(C) 0.6801
(D) 0.1896
(E) 0.0280
2. The mean GPA for Central High is 2.9, with the standard deviation of 0.5. Assuming the
GPA’s are normally distributed, what GPA score will place a student in the top 5% of the
class?
(A) 3.72
(B) 3.43
(C) 2.08
(D) 2.90
(E) 3.38
3. On Sarah’s last two biology exams, she scored an 87. The class mean on the first exam
was 75, with a standard deviation of 8.9. The class average on the second exam was 73,
with a standard deviation of 9.7. Assuming the scores on the exam were approximately
normally distributed, on which exam did Sarah score better relative to the rest of her
class?
(A) She scored better on the first exam
(B) She scored better on the second exam
(C) She scored equally on both exams
(D) It is impossible to determine because the class sizes are unknown
(E) It is impossible to determine because the correlation between the two sets of
exam scores is not provided.
4. A researcher notes that two populations of lab mice – one consisting of mice with white
fur, and one of mice with grey fur – have the same mean weight, and both have
approximately normal distributions. However, the population of white mice has a larger
standard deviation than the population of grey mice. If the weights for both of these
populations were plotted, how would the curves compare to each other?
(A) The curves would be identical
(B) The curve for the grey mice would be taller because it has a smaller standard
deviation
(C) The curve for the white mice would be taller because it has a larger standard
deviation
(D) The curve for the white mice would be taller because the population size of the
white mice is larger
(E) The curve for the grey mice would be taller because its variance is larger
5. Which of the following statements is NOT true for normally distributed data?
(A) The mean and median are equal
(B) The area under the curve is dependent upon the mean and standard deviation
(C) Almost all of the data lie within three standard deviations of the mean
(D) Approximately 68% of all of the data lies within one standard deviation of the
median
(E) When the data are normalized, the distribution has a mean  = 0 and a standard
deviation  = 1.
6. Webb is a baseball fanatic. He keeps his own statistics on the major league teams and
individual players. For the 350 regular starters, Webb has found their mean batting
average is 0.229, with a standard deviation of 0.024. His sister is appalled that baseball
players get paid the salaries they do and get a hit less than 25% of their attempts at bat.
To further her argument, she asks for the following information:
(a) What proportion of players hit more than 25% of the times they are at bat?
(b) Since the players with the top ten batting averages get cash bonuses, what is the
lowest batting average that will receive a bonus?
Chapter 3: Analyzing bivariate data
 Scatterplots – construct and interpret, analyze patterns
 Direction, shape, strength, outliers, influential points
 Correlation – calculation and properties
 Linear Regression – calculation, principles, and properties
 Least-squares regression line
 Interpret slope and y-intercept in context
 Prediction vs. extrapolation
 Residuals
 Correlation of determination – r2
 Residual plots – constructing and interpreting
 Cautions about correlation and regression
 Lurking variables
1. A perfect positive correlation means
(A) The points in the scatter diagram lie on an upward sloping line
(B) The points in a scatter diagram lie on a downward sloping line
(C) r is equal to –1
(D) r is equal to zero
(E) there is a direct cause and effect relationship between the variables
2. In a regression model, the slope represents
(A) The point where the y-axis intersects the x-axis
(B) The point where the regression line intersects the y-axis
(C) The point where the regression line intersects the x-axis
(D) The change in the response variable due to a one-unit change in the independent
variable
(E) The change in the independent variable due to a one-unit change in the response
variable
3. Are jet skis dangerous? Propelled by a stream of pressurized water, jet skis and other socalled wet bikes carry from one to three people, retail for an average price of $5700, and
have become one of the most popular types of recreational vehicle sold today. But critics
say that they’re noisy, dangerous, and damaging to the environment. An article in the
August 1997 issue of the Journal of the American Medical Association reported on a
survey that tracked emergency room visits at randomly selected hospitals nationwide.
The study recorded data on the number of jet skis in use and the number of accidents for
the years 1987–1996. Computer output and a residual plot from a linear regression
analysis of the data are shown below.
Predictor
Constant
Jetskis
S = 188.3
Coef
-0.8
0.0048308
SE Coef
109.9
0.0002292
R-Sq = 98.2%
T
-0.01
21.08
P
0.994
0.000
R-Sq(adj) = 98.0%
Residuals Versus Jetskis
(response is No. of a)
400
300
200
Residual
100
0
-100
-200
-300
-400
0
500000
1000000
Jetskis
a. What is the equation of the least-squares line? Be sure to define any variables you use.
b. Interpret the value of r 2 in the context of this problem.
c. Is a line an appropriate model for these data? Justify your answer.
d. Interpret the value of s in the context of this problem.
Chapter 4: More about Relationships between two Variables
 Transforming to achieve linearity
 Powers and logs
 Exponential models
 Power models
 Relationships between categorical variables
 Marginal distributions
 Marginal and joint frequencies for two-way tables
 Frequency tables and bar charts
 Conditional relative frequencies and association
 Comparing distributions using bar charts
 Conditional distributions
 Simpson’s paradox
 Establishing causation
 Lurking variables
 Causation
 Common response
 Confounding
1. One thousand adults were asked whether Republicans or Democrats have better domestic
economic policies. The answers were placed in the table below:
Republican
Democrat No Opinion
Totals
Male
220
340
40
600
Female
170
200
30
400
Totals
390
540
70
1000
What is the probability of choosing a Male, given he is a Republican?
(A) 22%
(B) 37%
(C) 56%
(D) 77%
(E) 58%
2. Using the information above, what is the probability of choosing a Democrat?
(A) 34%
(B) 37%
(C) 50%
(D) 54%
(E) 57%
3. Which of the following scatterplots would indicate that Y is growing exponentially over
time?
(a)
(b)
(d)
(e) none of these
(c)
4. According to the 1990 census, those states with an above-average number X of people who
fail to complete high school tend to have an above average number Y of infant deaths. In
other words, there is a positive association between X and Y. The most plausible explanation
for this is
(A) X causes Y. Programs to keep teens in school will help reduce the number of infant
deaths.
(B) Y causes X. Programs that reduce infant deaths will ultimately reduce high school
dropouts.
(C) Lurking variables are probably present. For example, states with large populations
will have both larger numbers of people who don’t complete high school and more
infant deaths.
(D) Both of these variables are directly affected by the higher incidence of cancer in
certain states.
(E) The association between X and Y is purely coincidental.
5.
An experiment was conducted to determine the effect of practice time (in seconds) on the percent
of unfamiliar words recalled. Here is a Fathom scatterplot of the results with a least-squares regression
line superimposed.
(a) Sketch a residual plot below.
(b) Does a linear model fit the data well? Justify
your answer.
We used Fathom to transform the original data in hopes of achieving linearity. The screen shots below
show the results of two different transformations.
(c) Would an exponential model or a power model fit the original data better? Justify your
answer.
(d) Use the model you chose in (c) to predict word recall for 25 seconds of practice. Show your
method.
Chapter 5: Producing Data
 Sampling: good and bad methods
 Census, Sample Survey
 Experiment, Observational study
 Voluntary response
 Convenience samples
 Simple random sample (SRS)
 Stratified random sample
 Cluster sampling, Systematic sampling, Multi-stage sampling
 Designing polls and surveys
 Undercoverage, Nonresponse, Question wording
 Potential bias
 Random number table
 Basics of experimental design – well designed and well conducted
 Subjects
 Experimental units
 Factors
 Treatments
 Explanatory and response variables
 Completely randomized design
 Control groups
 Random assignment
 Replication
 Placebo effect
 Blinding and double blinding
 Confounding
 Multi-factor experiments
 Block designs
 Matched pairs
 Population vs. sample
 Generalizability of results and types of conclusions that can be drawn from observational studies,
experiments, and surveys.
1. Ben conducts a study in which 100 subjects, randomly chosen from the population of all
students at a school, guess when 60 seconds have elapsed. He records the actual number
of seconds that have elapsed when the subjects think it has been 60 seconds. The
subjects make their guesses while listening to music. Fifty-five of Ben’s subjects choose
fast music and 45 choose slow music. What kind of study is this?
(A) Experiment, because the subjects are responding to treatments
(B) Experiment, because there is a response variable
(C) Experiment, because the subjects are randomly chosen from the population
(D) Observational study, because the participants select their own treatments
(E) Observational study, because the treatment groups are different sizes
2. Which of the following statements about observational studies is true?
I. A census is always preferable to a sample survey since it includes the entire population
II. A neutral designer of a survey has no predisposition towards any particular conclusion
can still produce biased data
III. Statistical inference is not necessary when a census is conducted properly
(A)
(B)
(C)
(D)
(E)
I only
II only
I and II
I and III
I, II, and III
3. A U.S. government researcher wants to select a sample of tax returns that will include
returns from a variety of different income levels. He divides the set of all the different
incomes shown on the forms into 10 nonoverlapping ranges, then he randomly selects
100 tax returns from each. Which of the following best describes the sampling scheme
used in this example?
(A) Stratified random sampling
(B) Simple random sampling
(C) Convenience sampling
(D) Two-stage sampling
(E) Cluster sampling
4. Which of the following is NOT a property of a large table of random digits?
(A) The table will contain, somewhere, the sequence of digits 1 2 3 4.
(B) Consecutive rows do not start with the same digit
(C) Each digit 0 through 9 occurs with equal frequency
(D) Each three-digit number 000 through 999 occurs with equal frequency
(E) The contents of one section of the table are independent of other sections of the table
5. The owner of a factory that employs half the citizens in a small town is trying to decide
whether to take a public stand on a controversial issue. He realizes that he would benefit
from knowing how the townspeople feel. He randomly selects 50 of the townspeople
from a list of all the town’s population. He personally contacts all 50 and asks them their
opinion on the issue. Most give him an answer, but 12 townspeople decline to
participate. He decides to summarize his results on the 38 responses. Which of the
following list the most significant sources of bias in this survey?
(A) Voluntary response bias and undercoverage
(B) Response bias and undercoverage
(C) Nonresponse bias and undercoverage
(D) Response bias and nonresponse
(E) Voluntary response bias and nonresponse
6. Which of the following is the least important way in which the designer of an experiment
can guard against confounding?
(A) Matching
(B) Randomization
(C) Replication
(D) Control
(E) Blocking
7. David knows that dancers are trained to spin many times without losing their ability to move
in a straight line after spinning. He wonders whether this ability is dependent on the number
of spins. He wants to design an experiment that will compare the ability of experienced
female dancers to walk a fixed distance in a straight line after 5 spins with their ability after
10 spins. Which of the following is the most appropriate design for this experiment?
(A) Completely randomized design
(B) Stratified design
(C) Randomized block design
(D) Cluster design
(E) Matched pairs design
8. Aspirin may enhance impairment by alcohol
Aspirin, a longtime antidote for the side effects of drinking, may actually enhance alcohol’s
effect, researchers at the Bronx Veteran’s Affairs Medical Center say.
In a report on a study published in the Journal of the American Medical Association, the
researchers said they found that aspirin significantly lowered the body’s ability to break down alcohol
in the stomach.
As a result, five volunteers who had a standard breakfast and two extra-strength aspirin tablets an
hour before drinking had blood alcohol levels 30 percent higher than when they drank alcohol alone.
Each volunteer consumed the equivalent of a glass and a half of wine.
That 30 percent could make the difference between sobriety and impairment, said Dr. Charles S.
Lieber, medical director of the Alcohol Research and Treatment Center at the Bronx center, who was
co-author of the report with Dr. Risto Roine.
a. Does this article describe an experiment? Explain.
b. Did this study involve a simple random sample (SRS)? Explain.
c. Did this study use a particular design that we have studied? If so, identify the design. Then
comment on the validity of the study.
9. You are participating in the design of a medical experiment to investigate whether or not a
calcium supplement in the diet will reduce the blood pressure of middle-aged men.
Preliminary research suggests that the supplement may have a greater effect on black men
than on white men.
a. What sort of experimental design would you choose, and why?
b. Assume that the experimental population consists of 600 white men and 500 black men.
Outline in a diagram the design of the experiment. (Be sure to indicate how many
subjects are assigned to the various treatment groups.)
Chapter 6: Probability and Simulation
 Simulations – basic process and examples
 Interpreting Probability
 Probability as long-run relative frequency
 Law of Large numbers
 Randomness
 Legitimate probability models
 Sample spaces
 Outcomes
 Events
 Addition rule for disjoint events
 Complement rule
 Venn diagrams – union and intersection
 Independence
 Multiplication rule
 General addition rule
 Conditional probability
 Tree diagrams
 Proving independence
1. According to a recent national survey of college students, 55% admitted to having cheated at
some time during the last year. What is the probability that for two randomly selected college
students, one or the other would have cheated during the past year?
(A) 0.5500
(B) 0.7975
(C) 0.3025
(D) 0.2475
(E) 0.2025
2. Given two events, A and B, if P(A) = 0.37, P(B) = 0.41, and the P(A or B) = 0.75, then the two
events are
(A) independent but not mutually exclusive
(B) mutually exclusive but not independent
(C) mutually exclusive and independent
(D) neither mutually exclusive nor independent
(E) Cannot be determined
3. Security procedures at the U.S. Capitol require that all bags – meaning briefcases, backpacks,
shopping bags, any carrying bags, and purses – must be screened. Currently, it is reported that
95% of all bags that contain illegal items trigger the alarm. 12% of the bags that do not contain
illegal items trigger the alarm. If 3 out of every 1,000 bags entering the Capitol contain an illegal
item, what is the probability that a bag that triggers the alarm will contain an illegal item?
(A) 0.0233
(B) 0.0029
(C) 0.9500
(D) 0.1140
(E) 0.1225
4. Suppose your teacher’s stash of calculators contain 3 defective calculators and 17 good
calculators. You select two calculators from the box for you and your friend to use on the AP
Statistics exam. What calculations would you use to determine the probability that one of the
calculators drawn will be defective?
(A)
(B)
(C)
(D)
(E)
17 3

20 19
 17  3   3  2 
      
 20  20   20  20 
 17  3 
  
 20  19 
 17  3   3  17 
      
 20  19   20  19 
 17   11   3   2 
    
 20   19   18   17 
5. Heart disease is the #1 killer today. Suppose that 8% of the patients in a small town are
known to have heart disease. And suppose that a test is available that is positive in 96%
of the patients with heart disease, but is also positive in 7% of patients who do not have
heart disease. If a person is selected at random and given the test and it comes out
positive, what is the probability that the person actually has heart disease?
Chapter 7: Random Variables
 Discrete vs. continuous
 Probability distributions
 Notation
 Mean, standard deviation, and variance of a random variable
 Law of large numbers
 Rules for mean and variance
 Linear transformations
 Linear combinations of random variables
 Mean and standard deviation for sums and differences of independent random variables
 Independence
 Combining normal random variables
1. Robin owns a bookstore. She is working on a presentation to convince her partner to spend $500
on a catchy window display. Robin has data to support the fact that if people come in to browse,
62% will make a purchase. Given that the average purchase is $12.38, what is the expected
amount of sales from the next 20 customers who enter the store?
(A) $7.68
(B) $153.51
(C) $247.60
(D) $94.60
(E) $58.34
2. A radio station is running a lottery to raise money for a local charity. The prizes are $10, $50,
and $100, and a grand prize of $1000. The chances of winning these amounts are 0.25, 0.15,
0.09, and 0.01 respectively. What are your total expected winnings (minus costs) if you pay $1
for a ticket?
(A) $29
(B) $10
(C) $90
(D) $290
(E) $28
3. The scores for the top three golfers on a high school golf team are used to determine which high
schools advance to the regional level. The Central High team’s top three players have mean
scores and standard deviations of:
Player 1
Player 2
Player 3
x
89.5
94.4
97.2
x
2.3
4.5
3.9
What are the mean score and standard deviation for the Central High team?
(A)  x  281.1,  x  6.38
(B)  x  93.7,  x  6.38
(C)  x  93.7,  x  3.57
(D)  x  281.1,  x  3.57
(E)  x  281.1,  x  10.7
Chapter 8: The Binomial and Geometric Distributions
 Binomial settings
 BINS
 Binomial distribution
 Mean and variance
 Normal approximation to the binomial distribution
 Geometric distributions
1. Based on his past performance, the probability that Ben will make a free throw is 0.6.
What is the probability that he will make 3 out of his next 5 free throws?
(A) 0.6630
(B) 0.0960
(C) 0.3456
(D) 0.9360
(E) 0.01536
2. Based on his past performance, the probability that Ben will make a free throw is 0.6.
What is the probability that he will miss his first three free throws, and then make his
fourth one?
(A) 0.9744
(B) 0.1536
(C) 0.8704
(D) 0.096
(E) 0.0384
3. A manufacturer of batteries for hearing aids claims that only 4% of their batteries are
defective. A consumer watch group is doubtful of the claim and wants to check it. They
have a shipment of 500 batteries.
(A) What is the mean and standard deviation of the distribution?
(B) The consumer group has reason to believe that the rate of defective batteries is at
least 5%. Based on your findings in (B), what is the probability that more than
5% of this shipment would be defective?
Chapter 9: Sampling Distributions
 What is a sampling distribution?
 Simulation of a sample distribution
 Inference
 Bias
 Variability
 Sampling distributions of proportions and means
 Mean and standard deviation of sampling distributions
 Normal approximations
 Sampling distributions of a difference between two independent sample proportions or sample
means
 Rule of thumbs
 Sampling distributions of proportions, calculations and conditions
 Central Limit theorem (CLT)
 Calculations using x-bar
 Normal population distribution vs. CLT
1. Two samples of corn were taken from a field to test the percent of corn plants infested
with worms. The USDA states that approximately 28% of all corn plants are infested.
One sample contains 100 ears of corn and the second sample 500 ears. Which sample
has the larger standard deviation?
(A) The sample of 500 will have the larger standard deviation
(B) Both samples will have the same standard deviation
(C) The sample of 100 ears will have a smaller standard deviation
(D) The sample of 100 ears will have the larger standard deviation
(E) It is impossible to determine
2. Which of the following statements best describes a sampling distribution of a sample
mean?
(A) It is x
(B) It is the distribution of all possible values of a population parameter
(C) It is the distribution of all possible values of a statistic taken from all possible
samples of a specific size
(D) It is an unbiased estimator 
(E) It is the normal distribution with x  0 and s = 1
3. The conditions that np > 10 and n(1 – p) > 10 are necessary to guard against
(A) a skewed distribution
(B) a small population size
(C) a small sample size
(D) a large standard deviation
(E) non-randomly selected sample
4. A sample of 5,000 female adults was randomly drawn from the United States. It is
known that the diastolic blood pressure for adult women in the United States is N(80, 12).
What is the mean and standard deviation of the distribution of the sample means?
(A) x  0.16, s  0.1697
(B) x  80, s  12
(C) x  80, s  0.1697
(D) x  3.58, s  0.024
(E) Cannot be determined
Chapter 10: Confidence Intervals
 Connect sampling distributions with confidence intervals
 Estimating population parameters
 Properties of point estimators
 Confidence interval for mu with sigma known
 Assumptions needed to be met
 Changing confidence level
 Interpret CI vs. interpreting confidence level
 Determine sample size
 CI for mu when sigma is unknown
 T-distributions
 One-sample t-interval
 Paired t procedures
 Robustness
 CI’s for proportions
 Determine sample size for proportions
 Standard error
 Margin of error
 Properties of CI’s
1. Which of the following statements about the t-distribution is true?
(A) The t-distribution has a mean of 0 and a standard deviation of 1
(B) The t-distribution has a larger variance than the standard normal distribution
(C) The smaller the degrees of freedom, the smaller the variance for the t-distribution
(D) The t-distribution is a skewed distribution
(E) The normal distribution is flatter and more spread out than the t-distribution
2. Campaign managers conduct regular polls to estimate the proportion of people who will
vote for their candidate in an upcoming election. Shortly before the actual election, the
campaign manager doubles the sample size of the poll. What effect does this have on the
estimate?
(A) It increases the reliability of the estimate
(B) It decreases the standard deviation of the sampling distribution of the sample
proportion
(C) It decreases the variability in the population
(D) It will reduce the effect of confounding variables
(E) It reduces the bias that comes from interviewer effect
3. An ecologist would like to estimate the mean carbon monoxide level of the air in a
particular city. The carbon monoxide levels are measured on 14 days during a month and
recorded. A histogram of the 25 readings is roughly symmetrical, with no outlying
values. The mean and standard deviation of these values are 5.4 and 2.2, respectively.
Assume the 25 days can be considered a simple random sample of all days. Which of the
following is a correct statement?
2.2
(A) A 95% confidence interval for  is 5.4  2.145 
14
2.2
(B) A 95% confidence interval for  is 5.4  2.145 
13
2.2
(C) A 95% confidence interval for  is 5.4  2.160 
14
2.2
(D) A 95% confidence interval for  is 5.4  2.160 
13
(E) The sample is too small to trust the results
4. To estimate the proportion of TV viewers watching a certain special, how large of a
random sample is required so that the margin of error is 0.04 with 99.6% confidence?
(A) 18
(B) 36
(C) 96
(D) 1296
(E) 1492
5. A quality control engineer at a steel mill must estimate the mean tensile strength of a new
machine using a random sample of 12 beans. The actual population distribution for this
machine is unknown, but graphical displays of the sample indicate that the assumption of
normality is reasonable. Since there are no historical data for this prototype machine, the
variability of the process is completely unknown. The engineer determines a tdistribution rather than a z-distribution because
(A) He has a small sample, making the z-distribution inappropriate
(B) He is using data rather than theoretical methods to determine the mean
(C) The data comes from only one machine
(D) The variability of the machine is unknown
(E) The t-distribution results in a narrower confidence interval
6. A company wants to estimate the mean net weight of all 32-ounce packages of its
Yummy Taste cookies at 95% confidence. It is known that the standard deviation of net
weights is 0.1 ounce. The sample size that will yield the margin of error within 0.02
ounces of the population mean is
(A) 9
(B) 10
(C) 96
(D) 97
(E) More information is needed
7. A random sample of 25 tourists who visited Hawaii this summer spent an average of
$1420 on this trip with a standard deviation of $285. The 95% confidence interval for the
mean money spent by all tourists who visit Hawaii is
(A) ($1302, $1538)
(B) ($1308, $1531)
(C) ($1397, $1443)
(D) ($1363, $1477)
(E) ($1385, $1465)
8. A sample of 1000 adults showed that 31% of them are smokers. To estimate the
proportion of people in the entire population who smoke, what additional information
would you need?
(A) The size of the population
(B) The amount of confidence you desire in your estimate
(C) The standard deviation for the number of smokers
(D) The length of time the people smoked
(E) All the information you need is contained in the problem
9. When comparing a 95% confidence interval with a 99% confidence interval created from
the same data, how will the intervals differ?
(A) The sample size must be known to determine the difference
(B) The mean of the sample must be known to determine the difference
(C) The use of the t-distribution or the z-distribution will determine how the two
intervals differ
(D) The 95% interval will be wider than the 99% interval
(E) The 95% interval will be narrower than the 99% interval
10. Increasing the sample size by a factor of 4 will have what effect on the margin of error?
(A) It will increase the margin of error by a factor of 4
(B) It will decrease the margin of error by a factor of 4
(C) It will increase the margin of error by a factor of 2
(D) It will decrease the margin of error by a factor of 2
(E) It will decrease the margin of error by a factor of 16
11. The principal of Southside High School, a large urban school of 4,252 students, took a
simple random sample of 250 Southside students and found that 43% of them were
involved in extracurricular activities. The 90% confidence interval for the estimate of
students involved in extracurricular activities at Southside High School is
(A) (0.3899, 0.4701)
(B) (0.3780, 0.4820)
(C) (0.3778, 0.4822)
(D) (0.3785, 0.4815)
(E) (0.1327, 0.2112)
12. A local politician wants to estimate the percentage of voters who plan to support a
referendum to curb development in the county. How large of a sample will be needed to
ensure a margin of error of no more than 3%, with 95% confidence?
(A) 896
(B) 752
(C) 632
(D) 1068
(E) More information is needed
13. A national news magazine surveyed 1,500 adults in the United States, and found that
37% disapproved of the Administration’s handing of domestic issues. The magazine
reported the results as 37%  3%. What degree of confidence is reported in these results?
(A) 97%
(B) 56%
(C) 94%
(D) 99%
(E) There is not enough information
14. Professor Graham wants to reduce the width of the confidence interval around his
estimate of the proportion of adults who are carriers of a certain bacteria. What can he do
to accomplish this?
(A) Decrease his sample size
(B) Increase the confidence interval
(C) Change his estimate of p̂
(D) Increase his sample size
(E) None of these will result in a smaller confidence interval
15. A random sample of adult male physicians at Memorial Hospital was taken, and the mean
cholesterol level was found to be 183 mg/dL. A 95% confidence interval for the
corresponding population mean is 183  17 mg/dL. Which of the following statements
must be true?
(A) 95% of the population measurements fall between 166 and 200
(B) 95% of the sample measurements fall between 166 and 200
(C) If 100 samples were taken, 95% of the sample means would fall between 166
and 200
(D) P(166  x  200) = 0.95
(E) If  = 160 this x of 183 mg/dL would be unlikely to happen
16. A recent survey of 500 people reported that 67% of American adults believe that high
gasoline prices are caused by the greed of oil companies. The margin of error was
reported as 3%. What does the margin of error mean?
(A) No more than 70% of the population believes that high gasoline prices are
caused by the greed of oil companies
(B) The actual parameter is between 64% and 70%
(C) It is unlikely that the reported statistic would be 67%, unless the true value was
between 64% and 70%
(D) Three percent of the people were not surveyed
(E) Three percent of the time, the value obtained would be different from 67%
17. A random sample of five snack foods available in the vending machines in the school
cafeteria contained the following amounts of sodium (in mg): 310, 350, 320, 28, and
340. What is the 90% confidence interval for the amount of sodium in mg per snack
food?
(A) 320  26.1
(B) 320  20.2
(C) 320  18.78
(D) 320  24.7
(E) 320  21.78
18. An economist for the government needs to estimate the mean income for households not
covered by health insurance in the city of Albany. He collects a random sample of 1,500
families, and finds the mean income for the sampled households is $18,870 with a
standard deviation of $7,240. Calculate a reasonable confidence interval for the true
mean income for households not covered by health insurance.
19. A random sample of 150 seniors at SDSU were asked if they had cheated on an exam or
major paper at any time during their college career. A total of 93 seniors reported that
they had cheated.
(A) Calculate a 95% confidence interval for the proportion of all seniors who had
cheated during their college careers.
(B) How many students should be surveyed to obtain a 95% confidence interval that
is within 1% of the correct percent of seniors who cheated?
(C) How would the length of the confidence interval be affected if the confidence
level were changed to 80%? Justify your answer.
Chapter 11: Testing a Claim
 Intro to significance testing
 Stating Hypothesis and alternative hypothesis
 Components of a significance test
 Conditions
 Calculations
 Interpretation
 One-sided vs. two-sided
 Statistical significance and P-value
 alpha
 Duality with CI
 Uses and abuses of tests
 Statistical significance vs. practical importance
 Type I and II errors in contest
 Connections between power and Type II error
1.
A bottling company claims there are 2 liters of soda in a large bottle. The Bureau of
Weights and Measures believes that the company is cheating the consumer by putting less
than 2 liters in a bottle. The bureau decides to conduct an experiment to determine if the
consumer is being cheated. Which of the following hypotheses would be appropriate?
(A) H 0 :   2, H a :   2
(B) H 0 :   2, H a :   2
(C) H 0 :   2, H a :   2
(D) H 0 :   2, H a :   2
(E) H 0 :   2, H a :   2
2.
The z-test may not be used when
I.
the sample is too small
II. the standard deviation of the population is unknown
III. the population is not normally distributed
IV. the sample is not normally distributed
(A)
(B)
(C)
(D)
(E)
3.
I only
II only
III only
II and IV
I and IV
The probability of finding a true difference in a hypothesis test can be increased when
which of the following is true?
(A) n is increased and  is increased
(B) n is increased and  is decreased
(C) n is decreased and  is increased
(D) n is decreased and  is decreased
(E) None of the above
4.
The analysis of a sample of 250 shoppers at a mall in a large metropolitan area produced a
99% confidence interval that the mean amount spent that day was ($124, $154). Suppose
you wish to test the null hypothesis that H0:  = $160 at the  = 0.01 level of significance.
Can you use the data provided to draw a conclusion?
(A) Yes; it can be concluded that the mean amount spent is significantly different from
$160, since this value is not in the 99% confidence interval
(B) Yes; it can be concluded that the mean amount spent is not significantly different
from $160, since this value is not in the 99% confidence interval
(C) No; the distribution of the population must be known before a conclusion can be
drawn
(D) No; the data is needed to properly conduct a hypothesis test
(E) No; hypotheses cannot be tested based on a confidence interval
5.
A researcher conducted an experiment regarding the effectiveness of a new drug.
Following the statistical analysis, the results were reported with a p-value of 0.12. Based on
this p-value, which of the following conclusions should the researcher reach?
(A) Reject the null hypothesis, since p-value of 0.12 is greater than the significance level
of 0.05.
(B) Reject the null hypothesis, since 1 – p-value is 0.88, which is greater than the
significance level of 0.05.
(C) Fail to reject the null hypothesis, since there is a 12% chance that you could obtain
these results when H0 is true, which is higher than the significance level of 0.05.
(D) Fail to reject the null hypothesis, since there is an 88% chance that you could obtain
these results when H0 is true, which is higher than the significance level of 0.05.
(E) Accept the null hypothesis, since the p-value is too large
6. If a null hypothesis is rejected when it is actually true, then
(A) A Type I error occurs
(B) A Type II error occurs
(C) A  error occurs
(D) A random error occurs
(E) A power occurs
7. The power of a significance test against a particular alternative is 91%. Which of the
following is true?
(A) The probability of a Type I error is 91%
(B) The probability of a Type II error is 91%
(C) The probability of a Type I error is 9%
(D) The probability of a Type II error is 9%
(E) The probability of an alpha error is 9%
Chapter 12: Significance Tests in Practice
 Testing a claim about mu
 One-sample t-test
 Paired t-test
 Testing a claim about p
 Significance tests
 What if the conditions aren’t met?
Chapter 13: Comparing Two Population Parameters
 Matched pairs data vs. independent samples
 CI for difference between two means (unpaired and unpaired)
 Estimating 1  2
 Two sample t-tests and assorted df possibilities
 CI for difference between two proportions
 Estimating p1  p2

 Significance
test for comparing two population proportions (unpaired and unpaired)

Chapter 14: Inference about Distributions of Population Proportions
 Chi-square goodness of fit test
 Chi-square test of homogeneity
 Chi-square test of association/independence
 Expected counts
 Chi-square distribution
Chapter 15: Inference about Linear Regression
 The linear regression model
 Population vs. sample regression lines
 Significance tests about the slope of a least-squares regression line
 Computer output
 CI for slope of a least-squares regression line