Download Chapter 2-6 Optional Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

Categorical variable wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
AP STATISTICS
Chapter 2-6 Review
1. Explain the concept of resistance. Include an example comparing measures of center and an example
comparing measures of spread.
B
A
2. An experiment was conducted using 100 volunteers to investigate two different weight loss programs
(program A and program B). Researchers recorded each patient’s initial weight and gender and then randomly
assigned each subject to program A or program B. At the end of the study each subject was weighed again and
the change in weight was recorded (final – initial).
a) Describe the W’s. For each variable, record whether it is categorical or quantitative.
b) The boxplots below show the change in weights for the subjects in each treatment. Compare these
Box Plot
Collection 1
distributions.
-40
-20
0
20
40
60
80
100
Change in Weight (Final – Initial)
c) There is an outlier in the Program A’s distribution. Explain how this outlier was identified.
d) If a person had a negative change, it means that he or she actually gained weight. Which program had a
higher proportion of subjects who gained weight?
3. In November 2008, CDO had a mock election between John McCain and Barack Obama. The results shown
in the table below are categorized by grade level and presidential preference.
9
10
11
12
total
Obama McCain total
233
155
388
211
188
399
119
134
253
128
118
246
691
595
1286
a) Make a graph to display the relationship between grade level and presidential preference.
b) Based on your graph, are grade level and presidential preference independent for the students who
participated in the mock election?
4. Explain how you would decide when to use a histogram and when to use a bar chart.
5. A random sample of CDO students was asked how many hours of sleep they got the previous night. Here
are the results: 6, 6.75, 7.25, 7.5, 7.5, 7.5, 8, 9, 10.5
a) Calculate the mean and standard deviation
b) Interpret the standard deviation
c) In the context of this problem, explain the difference between x and  .
d) Calculate and interpret the z-score for the student with 6 hours of sleep.
e) If the times were converted to minutes, how would this student’s z-score change?
6. The following histogram shows the time it took to complete an exam for a class of 30 students.
a)
b)
c)
d)
Describe the shape of the distribution.
Make an ogive (cumulative relative frequency plot) for this data.
Explain how the characteristics of the ogive correspond to the shape of the histogram.
Use your ogive to estimate the interquartile range of this data.
7. The following data shows the time (in minutes) it took for students to complete a Sudoku puzzle.
6, 6, 6, 7, 7, 8, 8, 10, 10, 10, 11, 11, 13, 15, 19, 25, 30
a) Make a histogram of this data.
b) Without calculating, which is higher, mean or median? Explain.
c) In what circumstances would you want to make a relative frequency histogram?
8. The following summary statistics describe the distribution of test scores on a recent test.
x
n
s
Min Q1 Med Q3 Max
56 13.44 3.67 6
11 13.5 16.5 20
a) What are the range and interquartile range of the scores?
b) To scale the scores, the teacher multiplies each score by 3 and adds 10. Find the new values of the
mean, standard deviation, median, and interquartile range.
c) If a Sally’s raw score was in the 39th percentile, explain to her what this means.
d) After the test scores have been scaled, what percentile will Sally be in?
e) Suppose the teacher wanted the mean of the scores to be 80 with a standard deviation of 15. What
transformations should he apply?
9. The following stemplot shows the average number of text messages sent each day by a sample of 20 students
during the last month.
0
1
2
3
4
5
11257
369
39
2446799
39
0
Average Number of Text Messages Sent
1|3 = 13 messages/day
a) Make a boxplot of this data.
b) Discuss the advantages and disadvantages of using stemplots vs. boxplots.
Answers:
1. Resistant measures are not influenced by outliers or skewness. For example, the median is a resistant
measure of center while the mean is not. Likewise, the interquartile range is a resistant measure of spread but
both the range and standard deviation are heavily influenced by outliers and skewness.
2a. Who: the 100 volunteers; What: gender and treatment are categorical, initial weight, final weight, and
change in weight are quantitative; How: a randomized experiment; When and Where not specified; Why: to see
which weight loss program works better.
2b. Shape & Unusual Values: Both distributions are approximately symmetric, but A has an outlier on the low
end (note: we cannot tell anything about the number of peaks). Center: The median of distribution B is higher.
Spread: Both the range and IQR are larger for distribution B.
2c. To identify outliers on the low end, calculate Q1 – 1.5(IQR). Any value lower than that is an outlier.
2d. We cannot tell. The lowest 25% for both groups includes both negative and positive values, but we don’t
know how many are negative and how many are positive. For example, it is possible that only 2 values are
negative in the A distribution and 12 values in the B distribution. Likewise, there may only be 1 negative value
in the B distribution while A can have up to 12 (in a set of 50 values, there will be 12 values below Q1).
3a. Note: This is just one possibility. You could make 4 pie charts or a comparative bar chart.
Note: It is better to focus on each grade separately in 4 segmented bar charts than looking at two charts (Obama
and McCain) split by grade level.
Note: Since the group sizes are so different, you must make your graph in terms of percents (relative
frequencies).
CDO Mock Election
100%
Percent
80%
60%
McCain
40%
Obama
20%
0%
9
10
11
12
Grade Level
3b. Grade level and presidential preference are not independent in this sample. If they were, then the same
percentage of each grade would prefer Obama. However, more than half of 9th, 10th, and 12th graders prefer
Obama while less than half of 11th graders preferred Obama. So, knowing a student’s grade level would help
you predict who he would vote for.
4. Use a bar chart when the data is categorical and a histogram when the data is quantitative. In a bar chart, the
bars shouldn’t touch and can be in any order. In a histogram, the bars will touch unless there is an empty
category.
5a. x = 7.65, s = 1.30
5b. On average, the amount of sleep students get is 1.30 hours away from the mean.
5c.  is the population mean (the average we would get if we surveyed the entire population of CDO
students). x is the sample mean (7.65) which is an estimate of  based on a sample of 10 CDO students. It
probably isn’t equal to  , but hopefully it is close.
6  7.65
 1.27 . The student who got only 6 hours of sleep is 1.27 standard deviations below the
1.30
mean of the sample.
5d. z 
5e. Since the z-score is a standardized score, it does not depend on the units so it would be the same.
6a. The shape is single peaked and approximately symmetric.
6b.
Bin
10-<15
15-<20
20-<25
25-<30
30-<35
35-<40
40-<45
45-<50
50-<55
55-<60
60-<65
65-<70
Freq
1
1
1
3
7
7
4
3
2
0
0
1
Rel. Freq.
Cum. Rel. Freq.
0.03333333
0.03333333
0.03333333
0.06666667
0.03333333
0.1
0.1
0.2
0.23333333
0.43333333
0.23333333
0.66666667
0.13333333
0.8
0.1
0.9
0.06666667
0.96666667
0
0.96666667
0
0.96666667
0.03333333
1
Ogive of Times for Test
1
0.9
0.8
Cum. Rel. Freq.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
40
50
60
70
80
Time
6c. In the histogram, most of the data is between 25 and 55, so this is where the ogive is most steep (where the
most data is being added to the cumulative relative frequency). There is less data below 25 and above 55, so the
ogive is relatively flat here. It is perfectly flat from 55 to 65 where there is no data at all.
6d. Tracing over from .75, we estimate Q3 = 43 and tracing over from .25, we estimate Q1 to be 32. So, the
IQR is approximately 11.
7a.
7b. Since the data is strongly skewed to the right, I would expect the mean to be greater than the median since
the mean is pulled in the direction of the skew.
7c. I would use a relative frequency histogram anytime I wanted to know the percent in a category and
especially when I am comparing two distributions with different sample sizes.
8a. range = 20 – 6 = 14, IQR = 16.5 – 11 = 5.5
8b. mean = 13.44(3) + 10 = 50.32
s = 3.67(3) = 11.01
median = 13.5(3) + 10 = 50.5
IQR = 5.5(3) = 16.5
8c. Sally’s score was the same or better than 39% of the test takers.
8d. Still in the 39th percentile.
8e. To make the SD = 15, multiply each value by 15/3.67 = 4.09. Multiplying everything by 4.09 will make
the mean = 4.09(13.44) = 54.97 so add an additional 25.03 to each score.
Box Plot
Collection
9a. min = 1,
Q1 = 510, med = 30.5, Q3 = 38, max = 50
0
10
20
30
texts
40
50
60
9b. With the stemplot you were able to see the double-peaked shape, which is not evident in the boxplot.
However, in the boxplot it is easy to see where the median is as well as measure the interquartile range (length
of the box).