Download Math 227 Exam 1 Ch 1 to 4 revKEY

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression toward the mean wikipedia , lookup

Transcript
Math 227 Review for Exam 1 Chapter 1 to 4 KEY
Make sure you write your answers in complete sentences and in context when applicable.
1. We collect these data from 50 male students. Which variable is qualitative (categorical) and which is
quantitative?
a. eye color qualitative
b. head circumference quantitative
c. marital status qualitative
d. number of cigarettes smoked daily quantitative
e. number of TV sets at home quantitative
f. temperatures in Southern California for the past year quantitative
g. weather conditions in Southern California in past year qualitative
2. Identify the following research studies as observational or a controlled experiment. Explain why.
a. Data from the Motorcycle Industry Council stated that “Motorcycle owners are getting older and richer.”
Data were collected on the ages and incomes of motorcycle owners for the years 1980 and 1998 and then
compared. The findings showed considerable differences in the ages and incomes of motorcycle owners for
the two years. Observational, no treatment is assigned
b. A study conducted at Virginia Polytechnic Institute and presented by Psychology Today divided female
athletes into two groups and had the students perform as many sit-ups as possible in 90 seconds. The first
group was told only to ‘do your best,’ while the second group was told to try to increase the number of situps they did each day by 10%. After 4 days, the first group averaged 43 sit-ups while the second group
averaged 56 sit-ups. The conclusion was that athletes who were given specific goals performed better than
those who were not given specific goals. Controlled experiment, treatment assigned (verbal instructions)
c. A recent study showed that eating garlic can lower blood pressure. Researchers prescribed garlic pills to
high blood pressure patients and monitored their results over a 6 month period. These results were then
compared to high blood pressure patients who had received a placebo. The doctors administering the pills
were not aware of which patients had received the treatment. Controlled experiment, treatment assigned
(garlic pills)
3. The dot plot below shows the ages for about 108 people in three community college math classes.
a. Any age 26 and over is considered unusually high for this sample. How many student ages are considered
unusually high for this sample? 21 students had ages that were unusual for this group
b. What percent of the sample was this? 21/108=19.4% of students’ ages are unusual
1
4. Answer the following questions given the distribution of following exam scores.
Histogram of Chapter 3 Exam
16
14
Frequency
12
10
8
6
4
2
0
40
50
60
70
80
90
100
110
Chapter 3 Exam
a) How many students took the chapter 3 exam? 37
b) What is the shape of the distribution of exam scores? Slightly skewed right or roughly symmetric
c) What was a typical score for this class (center)? About 75 points
d) What was the typical spread for this class? 70 to 90 –or– 70 to 80 –or– 65 to 85
e) How many students got at least an 80 on the exam? 11 (8 + 2 + 1)
f) What percentage of students got at least an 80 on the exam? 11/37 = 29.7% of students got at least an
80 on the exam.
g) How many students scored less than an 80 on the exam? 26 students (37-11 or 7+4+15)
h) What percentage of students scored less than an 80 on the exam? 26/37= 70.3% of students received
less than 80 on the exam.
i) What percentage of students scored below 70 on the exam? 7 + 4 = 11; 11/37 = 29.7% of students
scored below 70 on the exam.
j) Approximately what percentage of students scored from 70 to 90 on the exam? 15 + 8 = 23; 23/37 =
62.1% scored from 70 to 90 on the exam.
2
5. Which is true of the data whose distribution is shown?
I. The distribution is skewed to the right. True
II. The mean is smaller than the median. False (right tail pulls mean to the right of the median)
III. We should summarize with mean and standard deviation. False, mean and standard deviation are used
for symmetric graphs.
6. Answer the following questions given the distribution of salaries of a random company.
Salary
Relative Frequency (%)
40
30
20
10
0
40000
60000
80000
100000
Salary (In U.S. Dollars)
a) What percentage of employees made a salary of less than $35,000? 25%
b) What percentage of employees made a salary of more than $80,000? 5%
c) 60% of employees made a salary of less than ____________? $45,000
d) How many employees made a salary of less than $35,000? Cannot be determined. The number of
employees is not given.
7. All students in the physical education class completed a basketball free-throw shooting event and the
highest number of shots made was 32. The next day, the PE teacher realized that he had made a mistake.
The best student had actually made 38 shots (not 32). Indicate whether changing the student’s score made
each of these summary statistics increase, decrease, or stay about the same:
a)
b)
c)
d)
Mean class average increases
Median center stays about the same
Range overall range increases
IQR spread of the middle 50% of the class stays about the same
3
8. The mean and median scores of a recent math 075 exam were close to 68%. The instructor decided not
to count one score of zero that was from an absent student to get a better representation of the class
average and then recalculated the new mean and median.
a) Will the new mean increase, decrease or remain about the same? Explain. Since a very low score
was dropped, the new mean will now be higher (class average will go up). The mean is sensitive to
outliers.
b) Will the new median increase, decrease or remain about the same? Explain. The new median will
be roughly the same since the person with the middle score is roughly in the same position. (The only
time it would increase would be if the two middle people had scores that were far apart from each
other.) The median is not sensitive to outliers.
c) True or false: The overall range increased. False, since the minimum changes from zero to the next
lowest score in the class, the overall range will get smaller. The variability will decrease.
d) True or false: The IQR remained about the same. True, the scores at the 25th and 75th percentile are
roughly in the same position. Therefore, the IQR will be close to the same amount.
9. The following boxplots compare the ages of all the Oscar Winners from 1970 to 2001. Use this to answer
the following questions.
Consider the distributions of ages for Oscar winning
actors and actresses.
a.
50% of winners were below what age?
Actor: 42.5
Actress:35
b.
75% of winners were below what age?
Actor: 50.25
Actress:41.5
c.
75% of winners were above what age?
Actor: 37.25
Actress:32
d.
25% of winners were above what age?
Actor: 50.25
Actress:41.5
Actor 5 Number Summary: 31 , 37.25 , 42.5 , 50.25 , 76
Actress 5 Number Summary: 21 , 32 , 35 , 41.5 , 80
e. How many outliers are there for each gender and what are they?
Actor: 1 outlier at 76 years old
Actress: 3 outliers at around 60, 74,and 80 years old
f. What are the shapes of the distributions?
Actor: right skewed
Actress: right skewed
g. Did a typical actor or actress win at a younger age? Explain. An actress typically wins an Oscar at a
younger age at 35 years of age compared to an actor who typically wins at 42.5 years.
4
h. What are the IQRs for actors and actresses? Interpret these IQRs. IQR for actor: 13 yrs , IQR for actress:
9.5 yrs. The typical spread for the middle 50% of actors ages is 13 years whereas the typical spread for the
middle 50% of actresses ages is 9.5years. (There is more variability in typical ages for men compared to
women.)
i. Based on the IQRs, did actors or actresses win at a younger age? Explain. Based on these typical spreads
actors typically won at an age of 37.25 to 50.25 years of age and actresses typically won at an age of 32 to
41.5 years of age. Thus, actresses typically won at a younger age.
j. Which data set is more consistent and why? The data set for actresses is more consistent since the
typical spread is less dispersed. (ie. The IQR is less.) This means that it is easier to predict a typical age for
the female group.
k. Did actors or actresses win at a younger age? Utilize percentages from the Boxplot of the distributions
above to support your answer. In conclusion, actresses tend to win at a younger age. 75% of actresses won
an Oscar at an age of 41.5 years or less. 75% of actors won an Oscar at an age of 50.25 years or less which is
is about a 9 year difference. (You could also use the 50% ages to support your conclusion.)
10. The following data represent the annual chocolate sales (rounded to nearest billions of dollars) for a
sample of seven countries in the world. Round answers to nearest tenths.
2, 5, 7, 2, 5, 3, 18
a. Find the mean for the data. Write the answer in a complete sentence in context. The mean number of
annual chocolate sales for these seven countries was 6 billion dollars.
∑(𝑥−𝑥̅ )2
b. Calculate the standard deviation: s = √
𝑛−1
. Write the answer in a complete sentence in context.
The mean number of chocolates sales was six billion dollars with a standard deviation of 5.6 billion
dollars. (or) Typical chocolate sales are 6 billion dollars ±5.6 billion dollars.
c. Using this standard deviation, one could then expect typical annual chocolate sales to be between which
two values? 6+5.6= 11.6
6-5.6= 0.4
One can expect typical annual chocolate sales for these countries to be between 0.4 and 11.6 billion
dollars or between $ 400,000,000 and $11,600,000,000.
12/24/11
Checkpoint Topic 2.1
Question
5
11.
Points: 10 out of 10
Answer the following
questions with a letter I,
II, III, or IV. Explain
your choice in complete
sentences for each
question. Histograms can
be used more than once
Which of the histograms could represent a distribution of weights ofand
babies
some answers might
for a large random sample of male newborns at a local hospital?
A.
I
B.
II
C.
III
5
have more than one
answer.
A. Which graph would represent a distribution of the ages of math 075 students where there is a high
percentage of students who recently graduated high school and very few students who over 50? Explain.
II - Most of the data will be clustered on the lower end (left) and very little data will be on the higher end
(right).
B. Name all graphs where the mean would be chosen as the best measure of center. Explain.
III- Mean is the best measure of center for symmetric graphs only.
C. Name all graphs where the IQR (interquartile range) would be chosen as the best measure of spread.
Explain. I, II, IV- The IQR is the best measure of spread for non-symmetric graphs since it is not
sensitive to outliers.
D. Which graph would represent a distribution for the heights of koala bears? Explain.
II- Measurements of species (humans, plants, and animals) are roughly symmetric. Most fall within a
typical range with fewer high and low values.
6
12. The ten top grossing Pixar Animated movies for the US box office up to June 2010 are shown below, in
millions of dollars.
a. Find the median A typical Pixar movie made about 245 million dollars.
b. Find the interquartile range (IQR) and interpret the meaning of the IQR in context. 261-206 = 55 million
dollars. Examples: The typical spread of revenue for Pixar movies was 55 million dollars. This means that
the spread between the middle 50% of the revenues was 55 million dollars.
c. Between which two values does a typical movie gross? Pixar typically made from 206 to 261 million
dollars. (Q1 and Q3)
Movie
$Millions
Toy Story
A Bug’s Life
Toy Story 2
Monsters, Inc.
Finding Nemo
The Incredibles
Cars
Ratatouille
WALL-E
Up
192
163
246
256
340
261
244
206
224
293
13. The following graphs show the distributions of the ages in years of a pre statistics class for students in
the Fall of 2014.
Histogram of Ages
350
300
Frequency
250
200
1 50
1 00
50
0
15
30
45
60
75
90
Ages
7
Dotplot of Ages
24
36
48
60
72
84
96
Ages
Each symbol represents up to 5 observations.
Boxplot of Ages
1 00
90
80
Ages
70
60
50
40
30
20
10
Note: Age 26 is the first outlier.
Descriptive Statistics: Ages
Variable
Mean StDev
Ages
21.128 6.812
Minimum
15.0
Q1
18.0
Median Q3
19.0
21.0
Maximum
98.0
IQR
3.0
a. Was this a categorical or quantitative study? Quantitative (ages)
b. What is the variable (variables)? ages
c. What is the shape of the distribution in the ages? Right skewed
d. Which measure of typical center is best to use? Mean or Median? Explain. The median would be a
better representation of the typical center since the graph is skewed.
e. Which measure of typical spread is best to use? Standard Deviation or IQR? Explain. The IQR would be
a better representation of the typical spread since the graph is skewed.
f. What is the typical center? Complete sentence in context. Using the provided descriptive statistics, the
median is 19.0. This means that a typical age for a student was 19 years old.
g. What is the typical spread? Complete sentence in context. The IQR is given a 3. This means that the
spread between the middle 50% of the ages was only 3 years.
h. What ages are considered unusual for this group? Were there any students that were unusually
younger or older for this sample? It was given that 26 was the first outlier so any student 26 and older is
considered unusual for this group. There were many outliers in this group (too many to count). The
oldest being close to 100.
8
14. According to the data above for the ages, the mean was 21.1 years with a standard deviation of 6.8
years. The following question is to practice standard deviation. In reality, since the graph was skewed,
these values are not a good representation of what was typical for this group. The median and IQR
would be used instead.
But for practice:
a. What is the range of ages from one standard deviation below the mean to one standard deviation
above the mean? 14.3 to 27.9 years old. (Typical ages)
b. What is the range of ages from two standard deviations below the mean to two standard deviations
above the mean? 7.5 to 34.7 years old. (Anyone over 34.7 years old is unusual for this group)
c. What is the range of ages from three standard deviations below the mean to three standard
deviations above the mean? 0.7 to 41.5 years old. (Anyone over 41.5 years old was extremely unusual
for this group)
d. Is the age of 25 years more than one standard deviation above the mean? Show by converting to a z
score using the formula z 
xx
25  21.1
 0.6 No it is not more than one standard deviation
. z
s
6.8
above the mean. The z score is 0.6. This means a 25 year old was typical for this group.
e. There was a 98 old student which is unusual. How unusual is she, highly unusual (z-score above 2) or
extremely unusual (z-score above 3)? z 
98  21.1
 11.3 She was extremely unusually with a z score
6.8
of 11.3! Much higher than a z score of 3 meaning this is extremely rare and highly unlikely to happen
again.
15. A dietitian is interested in comparing the sodium content of real cheese with the sodium content of a
cheese substitute in milligrams and asks you (the statistician) to provide data that supports her belief that
cheese substitutes typically contain more sodium. You collect the sodium content of several real cheeses
and chees substitutes. Using computer technology, you provide the following box plots and sample
statistics.
Using the following statistics and graphs, decide whether the dietitian’s belief is correct. Support your
decision with the statistics provided. (Include discussion of the shapes, any outliers and the best
measures of center and spread to support your decision). Answers will vary but the conclusion should
be that: The typical sodium content for real cheese was between 56.3 and 292.5 mg (Half of the samples
fell within this range. The typical sodium content for cheese substitute was between 197.5 and 305 mg.
(Half of the samples fell within this range. Although there was more variability in the real cheese, a
typical sample had lower sodium in general. Note, however that the maximum typical value of sodium
was about the same for both. (Even though real cheese is the better choice we can note that the upper
25% of the samples were higher in sodium content compared to the substitute cheese due to real
cheese having more variability.)
real cheese
N
Mean
SD
Minimum
8
193.1mg
133.2mg
40mg
cheese substitute
N
Mean
8
253.8mg
SD
68.6mg
Q1
Median
56.3mg
Q3
200mg
Maximum
292.5mg
420mg
Minimum
Q1
Median
Q3
Maximum
130mg
197.5mg
265mg
305mg
340mg
9
Boxplot of real cheese and cheese substitute
400
Data
300
200
100
0
real cheese
cheese substitute
16. In the real cheese/cheese substitute boxplots, which type had more variability?
(Using the descriptive statistics) The typical spread for real cheese was (Q3 – Q1) 236.2 mg. The typical
spread for cheese substitute was 107.5 mg. Thus the real cheese had more variability. The cheese
substitute was more consistent.
17. Which would have a larger standard deviation? The mile times of the male high school track teams in
the U.S. or the mile times of the male participants in the last Olympics? High school teams since their
times would be more spread out or dispersed (more variability).
18. In 2007, the mean property crime (per 100,000 people) for the 26 states east of the Mississippi River
was 409 with a standard deviation of 193. Assume the distribution was roughly symmetric and unimodal.
a. Between which two values would you expect to find about 68% of the rates? 68% is one standard
deviation so between 216 and 602 crimes per 100000 people.
b. Between which two values would you expect to find about 95% of the rates? 95% is two standard
deviations so between 23 and 795 crimes per 100000 people.
c. If an eastern state had a violent crime rate of 503 crimes per 100,000 people, would you consider this
unusual? Explain. No, 503 crimes falls within one standard deviation. 503 crimes is within the typical
values.
19. When would you choose the median as the best measure of center? Median is appropriate for nonsymmetrical graphs. (Mean would be appropriate for symmetrical graphs.)
20. When would you choose the standard deviation as the best measure of spread? Standard deviation is
appropriate for symmetrical graphs. (IQR would be appropriate for non-symmetrical graphs.)
21. Data was collected and a scatterplot was constructed that compared a person’s cholesterol reading and
the number of servings of vegetables consumed in a week.
10
a. Would you expect a positive or negative association? Negative since the more vegetables one consumes,
the expected cholesterol reading would decrease. (As x increases, y decreases)
b. Identify the explanatory and response variable. Explanatory: number of vegetable servings. Response:
cholesterol reading.
c. The correlation coefficient, r is found to be - 0.782. Can it be concluded that increasing the number of
servings of vegetables will cause lower cholesterol readings? Explain. Although the r value shows a strong
negative correlation, we cannot conclude causation. The conclusion can only be stated as “may cause”
lower cholesterol readings. We can only infer that the two are associated.
d. The researcher found that there was an outlier for a person who ate 500 servings of vegetables in a
week. It was discovered that it was a typo and this value was changed to the correct number of 50 servings
a week. How will the r value be affected?
22. Match each description of a set of measurements to a scatterplot. r = - 0.689, r = .728, r = 0.562
r = 0.728
r= 0.562
r = -0.689
23. a. What is extrapolation? Making predictions beyond the scope of the data.
b. Why is it important to not extrapolate? We are not guaranteed that the linear trend will continue
beyond the scope of the data provided which can then lead to incorrect predictions.
24. The following scatterplot compares a women’s height with their weight.
Scatterplot of womens height and weight
170
weight in pounds
160
150
140
130
120
110
100
61
62
63
64
65
height in inches
66
67
68
a. Describe the association between height and weight. As a women’s height increases, the weight
increases as well.
11
b. Assume a linear trend exists. Draw a best fit line and use the line to roughly predict the weight of a
women who is 65 inches tall (5 feet, 5 inches). You should draw your line roughly through the center of the
data.
25. The regression equation for predicting a women’s weight based on height (from above) is given by:
Predicted weight = -443 + 9.03height
a. Find the predicted weight of a women who is 65 inches tall, if applicable. The predicted weight of a 65
inch woman is 143.95 pounds or about 144 pounds.
b. Find the predicted weight of a women who is 59 inches tall, if applicable. We should not make
predictions beyond the scope of the data.
c. Interpret the slope in context. For each additional inch in a women’s height, her weight will increase by
9.03 pounds. (note, the ‘x’ is the height and the ‘y’ is the weight)
d. Interpret the y-intercept in context. If a women measures zero inches in height, her weight would be
-443 pounds, which in this case has no meaning.
e. Computer software computes the r value to be r = 0.881. Interpret the value of the correlation
coefficient, r. (Sorry I left out the first sentence in the original document.)
There is a strong positive association between a women’s height and weight.
26. The math department at a particular college wants to investigate the use of the newly developed math
tutorial program. They decide to sample students to find out about their participation. Several plans for
choosing the sample are proposed.
Plan a) Students are divided into groups according to their math level (below average, average, and above
average). Then twenty students are selected from each group and interviewed to determine whether they
participated in the school's tutorial program.
Plan b) Every hundredth student who registers is asked whether they participated in the school's tutorial
program.
Plan c) Students are divided into groups according to their math level (below average, average, and above
average). Then all students in the average and above average groups are chosen and interviewed to
determine whether they participated in the school's tutorial program.
Plan d) Students are selected to be interviewed to determine whether they participated in the school's
tutorial program. The researcher goes to the tutoring room and interviews students as they come in.
Plan e) 100 students are chosen according to student ID numbers generated by a computer program.
Plan f) Students are mailed a questionnaire to determine whether they participated in the school’s tutorial
program.
A. Which of the above would be a good method to randomly select students? Why? Plans A, B, & E.
These three methods are not biased since these methods are not biased in the way participants are
selected.
12
B. Which of the above methods might result in being biased? Why? Plans C, D, & F.
Plan C, Students in the below average group have been excluded and do not have the opportunity to be
selected.
Plan D, Only students who are already going to tutoring are being interviewed. The students who don’t
go to tutoring are being excluded.
Plan F, Students have to volunteer to be in the study. Thus, students who do not mail the questionnaire
are excluded.
C. Name the type of sampling used for each of the above.
Plan A: Stratified, Plan B: Systematic, Plan C: Cluster, Plan D: Convenience, Plan E: Random, Plan F:
Voluntary
27. Researchers reported that a newly discovered herb helps lower cholesterol in people with high
cholesterol in the United States. To test this claim, a study is conducted among 1000 high cholesterol
patients in the United States. The mean decrease in cholesterol readings is found to be 15 mg/dL.
Identify the following:
i.
the parameter all United States people with high cholesterol
ii.
the sample 1000 people with high cholesterol
iii.
the parameter (if known, state it) mean decrease in cholesterol readings in all United States people with
high cholesterol (not known)
iv.
the statistic (if known, state it) ) mean decrease in cholesterol readings for the sample of 1000 people
with high cholesterol which was a decrease of 15 mg/dL.
28. What is the difference between a parameter and a statistic? A statistic is data obtained from a sample
and a parameter is data obtained from the population.
13