Download MDM4U1 Statistics of Two Variables Test 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
MDM4U1 Data Management and Statistics
Statistics of Two Variables Test 2
Multiple Choice [12]
Identify the choice that best completes the statement or answers the question.
1. The numbers of monitors sold by a computer store on 11 consecutive business
days are: 4, 10, 10, 11, 10, 10, 13, 30, 12, 20, and 24. What are the mean, median,
and mode for this set of data?
a. mean 14, median 11, mode 10
c. mean 17, median 14, mode 10
b. mean 10, median 11, mode 14
d. mean 11, median 14, mode 10
Answer: a.
2. The numbers of parking violations on a main street during each hour from 9:00
to 17:00 are: 8, 28, 18, 12, 28, 21, 28, and 25. What are the mean, median, and
mode for this set of data?
c. mean 28, median 23, mode 21
c. mean 23, median 21, mode 28
d. mean 18, median 21, mode 28
d. mean 21, median 23, mode 28
Answer: d.
3. Which of the following is not a characteristic of the mean?
a. It is the most familiar and widely used measure of central tendency.
b. It is affected by the value of every piece of data.
c. It can be computed when data are grouped even and if the last group is openended.
d. Its value can be influenced greatly by outliers.
Answer: c.
4. A box-and-whisker plot does not show the
a. mean b. first quartile c. third quartile d. median.
Answer: a.
5. Which set of data would probably show a strong positive linear correlation?
a. Marks on a history test and the heights of the students
b. The number of defective light bulbs produced and the time of the day when they
were manufactured
c. The colour of cars sold and the annual income of the car buyers
d. The height of corn in a field and the amount of precipitation during the growing
season.
Answer: d.
6. Which of the following statements is false?
a. The correlation coefficient is the covariance divided by the product of the
standard deviations for X and Y.
b. The correlation coefficient is also called Pearson product-moment coefficient of
correlation.
c. The value of the correlation coefficient always lies in the range 0 ≤ r ≤ 1.
d. The correlation coefficient is a quantitative measure of the relationship between
two variables.
Answer: c.
7.
a.
b.
c.
d.
A set of data with a correlation coefficient of – 0.55 has a
strong negative linear correlation
moderate negative linear correlation
weak negative linear correlation
little or no linear correlation
Answer: b.
8. Using a linear-regression equation to predict values between actual data points is
an example of
a. extrapolation
c. least-squares fit
b. residuals
d. interpolation
Answer: d.
9. Using a linear-regression equation to predict values outside the range of the data
is an example of
a. extrapolation
c. least-squares fit
c. residuals
d. interpolation
Answer: a.
10. For the line of best fit in the least-square method,
a. the sum of the squares of the residuals has the greatest possible value
b. the sum of the squares of the residuals has the least possible value
c. the sum of the residuals is equal to one
d. both b) and c)
Answer: b.
11. The coefficient of determination, r2, indicates
a. the linear relationship between two variables
b. the slope of the line of best fit
c. how closly the data fit a defined curve
d. the sum of the residuals from each data point
Answer: c.
12. Which of the following statements is false?
a. The coefficient of determination can have values from – 1 to 1.
b. The coefficient of determination can be applied to any regression curve.
c. The coefficient of determination is the variation in y explained by variation in x,
divided by the total variation in y.
d. Manually calculation the coefficient of determination is not practical for large
sets of data.
Answer: a.
Completion [6]
Complete each statement.
13. In a set of data, the sum of the values of a variable divided by the total number
of values is the mean.
14. Values that are distant from the majority of values in a set of data are outliers.
15. Quantities that indicate how closely a set of data clusters around its center are
variance, interquartile range and standard deviation.
16. The difference between an individual value and the mean in a set of data is
called deviation.
17. The square root of the mean of the squares of all the deviations in a set of data
is standard deviation.
18. The mean of the squares of all of the deviations in a set of data is variance.
Problems
19. The numbers of service calls a heating company made during the first 11 days
of October are 6, 28, 28, 11, 30, 21, 17, 25, 28, 28, and 20. Find the mean, median,
and mode for this set of data.
Solution Arrange the data in increasing order:
6, 11, 17, 20, 21, 25, 26, 28, 28,28, 28, 30.
The middle number, 25, is the mean. The number 28 occurs most, it is the mean.
Average = Sum ÷ 11 = 28.
20.
The ages, in years, of a group of friends are:
29, 33, 36, 48, 50, 51, 53, 53.
a) Find the mean, median, and mode of the ages.
b) Explain what each of these measures tells you about this group of friends.
c) What do the relative values of the mean and median tell you about the group?
Solution
a) Median = (48 + 50) ÷ 2 = 49,
Mode = 53,
Mean = sum ÷ 8 = 353 ÷ 8 = 44.125.
b) The mean shows the average age of 8 friends. The median shows the middle of
the ages when they are listed in increasing order. The mode shows the most
common age.
c) The mean is lower than the median, which shows that more people are older than
the mean. However, those that are younger are significantly young enough to
influence the mean to be lower.
21. Each child in a study of infantile autism was given a behavioral test and graded
on a scale from 0 (no symptoms) to 116 (maximum severity). The scores of the 21
children in the study were:
27, 25, 65, 67, 47, 46, 63, 44, 34, 51, 17, 40, 41, 60, 24, 48, 29, 73, 60, 41, 47
Calculate the mean score, the standard deviation, and the variance.
Solution Use a calculator!
Mean = Sum ÷ 21 = 45.67,
Standard deviation = 15.62,
Variance = 243.98.
22. A comsumer magazine evaluated 39 models of bathroom scales. The table
below lists the prices for models (rounded to the nearest dollar).
Scale Model
EconoHealth A10
EconoHealth A12
EconoHealth B10
EconoHealth E10
EconoHealth Digital-10
EconoHealth E-20
EconoHealth E-30
HealthSkale 190
HealthSkale 210
HealthSkale 211
HealthSkale 290 Deluxe
HealthSkale 310
HealthSkale 1000
HealthSkale 1002
HealthXact 12573
HealthXact 12756
HealthXact 12856
Prowt P10A
Prowt Value
Prowt Value 2
Price ($)
50
50
50
28
65
40
50
22
32
30
79
50
23
20
35
24
25
120
35
35
Scale Model
Superskale 6400
Superskale 7200
Superskale 8000
Superskale 8280
SvelteCheck 12300
SvelteCheck 12300D
SvelteCheck 12509
SvelteCheck 12510
SvelteCheck Fashion
SvelteCheck Pro
SvelteCheck Xtra
Weighbeter 550
Weighbeter 801D
Weighbeter 830
Weighbeter 835
Weighbeter 950
Weighbeter 2000
Weighbeter 2100
Weighbeter Basic
What is the z-score for the price of
a) the Weighbeter 801D scale?
b) the Weighbeter 830 scale?
Solution Mean = 35.1795, Standard deviation = 21.74
a) z-score =
b) z-score =
60− 35.1795
21.74
30− 35.1795
21.74
= 1.14
= – 0.238
Price ($)
65
20
14
25
24
48
15
10
17
50
25
22
60
30
30
10
12
20
12
23. Classify the type of linear correlation you might expect with the following pairs
of variables.
a) Hours of lacrosse practice, goals scored in a lacrosse game
b) Students’ average marks, the numbers of siblings in their families
c) Distances from students’ homes to their schools, the time they spend on the
school bus each day
d) Amounts of television watched per day, scores on an aerobic fitness test
e) Years without an accident, driving insurance rates
Answers:
a) Moderate positive correlation
b) No correlation
c) Strong positive correlation
d) Moderate negative correlation
e) Strong negative correlation
24. Does the slope of the line of best fit tell you anything about the strength of a
linear correlation? Explain why or why not.
Answer No. The slope of the line of best fit shows the steepness of the relationship.
However, it does not take into account how close of far the data points are to the
line.
25. When should outliers be excluded from a regression analysis?
Answer: The outliers should be excluded from a regression analysis when they are
significantly far at the end or beginning.
26. Find a regression equation with a coefficient of determination greater than 0.98
for the following set of data.
x
y
0
2.5
1
7.5
2
22.5
3
67.5
4
202.5
5
607.5
Solution y = 2.5 × 3𝑥 , r = 1.
27. Find a regression equation with a coefficient of determination greater than 0.98
for the following set of data.
x
y
0.1
0.51
0.2
0.54
0.3
0.59
0.4
0.66
0.5
0.75
0.6
0.86
Solution Use a calculator! y = x2 + (2.7 × 10–13)x + 0.5, r2 = 1.
28. A student has a theory that left-handed people understand mathematical logic
more readily than right-handed people do. This student finds that the average mark
for the three left-handed students in his mathematics class is 3.2% higher than the
class average. Does his statistic prove the student’s theory?
Answer No. 3 data points are not enough to make a generalization about the
population of left-handed people which are millions of people. In addition, class
averages are not a good indication of mathematical logic capability because of a
variety of other factors that influence averages.