Download Unit 1 - RichmireMDM4U

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Time series wikipedia, lookup

Misuse of statistics wikipedia, lookup

Unit 1: Introduction to Data Management
1. A bar graph is most appropriate for describing
a. the frequency distribution of discrete data
b. the frequency distribution of continuous data
c. changes of data over time
d. the median of a set of data
2. A graph in which data are organized into equal class intervals along the horizontal axis is a
a. pictograph
c. bar graph
b. broken-line graph
d. histogram
3. In a stem-and-leaf plot, the leaves represent the
a. frequency of a data category
b. final digits of the values in a data category
c. initial digits of a data category
d. average of the data values
4. The following pictograph depicts the population of selected provinces.
Nova Scotia
Which of the following statements can be concluded?
a. the population of Quebec is 700 000
b. Nova Scotia is the least populated province in Canada
c. the population of Ontario is approximately double that of B.C.
d. the population of Quebec is increasing
5. Which of the following statements is not true? The media can influence a person's opinion by
a. using graphs and changing the vertical scale
b. using data taken over a short period of time
c. altering the data to make a better point
d. using data taken from a small sample size
6. Two variables have a coefficient of determination of 0.64. The correlation coefficient could be
b. 0.41
d. 0.36
7. A set of data having small residual values means that
a. the correlation coefficient is close to 0
b. there is a positive correlation
c. there is a negative correlation
d. there is a strong correlation
8. Explain the difference between the coefficient of correlation and the coefficient of determination.
9. The class scores from a mathematics quiz are as follows:
8, 6, 10, 5, 8, 6, 9, 7, 8, 10, 4, 5, 7, 8, 9, 8, 10, 5, 8, 7, 3
Organize the data in a frequency table. Calculate the mean, median and mode.
10. A marine biologist is studying a specific species of fish in a lake. She records the masses of 15 fish, in
315, 282, 400, 220, 336, 414, 278, 212, 510, 326, 407, 296, 390, 483, 356
Using class intervals of 50, construct a histogram. Be sure to include all the necessary features of a
histogram (e.g. axis labels).
11. Explain why sample size is important when trying to draw conclusions from data. You may use an
example to support your answer.
[10] 12. Students were asked hours of homework they do each week. The scatter plot below shows the results
plotted versus the grade level of the students. The positive correlation has been modeled using both
linear and quadratic regressions, the equations of the curves of best fit and their associated coefficients
of determination are shown.
a) Which model is better? Justify your answer!
b) Using the equation (given) calculate the residuals for each point.
c) Construct a residuals plot based on whichever model you determined to be best in part (a)
Unit 2: Causation
1. Is there a cause and effect relationship for each of the following situations? Explain.
a. The rate of a chemical reaction increases with temperature
b. Leadership ability has a positive correlation with academic achievement
c. The prices of butter and motorcycles have a strong positive correlation over many years
d. Sales of cell phones had a strong negative correlation with ozone levels in the atmosphere over
the last decade
e. Traffic congestion has a strong correlation with the number of urban expressways
2. Identify the most likely type of causal relationship between each of the following pairs of variables.
Assume that a strong positive correlation has been observed with the first variable as the
independent variable.
a. alcohol consumption, incidence of automobile accidents
b. score on physics examination, score on calculus examination
c. increase in pay, job performance,
d. population of rabbits, consumer price index
e. number of scholarships received by a student, number of jobs offers upon graduation
f. coffee consumption, insomnia
g. funding for athletic programs, number of medals won at the Olympic games
3. Over 10 months, the prices of cheese and butter sold are positively correlated. Provide a common
cause factor for this association.
4. Define and provide an example for each of the following terms:
a. Reverse cause and effect association
b. Accidental relationship
c. Attribute Name
Unit 3: Statistics – One Variable Data
1) A researcher systematically randomly selects 10 of the 700 houses in his small town to conduct a
survey about people’s favourite foods. At all 10 houses someone answers the door and is also
willing to participate in his survey. He started at 9am and was finished the survey by 12pm.
a. What is the sampling interval in the researcher’s study?
b. Identify a possible source of sampling bias that may affect his survey results. Explain what
steps the researcher could have taken to reduce or minimize this bias.
2) A researcher is surveying people about their attitudes towards animal rights legislation. He surveys
people as they eat lunch in the ‘Vegetarian Fast Food Restaurant’ on Phillip Street in Waterloo. His
survey results will be used to determine the attitudes of Waterloo residents towards new federal
animal rights laws that have just been passed.
a. If the researcher uses systematic random sampling to create a sample consisting of 51 of the
255 customers in the restaurant, what is the sampling interval?
b. If the researcher instead constructs his sample by surveying one person at each table, what
kind of bias could result? Explain!
c. Identify a possible source of sampling bias in this scenario and explain what steps the
researcher could take to minimize it.
3) In a factory with 32 employees, the number of sick days each employee took in 2008 has the
following frequency distribution table:
Sick Days Taken
a. Calculate the mean, median and mode for this data.
b. Calculate the variance and standard deviation.
c. How many employees are within one standard deviation of the mean?
4) In MDM4UI, the final grade of a student is calculated as follows:
Culminating Project
Final Exam
Tau scored 60 on his final exam, 71 on his Assignments/Quizzes/Tests component, and ended up
with 73 in the course. What was his score on the culminating project?
5) The total daily rainfall (in mm) recorded at the University of Waterloo weather station for 5 days
10.8 11.2 13.4 14.5 15.1
a. Calculate for this data.
b. Calculate for this data.
c. How many of the 5 days’ rainfall data fall within one standard deviation of the mean?
Unit 4: Statistics – Normal Distributions
1) Use z-scores to determine who has the more impressive quiz mark: Xi, who has a mark of 12.5
where his class’ mean is 12.5 and standard deviation is 2.1 or Eta who has a mark of 9.0 where her
class’ mean is 8.1 and standard deviation is 0.7.
2) Nu’s graduating average puts him at the 96th percentile. If 1350 students graduated with him, how
many had a graduating average equal to or higher than Nu’s?
3) Miss Omnicron is 168cm tall. In her village, heights are normally distributed with a mean of 194cm
and a standard deviation of 6cm.
a. What percentile does Miss Omnicron fall in?
b. Using the percentile calculated in (a), what is the probability that Miss Omnicron’s next
door neighbour Mrs. Upsilon is taller than her?
c. What percentage of the villages population is shorter than 170cm?
d. If the town has 157 residents, how many are taller than 170cm?
4) An MDM4UI student is surveying Canadians for a class project about ear wax accumulations. He
wishes to have a confidence level of 95%
a. What is the approximate margin of error if he surveys 150 people?
b. How many people would he need to survey to have a margin of error of 0.5%?
5) The following table shows the cost of menu items at 3 local restaurants:
Adult Entrée
Kid’s Entrée
a) Create an index that calculates the cost of 2 appetizers, 2 adult entrees, 2 kids entrees, 4
drinks and 4 desserts.
b) Use your index to rank the restaurants from least to most expensive.