Download Measures of central tendency

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Unit 5: Statistical Reasoning
Outcome FM20.6
Demonstrate an understanding of normal distribution, including standard deviation and z-scores.
Chapter 5 Definitions:
______________________: A value in a data set that is very different from other values in the set.
______________________: A graph that records each data value in a data set as a point above a
number line.
______________________: The difference between the maximum value and the minimum value in a
data set.
______________________: A measure of central tendency determined by dividing the sum of all
the values in a data set by the number of values in the set.
______________________: A measure of central tendency represented by the middle value of an
ordered data set. For example, the median of 3, 3, 4, 5, 6 is 4. If the data set has more than one
middle value, the median is the mean of the two. For example, the median of 3, 3, 5, 6 is also 4.
______________________: A measure of central tendency represented by the value that occurs
most often in a data set. For example, the mode of 2, 2, 4, 5, 6 is 2.
______________________: A measure that varies by the spread among the data in a set; dispersion
has a value of zero if all the data in a set is identical, and it increases in value as the data becomes
more spread out.
______________________________________: A set of intervals (table or graph), usually of equal
width, into which raw data is organized; each interval is associated with a frequency that indicates the
number of measurements in this interval.
______________________: The graph of a frequency distribution, in which equal intervals of values
are marked on a horizontal axis and the frequencies associated with these intervals are indicated by
the areas of the rectangles drawn for these intervals.
______________________________________________: The graph of a frequency distribution,
produced by joining the midpoints of the intervals using straight lines.
______________________: The difference between a data value and the mean for the same set of
data.
________________________________________________: A measure of the dispersion or
scatter of data values in relation to the mean; a low standard deviation indicates that most data values
are close to the mean, and a high standard deviation indicates that most data values are scattered
farther from the mean.
_______________________________________: A symmetrical curve that represents the normal
distribution; also called a bell curve.
_______________________________________________: Data that, when graphed as a
histogram or a frequency polygon, results in a unimodal symmetric distribution about the mean.
______________________: A standardized value that indicates the number of standard deviations
of a data value above or below the mean.
______________________________________________: A normal distribution that has a mean
of zero and a standard deviation of one.
__________________________________________: A table that displays the fraction of data
with a z-score that is less than any given data value in a standard normal distribution. (There is a zscore table on pages 592 to 593.)
___________________________________________: The possible difference between the
estimate of the value you’re trying to determine, as determined from a random sample, and the true
value for the population; the margin of error is generally expressed as a plus or minus percent, such as
65%.
___________________________________________: The interval in which the true value you’re
trying to determine is estimated to lie, with a stated degree of probability; the confidence interval
may be expressed using 6 notation, such as 54.0% 6 3.5%, or ranging from 50.5% to 57.5%
____________________________________________: The likelihood that the result for the
“true” population lies within the range of the confidence interval; surveys and other studies usually use
a confidence level of 95%, although 90% or 99% is sometimes used.
5.1: Exploring Data
Example 1
The payrolls for three small companies are shown in the
table. Figures include year-end bonuses. Each company
has 15 employees.
What is the best indicator of an “average” salary
for each company?
A.
What is the range of salaries for each company?
Media Focus Advertising:
Computer Rescue:
Auto Value Sales:
B.
Examine the data. Which companies have data that would be considered outliers? Tell
how you know.
C.
Determine the measures of central tendency (mean, median and mode) for the
salaries for each company.
Measure
Media focus Ad.
Computer Rescue
Auto Value Sales
Mean* (average)
Median (middle)
Mode (most
frequent)
* Sometimes you can exclude outliers when calculating mean
D.
Which measure of central tendency is most affected by outliers?
E.
Create a line plot for each of the three companies. Look for outliers, measures of
central tendency, and the range on your line plots. Which of these features are easily
visible?
Media Focus Advertising:
Computer Rescue:
Auto Value Sales:
F.
Which measure of central tendency best illustrates the “average” salary for each
company? Why?
Example 2:
Paulo needs a new battery for his car. He is trying to decide between two different brands.
Both brands are the same price. He obtains data for the lifespan, in years, of 30 batteries of
each brand, as shown below:
How can you compare the data to help Paulo decide which brand of battery to buy?
A.
Use the range and measures of central tendency to describe how the data in each set is
distributed. Organize each data set using a line plot to help you visually compare the
dispersions of the data sets. Describe any similarities and differences between the two
sets of data.
B.
Explain why the mean and median do not fully describe the difference between these
two brands of batteries.
Key Ideas
• Measures of central tendency (_____________________________________________) are
not always sufficient to represent or compare sets of data.
• You can draw inferences from numerical data by examining how the data is
______________________________ around the mean or the median.
5.2: Frequency Tables, Histograms, and Frequency Polygons
Flooding is a regular occurrence in the Red River basin. During the second half of the 20th century,
there have been nine notable floods, four of which have been severe, occurring in 1950, 1979, 1996,
and 1997. The flood that occurred in 1997 is known as the “flood of the century” in Manitoba and
North Dakota.
The following data represents the flow rates of the Red River from 1950 to 1999, as recorded at the
Redwood Bridge in Winnipeg, Manitoba:
How can you approximate the water flow rate associated with serious flooding in Winnipeg?
Determine the water flow rate that is associated with serious flooding by creating a frequency
distribution. A frequency distribution can take three forms:
1.
Frequency Distribution Table
Step 1 – Determine the range of the data so that you can choose a suitable interval. Keep in
mind that most frequency distribution tables have between 5 and 12 intervals.
Step 2 – Create a table with column headings like the one below.
Flow Rate (m3/s)
Tally
Frequency
Step 3 – Use your table to answer the question.
Determine the water flow rate that is associated with serious flooding: _________________
2.
Histogram
Step 1 – Create a frequency table.
Step 2 – Create a graph where the intervals are labelled
on the horizontal axis and the frequency is labelled on the
vertical axis. Give your graph an appropriate title. A
histogram looks a lot like a bar graph, but the bars are all
touching. All bars must be the same width.
Step 3 – Answer the question.
Determine the water flow rate that is associated with serious flooding: _________________
3.
Frequency Polygon
Step 1 – Create a frequency table. For each
interval, determine the midpoint by adding the
boundaries of each interval and dividing by
two.
Step 2 – Create a graph where the midpoints
are labelled on the horizontal axis and the
frequency is labelled on the vertical axis. Give
your graph an appropriate title. A frequency
polygon looks a lot like a line graph. Once you
plot the points on the grid, join the points with
straight lines. Be sure to close the polygon.
Step 3 – Answer the question.
Most common water flow interval:
Example 2 – The magnitude of an earthquake is measured using the Richter scale. Examine
the histograms for the frequency of earthquake magnitudes in Canada from 2005 to 2009.
Which of these years could have had the most damage from earthquakes? Give reasons for
your answer.