Download Section 4 powerpoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
14 Descriptive Statistics
14.1 Graphical Descriptions of Data
14.2 Variables
14.3 Numerical Summaries
14.4 Measures of Spread
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 2
The Range
An obvious approach to describing the
spread of a data set is to take the difference
between the highest and lowest values of
the data. This difference is called the range
of the data set and usually denoted by R.
Thus, R = Max – Min. The range of a data
set is a useful piece of information when
there are no outliers in the data. In the
presence of outliers the range tells a
distorted story.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 3
The Range
For example, the range of the test scores in
the Stat 101 exam is 24 – 1 = 23 points, an
indication of a big spread within the scores
(i.e., a very heterogeneous group of
students). True enough, but if we discount
the two outliers, the remaining 73 test
scores would have a much smaller range of
16 – 6 = 10 points.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 4
The Interquartile Range
To eliminate the possible distortion caused
by outliers, a common practice when
measuring the spread of a data set is to use
the interquartile range, denoted by the
acronym IQR. The interquartile range is the
difference between the third quartile and the
first quartile (IQR = Q3 – Q1), and it tells us
how spread out the middle 50% of the data
values are. For many types of real-world
data, the interquartile range is a useful
measure of spread.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 5
Example 14.18 2007 SAT Math Scores:
Part 3
The five-number summary for the 2007 SAT
math scores was Min = 200 (yes, there were
a few jokers who missed every question!), Q1
= 430, M = 590, Max = 800 (there are still a
few geniuses around!). It follows that the
2007 SAT math scores had a range of 600
points (800 – 200 = 600) and an interquartile
range of 160 points (IQR = 590 – 430 = 160).
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 6
Standard Deviation
The most important and most commonly
used measure of spread for a data set is the
standard deviation. The key concept for
understanding the standard deviation is the
concept of deviation from the mean. If A is
the average of the data set and x is an
arbitrary data value, the difference x – A is
x’s deviation from the mean. The deviations
from the mean tell us how “far” the data
values are from the average value of the
data. The idea is to use this information to
figure out how spread out the data is.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 7
Standard Deviation
The deviations from the mean are
themselves a data set, which we would like
to summarize. One way would be to
average them, but if we do that, the
negative deviations and the positive
deviations will always cancel each other out
so that we end up with an average of 0.
This, of course, makes the average useless
in this case. The cancellation of positive and
negative deviations can be avoided by
squaring each of the deviations.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 8
Standard Deviation
The squared deviations are never negative,
and if we average them out, we get an
important measure of spread called the
variance, denoted by V. Finally, we take the
square root of the variance and get the
standard deviation, denoted by the Greek
letter  (and sometimes by the acronym
SD).
The following is an outline of the definition
of the standard deviation of a data set.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 9
■
■
■
THE STANDARD DEVIATION
OF A DATA SET
Let A denote the mean of the data set.
For each number x in the data set,
compute its deviation from the mean
(x – A) and square each of these
numbers. These numbers are called
the squared deviations.
Find the average of the squared
deviations. This number is called the
variance V.
The standard deviation is the square


root of the variance   V .
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 10
Example 14.19 Calculation of a SD
Over the course of the semester, Angela
turned in all of her homework assignments.
Her grades in the 10 assignments (sorted
from lowest to highest) were 85, 86, 87, 88,
89, 91, 92, 93, 94, and 95. Our goal in this
example is to calculate the standard deviation
of this data set the old-fashioned way (i.e.,
doing our own grunt work).
The first step is to find the mean A of the data
set. It’s not hard to see that A = 90.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 11
Example 14.19 Calculation of a SD
The second step is to
calculate the deviations from
the mean and then the
squared deviations. When we
average the squared
deviations, we get 11. This
means that the variance is
V = 11 and thus the standard
deviation (rounded to one
decimal place) is
  11  3.3 points.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 12
Interpreting the Standard Deviation
It is clear from just a casual look at Angela’s
homework scores that she was pretty
consistent in her homework, never straying
too much above or below her average score
of 90 points. The standard deviation is, in
effect, a way to measure this degree of
consistency (or lack thereof). A small
standard deviation tells us that the data are
consistent and the spread of the data is
small, as is the case with Angela’s homework
scores.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 13
Interpreting the Standard Deviation
The ultimate in consistency within a data set
is when all the data values are the same (like
Angela’s friend Chloe, who got a 20 in every
homework assignment). When this happens
the standard deviation is 0.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 14
Interpreting the Standard Deviation
On the other hand, when there is a lot of
inconsistency within the data set, we are
going to get a large standard deviation. This
is illustrated by Angela’s other friend, Tiki,
whose homework scores were 5, 15, 25, 35,
45, 55, 65, 75, 85, and 95. We would expect
the standard deviation of this data set to be
quite large–in fact, it is almost 29 points.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 15
Summary of the Standard Deviation
The standard deviation is arguably the most
important and frequently used measure of
data spread.Yet it is not a particularly intuitive
concept. Here are a few basic guidelines that
recap our preceding discussion:
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 16
Summary of the Standard Deviation
■
The standard deviation of a data set is
measured in the same units as the original
data. For example, if the data are points on
a test, then the standard deviation is also
given in points. Conversely, if the standard
deviation is given in dollars, then we can
conclude that the original data must have
been money–some prices, salaries, or
something like that. For sure, the data
couldn’t have been test scores on an
exam.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 17
Summary of the Standard Deviation
■
It is pointless to compare standard
deviations of data sets that are given in
different units. Even for data sets that are
given in the same units–say, for example,
test scores–the underlying scale should be
the same. We should not try to compare
standard deviations for SAT scores
measured on a scale of 200–800 points
with standard deviations of a set of
homework assignments measured on a
scale of 0–100 points.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 18
Summary of the Standard Deviation
■
For data sets that are based on the same
underlying scale, a comparison of standard
deviations can tell us something about the
spread of the data. If the standard
deviation is small, we can conclude that
the data points are all bunched together–
there is very little spread. As the standard
deviation increases, we can conclude that
the data points are beginning to spread
out.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 19
Summary of the Standard Deviation
The more spread out they are, the larger
the standard deviation becomes. A
standard deviation of 0, means that all
data values are the same.
As a measure of spread, the standard
deviation is particularly useful for analyzing
real-life data.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.4 - 20