Download MATH 112 Section 7.2: Measuring Distribution, Center, and Spread

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Measures of Center
Measures of Spread
Distributions
MATH 112
Section 7.2: Measuring Distribution, Center, and
Spread
Prof. Jonathan Duncan
Walla Walla College
Fall Quarter, 2006
Conclusion
Measures of Center
Measures of Spread
Outline
1
Measures of Center
The Arithmetic Mean
The Geometric Mean
The Median
The Mode
2
Measures of Spread
3
Distributions
4
Conclusion
Distributions
Conclusion
Measures of Center
Measures of Spread
Distributions
Conclusion
Analyzing Data
In the last section we focused on presenting an overall picture of
data using tables or graphs. Now we will examine ways to analyze
certain characteristics of data such as the data set’s center, spread,
and distribution.
Example
In your first year as a teacher, you and another teacher both give the
same test to your classes of 25 children. The two classes have the
following scores:
93
85
77
71
68
Your Class
92 92 90
84 80 79
76 75 74
71 71 71
66 62 59
87
78
71
70
53
98
93
82
70
61
Other Class
98 97 95
89 88 87
77 76 72
65 64 63
60 58 58
94
84
71
61
47
Measures of Center
Measures of Spread
Distributions
Conclusion
The Arithmetic Mean
Defining the Arithmetic Mean
If you wanted one number to capture your classes’ performance,
what would it be?
Arithmetic Mean
The arithmetic mean is found by adding all numbers in the data
set and dividing by the number of values in the data set.
Example
≈ 75.8
Your Class: 53+59+···+92+93
25
47+58+···+98+98
Other Class:
≈ 76.4
25
Based on these numbers, which class did better? Is the arithmetic
mean an accurate summary of the exam scores?
Measures of Center
Measures of Spread
Distributions
Conclusion
The Arithmetic Mean
The Arithmetic Mean and Outliers
There are some issues to keep in mind when using the arithmetic
mean. One of these issues is the effect of outliers–data points
which are much smaller or larger than the typical value.
Example
A new student transfers into your class and scores 100% on the
test. How does this affect the mean score?
The arithmetic mean changes from 75.8 to 76.7.
Example
One of the students in your class who scored a 71 on the exam is
found to have cheated and their score is changed to a zero. How
does this affect your mean score?
The arithmetic mean changes from 75.8 to 73.0.
Measures of Center
Measures of Spread
Distributions
Conclusion
The Arithmetic Mean
Cautions about the Arithmetic Mean
Using the arithmetic mean to measure the center of a data set has
both advantages and disadvantages.
Advantages of the Arithmetic Mean
The following are advantages of the arithmetic mean.
The arithmetic mean is the standard measure of center. It is
usually called the “average”.
The arithmetic mean detects and is effected by outliers.
Disadvantages of the Arithmetic Mean
The following are disadvantages of using the arithmetic mean.
The arithmetic mean is strongly affected by outliers and can
be skewed because of outliers.
The arithmetic mean my not accurately represent the
“typical” value of a data point.
Measures of Center
Measures of Spread
Distributions
Conclusion
The Geometric Mean
Defining the Geometric Mean
The geometric mean is found using a process similar to the
arithmetic mean. The difference is that instead of using addition
and division, we use multiplication and roots.
The Geometric Mean
To find the geometric mean, multiply all numbers in the data set
together and then take the nth root of this product where n is the
number of values in the data set.
Example
√
Your Class: 25√53 × 59 × · · · × 92 × 93 ≈ 75.1
Other Class: 25 47 × 58 × · · · × 98 × 98 ≈ 74.8
Notice that the geometric mean and arithmetic mean are usually
quite close. For your class, the arithmetic mean was 75.8 compared
to a geometric mean of 75.1.
Measures of Center
Measures of Spread
Distributions
Conclusion
The Geometric Mean
Cautions about the Geometric Mean
The geometric mean also has advantages and disadvantages.
Advantages of the Geometric Mean
The following are advantages of the geometric mean.
Like the arithmetic mean, the geometric mean will detect
outliers.
Each value in the data set contributes to the geometric mean.
Disadvantages of the Geometric Mean
The following are disadvantages of the geometric mean.
In a data set containing a zero, the geometric mean will not
work.
The geometric mean is not a well known measure of center.
Measures of Center
Measures of Spread
Distributions
Conclusion
The Median
Defining the Median
While the arithmetic mean is often used to measure the center of a
set of data, they are both sensitive to outliers and therefore not
appropriate for data sets in which there outlying values.
The Median
To find the median of a set of data, arrange the values in order
from least to greatest. The middle number (or the arithmetic
mean of the two middle numbers if the number of values is even)
is the median.
Example
Your Class: 75 is the middle number
Other Class: 76 is the middle number
In our example, the mean and median are very close. The mean for
your class was 75.8 and for the other class it was 76.4.
Measures of Center
Measures of Spread
Distributions
Conclusion
The Median
The Median and Outliers
The median does not make use of every number in the data set
and because of this, it is not as sensitive to outliers. Consider the
following smaller set of data.
Example
Find the mean and median of 1, 70, 71, 72.
The mean is 1+70+71+72
≈ 53.5
4
The median is the mean of 70 and 71, so 70.5
Measures of Center
Measures of Spread
Distributions
Conclusion
The Median
Cautions about the Median
The median is a useful tool for measuring centers in data sets in
which there are many or extreme outliers. However, there are still
some cautions to keep in mind when using the median.
Advantages of the Median
The following are some advantages of using the Median to
measure center.
The median is not sensitive to outliers
One half of the data is below the median and one half above.
Disadvantages of the Median
The following are some disadvantages of using the Median.
Not every number is included in the computation of the
median.
The median is not as well known as the mean.
Measures of Center
Measures of Spread
Distributions
Conclusion
The Mode
Defining the Mode
The last measure of center which we will examine is the mode.
The mode is the only way to measure the center of a set of
non-numeric data.
The Mode
The mode of a set of data is the value which appears most often.
If no number appears more than once, there is no mode. If several
numbers appear the same maximal number of times, the data is
multi-modal.
Example
Your Class: The mode is 71
Other Class: The data has two modes–58 and 61
Measures of Center
Measures of Spread
Distributions
Conclusion
The Mode
More on the Mode
The mode is a useful tool when measuring the center of a set of
data which is not numeric.
Example
Find the mode of the classes favorite pizza topping.
Advantages and Disadvantages of the Mode
Advantages: works well with non-numeric data, usually easy
to find, is the most “typical” data.
Disadvantages: may not exist, my not be related to the
“real” center of the data.
Measures of Center
Measures of Spread
Distributions
Conclusion
The Mode
Visualizing the Center
Graphs can be used to visualize the center of a set of data.
Frequency Distribution
A frequency distribution is a table in which data values are
arranged into bins. It can also be a bar graph of the frequency
distribution table.
Example
Create a frequency distribution table and graph for your test scores.
Stem-and-Leaf Plots
A stem-and-leaf plot also gives us as a picture of the data, but it
preserves the individual values.
Example
Construct a stem-and-leaf plot for your test scores.
Measures of Center
Measures of Spread
Distributions
Conclusion
Measuring Spread
One of the things you may have noticed about the graphs we just
created is that they give a good picture of how spread out the data
is. There are a couple of ways that we can measure spread.
Measuring the Spread of Data
There are two ways in which we will look at the spread of data.
One each associated with the mean and with the median.
When using the Median for the center, we often use a
5-number summary to measure the spread of the data.
When using the Mean for the center, we often use the
standard deviation to measure the spread of the data.
Measures of Center
Measures of Spread
Distributions
Conclusion
The 5-Number Summary
The five number summary consists 5 numbers which give us a
picture of how spread out a given data set is.
5-Number Summary
The 5-Number Summary for a set of data consists of:
The Minimum
The 1st Quartile (Median of the bottom half)
The Median
The 3rd Quartile (Median of the top half)
The Maximum
Example
Find the 5-number summary for your class and the other class and
display them graphically as a “box plot”.
Measures of Center
Measures of Spread
Distributions
Conclusion
The Standard Deviation
The standard deviation of a set of data is a measure of the average
distance of each data point from the mean of the data set.
Standard Deviation
The standard deviation of a set of data {x1 , x2 , . . ., xn } is
r
(x1 − x)2 + (x2 − x)2 + · · · + (xn − x)2
n
Example
Find the standard deviation of 64, 75, 75, 82, 90, 90, and 95.
Measures of Center
Measures of Spread
Distributions
Conclusion
Relating Center and Spread to a Picture
The measures of center and spread which we have studied thus far
relate to a the larger picture of the data called the spread.
Types of Distributions
There are several different types of distributions. Consider each of
the following.
Uniform Distribution
Normal Distribution
Skewed Distribution
Example
Which distribution does the distribution of your exam scores most
closely match?
Measures of Center
Measures of Spread
Distributions
Conclusion
The Normal Distribution
The normal distribution can be of particular importance because it
is a very well understood distribution.
Properties of the Normal distribution
The normal distribution has the following properties.
The center of the distribution is the mean of the data set
The distribution is symmetric about the mean
68% of the data is within one standard deviations of the mean
95% of the data is within two standard deviations of the mean
99.8% of the data is within two standard deviations
Example
A standardized test has a mean score of 72 and a standard deviation of 9. Suppose that 5000 students took the
test and their scores were normally distributed. How many would we expect to score a) above 72 b) above 81 c)
below 36?
Measures of Center
Measures of Spread
Distributions
Important Concepts
Things to Remember from Section 7.2
1
Finding measures of center
1
2
3
Arithmetic mean
Median
Mode
2
Creating frequency distributions and stem-and-leaf plots
3
Finding measures of spread
1
2
5-number summary
standard deviation
4
Identifying distributions
5
Working with the normal distribution
Conclusion