Download Chapter 3: Descriptive Measures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 3:
Descriptive Measures
STP 226: Elements of Statistics
Jenifer Boshes
Arizona State University
3.1: Measures of Center
Mean
The mean of a data set is the sum of the
observations divided by the number of
observations. (average)
Example 1:
Example 2:
The following data set is
comprised of a set of
homework grades. Find the
mean homework grade.
The following data set is
comprised of the lengths of a
rare orchid (in inches). Find the
mean orchid length.
93 87 90 90 82 85
88 90 93 83 90
13 18 14.5
14 15 14
Interpret:
Interpret:
Median
To find the median of a data set:
Arrange the data in increasing order.
• If the number of observations is odd, then the median is the observation
exactly in the middle.
• If the number of observations is even, the median is the mean of the two
middle observations in the ordered list.
Example 3:
Find the median homework score.
93 87 90 90 82 85
88 90 93 83 90
Interpret:
Example 4:
Find the median orchid length.
13 18 14.5
14 15 14
Interpret:
Mode
The mode of a data set is value that occurs with greatest frequency.
First, find the frequency of each value in the data set.
• If no value occurs more than once, there is no mode.
• Otherwise, any value that occurs with greatest frequency is a mode.
Example 5:
Example 6:
Find the mode homework score.
Find the mode orchid length.
93 87 90 90 82 85
88 90 93 83 90
13 18 14.5
14 15 14
Interpret:
Interpret:
Example 7:
Find the mean, median, and mode of each of the data sets.
Data Set I
Data Set II
45 90 78 88
61 99 82
68 16 86 49
72 80 96
88 86 82 76
78 77 66
Skewed vs. Symmetric
(a) Right skewed: The mean is to the right
of the median.
(b) Symmetric: The mean is equal to the
median.
(c) Left skewed: The mean is to the left of
the median.
When to use each…
Median: Use the median when your data set
has very extreme values.
A resistant measure (or robust) is not
sensitive to the influence of a few extreme
observations.
Mode: Use the mode when you have qualitative
data.
Sample Mean
Example 8:
The exam scores for a student are: 61,
97, 78, 86, and 73.
(a) Use mathematical notation to
represent the individual exam scores.
(b) Use summation notation to express
the sum of the five exam scores.
(c) Find x for the exam data.
3.2: Measures of
Variation
Example 1:
The exam scores for student A are: 100,
100, 90, 90, and 70. The exam scores
for student B are: 90, 88, 88, 93, and
91. Compare the means and
medians.
Who is the better student? Who is more consistent?
Range
Standard Deviation
The standard deviation measures
variation by indicating, on average,
how far the observations are from the
mean.
Sample Standard Deviation
1. For each observation, calculate the deviation
from the mean.
2. Square this value.
3. Add up the squares.
4. Divide by n – 1.
5. Take the square root.
Example 2:
Find the standard deviation for student
A:
100, 100, 90, 90, and 70.
1.
2.
3.
4.
5.
For each observation, calculate the deviation from
the mean.
Square this value.
Add up the squares.
Divide by n – 1.
Take the square root.
Example 3:
Find the standard deviation for student
B:
90, 88, 88, 93, and 91.
1.
2.
3.
4.
5.
For each observation, calculate the deviation from
the mean.
Square this value.
What can we say about the
Add up the squares.
relative performance between
students A and B?
Divide by n – 1.
Take the square root.
Comments on Standard Deviation
s2 is called the sample variance.
The units of s2 are the square of the original units.
The units of s are the same as the original units.
s is ALWAYS ≥ 0. Why?
s is a measure of how much each point deviates from the
mean deviation.
Do not perform any rounding until the computation is
complete; otherwise, substantial roundoff error can result.
Almost all the observations in any data set lie within three
standard deviations to either side of the mean. This is
known as Chebyshev’s Rule.
Example 3:
• How many observations for student B are within one standard deviation of the
mean?
• How many observations for student B are within two standard deviation of the
mean?
• How many observations for student B are within three standard deviation of the
mean?
3.3: The Five-Number
Summary;
Boxplots
Robustness
Recall: What does it mean for a statistic to
be robust?
Name a statistic that is not robust.
Name a statistic that is robust
Quartiles
Quartiles divide a data set
into quarters. Q1, Q2, and Q3
are the three quartiles.
The second quartile (Q2) is
the median of the entire data
set.
The first quartile (Q1) is the
median of the portion of the
data set that lies at or below
Q 2.
The third quartile (Q3) is the
median of the portion of the
data set that lies at or above
Q 2.
Example 1:
Fifteen people were asked how
many baseball games they
had attended the previous
season.
Find the quartiles.
1.
2.
3.
4.
Order the data.
Find the median of the data
set. This is Q2.
Find the median of the data
that lies at or below the
median of the entire data set.
This is Q1.
Find the median of the data
that lies at or above the
median of the entire data set.
This is Q3.
12
25
8
6
1
0
42
19
17
0
63
14
22
31
34
Interquartile Range (IQR)
The IQR is the difference between the first
and third quartiles; that is, IQR = Q3 – Q1.
It is the preferred measure of variation
when the median is used as the measure
of center. Like the median, the IQR is a
resistant or robust measure.
Example 2:
What is the IQR for the baseball data?
Interpret:
Five-Number Summary
Min
Q1
Q2
Q3
Max
Example 3:
Find the five-number summary for the
baseball data.
Outliers
Outliers are observations that fall well
outside the overall pattern of the data.
They may result from a recording error,
obtaining an observation from a different
population, or an unusual extreme value.
Lower and Upper Limits
Lower limit: Q1 – 1.5 · IQR
Upper limit: Q3 + 1.5 · IQR
Observations that lie outside the upper
and lower limits – either below the lower
limit or above the upper limit – are
potential outliers.
Example 4:
For the baseball data:
(a) Obtain the lower and upper limits.
(b) Determine the potential outliers, if any.
(c) Construct a modified boxplot.
12
25
8
6
1
0
42
19
17
0
63
14
22
31
34
• Adjacent values of a set are the most
extreme observations that are not potential
outliers.
Steps for Constructing a Modified
Boxplot
Steps for Constructing a Boxplot
Boxplots
Boxplots are useful for comparing two or
more data sets.
Notice how box width and whisker length
relate to skewness and symmetry.
Bibliography
Some of the textbook images embedded in
the slides were taken from:
Elementary Statistics, Sixth Edition; by
Weiss; Addison Wesley Publishing
Company
Copyright © 2005, Pearson Education, Inc.