Download 33_center_spread_with_standard_deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Describing Distributions of
Quantitative Data
Center and Spread
Shape, Center, Spread
After spending some time in previous units describing the
shape of quantitative data, in this unit we will describe the
center and spread of quantitative data.
Objective:
1. Students will be able to calculate measures of center
including mean, median and midrange. Students will
also be able to calculate measures of spread including
IQR and standard deviation.
2. Students will know which measure of center and
spread are appropriate for the data that is being
described.
Measures of Center
Midrange: A simple measure of center taking the
average of the maximum and minimum value.
Max  Min
Midrange 
2
Example: Find the midrange of the following data.
6, 2, 5, 8, 10, 15, 20, 3, 4, 8
First, put the data in order: 2, 3, 4, 5, 6, 8, 8, 10, 15, 20
Max = 20
Min = 2
20  2 22
Midrange 

 11
2
2
Measures of Center
Mean:
Commonly referred to as the “average” of a
set of data, the mean takes the sum of the
data and divides by the number of data
y
entries.
Sum of entries
mean 
Number of entries
n
Example: Find the mean of the following data.
6, 2, 5, 8, 10, 15, 20, 3, 4, 8
Add the 10 numbers and divide by 10:
6+2+5+8+10+15+20+3+4+8  81  8.1
10
10
Measures of Center
Median:
The middle value of an ordered set of data.
If there is an odd number of data entries,
the median is the middle value. If there is
an even number of entries, the median is
the mean of the two middle values.
Example: Find the median of the following sets of data.
a) 5, 4, 9, 20, 15
b) 6, 2, 5, 8, 10, 15, 20, 3, 4, 8
4, 5, 9, 15, 20
2, 3, 4, 5, 6, 8, 8, 10, 15, 20
Median = 9
median = 6+8 = 7
2
Measures of Center
Which Measure of CENTER?
Midrange:
Very sensitive to small changes in data.
Not a very good measurement to describe
a whole set of data.
Mean:
Good for describing symmetric data.
Median:
Good for describing skewed data or
data with outliers.
(If data is symmetric, the median and mean will be very similar
numbers. If the median and mean are very different, the data is
skewed or has outliers.)
Average?
Can you calculate:
a)
b)
c)
d)
Your average test grade?
The average heart rate?
The average family?
The average song title?
"Average" is a term used to mean "typical". With
numeric data we need to be more specific. (Is your
typical test grade the mean or median of your test
grades?) If your data is not numeric it does not make
sense to try to calculate a mean or median to describe
an average.
Measures of Spread
Range:
The maximum value minus the minimum
value of a set of data. A simple measure of
spread good for determining a scale for a
graph.
max  min
Example: Find the range of the following data.
6, 2, 5, 8, 10, 15, 20, 3, 4, 8
First, put the data in order: 2, 3, 4, 5, 6, 8, 8, 10, 15, 20
Max = 20
Min = 2
range  20  2  18
Measures of Spread
IQR:
The difference between the middle 50%
of your data. Best used to describe the
range of a skewed data.
Q3  Q1
Example: Find the IQR of the following data.
2, 3, 4, 5, 6, 8, 8, 10, 15, 20
Min = 2
Q1 = 4
Med = 7
Q3 = 10
Max = 20
Median
IQR  10  4  6
Measures of Spread
Standard
Deviation: The average distance the data values
are from the mean. Best used to
describe the range of symmetric data.
s

 y y

2
n 1
* This formula is difficult to understand at first glance. It will
be explained in subsequent slides.
Standard Deviation
Variance and Standard Deviation
Notation: For a set of data, {y1, y2, y3, y4, …, yn}
n:
The number of data entries
y:
sum of data entries y

mean =
n
n
s2:
variance
s:
standard deviation
Variance
2
s
Another measure of spread (best used for symmetric data),
variance finds the "almost average" distance of each data
point from the mean.
The symbol used for variance is s2 because it is the square of the
standard deviation. (standard deviation is the square root of variance)
Distance from
the mean
Sum
s 
2

 y y
n 1

2
Squared
One less than the total
# of data entries:
the “almost average”
Variance
2
s
Ex) Find the variance of the data. These are the This is
y – values.
6, 8, 10, 14, 17
Square each distance, then add6(Σ)
 8them
 10 together.
14  17 55
 11
First calculate the mean
( 2).
2
2
2
2
(5)  (3)  (1)  (3)
5  (6)
5
Then find the distance of each data point from the
25  9  1  9  36  80
mean. (y - )
(6 number
11) (8 by
(10 {n11)
11)(n-1).
(14number
 11) (17  11)
Finally, divide this
is the
5) in this
(3)example,
(1) so n-1=4
(3) }
(6)
of data entries. ({n=5
80
Square each distance, then
add (Σ) them together.
 20
4 2
2
2
2
2
(5)  (3)  (1) 2 (3)  (6)
The variance of this data s = 20.
25  9  1  9  36  80
Problem with Variance
Problem with Variance
The problem with variance is that it always yields
square units. We don’t usually want to compare
(units)2.
i.e.- square meters (m2), mpg2, (test grade)2
We want to describe our spread in terms of the same units
as our original data. If the original data is in meters, we
want to know the spread in terms of meters. If the original
data are test grades, we want to know the spread in terms
of test grades.
Problem with Variance
To fix this problem, we use the standard deviation
(s) as our measure of spread. Standard deviation
is the square root of variance (s2), so the units for
standard deviation will be the same as the units in
the original data.
standard deviation = variance
s s
2
Standard Deviation
2

 y y
variance, s =

2
n 1

 y y
standard deviation, s =
n 1

2
Standard Deviation
Recall the data from the previous example:
6, 8, 10, 14, 17
We found that the variance (s2) for this data is 20.
Therefore, the standard deviation (s) =
20  4.47
What this means, is that the average distance of each
data point from the mean is approximately 4.47.
Standard Deviation
Recall the mean of this data is 11.
6, 8, 10, 14, 17
5
6
7
8
3
9
3
1
10
6
11
12
13
14
15
16
17
Mean
Does it seem that the distances of the values from the
mean have an approximate average of 4.47?
Standard Deviation
Finding the Variance and Standard Deviation using a table:
1. Find the mean of the data.
2. Set up three columns
3. Find the sum of the squared deviations
4. Divide the sum by (n-1). This is the Variance.
5. Take the square root of the variance. This is the
Standard Deviation.
Standard Deviation
Example:
Find the variance and standard deviation
30, 32, 32, 40, 42, 46
222
mean y 
 37
2
)
(y - )
6
y
(y -
30
32
32
40
42
46
(30-37)
(32-37)
(32-37)
(40-37)
(42-37)
(46-37)
 
(-7)2 = 49
(-5)2 = 25
(-5)2 = 25
(3)2 = 9
(5)2 = 25
(9)2 = 81
Add these
49
25
25
9
25
+81 .
214
214
Variance 
 42.8
5
n–1
Std. Dev.  42.8
 6.5
Standard Deviation
Try this:
Find the mean and median of the data.
5, 7, 9, 9, 10, 11, 12
Given the histogram of the above data, which is the
appropriate measure of center (mean or median)?
Explain.
Hint:
Is the data symmetric
or skewed?
Standard Deviation
Try this:
Find the IQR and the standard deviation
5, 7, 9, 9, 10, 11, 12
Given the histogram of the above data, which is the
appropriate measure of spread (IQR or standard
deviation)? Explain.
Hint:
Is the data symmetric
or skewed?
Review
Review: Create a box plot to describe the following data
(be sure to identify any outliers)
5, 5, 6, 7, 8, 8, 9, 10, 12, 20, 23