Download Ch 3 - csusm

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

Transcript
Other Numerical Measures
 Median
 Mode
 Range
 Percentiles
 Quartiles, Interquartile range
BUS304 – Data Charaterization
1
Median
The middle value
-- The value which divides the data in half, with equal
sizes above and below
Steps:
1. Put your data in ordered array (sort)
2. If n (or N) is odd, the median is the middle number
(i.e. the n1 th number)
2
3. If n (or N) is even, the median is the average of two middle numbers
(i.e. the average of the n and the n +1 th numbers)
2
2
BUS304 – Data Charaterization
2
Sensitivity to outliers
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 2.5
Median does not affected
by extreme values
0 1 2 3 4 5 6 7 8 9 10
Median = 3
BUS304 – Data Charaterization
3
Mode
The value that occurs most often
Steps:
1. Put your data in ordered array (sort)
Mode does not affected
by extreme value either.
2. Find the data value(s) that repeats the most frequently
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No Mode!
0 1 2 3 4 5 6
Mode=5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode=5 and 9
Boston
Austin
San Diego
Los Angels
Mode=San Diego
BUS304 – Data Charaterization
4
Find Mode and Median from
Frequency Table
Below is a frequency table showing
Find the mean, median and mode.
the number of days the teams finish
Create a histogram, locate the mode,
their projects
median and mode.
Describe the shape of the histogram,
Relative
Days to
Complete
Frequency
5
4
?
6
12
?
7
8
?
8
6
?
9
4
?
10
2
?
Frequency
and find the relationship between
mean, median and mode.
BUS304 – Data Charaterization
5
Shape of a distribution
Symmetric
Mean = Median = Mode
Right-Skewed
Left-Skewed
Mean < Median < Mode
(Longer tail extends to left)
Mode < Median < Mean
(Longer tail extends to right)
Note that Mean is affected by the extreme
value the most. So mean is always leaning
towards the tail compared to the other two
measures.
BUS304 – Data Charaterization
6
Measures of center location

 Mean
 Median
 Mode
Mean is generally used, unless extreme
values (outliers) exist;

the next common is median, since the
median is not sensitive to extreme values;

mode is sometime used when there is a
really large frequency.
Think of the example of house price
BUS304 – Data Charaterization
7
Range
Simplest measure of variation
Describe how wide the data spread
Formula
Range = Maximum Value – Minimum Value
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
BUS304 – Data Charaterization
8
Disadvantage of Range
 Ignores the way in which
data are distributed
 Sensitive to outliers
1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
7
8
9
10
11
12
Range = 5 - 1 = 4
Range = 12 - 7 = 5
1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
7
8
9
10
11
12
Range = 120 - 1 = 119
Range = 12 - 7 = 5
Range is affected the
most by outliers.
BUS304 – Data Charaterization
9
Break
BUS304 – Data Charaterization
10
Other measures
1.
Percentiles:
Measures the percentage of data below the value.
e.g. if the 60th percentile is 1240 (SAT score), that means there
are 60% students getting a score less than 1240.
Correspondingly, there are 40% of students getting 1240 or
higher.
How to find percentile? The pth percentile in an ordered array of n
values is the value in the ith position, where
p
i
(n  1)
100
BUS304 – Data Charaterization
11
Example
 Find the 80th percentile
from the annual income
data
 Step:
1. Sort the data
2. Find the location for the
80th percentile:
i
p
80
(n  1) 
(100  1)  80.8  81
100
100
3. Find the 81st person’s
income
 Think, what does this income
mean?
 Exercise: find the value where
30% people have the income
or higher.
 Exercise2: find the value
where 30% people have the
income less than it.
 Exercise 3: find the value
where 50% people have the
income less than it. What is
the measure also called?
BUS304 – Data Charaterization
12
Quartiles
 The 25th, 50th, and 75th percentiles
 Called the first, second, and third quartiles, respectively.
 Written as Q1, Q2, Q3, respectively.
 The quartiles split the ranked data into 4 equal groups.
25% 25% 25% 25%
Q1
Q2
Q3
BUS304 – Data Charaterization
13
Example:
Example: Find the first quartile in the data sample:
22 12 14 16 17 16 13 20 18
Median = the 50th percentile = the second quartile
BUS304 – Data Charaterization
14
Interquartile Range
Recall:
 Range? Disadvantage of range?
Interquartile Range:
Interquartile Range = Q3 – Q1
Example:
12 13 14 16 16 17 18 20 22
Q1=13.5
Q3=19
Interquartile range = Q3 – Q1 = 19 – 13.5 = 5.5
BUS304 – Data Charaterization
15
Summary
 Understand and compute the following two sets of data
measures:
 Measures of central tendency
• Mean, Median, and Mode
 Measures of variation
• Range, Variance, and Standard deviation
 Other ways to describe data:
 Percentiles, Quartiles, Interquartile range
BUS304 – Data Charaterization
16