Download 3-5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Baby Leo’s 4-month “Healthy
Baby” check-up reported the
following:
1) He is in the 90th percentile for
weight
2) He is in the 95th percentile for
head circumference
3) He is in the 100th percentile
for height.
Interpret these scores. What do
they mean?
Slide
1
Section 3-5
Exploratory Data Analysis
(EDA)
Slide
2
Key Concept
This section discusses outliers, then
introduces a new statistical graph called
a boxplot, which is helpful for visualizing
the distribution of data.
Slide
3
Definition
 Exploratory Data Analysis (EDA)
the process of using statistical tools
(such as graphs, measures of center,
and measures of variation) to investigate
data sets in order to understand their
important characteristics
Slide
4
Definition
 An outlier is a value that is located very
far
away from almost all of the other values.
 An extreme value that falls outside general pattern
of data.
 Not all outliers are errors.
 An outlier can have a dramatic effect on the mean.
 An outlier can have a dramatic effect on the
standard deviation.
 An outlier can have a dramatic effect on the scale
of the histogram so that the true nature of the
distribution is totally obscured.
Slide
5
Definitions
 For a set of data, the 5-number summary consists
of the minimum value; the first quartile Q1; the
median (or second quartile Q2); the third quartile,
Q3; and the maximum value.
 A boxplot ( or box-and-whisker-diagram) is a
graph of a data set that consists of a line
extending from the minimum value to the
maximum value, and a box with lines drawn at the
first quartile, Q1; the median; and the third
quartile, Q3.
(Calculator: with/without potential outlier,
boxplot/modified boxplot)
Slide
6
Boxplots (useful for revealing:
center, spread, distribution,
outliers)
Slide
7
Modified Boxplots
Some statistical packages provide modified
boxplots which represent outliers as special points.
A data value is an outlier if it is …
above Q3 by an amount greater than 1.5 X IQR
or
below Q1 by an amount greater than 1.5 X IQR
Use the outlier criterion on our dataset
1
2
3
4
8
165
To identify any outliers.
Slide
8
Modified Boxplot Construction
A modified boxplot is constructed with
these specifications:
A special symbol (such as an asterisk) is
used to identify outliers.
The solid horizontal line extends only as
far as the minimum data value that is
not an outlier and the maximum data
value that is not an outlier.
Slide
9
Modified Boxplots - Example
Slide
10
Do male doctors perform more C-sections than female
doctors? A study in Switzerland examined the number
of C-sections performed in one year by a sample of
male/female doctors.
Male Dr. Data:
20 25 25 27 28 31 33 34 36 37 44 50 59 85 86
Min
Q1
M
Q3
Max
Female dr. data:
5 7 10 14
Min
Q1
18
19 25
M = 18.5
29
Q3
31
33
Max
• Are there any outliers?
• What do the modified boxplots tell you?
• Give some overall observations.
Slide
11
You do!
Here are measured reaction times (in seconds) in a test
of driving skills:
2.4 2.5 2.8 2.0 2.4 2.9 3.2 3.5 2.7 2.7 2.8 2.7
1) Find the five-number summary
2) Draw the modified boxplot. Are there any outliers?
Slide
12
Boxplots - cont
Slide
13
Boxplots - cont
Slide
14
Statdisk Pulse rates
Compare male and female
1) Center
2) Variation
3) 5 # summary
4) Are there any outliers?
Slide
15
Recap
In this section we have looked at:
 Exploratory Data Analysis
 Effects of outliers
 5-number summary
 Boxplots and modified boxplots
Slide
16