Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia, lookup

Data mining wikipedia, lookup

History of statistics wikipedia, lookup

Time series wikipedia, lookup

Transcript
```Chapter 3.4
Exploratory Data Analysis

Data are organized by using a frequency distribution

Use distribution to create various graphs, histogram,
frequency polygon, ogive

Mean and standard deviation are computer to summarize
data

Purpose is to confirm various conjectures about the
nature of the data
Exploratory Data Analysis (EDA)

Purpose is to examine data to find out what information
can be discovered about the data such as the center and

Organized using a stem and leaf plot

Measure of central tendency is the median and variation
is the interquartile range

Represented graphically using a boxplot
Quartiles

Quartiles divide the distribution into four groups,
separated by Q1, Q2, Q3

Q1 is the same as the 25th percentile

Q2 is the same as the 50th percentile (median)

Q3 is the same as the 75th percentile
For example: 5, 6, 12, 13, 15, 18, 22, 50
The five number summary
1.
The lowest value of the data set (minimum)
2.
Q1
3.
the median
4.
Q3
5.
The highest value of the data set (maximum)
Boxplot

A boxplot is a graph of a data set obtained by drawing a
horizontal line from the minimum data value to Q1 ,
drawing a horizontal line from Q3 to the maximum data
value, and drawing a box whose vertical sides pass
through Q1 and Q3 with a vertical line inside the box
passing through the median or Q2
Procedure for constructing a boxplot
1.
Find the five-number summary for the data values
2.
Draw a horizontal axis with a scale such that it includes
the maximum and the minimum data values.
3.
Draw a box whose vertical sides go through Q1 and Q3
and draw a vertical line through the median
4.
Draw a line from the minimum data value to the left
side of the box and a line from the maximum data value
to the right side of the box.
Number of Meteorites Found

The number of meteorites found in 10 states of the U. S.
is 89, 47, 164, 296, 30, 215, 138, 78, 48, 39. Construct a
boxplot for the data
Information obtained from a boxplot



If the median is near the
center of the box, the
distribution is
approximately symmetric
If the median falls to the
left for the center of the
box, the distribution is
positively (right) skewed.
If the median falls to the
right of the center, the
distribution is negatively
(left) skewed.



If the lines are about the
same length, the
distribution is
approximately symmetric
If the right line is larger
than the left line, the
distribution is positively
(right) skewed
If the left line is larger
than the right line, the
distribution is negatively
(left) skewed
Sodium Content of Cheese

A dietitian is interest in comparing the sodium content of
real cheese with the sodium content of a cheese
substitute. Compare the distribution using boxplots.
Real Cheese
Cheese Substitute
310
4520
45
40
270
180
250
290
220
240
180
90
130
260
340
310
Resistant Statistic

A resistant statistic is relatively less affected by outliers
than a nonresistant statistic.

The mean and standard deviation are nonresistant
statistics

Sometimes, when a distribution is skewed or contains
outliers, the median and interquartile range may more
accurately summarize the data than the mean and
standard deviation
exploratory data analysis
Exploratory data analysis
Frequency Distribution
Stem and leaf plot
Histogram
boxplot
Mean
median
Standard Deviation
interquartile range
Try it!

Applying the concepts 3-4

Pg. 174
```