Download Day 6

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
PROB AND STATS
Measuring/Describing the Spread
Quartiles, Box-Plots and Five Number
Summary, Standard Deviation and Variance
READ

Page 42, the first two paragraphs to understand
where we are going with quartiles and why we
use them to improve our description of spread.
QUARTILES - Q1 & Q3

Find the median for your data.

Q1 is the median of the left half of the data

Q3 is the median of the right half of the data

Let’s look at 3 sets of data.
P42
EX 1.12 FINDING QUARTILES

P43
Barry Bond’s home run counts (arranged in
order) are
There is an even number of observations, so the
median lies in midway between the middle pair.
The first quartile is the median of the 8
observations to the left of M.
 The third quartile is the median of the 8
observations to the right of M.

WHAT IS?
M
 Q1
 Q3


Quartiles are resistant because it wouldn’t have
mattered if Bond’s record was 73 or 703.
EX 1.12 (CONTINUED)

Hank Aaron’s data in increasing order

What is M? Q1? Q3?

Be aware that computer programs may calculate
a little differently.
CONSIDER THIS DATA SET
 8,
17, 17, 18, 19, 20, 21, 22, 22, 22, 25, 26,
27, 28, 29, 50, 54
 Find the Median and Quartiles
 M = 22
 Q1 = 18.5
 Q3 = 27.5
INTERQUARTILE RANGE(IQR)
 IQR
P43
= Q3 - Q1
 It is the distance or
range between the
quartiles.
DETERMINING OUTLIERS
THE INTERQUARTILE RANGE (IQR)
 There
is a simple
formula we can use to
determine if a piece of
data is an outlier or
not using the IQR.
 If the data
(observation) is 1.5 x
IQR above Q3 or below
Q1 we call it an
outlier.
P44
EX 1.13 DETERMINING OUTLIERS

P(44)
Do you think that Barry Bonds 73 home run season
is an outlier? Let’s check.
IQR = Q3 – Q1 = 41 – 25 = 16
 Q3 + 1.5 x IQR = 41 + (1.5 x 16) = 65 (upper cutoff)
 Q1 – 1.5 x IQR = 25 – (1.5 x 16) = 1 (lower cutoff)


Since 73 is above the upper cutoff, it is an outlier.
FIVE NUMBER SUMMARY
 Minimum
P44
- Smallest
piece of data
 Q1 - Quartile 1
 M - median
 Q3 - Quartile 3
 Maximum - Largest
piece of data
 Written as:
 Min Q1 M Q3 Max
GIVE THE FIVE NUMBER SUMMARY FOR
LET’S DRAW A BOX PLOT FROM THE 5
NUMBER SUMMARY
BOXPLOTS
Box plots show less
detail than
histograms or
stemplots, therefore
they are best used for
a side-by-side
comparison of more
than one distribution
 Drawn vertical or
horizontal

Max
Q3
Median
Q1
Min
LET’S READ THE BOX PLOT
Find the median first.
 Look at the spread
 Next the quartiles


In looking at the
boxplot, do you feel
that Bonds and Aaron
were about equally
consistent?
MODIFIED BOXPLOT
 Same
as a regular box
plot but outliers are
plotted individually
 The “whiskers” extend
to the smallest and
largest observations
that are not outliers.

Five number summary
MODIFIED BOXPLOT


Bonds 16 25 34 41 73

Aaron 13 28 38 44 47
Outliers

Bonds




IQR = Q3 – Q1 = 41 – 25 = 16
Q3 + 1.5 x IQR = 41 + (1.5 x 16)
= 65 (upper cutoff)
Q1 – 1.5 x IQR = 25 – (1.5 x 16)
= 1 (lower cutoff)
Aaron

None
USE THE CALCULATOR TO DRAW A
BOXPLOT

Follow along in your book on page 47
STANDARD DEVIATION
s - one of the most important measures in stats
 Measures the spread by looking at how far the
observations are from the mean.
VARIANCE

P49
s2 - Set of observations is the mean of the squares
of the deviations of the observations from the
mean.
FORMULA FOR STANDARD DEVIATION
DEGREES OF FREEDOM
Describe the number of values in the final
calculation of a statistic that are free to vary.
 D.O.F = n – 1

PROPERTIES OF THE STANDARD DEVIATION
s- measures the spread about the mean and
should be only used when the mean is chosen as
the measure of the center.
s = 0 only when there is no spread.
 s, like the mean is not a resistant measure. Why
would this be? Strong skewness or a few outliers
can make sure very large.
LETS WORK THIS PROBLEM BY HAND.
Follow the table on page 50.
A study in Switzerland examined the number of
hysterectomies (uterus removals) performed by
doctors one year. The data for 15 doctors are
given below.

27 50 33 25 25 31 37 44 20 36 57 34 28
27 50 33 25 25 31 37 44 20 36 57 34 28

Observations
Deviations
Square deviations
CHOOSING BETWEEN THE FIVE-NUMBER
SUMMARY AND THE MEAN AND S.D.
Need some way to describe the center and spread of
a distribution
Five-Number Summary is better for skewed
distributions or distributions with strong outliers.
 x & s are better for distributions that are
reasonably symmetric.
ASSIGNMENT

Work on problems 1.36-1.43