Download ch 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter
Four
Quantitative Data
• Distribution: gives the possible values of a
variable and the frequency of each value.
– Organized into bins
– 3 types of displays:
• histogram, stem & leaf, dotplot
Leaving on a jet plane???
Slide 2 - 4
KEY
5|5 = 5.0
Stem-and-Leaf Example
• Compare the histogram and stem-and-leaf display for
the pulse rates of 24 women at a health clinic. Which
graphical display do you prefer?
Slide 4 - 6
Back to Back Stem and Leaf
Plots
Dotplots
• A dotplot is a simple
display. It just places a
dot along an axis for
each case in the data.
• The dotplot to the right
shows Kentucky Derby
winning times, plotting
each race as its own
dot.
• You might see a dotplot
displayed horizontally
or vertically.
Slide 4 - 8
Center, Shape & Spread
• Shape – how many modes (humps)
– Unimodal – 1 peak
– Bimodal – 2 peaks
– Multimodal – 3 or more peaks
– Uniform – all bins approximately the same
height
Symmetry versus skew
symmetric
Skewed
Right
Skewed
Left
• Look for unusual features
– Outliers: values that are extreme and don’t
appear to belong with the rest of the data.
Could be unusual values that need attention
or a mistake. Look at why??? Outliers
influence statistical analysis
– Gaps: warn the data is not homogeneous
– If unusual, can re-express data by methods
such a logarithm (skewed right) or square
(skewed left).
Cookie Data
Keebler
Nabisco
8
8
15
16
10
10
16
16
11
12
16
17
12
12
17
18
13
13
18
18
13
14
18
18
15
15
18
19
15
16
20
21
16
16
22
23
17
17
24
27
19
33
President
Age President
Age President
Age
Washington
57
Buchanan
65
Harding
55
J. Adams
61
Lincoln
52
Coolidge
51
Jefferson
57
A. Johnson
56
Hoover
54
Madison
57
Grant
46
FD Roosevelt
51
Monroe
58
Hayes
54
Truman
60
JQ Adams
57
Garfield
49
Eisenhower
61
Jackson
61
Arthur
51
Kennedy
43
Van Buren
54
Cleveland
47
LB Johnson
55
WH Harrison
68
B. Harrison
55
Nixon
56
Tyler
51
Cleveland
55
Ford
61
Polk
49
McKinley
54
Carter
52
Taylor
64
T. Roosevelt
42
Reagan
69
Fillmore
50
Taft
51
G. Bush/GW
64/54
Pierce
48
Wilson
56
Clinton
46
Obama
47
The following data shows the percentage of water quality tests that failed to meet water
quality standards at 82 swimming beaches in California. The data is divided into those
beaches inside and outside of Los Angeles County.
Mean or Median?
• Because the median considers only the order of values,
it is resistant to values that are extraordinarily large or
small; it simply notes that they are one of the “big ones”
or “small ones” and ignores their distance from center.
• To choose between the mean and median, start by
looking at the data. If the histogram is symmetric and
there are no outliers, use the mean.
• However, if the histogram is skewed or with outliers, you
are better off with the median.
What About Spread? The
Standard Deviation
• A more powerful measure of spread than
the IQR is the standard deviation, which
takes into account how far each data value
is from the mean.
• A deviation is the distance that a data
value is from the mean.
– Since adding all deviations together would
total zero, we square each deviation and find
an average of sorts for the deviations.
What About Spread? The
Standard Deviation (cont.)
• The variance, notated by s2, is found by
summing the squared deviations and
(almost) averaging them:
2
y  y 

2
s 
n 1
• The variance will play a role later in our
study, but it is problematic as a measure
of spread—it is measured in squared
units!
What About Spread? The
Standard Deviation (cont.)
• The standard deviation, s, is just the
square root of the variance and is
measured in the same units as the original
data.
 y  y 
2
s
n 1
Tell -- Shape, Center, and
Spread
• Next, always report the shape of its
distribution, along with a center and a
spread.
– If the shape is skewed, report the median and
IQR.
– If the shape is symmetric, report the mean
and standard deviation and possibly the
median and IQR as well.
GRAPHING STATIONS
• Find the following for your data set
– Mean & Standard Deviation
– 5 Number Summary
– IQR
– Range
• Produce missing display (Histogram or
stem & leaf