Download CHAPTER 1 STATISTICS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Power law wikipedia , lookup

Transcript
CHAPTER 4 Displaying and
Summarizing Quantitative Data



Slice up the entire span of values in
piles called bins (or classes)
Then count the number of values that
fall in each bin
The bins and the counts in each bin
give the distribution of the quantitative
variable
Histogram




Display the counts in each bin in a
histogram.
Like a bar chart, a histogram plots the bin
counts as the heights of bars.
No spaces between bins. (different from a
bar chart)
Relative frequency histogram displays
percentage of cases in each bin instead of
the count.
Stem and Leaf Display


Shows the distribution as well as the
individual values.
Very Convenient: easy to make by
hand.

Make a Steam and Leaf Display of the data
set of exercise 40 (page 82)
Shape, Center, and Spread

How many Modes (“humps”)?






Histograms with
One peak
Unimodal
Two peaks
Bimodal
Three or more
Multimodal
A histogram that doesn’t appear to have any
mode and in which all the bars are approximately
the same height is called Uniform
Exercise 7 Page 78
Symmetry

A distribution is symmetric if the two halves
on either side of the center look
approximately like mirror images of each
other.
Skewed Distributions


Tails: The thinner ends of a distribution are called
tails. If one tail stretches out farther than the
other the histogram is said to be skewed to the
side of the longer tail
Skew to the left
Skew to the right
Outliers


Outliers are values that stand off away
from the body of the distribution
Gaps in the distribution warn us that
the data may not be homogeneous.
They may come from different sources
or contain more than one group.

(Example on page 52)
Center of the Distribution

For unimodal and symmetric
distributions:


In the middle
For skewed and more than one mode is
harder to find

(split in groups)
How Spread is the
Distribution?

Just Checking page 56

Comparing Distributions

Do men and women tend to get heart
attacks at different ages?
Summarizing Distributions

Center


Midrange
Max + Min
Midrange =
2
Median: The middle value that divides the
histogram into two equal areas



Order the values first
If n is odd the median is the middle value. Position
(n+1)/2
If n is even then take the average of the two middle
values, that is the average of positions n/2 and n/2+1
Summarizing Distributions
(cont.)

Spread

Range = Max – Min

Quartiles


Find the median, then find the median of each
half. (Note: If n is odd include the median of
the complete set to calculate the median of
each half)
These are called the Lower quartile and Upper
quartile and are denoted by Q1 and Q3
respectively.
The Interquartile Range


IQR = Q3 – Q1
The lower and upper quartiles are also
called the 25th and 75th percentiles



Q1 = 25th percentile
Median = 50th percentile
Q3 = 75th Percentile
Summarizing Distributions
(cont.)

Summarizing Symmetric Distributions



If the shape of the distribution is symmetric,
the mean (average) is a good alternative to
summarize the distribution
Remember : Symmetric and no outliers
Mean:
y
y
i
i
n
Mean or Median



The mean is the point at which the
histogram would balance.
Outliers will pull the mean in that
direction.
For skewed data it’s better to report the
median than the mean as a measure of
center
What About Spread?
The Standard Deviation

Standard Deviation:



It takes into account how far each value is from the
mean
Appropriate only for symmetric data
Deviation: Distance from each data value to the
mean
yi  y


Variance
s 
2
Standard Deviation s 
 ( y  y)
2
i
i
n 1
2
(
y

y
)
 i
i
n 1
Shape, Center and Spread

Report always center and spread




Which measure for center and which measure for
spread?
Skewed : Median and IQR
Symmetric: Mean and Standard Deviation
If there are outliers report the mean and
standard deviations with and without the
outliers. Median and IQR are not likely to be
affected.
Chapter 5 Understanding and
Comparing Distributions
Five Number Summary
Max
82
Q3
68
Median
55
Q1
39
Min
27

After you have the five number summary
you can create a display called a BoxPlot
Box Plots


Place the Median and quartiles over a line
spanning the range of the data. (as shown
in the board)
Locate the Upper and lower fences




Upper Fence = Q3 + 1.5 IQR
Lower Fence = Q1 – 1.5 IQR
Then draw the Whiskers (Most Extreme
data value Found within the fences)
Display Outliers
Exercise

Comparing Groups (Page 93)
Time Plot

Displays data that
changes over time

(What is wrong
with the time plot
on page 104?)