Download 12-describing-distributions-with

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Daniel S. Yates
The Practice of Statistics
Third Edition
Chapter 1:
Exploring Data
1.2 Describing Distributions with
Numbers
Copyright © 2008 by W. H. Freeman & Company
Objectives for 1.2
• Given a data set, How do you compute mean,
median, quartiles, and the five-number
summary?
• How do you construct a box plot using the fivenumber summary?
• How do you compute the inter-quartile range?
• How do you identify an outlier using the interquartile range rule?
• How do you compute the standard deviation and
variance?
Measure for The Center of a
Distribution
The Means of a Data Set
• So far, we know several measures of
central tendency of a set of numbers:
means, median, and mode.
• The means is the arithmetic average of
the data set.
The Mean of a Data Set
“Average Value”
• Σ (sigma) means to add them all up. All the data values
and get a total.
• Take the total and divide by the number of data.
Example - Mean
• Joey’s first 14 quiz grades in a marking
period were
86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90,
98, 93
• Find the mean.
• Answer 85.
• Use calculator – Stat edit, enter data in L1
Second Stat, Math, Mean( L1), Enter
The Median of the Data Set
• Median is the center of the data set.
• Half of the data set is above and Half is
below the median. The 50th Percentile.
• The median may or may not be in the data
set.
Calculation for Median
“Middle Value”
Example - Median
• Joey’s first 14 quiz grades in a marking
period were
86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90,
98, 93
• Find the median.
• Answer 85.
• Use calculator – Stat edit, enter data in L1
Second Stat, Math, Median( L1), Enter
Terminology
“A measure is resistant”
• A measure that does not respond strongly
to the influence of outliers (extreme
observations).
• Furthermore, a measure that is resistant
does not respond strongly to changes in a
few observations.
Are mean and median resistant?
Mean and Median Applet
Mean vs Median
• Mean is not a resistant measure.
– It is sensitive to the influence of a few extreme
observations (outliers).
– It is sensitive to skewed distributions. The mean is
pulled towards the tail.
• Median is resistant.
– It is resistant to extreme values and skewed
distributions.
• For skewed distributions the median is the better
measure for center.
Measure for Spread
Range
Quartiles
Five Number Summary
The Standard Deviation
Range
• The difference between the largest value
and the smallest value.
• Gives the full spread of the data.
• But may be dependent on outliers.
Quartiles
• We can describe the spread (variability of a distribution) by giving
several percentiles (pth percentile of a distribution)
• Typically we use 25th percentile, 50th percentile, 75th percentile.
• Q1, median, Q3.
Example
• Joey’s first 14 quiz grades in a marking
period were
86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90,
98, 93
• Find Q1, median, and Q3.
• Answer: Q1 = 78, Median = 85, Q3 = 91
• Using the calculator
– STAT, CALC, 1-Var Stats L1, ENTER
Five Number Summary
Using the calculator, we again use 1-Var Stats.
Five Number Summary Computer
Software Output
Five Number Summary Computer
Software Output
Graphical Display of 5 Number
Summary
Example - Boxplot
• Joey’s first 14 quiz grades in a marking
period were
86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90,
98, 93
• Answer
74
78
85
91
98
• Calculator
STAT PLOT, make appropriate selections on the
menu, ZOOM, 9:Zoom Stat
Interquartile Range
Identifying Outliers
Variance and Standard
Deviation
Example – Variance and Standard
86
86-85=1
1
Deviation
84
84-85=-1
1
Joey’s first 14 quiz grades in a marking period were
86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93
91
91-85=6
36
75
75-85=-10
100
Calculate the variance and standard deviation.
78
78-85=-7
49
80
80-85=-5
25
74
74-85=-11
121
87
87-85=2
4
76
76-85=-9
81
96
96-85=11
121
82
82-85=-3
9
90
90-85=5
25
98
98-85=13
169
93
93-85=8
64
s 
2
2
(
x

x
)
 i
x
x
i
n 1
n
1190

 85
14
806
s 
 62
13
2
Standard Deviation
s  62  7.874
Calculator – STAT EDIT, enter data in list 1,
QUIT
STAT CALC 1-Var Stat
Total
1190
Tot
806
Standard Deviation
• The standard deviation is zero when there
is no spread.
• The Standard deviation gets larger as the
spread increases.
Impact of adding a constant to all
data in the set?
• Joey’s first 14 quiz grades in a marking period
were
– 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93
• Add 32 points to each score, then store in L2.
• Compute 1-Var Stat. What has changed?
• The five-number summary has changed but the
standard deviation has not?
• The measure the spread remains the same?
The impact of multiplying each data
in the set by a constant?
•
•
•
•
Using the data set in L1 multiply the 2.
Compute 1-Var Stat.
What has changed?
The five-number summary has changed
by 2 times and the standard deviation has
changed by 2 times.
• The measure of the spread has increased.