Download Measures of Dispersion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Transcript
Measures of Dispersion
How far the data is spread out
Range
• Difference between the largest and smallest
value in a data set.
3 5 8 1 -7 2 5
• Smallest value = -7
• Largest value = 8
• Range = 15
Variance
• Includes all data values
• Measures average distance of squared values to
mean
• Population
• Sample 𝑠2 =
𝜎2
𝑥𝑖 −𝜇
𝑁
=
𝑥𝑖 −𝑥
𝑛−1
2
2
Calculating the Variance
• Example: Consider the following data set
4, 5, 6, 6, 9
The sample mean = 6
𝟐
x
𝒙
(x-𝒙)
4
6
-2
4
5
6
-1
1
6
6
0
0
6
6
0
0
0
6
3
9
(x−𝒙)
Calculating the Variance
𝟐
(x−𝒙)
4
1
0
0
9
Here’s the 4th column from the
previous slide.
• The sum of the square of the
residuals = 14
• Divide the sum of the square of the
residuals by the number in the
sample minus one.
• 14/4 = 3.5
Standard Deviation
• Important value to study of statistics
• Measures average absolute distance of values to
the mean
• Population
• Sample
Quantiles
• Dividing data into equal groups
▫ Percentiles
▫ Quartiles
 First quartile – Q1 =
 Second quartile – Q2 =
 Third quartile – Q3 =
• Interquartile range
▫ Q3 – Q1
Percentiles
• Commonly used measure of relative position.
• Remember the median?
• For any data set, the pth percentile is a value (x)
such that p percent of the data is less than x and
1-p percent of the data is greater than x.
Boxplots
(Box and whisker plots)
• Visual display of data
• 5-number summary
▫
▫
▫
▫
▫
Minimum
Q1
Q2
Q3
Maximum
• IQR
• Outliers
0
2
4
6
8
10
12
14
Boxplots
5932
5933
z-scores
• The z-score tells us how far a data value is from
the mean in terms of the number of standard
deviations
• This is another one of the fundamental values in
statistics that we will use again, later.
Summary
• Mean, median, and mode
▫ What information does each convey?
▫ Which is the most resistant to outliers?
Summary
•
•
•
•
Range
Variance/Standard deviation
Z-scores (more later)
Quantiles
▫ IQR
▫ Boxplots
Which to use?
• Data is symmetric and unimodal
▫ Use the mean and s.d.
• Data is skewed
▫ Use the median and 5-number summary
• Mode?
Practice Exercise
Consider the following data:
4, -8, 4, -12, 8, 4, -14
What is the range?
Practice Exercise
Consider the following data:
4, -8, 4, -12, 8, 4, -14
What is the variance and standard
deviation?
Practice Exercise
Consider the following data:
4, -8, 4, -12, 8, 4, -14
What are the values of Q1, Q2, and Q3?
What is the value of the IQR?