Download MEASURES OF SPREAD – VARIABILITY- DIVERSITY

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
MEASURES OF SPREAD – VARIABILITY- DIVERSITYVARIATION-DISPERSION
0.0
0.0
0.01
0.1
0.02
0.2
0.03
0.3
0.04
Variability in Nature, life, and various processes we
investigate is fundamental to the theory of Statistics.
8
10
A
12
-20
0
20
B
40
Measures of variability
Range, Inter-Quartile Range (IQR), Standard Deviation
RANGE is the difference between the largest and the smallest observations.
Range = maximum value – minimum value
The more variability or spread is in the data, the larger the difference between the min
and max, the larger the range.
Example. For the two data sets summarized in the histograms:
A: min = 6.96, max = 13, range=13 - 6.96 = 6.04
B: min = -22.74, max = 40.93, range= 40.93-(-22.74)=63.67
INTER-QUARTILE RANGE (IQR)
Percentiles – divide data into 100ths.
E.g. GRE score in 85th percentile means that 85 percent of the students taking the
test scored lower and 15 percent of the students scored higher than this.
QUARTILES- special percentiles: 50th, 25th and 75th percentiles.
50th percentile= median.
25th percentile = the first quartile, Q1= median of the lower half of the data.
75th percentile = the third quartile, Q3 = median of the upper half of the data.
Example. Data is number of km to school for a sample of 18 kids.
2 5 3 4 7 7 8 5 4 3 7 8 9 11 2 3 3 1.
Sorted data: 1 2 2 3 3 3 3 4 4 5 5 7 7 7 8 8 9 11
3
Q1
4.5
median
7
Q3
Median=(4+5)/2=4.5
Data below median: 1 2 2 3 3 3 3 4 4 ; median of this data: 3 = Q1
Data above the median: 5 5 7 7 7 8 8 9 11; median of this data: 7 = Q3.
Inter-Quartile Range, IQR= Q3- Q1=7-3=4.
FIVE POINT SUMMARY: (MIN, Q1, MEDIAN, Q3, MAX)
Q3
7
Max.
11
BOXPLOTS – Graphical display of the five point
summary
2
Example. Boxplot of the distance to school data.
Box: between Q1 and Q3.
6
Median
4.5
4
Min. Q1
1
3
8
10
Example. Five point summary of the distance
to school data.
Line inside the box-median.
Whiskers drawn to max/min.
if high/low outliers present to
1.5xIQR above/below Q3/Q1;
20
25
Whiskers drawn to
if no outliers to max/min.
30
Boxplot with outliers
5 3 4 7 7 8 5 4 3 7 8 9 11 2 3
3 30
5
Example. Boxplot of the distance to
school data with an outlier.
10
15
Outlier(s) indicated as an open circle or
a line above/below the whiskers.
STANDARD DEVIATION
STANDARD DEVIATION – measures average deviation of the data from the mean.
x − x1
x − xi
= deviation of the first observation from the sample mean
= deviation of the ith observation from the sample mean
n
1
Could try ∑ xi − x as the average deviation from the mean.
n i =1
Problem: Difficult mathematically to deal with because of the absolute value.
Solution. Use squared deviations.
SAMPLE VARIANCE
n
1
2
2
S =
( xi − x )
∑
n − 1 i =1
SAMPLE VARIANCE =
n
=
∑x
i
2
x)
(
∑
−
2
n
n∑ xi − ( ∑ xi )
i
n
i =1
n −1
2
=
i =1
n(n − 1)
2
SAMPLE VARIANCE AND STANDARD DEVIATION
Notes on sample variance
2
Sample variance is always nonnegative, s ≥ 0.
Sample variance = 0 only if all deviations from the mean are zero, that is all
observations are the same.
SAMPLE STANDARD DEVIATION S =POSITIVE SQUARE ROOT OF SAMPLE VARIANCE
S=
var iance = s. 2
Example. Standard deviations of the distance to school data with and without an outlier.
Original data set: variance:
s 2= 7.87, standard deviation= 2.81.
Data set with an outlier: variance
s 2= 40.57, standard deviation= 6.37.
UNITS: All measures of center are in the same units as the observations.
Variance is in the squared units of the observations.
Standard deviation is in the same units as the observations.
6
8
10
12
Median 9.9992
14
0.005
0.0
0.0
0.0
0.1
0.5
0.010
0.2
1.0
0.015
0.3
1.5
0.020
0.4
2.0
MEASURES OF CENTER AND SYMMETRY OF THE DATA
0
2
4
6
Median 977.107
Mean 949.489
Mean 9.9924
8
10
0
200
400
600
Median ?
Mean ?
Skewed to the right
Symmetric histogram
Skewed to the left
800
1000
0.0
0.0
0.0
0.1
0.005
0.5
0.010
0.2
1.0
0.015
0.3
1.5
0.020
0.4
2.0
MEASURES OF CENTER AND SYMMETRY OF
THE DATA - SUMMARY
0
2
4
6
8
10
6
HISTOGRAM SKEWED
to the RIGHT
MEAN > MEDIAN
8
10
HISTOGRAM
SYMMETRIC
MEAN=MEDIAN
12
14
0
200
400
600
800
1000
HISTOGRAM SKEWED
to the LEFT
MEAN < MEDIAN