Download Measures of Spread

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Transcript
Workingwith one variable
data
Spread
Joaquin’s Tests
Taran’s Tests:
76, 45, 83, 68, 64
67, 70, 70, 62, 62
What can you infer, justify and conclude about the Joaquin’s
and Taran’s tests scores? (Hint: Calculate the mean, median
and mode for each. What do they tell you?)
J.’s mean =
med =
mode = none
T.’s mean =
med =
mode =
Spread
 Mean, median and mode are all good ways to find the centre of
your data.
 This information is most useful when the sets of data being
compared are similar.
 It is also important to find out how much your data is spread
out. This gives a lot more insight to data sets that vary from
each other.
Consider the following two data sets with identical mean and
median values. Why is this information misleading? ( Mean =
5, Median = 5)
 Set A) 1, 2, 2, 3, 3, 4, 4, 4, 5,  Set B) 3, 3, 3, 4, 4, 4, 5, 5, 5,
5, 5, 5, 6, 6, 6, 7, 7, 8, 8, 9
6, 6, 6, 7, 7, 7
Set B
Set A
3.5
4.5
4
3.5
3
2.5
3
2.5
2
Series2
2
1.5
1
0.5
0
Series2
1.5
1
0.5
0
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
This information is misleading because one graph is bell-shaped and
the other is uniform, but the calculations make them appear to be
similar when really A and B are spread out quite differently.
Measures
of
Spread
 In analysing data, it is often important to know
whether it is spread out, or whether it is clustered
around the mean.
 Measures of spread are used to quantify the spread
of the data.
 The measures of spread, or dispersion are:
 Range
 Quartiles
 Variance
 Standard
deviation
Range
 The simplest measure of dispersion.
 Calculated by finding the difference
between the greatest and the least values of
the data.
 Useful since it is the easiest to understand.
 Affected by extreme data.
 The range of values 1, 2, 4, 6, 9, 11, 15, 25
is 25 – 1 = 24
Quartiles
and
Interquartile
Ranges
 Quartiles divide a set of ordered data into four
groups with equal numbers of values.
Lowest
Datum
First
Quartile
Q1
Median
Q2
Third
Quartile
Q3
Highest
Datum
The three “dividing points” are the first quartile
(Q1), median, (sometimes called the second
quartile, Q2), and the third quartile (Q3)
Quartiles and Interquartile Ranges
Lowest Datum
Q1
Median
Q2
Q3
Highest Datum
The interquartile range is Q1 – Q3, which is the
range of the middle of the data.
The semi-interquartile range is one half of the
interquartile range.
Both these ranges indicate how closely the data
are clustered around the median.
Box and Whisker Plot
 Illustrates the Quartiles
 The Box shows the interquartile range
 The whiskers represent the lowest and highest values
A modified box and whisker plot shows outliers outside of
the whiskers
See Page 141 for illustrations
Standard
Deviation
 A deviation is the difference between an
individual value in a set of data and the
mean for the data.
 Standard Deviation averages the square of
the distance that each piece of data is from
the mean.
 The smaller the standard deviation, the
more compact the data set.
Standard Deviation – Population
 
( x  )
2
N
σ = Standard Deviation - Population
∑ = Sum
μ = Mean
N = Number of data in population
Standard Deviation – Sample
s
 ( x  x)
2
n 1
s = Standard Deviation - Sample
∑ = Sum
x = Mean
n = Number of data in sample
Variance
 The variance can be found by calculating
the average squared difference ( or
deviation ) of each value from the mean.
Population

2
( x  )


N
Sample
2
s
2
( x  x)


 Or square the standard deviation.
n 1
2
Standard Deviation – Group Data
If you are working with grouped data, you can
estimate the standard deviation using the
following formula
Population
s


 f (mf x()m
i
2
i
n  1i
Sample
2


)
i
N
fi = the frequency for a given interval
mi = the midpoint of the interval
Find
the
Measures
of
Spread
Rachelle works part-time at a gas station. Her gross
earnings for the past eight weeks are shown.
$55 $68 $83 $59 $68 $95 $75 $65
Calculate the range, variance, standard deviation,
interquartile, and semi-interquartile ranges for
her weekly earnings.
Find the Measures of
Dispersion
Range:
The range of Rachelle’s earnings is $
Find the Measures of
Dispersion
Mean 
Variance:
Gross
Earnings
55
 x     x   2
 x   
Variance 
N
2
68
83
59
68
95
75
65
Total
568
The variance of
Rachelle’s earnings is
$
Find the Measures of
Dispersion
Standard Deviation:
  Variance
The standard deviation of Rachelle’s earnings
is $
Find
the
Measures
of
Spread
Interquartile range:
First, put the data into numerical order
55
59
65
68
68
75
83
75  83
Q3 
2
 79
Interquartile range = Q3 - Q1
= 79 – 62
= 17
59  65
Q1 
2
 62
95
Find
the
Measures
of
Spread
Semi-Interquartile range:
Semi-Interquartile range = 17/2
= 8.5
Therefore the interquartile range is 17 and
semi-interquartile range is 8.5.
Standard Deviation Group Data Example
The following table represents the number of hours
per day of watching TV in a sample of 500 people.
Number
of hours
Frequency
0-1
64
2-3
92
4-5
141
.
  5.1
6-7
86
8-9
71
10-11 12-13
35
11
Interval
Midpoint
(mi)
Frequency
fi
0-1
0.5
64
(0.5-5.1)2 = 21.16
64 x 21.16 = 1354.24
2-3
2.5
92
6.76
92 x 6.76 = 621.92
4-5
4.5
141
0.36
141 x 0.36 = 50.76
6 -7
6.5
86
1.96
86 x 1.96 = 168.56
8-9
8.5
71
11.56
71 x 11.86 = 842.06
10 - 11
10.5
35
29.16
35 x 29.16 = 1020.6
12 - 13
12.5
11
54.76
11 x 54.76 = 602.36


f i (mi   ) 2
N
500
(m i   ) 2
f i (m i   ) 2
4660.5

500
= 3.05
THEREFORE THE STANDARD DEVIATION IS
APPROXIMATLY 3.05
4660.5
Z-Scores
• The number of standard deviations away
from the mean a data point is
– Thus if our standard deviation is 8 then how many
8’s is a data point (13) away from the average or
centre
– It is found by dividing the deviation by the
standard deviation
If your values are below
the mean their z score
will be negative.
Similarly if your value is
above the mean your z
score will be positive
Percentiles
• Similar to quartiles
• Percentiles divide the data into 100
intervals that have equal number of
values.
• k percent of the data are less than or
equal to kth percentile Pk
• Which means that you are finding what percent of the data is
below your specific value in question
•
Often used for Standardized Tests
Homework
Pg 148 #1-6, 14
I LOVE HOMEWORK