Download Location and Dispersion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
LBSRE1021 Data Interpretation
Lecture 3
Location and Dispersion
OBJECTIVES
State some basic definitions
Prepare a frequency distribution using given
data.
Plot a frequency distribution as a bar chart.
Explain the terms Location and Dispersion
OBJECTIVES
C a lc u la te th e :
a rith m e tic m e a n
s ta n d a rd d e v ia tio n
c o e ffic ie n t o f v a ria tio n
m e d ia n
q u a rtile s
m ode
fo r a g iv e n s e t o f d a ta
Definition
Statistics:
The organisation of data to enable meaningful
analysis.
Definitions
Population: Every member of the group in
which you are interested e.g. the
consumers of a certain product, the
employees of a company
Sample: A sub-set of the population on
which measurements may be made that
approximate to those of the population
More definitions
Variable: Something which can
measured or counted.
eg. age, weight, salary, no. children
Frequency: The number of times
particular value of a variable occurs.
be
a
Frequency Distribution
eg. Age of students on AE1021
Age
(the variable)
18
19
20
21
22 and over
TOTAL
No. Students
(the
frequency)
102
116
63
41
30
352
Grouped Frequency
Distribution
Lengths of 100 copper pipes
Length(cm)
10 but under
20
30
40
50
60
70
80
90
20
30
40
50
60
70
80
90
100
Frequency
3
7
10
16
34
13
7
6
4
Bar Chart
No. Pipes
Bar Chart for Lengths of Pipes
40
35
30
25
20
15
10
5
0
10 to 20 to 30 to 40 to 50 to 60 to 70 to 80 to 90 to
20
30
40
50
60
70
80
90
100
Length (cm)
Measures of Location
Averaging of Data
Summarise data to a single statistic
Allows comparison of e.g.
average incomes
average rainfall
average expenditure
Measures of Location: Mean
Arithmetic Mean
eg.
for 5, 7, 9, 10
5 + 7 + 9 + 10 = 7.75
4
in general:
x =
x
n
Measures of Location: Median
Median
Arrange numbers in ascending order.
Item in centre is median.
Unaffected by extremes
e.g. for these data 11, 14, 14, 21, 25, 27, 30
the median is 21
Measures of Location: Mode
Mode
The value which occurs most often.
For the data on the previous slide
the mode is 14.
Dispersion
Mean gives us the location,
Need a measure of dispersion or spread of the data
Range
eg.
Month
J
F
M
A
M
J
Average Price (p)
155
143
144
139
140
141
Range
155 - 139
Problem
-
=
16p
concerned only with extremes
Dispersion: Quartiles
Histogram of bolt lengths
120.00%
100.00%
15
80.00%
10
60.00%
40.00%
5
20.00%
Length cm <=
8
M 5
or
e
75
80
65
70
55
60
.00%
45
50
0
35
40
Frequency
20
Dispersion :Quartiles
u
u
u
u
u
u
QUARTILES
The quartiles are four values in a data set.
First Quartile (Q1) is a value such that 25% of
items in the data set have this value or less.
Second Quartile (median) is a value such that
50% of items in the data set have this value or
less.
Third Quartile (Q3) is a value such that 75% of
items in the data set have this value or less.
Fourth Quartile is a value such that 100% of
items in the data set have this value or less.
Quartiles: Example
u
Switchboard data refers to the number of
telephone calls to a switchboard each hour for
two days:
– Day 1: 26 25 30 31 27 27 26 29
– Day 2: 30 32 27 28 26 27 28 29
u
To find the first quartile and the third quartile
arrange the 16 data items in ascending order as
follows:
– 25 26 26 26 27 27 27 27 28 28 29 29 30 30 31 32
Quartiles:Example
u
u
u
The first quartile will be the value of the (n + 1) * 1/4 item.
The third quartile will be the value of the (n + 1) * 3/4 item.
In this case Q1 will be the value of the 4.25th item. By
interpolation, the first quartile will be 26.25
–
u
Q3 will be the value of the 12.75th item. So the third quartile
will be 29.75
–
u
between 26 and 27.
between 29 and 30.
(Q2 is the value of the (n + 1)/2 item, i.e. the Median)
Deviation
In a set o f d ata each item d ev iates fro m th e m ean
eg . fo r th e d ata ab o v e th e m ean ( x ) is 143.67
each item d ev iates fro m th e m ean b y ( x - x )
d ata
155
143
144
139
140
141
an d
m ean
143.67
143.67
143.67
143.67
143.67
143.67
 (x -x )= 0
(x -x )
11.33
-0.67
0.33
-4.67
-3.67
-2.67
Dispersion: Variance
u
Deviations from the mean (x - x) are
:- +ve and -ve
:- sum to zero
u
(x - x )2 is always +ve
u
Average of square of deviations is the
u
Variance
=
 ( x - x )2
n
Standard Deviation
Square root of Variance is the
Standard Deviation
s.d.
=  (x - x )2
n
= x2 n
x
n
2
S.D. Example
x2
Price (x)
155
143
144
139
140
141
862
s.d.
=
24025
20449
20736
19321
19600
19881
124012
124012
6
-
862
6
2
= 5.34
Central Tendency
The mean and Standard Deviation
describe the area of central tendency and
spread of data
Mean
(Mode and Median
same if symmetrical)
Spread
Coefficient of Variance
Standard Deviation / Arithmetic Mean expressed as %
Shows the dispersion relative to the mean
e.g. Comparing two data sets with the means and s.d.
given below, what conclusions can you draw?
Data set
Mean
s.d.
Coeff of Var
1
100
20
20%
2
50
15
30%