Download CS1512

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CS1512
Foundations of
Computing Science 2
Lecture 20
Probability and statistics (2)
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
© J R W Hunter, 2006
1
Ordinal data
• X is an ordinal variable with values: a1, a2, a3, ... ak, ... aK
• ‘ordinal’ means that:
a1 ≤ a2 ≤ a3 ≤ ... ≤ ak ≤ ... ≤ aK
• cumulative frequency at level k:
ck = sum of frequencies of values less than or equal to ak
ck = f1 + f2 + f3 + ... + fk
= (f1 + f2 + f3 + ... + fk-1 ) + fk
= ck-1 + fk
• also (%) cumulative relative frequency
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
2
CAS marks
100
25
90
80
20
70
60
%
%
15
50
40
10
30
20
5
10
0
0
2
4
6
8
10
12
14
16
CAS
% relative frequencies
18
20
0
0
2
4
6
8
10
12
14
16
18
20
CAS
% cumulative relative frequencies
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
3
Enzyme concentrations
Concentration
19.5 ≤ c < 39.5
39.5 ≤ c < 59.5
59.5 ≤ c < 79.5
79.5 ≤ c < 99.5
99.5 ≤ c < 119.5
119.5 ≤ c < 139.5
139.5 ≤ c < 159.5
159.5 ≤ c < 179.5
179.5 ≤ c < 199.5
199.5 ≤ c < 219.5
Freq.
1
2
7
7
7
3
2
0
0
1
Totals
30
Rel.Freq.
0.033
0.067
0.233
0.233
0.233
0.100
0.067
0.000
0.000
0.033
% Cum. Rel. Freq.
3.3%
10.0%
33.3%
56.6%
79.9%
89.9%
96.6%
96.6%
96.6%
100.0%
1.000
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
4
Cumulative histogram
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
5
Discrete two variable data
25
CS1012 CAS
20
15
10
5
0
0
5
10
15
20
25
CS1512 Assessment 1 CAS
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
6
Continuous two variable data
X
4.37
8.10
11.45
10.40
3.89
11.30
11.00
6.74
5.41
13.97
Y
24.19
39.57
55.53
51.16
20.66
51.04
49.89
35.50
31.53
65.51
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
7
Time Series
•Time and space are fundamental (especially time)
•Time series: variation of a particular variable with time
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
8
Summarising data
by numerical means
Further summarisation (beyond frequencies)
Measures of location (Where is the middle?)
• Mean
• Median
• Mode
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
9
Mean
_
sum of observed values of X
Sample Mean (X) =
number of observed values
x
=
n
use only for quantitative data
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
10
Sigma
Sum of n observations
n
 xi = x1
+ x2 + ... + xi + ... + xn-1 + xn
i=1
If it is clear that the sum is from 1 to n then:
 x = x1
+ x2 + ... + xi + ... + xn-1 + xn
Sum of squares
 x2 = x1 2
+ x22 + ... + xi2 + ... + xn-12 + xn2
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
11
 x from frequencies
If X is a categorical variable with values: a1, a2, a3, ... ak, ... aK
 x = x1
+ x2 + ... + xi + ... + xn-1 + xn
(order of summation isn’t important)
(e.g. piglets: 5 + 11 + 12 + 7 + + 8 + 14 + 7 + ... + 14 + ...)
Group together those x’s which have value a1, those with value a2, ...
 x = x..
+ x.. + x.. ... +
x.. + x.. ... +
...
x.. + x..
=
f1 * a1
+
f2 * a2
x’s which have value a1
x’s which have value a2
- there are f1 of them
- there are f2 of them
x’s which have value aK
- there are fK of them
+ ... +
fk * ak +
... +
fK * aK
K
=
 fk * ak
k=1
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
12
Mean
Litter size
ak
Frequency
Cum. Freq
K
x =  fk * ak
fk
k=1
5
6
7
8
9
10
11
12
13
14
Total
1
0
2
3
3
9
8
5
3
2
1
1
3
6
9
18
26
31
34
36
= 1*5 + 0* 6 + 2*7 + 3*8
3*9 + 9*10 + 8*11
5*12 + 3*13 + 2*14
= 375
_
X = 375 / 36
= 10.42
36
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
13
Median
Sample median of X = middle value when n sample observations
are ranked in increasing order
= the ((n + 1)/2)th value
n odd:
values:
183, 163, 152, 157 and 157
rank order: 152, 157, 157, 163, 183
median:
157
n even: values:
165, 173, 180, 164
rank order: 164, 165, 173, 180
median:
(165 + 173)/2 = 169
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
14
Median
Litter size
Frequency
5
6
7
8
9
10
11
12
13
14
1
0
2
3
3
9
8
5
3
2
Total
36
Cum. Freq
1
1
3
6
9
18
26
31
34
36
Median = 10.5
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
15
Median from cumulative distribution
cumulative % frequency polygon
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
16
Mode
Sample mode = value with highest frequency (may not be unique)
Litter size
5
6
7
8
9
10
11
12
13
14
Frequency
1
0
2
3
3
9
8
5
3
2
Cum. Freq
1
1
3
6
9
18
26
31
34
36
Mode = 10
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
17
Skew
left skewed
symmetric
right skewed
mean < mode
mean  mode
mean > mode
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
18
Variance
Measure of spread: variance
45
45
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
16
17
18
19
Variance
sample variance =
s2
sample standard deviation =
s
= √ variance
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
20
Variance and standard deviation
Litter size
ak
Frequency
fk
Cum. Freq
K
x2 =  fk * ak2
k=1
5
6
7
8
9
10
11
12
13
14
Total
1
0
2
3
3
9
8
5
3
2
36
1
1
3
6
9
18
26
31
34
36
= 1*25
=
+ 2*49 + 3*64
3*81 + 9*100 + 8*121
5*144 + 3*169 + 2*196
4145
x = 375
(x)2 / n = 375*375 / 36
=
3906
s2 = (4145-3906) / (36-1)
= 6.83
s = 2.6
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
21
Piglets
Mean
= 10.42
Median
= 10.5
Mode
= 10
Std. devn. = 2.6
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
22
Quartiles and Range
Lower quartile: value such that 25% of observations are below it (Q1).
Median: value such that 50% of observations are below (above) it (Q2).
Upper quartile: value such that 25% of observations are above it (Q3).
Range: the minimum (m) and maximum (M) observations.
Box and Whisker plot:
m
Q1
Q2
Q3
M
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
23
Estimating quartiles
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
24
Linear Regression
Calculate m and c so that
(distance of point from line)2 is minimised
y
y = mc + c
x
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
25
Time Series - Moving Average
Time
0
1
2
3
4
5
6
7
8
9
Y
24
18
27
22
28
34
31
45
38
35
3 point MA
*
23.0000
22.3333
25.6667
28.0000
31.0000
36.6667
38.0000
39.3333
*
• smoothing function
• can compute median, max, min, std. devn, etc. in window
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
26
Related documents