Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ © J R W Hunter, 2006 1 Ordinal data • X is an ordinal variable with values: a1, a2, a3, ... ak, ... aK • ‘ordinal’ means that: a1 ≤ a2 ≤ a3 ≤ ... ≤ ak ≤ ... ≤ aK • cumulative frequency at level k: ck = sum of frequencies of values less than or equal to ak ck = f1 + f2 + f3 + ... + fk = (f1 + f2 + f3 + ... + fk-1 ) + fk = ck-1 + fk • also (%) cumulative relative frequency www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 2 CAS marks 100 25 90 80 20 70 60 % % 15 50 40 10 30 20 5 10 0 0 2 4 6 8 10 12 14 16 CAS % relative frequencies 18 20 0 0 2 4 6 8 10 12 14 16 18 20 CAS % cumulative relative frequencies www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 3 Enzyme concentrations Concentration 19.5 ≤ c < 39.5 39.5 ≤ c < 59.5 59.5 ≤ c < 79.5 79.5 ≤ c < 99.5 99.5 ≤ c < 119.5 119.5 ≤ c < 139.5 139.5 ≤ c < 159.5 159.5 ≤ c < 179.5 179.5 ≤ c < 199.5 199.5 ≤ c < 219.5 Freq. 1 2 7 7 7 3 2 0 0 1 Totals 30 Rel.Freq. 0.033 0.067 0.233 0.233 0.233 0.100 0.067 0.000 0.000 0.033 % Cum. Rel. Freq. 3.3% 10.0% 33.3% 56.6% 79.9% 89.9% 96.6% 96.6% 96.6% 100.0% 1.000 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 4 Cumulative histogram www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 5 Discrete two variable data 25 CS1012 CAS 20 15 10 5 0 0 5 10 15 20 25 CS1512 Assessment 1 CAS www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 6 Continuous two variable data X 4.37 8.10 11.45 10.40 3.89 11.30 11.00 6.74 5.41 13.97 Y 24.19 39.57 55.53 51.16 20.66 51.04 49.89 35.50 31.53 65.51 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 7 Time Series •Time and space are fundamental (especially time) •Time series: variation of a particular variable with time www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 8 Summarising data by numerical means Further summarisation (beyond frequencies) Measures of location (Where is the middle?) • Mean • Median • Mode www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 9 Mean _ sum of observed values of X Sample Mean (X) = number of observed values x = n use only for quantitative data www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 10 Sigma Sum of n observations n xi = x1 + x2 + ... + xi + ... + xn-1 + xn i=1 If it is clear that the sum is from 1 to n then: x = x1 + x2 + ... + xi + ... + xn-1 + xn Sum of squares x2 = x1 2 + x22 + ... + xi2 + ... + xn-12 + xn2 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 11 x from frequencies If X is a categorical variable with values: a1, a2, a3, ... ak, ... aK x = x1 + x2 + ... + xi + ... + xn-1 + xn (order of summation isn’t important) (e.g. piglets: 5 + 11 + 12 + 7 + + 8 + 14 + 7 + ... + 14 + ...) Group together those x’s which have value a1, those with value a2, ... x = x.. + x.. + x.. ... + x.. + x.. ... + ... x.. + x.. = f1 * a1 + f2 * a2 x’s which have value a1 x’s which have value a2 - there are f1 of them - there are f2 of them x’s which have value aK - there are fK of them + ... + fk * ak + ... + fK * aK K = fk * ak k=1 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 12 Mean Litter size ak Frequency Cum. Freq K x = fk * ak fk k=1 5 6 7 8 9 10 11 12 13 14 Total 1 0 2 3 3 9 8 5 3 2 1 1 3 6 9 18 26 31 34 36 = 1*5 + 0* 6 + 2*7 + 3*8 3*9 + 9*10 + 8*11 5*12 + 3*13 + 2*14 = 375 _ X = 375 / 36 = 10.42 36 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 13 Median Sample median of X = middle value when n sample observations are ranked in increasing order = the ((n + 1)/2)th value n odd: values: 183, 163, 152, 157 and 157 rank order: 152, 157, 157, 163, 183 median: 157 n even: values: 165, 173, 180, 164 rank order: 164, 165, 173, 180 median: (165 + 173)/2 = 169 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 14 Median Litter size Frequency 5 6 7 8 9 10 11 12 13 14 1 0 2 3 3 9 8 5 3 2 Total 36 Cum. Freq 1 1 3 6 9 18 26 31 34 36 Median = 10.5 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 15 Median from cumulative distribution cumulative % frequency polygon www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 16 Mode Sample mode = value with highest frequency (may not be unique) Litter size 5 6 7 8 9 10 11 12 13 14 Frequency 1 0 2 3 3 9 8 5 3 2 Cum. Freq 1 1 3 6 9 18 26 31 34 36 Mode = 10 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 17 Skew left skewed symmetric right skewed mean < mode mean mode mean > mode www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 18 Variance Measure of spread: variance 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 16 17 18 19 Variance sample variance = s2 sample standard deviation = s = √ variance www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 20 Variance and standard deviation Litter size ak Frequency fk Cum. Freq K x2 = fk * ak2 k=1 5 6 7 8 9 10 11 12 13 14 Total 1 0 2 3 3 9 8 5 3 2 36 1 1 3 6 9 18 26 31 34 36 = 1*25 = + 2*49 + 3*64 3*81 + 9*100 + 8*121 5*144 + 3*169 + 2*196 4145 x = 375 (x)2 / n = 375*375 / 36 = 3906 s2 = (4145-3906) / (36-1) = 6.83 s = 2.6 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 21 Piglets Mean = 10.42 Median = 10.5 Mode = 10 Std. devn. = 2.6 www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 22 Quartiles and Range Lower quartile: value such that 25% of observations are below it (Q1). Median: value such that 50% of observations are below (above) it (Q2). Upper quartile: value such that 25% of observations are above it (Q3). Range: the minimum (m) and maximum (M) observations. Box and Whisker plot: m Q1 Q2 Q3 M www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 23 Estimating quartiles www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 24 Linear Regression Calculate m and c so that (distance of point from line)2 is minimised y y = mc + c x www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 25 Time Series - Moving Average Time 0 1 2 3 4 5 6 7 8 9 Y 24 18 27 22 28 34 31 45 38 35 3 point MA * 23.0000 22.3333 25.6667 28.0000 31.0000 36.6667 38.0000 39.3333 * • smoothing function • can compute median, max, min, std. devn, etc. in window www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ 26