Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LBSRE1021 Data Interpretation Lecture 3 Location and Dispersion OBJECTIVES State some basic definitions Prepare a frequency distribution using given data. Plot a frequency distribution as a bar chart. Explain the terms Location and Dispersion OBJECTIVES C a lc u la te th e : a rith m e tic m e a n s ta n d a rd d e v ia tio n c o e ffic ie n t o f v a ria tio n m e d ia n q u a rtile s m ode fo r a g iv e n s e t o f d a ta Definition Statistics: The organisation of data to enable meaningful analysis. Definitions Population: Every member of the group in which you are interested e.g. the consumers of a certain product, the employees of a company Sample: A sub-set of the population on which measurements may be made that approximate to those of the population More definitions Variable: Something which can measured or counted. eg. age, weight, salary, no. children Frequency: The number of times particular value of a variable occurs. be a Frequency Distribution eg. Age of students on AE1021 Age (the variable) 18 19 20 21 22 and over TOTAL No. Students (the frequency) 102 116 63 41 30 352 Grouped Frequency Distribution Lengths of 100 copper pipes Length(cm) 10 but under 20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 100 Frequency 3 7 10 16 34 13 7 6 4 Bar Chart No. Pipes Bar Chart for Lengths of Pipes 40 35 30 25 20 15 10 5 0 10 to 20 to 30 to 40 to 50 to 60 to 70 to 80 to 90 to 20 30 40 50 60 70 80 90 100 Length (cm) Measures of Location Averaging of Data Summarise data to a single statistic Allows comparison of e.g. average incomes average rainfall average expenditure Measures of Location: Mean Arithmetic Mean eg. for 5, 7, 9, 10 5 + 7 + 9 + 10 = 7.75 4 in general: x = x n Measures of Location: Median Median Arrange numbers in ascending order. Item in centre is median. Unaffected by extremes e.g. for these data 11, 14, 14, 21, 25, 27, 30 the median is 21 Measures of Location: Mode Mode The value which occurs most often. For the data on the previous slide the mode is 14. Dispersion Mean gives us the location, Need a measure of dispersion or spread of the data Range eg. Month J F M A M J Average Price (p) 155 143 144 139 140 141 Range 155 - 139 Problem - = 16p concerned only with extremes Dispersion: Quartiles Histogram of bolt lengths 120.00% 100.00% 15 80.00% 10 60.00% 40.00% 5 20.00% Length cm <= 8 M 5 or e 75 80 65 70 55 60 .00% 45 50 0 35 40 Frequency 20 Dispersion :Quartiles u u u u u u QUARTILES The quartiles are four values in a data set. First Quartile (Q1) is a value such that 25% of items in the data set have this value or less. Second Quartile (median) is a value such that 50% of items in the data set have this value or less. Third Quartile (Q3) is a value such that 75% of items in the data set have this value or less. Fourth Quartile is a value such that 100% of items in the data set have this value or less. Quartiles: Example u Switchboard data refers to the number of telephone calls to a switchboard each hour for two days: – Day 1: 26 25 30 31 27 27 26 29 – Day 2: 30 32 27 28 26 27 28 29 u To find the first quartile and the third quartile arrange the 16 data items in ascending order as follows: – 25 26 26 26 27 27 27 27 28 28 29 29 30 30 31 32 Quartiles:Example u u u The first quartile will be the value of the (n + 1) * 1/4 item. The third quartile will be the value of the (n + 1) * 3/4 item. In this case Q1 will be the value of the 4.25th item. By interpolation, the first quartile will be 26.25 – u Q3 will be the value of the 12.75th item. So the third quartile will be 29.75 – u between 26 and 27. between 29 and 30. (Q2 is the value of the (n + 1)/2 item, i.e. the Median) Deviation In a set o f d ata each item d ev iates fro m th e m ean eg . fo r th e d ata ab o v e th e m ean ( x ) is 143.67 each item d ev iates fro m th e m ean b y ( x - x ) d ata 155 143 144 139 140 141 an d m ean 143.67 143.67 143.67 143.67 143.67 143.67 (x -x )= 0 (x -x ) 11.33 -0.67 0.33 -4.67 -3.67 -2.67 Dispersion: Variance u Deviations from the mean (x - x) are :- +ve and -ve :- sum to zero u (x - x )2 is always +ve u Average of square of deviations is the u Variance = ( x - x )2 n Standard Deviation Square root of Variance is the Standard Deviation s.d. = (x - x )2 n = x2 n x n 2 S.D. Example x2 Price (x) 155 143 144 139 140 141 862 s.d. = 24025 20449 20736 19321 19600 19881 124012 124012 6 - 862 6 2 = 5.34 Central Tendency The mean and Standard Deviation describe the area of central tendency and spread of data Mean (Mode and Median same if symmetrical) Spread Coefficient of Variance Standard Deviation / Arithmetic Mean expressed as % Shows the dispersion relative to the mean e.g. Comparing two data sets with the means and s.d. given below, what conclusions can you draw? Data set Mean s.d. Coeff of Var 1 100 20 20% 2 50 15 30%