Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Advanced Higher Geography Descriptive Statistics Descriptive statistics include: Types of data Measures of central tendency Measures of dispersion Types of data (1) Nominal data: data that has names. eg: rock types (sedimentary, igneous or metamorphic). Ordinal data: data that can be placed in ascending or descending order. eg: settlement type (city, town, village & hamlet). Types of data (2) Interval data: data with no true zero. Very uncommon so don’t worry about it. Ratio data: most numerical data. Central Tendency When you calculate the central tendency of a data set you calculate its average. The measurements used for calculating central tendency include the mean, the mode and the median. The Mean Calculating the mean is one of the commonly used statistics in geography. It is found by totalling the values for all observations (∑x) and dividing by the total number of observations (n). The formula for finding the mean is: Mean = ∑x n The Median The median is the middle value when all of the data is placed in ascending / descending order. Where there are two middle values we take the average of these. The Mode The mode is the number that occurs the most often. Sometimes there are two (or more) modes. Where there are two modes the data is said to be bi-modal. 5 mins ©Microsoft Word clipart Find the mean, median and mode of the following data. The weekly pocket money for 9 first year pupils was found to be: 3 – 12 – 4 – 6 – 1 – 4 – 2 – 5 – 8 Mean 5 Median 4 Mode 4 Groups of data Sometimes the data we collect are in group form. Slope Angle (°) Midpoint (x) Frequency (f) Midpoint x frequency (fx) 0-4 2 6 12 5-9 7 12 84 10-14 12 7 84 15-19 17 5 85 20-24 22 0 0 n = 30 ∑(fx) = 265 Total Finding the mean is slightly more difficult. We use the midpoint of the group and multiply this by the frequency. Slope Angle (°) Midpoint (x) Frequency (f) Midpoint x frequency (fx) 0-4 2 6 12 5-9 7 12 84 10-14 12 7 84 15-19 17 5 85 20-24 22 0 0 n = 30 ∑(fx) = 265 Total The mean is: ∑(fx)/n = 265 / 30 = 8.8 Which is in the 5 – 9 group Slope Angle (°) Midpoint (x) Frequency (f) Midpoint x frequency (fx) 0-4 2 6 12 5-9 7 12 84 10-14 12 7 84 15-19 17 5 85 20-24 22 0 0 n = 30 ∑(fx) = 265 Total We cannot find the mode for grouped data but we can find the modal group. The modal group. The modal group is the group that occurs most frequently (ie: 5-9 group). Your turn Read page 25 – 29 of ‘Geographical Measurements and Techniques: Statistical Awareness, LT Scotland, June 2000. Answer questions 1 & 2 from Task 4 in this book. The Interquartile Range The interquartile range consists of the middle 50% of the values in a distribution; 25% each side of the median (middle value). This calculation is useful because it shows how closely the values are grouped around the median. The benefits It is easy to calculate It is unaffected by extreme values It is a useful way of comparing sets of similar data. Interquartile range We know that the median divides the data into two halves. We also know that for a set of n ordered numbers the median is the (n + 1) ÷ 2 th value. Similarly, the lower quartile divides the bottom half of the data into two halves, and the upper quartile also divides the upper half of the data into two halves. Lower quartile is the (n + 1) ÷ 4 Upper quartile is the 3 (n + 1) ÷ 4 Question Box and whisker diagrams A box and whisker plot is used to display information about the range, the median and the quartiles. It is usually drawn alongside a number line, as shown: Box and whisker Drawbacks It can be a laborious process to calculate the location of the quartiles, especially when there is a large number of data within the set. It does not give any indication of how the entire data set is distributed, just the limits of the middle 50% of the data Not all values are considered and hence a false impression may be given of the data set being analysed, Standard Deviation You could have 2 sets of data that produce the same mean, but the data may have a very different range of values within them. Standard Deviation is a tool that produces a figure indicating the extent to which the data is clustered around the mean. The Normal distribution curve The normal curve assumes Data in your sample follows the simple distribution around the mean. The standard deviation gives important information as it indicates the shape of the normal curve. If the SD is large then it suggests a wide spread of data around the mean and a flatter, wider normal distribution curve. If the SD is small, it suggests a steep and narrow normal distribution curve and a narrow spread around the mean. A smaller SD suggests a more reliable mean. There is likely to be few extreme values. It is also useful for comparing two sets of data that may have similar means but quite different ranges of data within each set.