Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4. Interpreting sets of data Cambridge University Press 1 G K Powers 2013 Grouped frequency tables 1. 2. 3. Classes or groups are listed in the first column in ascending order. The tally column shows the number of times a score occurs in a class. The frequency column shows the total count of the scores in each class. HSC Hint – Class centre is the middle and is calculated by adding the two extremes and dividing by 2. Cambridge University Press 2 G K Powers 2013 Cumulative frequency Cumulative frequency is the frequency of the score plus the frequency of all the scores less than that score. It is the progressive total of the frequencies. Score Frequency 18 19 20 21 1 5 3 7 Cumulative frequency 1 6 9 16 HSC Hint – The last number in the cumulative frequency column equals the total number of scores. Cambridge University Press 3 G K Powers 2013 Cumulative frequency graphs Cumulative frequency histogram Cumulative frequency polygon HSC Hint – Cumulative frequency polygon joins the top right corner of the rectangles in a cumulative frequency histogram. Cambridge University Press 4 G K Powers 2013 Mean Mean is a measure of the centre. It is calculated by summing all the scores and dividing by the number of scores. Mean = x x n fx x f ‒ ‘Sum of’ (Greek capital letter sigma) x ‒ A score or data value – Mean of a set of scores x n f Sum of scores Number of scores ‒ Total number of scores ‒ Frequency HSC Hint – Make sure all data has been cleared before using the calculator for statistics. Cambridge University Press 5 G K Powers 2013 Mode Mode is the score that occurs the most number of times. Score with the highest frequency. To find the mode: Determine the number of times each score occurs. Mode is the score that occurs the most number of times. If two or more scores occur the same number of times they are both regarded as the mode. HSC Hint – Data is called bimodal if it contains two modes. Cambridge University Press 6 G K Powers 2013 Median The median is the middle score or value. Cumulative frequency polygon is used to estimate the median. HSC Hint – Total number of scores is the value of the cumulative frequency for the last score or class. Cambridge University Press 7 G K Powers 2013 Range and interquartile range Range = Highest score – Lowest score Interquartile range is the difference between the first quartile and third quartile. (IQR Q3 Q1) To calculate the interquartile range (IQR) 1. Arrange the data in increasing order. 2. Divide the data into two equal-sized groups. If n is odd, omit the median. 3. Find Q1 the median of the first group. 4. Find Q3 the median of the second group. Calculate the interquartile range. (IQR Q3 Q1) HSC Hint – Interquartile range is not dependent on the 5. extreme values like the range. Cambridge University Press 8 G K Powers 2013 Standard deviation The standard deviation is a measure of the spread of data about the mean. Two calculations are used for standard deviation. Population standard deviation ( n ) is a better measure when we have all of the data or the entire population. Sample standard deviation ( n1) is the better measure when a sample is taken from a large population. HSC Hint – Population standard deviation or sample standard deviation can be used if it is not specified. Cambridge University Press 9 G K Powers 2013 Investigating sets of data Outlier is a score that is separated from the majority of the data. Outliers have little effect on the mean, median and mode for large sets of data. However, in small data sets, the presence of an outlier will have a large effect on the mean, smaller effect on the median and usually no effect on the mode. Shape of the graph is described in terms of smoothness, symmetry and the number of nodes. HSC Hint – An outlier is a score that is not close to any other scores. It is not typical. Cambridge University Press 10 G K Powers 2013 Symmetry and skewness No skew (symmetric) Data is symmetrical and balanced about a vertical line. Positively skewed Data is more on the left side. The long tail is on the right side. Negatively skewed Data is more on the right side. The long tail is on the left side. HSC Hint – Mean, mode and median are equal when the data is symmetrical. Cambridge University Press 11 G K Powers 2013 Number of modes Unimodal Data has only 1 mode or peak. Bimodal Data has 2 modes or peaks. Multimodal Data has many modes or peaks. HSC Hint – List all the modes if the data is multimodal. Cambridge University Press 12 G K Powers 2013 Double stem-and-leaf plots A stem-and-leaf plot has the tens digit of the data written in numerical order down the page. The ‘units’ digit becomes the ‘leaves’ and is written in numerical order across the page. HSC Hint – The numbers in the ‘leaves’ of a stem-andleaf plot must be written in increasing order. Cambridge University Press 13 G K Powers 2013 Double box-and-whisker plots A graph that uses five-number summary – lower extreme, lower quartile, median, upper quartile and the higher extreme. A double box-and-whisker graph has two sets of data. HSC Hint – To draw a box plot arrange the data in order before calculating the five-number summary. Cambridge University Press 14 G K Powers 2013 Radar charts A radar chart looks like a spider web and is used to compare the performance of one or more entities. HSC Hint – Line segments in a radar chart must be constructed accurately to ensure the information is valid. Cambridge University Press 15 G K Powers 2013 Area chart A graph consisting of different ‘areas’ each representing a data set over a period of time. The thickness of the area indicates the size of the data. HSC Hint – To read data from an area chart, draw a vertical line and estimate the difference between the heights. Cambridge University Press 16 G K Powers 2013 Comparison – Measures of location Mean Advantages Disadvantages Median Advantages Disadvantages Advantages Mode Disadvantages Cambridge University Press Easy to understand and calculate. Depends on every score. Varies least from sample to sample. Distorted by outliers. Not suitable for categorical data. Easy to understand. Not affected by outliers. May not be central. Varies more than the mean in a sample. Easy to determine Not affected by outliers Suitable for categorical data May be no mode or more than one mode. May not be central 17 G K Powers 2013 Comparison – Measures of spread Advantages Range Disadvantages Interquartile range Advantages Disadvantages Standard deviation Advantages Disadvantages Cambridge University Press Easy to understand. Easy to calculate. Dependent on the smallest and largest values. May be distorted by outliers. Easy to determine for small data sets. Easy to understand. Not affected by outliers. Difficult to calculate for large data sets. Dependent on lower and upper quartiles. Data needs to be sorted. Depends on every score. Not affected by outliers. Difficult to determine without a calculator Difficult to understand. 18 G K Powers 2013 Two-way tables A two-way table presents data using rows and columns. Data in a cell is interpreted by reading the headings for the row and the column. HSC Hint – Calculate the totals across each row and down each column. Add the totals horizontally and vertically. The results of these calculations should be equal. Cambridge University Press 19 G K Powers 2013