Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Exercises 1C Solutions Question 1C.1 Consider the following data set: 78 41 100 47 71 51 22 60 1 41 24 45 50 76 42 23 21 10 46 (i) Is this data set left-skewed, right-skewed or symmetric? To answer this, we sort the data into ascending order and informally cluster the data. There appears to be a central piece 40-60, a set on either side of this 20s and 70s and some outliers. It appears that the data is reasonable symmetric 1 21 41 71 100 10 22 41 76 23 42 78 24 45 46 47 50 51 60 (ii) Construct a 4-bar histogram for the data set. Would you describe the data as leftskewed, right-skewed or symmetric? The data values lie in the range 1-100 and so dividing this into four gives the bins 1-25, 26-50, 51-75 and 76-100. The corresponding frequencies are 6, 7, 3, 3. The histogram is right-skewed. 8 7 6 5 4 3 2 1 0 1-25 26-50 51-75 76-100 (iii) Construct a 5-bar histogram for the data set. Would you describe the data as leftskewed, right-skewed or symmetric? The data values lie in the range 1-100 and so dividing this into five gives the bins 1-20, 21-40, 41-60 and 61-8, 81-100. The corresponding frequencies are 2, 4, 9, 3, 1. The histogram is symmetric. 10 8 6 4 2 0 1-20 21-40 41-60 61-80 81-100 Question 1C.2 Which of the following situations give a distribution that is skewed-to-the-right, skewed-to-the-left, unimodal symmetric, bimodal symmetric? a) Distances completed by people competing in a marathon. It is reasonable to assume that most people entering will finish the course, but some will have overestimated their abilities or fallen ill on the day. It is also reasonable to assume that more will get close to the end than will give up early. This gives a distribution that is right-skewed. b) Scores on an easy quiz given the day before spring break when only half the class shows. Half the students in the class are absent and get zeros. Because the quiz is easy those that show are likely to do well, so we expect a bimodal distribution. c) Money taken at the box office by movies released in a given year. Most movies are flops; only a few are very successful. This gives a left-skewed distribution. d) Heights of 8 year old girls. These are likely to be symmetric about the average height with just as many really tall girls as really short girls. Question 1C.3 Find the mean and median of the data presented in Question 1C.1. That is of the data set: 78 41 100 47 71 51 22 60 1 41 24 45 50 76 42 23 21 10 46 (i) For the mean we sum these values to get 849. There are 19 data values and so the mean is 849/19 = 44.68. (ii) The data was ordered in the solution to Question 1B.1: 1 46 10 47 21 50 22 51 23 60 24 71 41 76 41 78 42 100 45 The middle number, 45, is the median Question 1C.4 Of the two averages computed in Question 1C3, which average is better? The mean of 44.68 and the median of 45 are very similar. Neither choice is better. Question 1C.5 Carry out a 5-number summary for the data provided in Question 1C.1. Use the information to construct a boxplot. The data set was ordered in the solution to Question 1B.3 1 46 10 47 21 50 22 51 [23] [60] 24 71 41 76 41 78 42 100 [45] Since there are 19 data values, 9 lie below the mean and 9 above the mean. To find the two quartiles we need the middle value of these two sets, namely 23 and 60. The 5number summary is therefore given by: Low =1 LQ=23 Median=45 UQ=60 High=100 This information looks quite symmetric as a boxplot: Question 1C.6 Find the mean and standard deviation for the following sets of data: a) 0, 1, 3, 5, 7, 7, 9, 11, 13, 14 The mean is 70/10=7. For the standard deviation we set up the table Value Deviation from Mean 0 1 3 5 7 7 9 11 13 14 7 6 4 2 0 0 2 4 6 7 Squared Deviation 49 36 16 4 0 0 4 16 36 49 The sum of the squared deviations is 210. Divide this by 9 to get a variance of 210/9 and take a square root to get a standard deviation of 4.83. b) 0, 0, 0, 0, 7, 7, 14, 14, 14, 14 The mean is 70/10=7. For the standard deviation we set up the table Value 0 0 0 0 7 7 14 14 14 14 Deviation from Mean 7 7 7 7 0 0 7 7 7 7 Squared Deviation 49 49 49 49 0 0 49 49 49 49 The sum of the squared deviations is 392. Divide this by 9 to get a variance of 392/9 and take a square root to get a standard deviation of 6.60 Question 1C.7 A data set is such that its mean is 20, and all of its values lie between 0 and 10 or between 30 and 40. Which is correct? The standard deviation is a) less than 0 b) between 0 and 5 c) between 5 and 10 d) greater than 10 Since the mean is 20 and all data points lie between 10 and 20 units from this, the standard deviation must also lie in this range. So d is correct.