Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Homework #2: Due Friday Sept. 10, 2004 1. A clinician administered the Liebowitz Social Anxiety Scale to a sample of 12 individuals. People with “moderate” social anxiety score between 55-65, people with “marked’ social anxiety score between 65-80, people with “severe” social anxiety score between 80-95, and people with “very severe” social anxiety score above 95. The data are shown below: 54, 58, 68, 70, 64, 82, 80, 65, 60, 71, 64, 116 a). What is the mean, median, and mode of these data? (3 points) Mean X n 54 58 68 70 64 82 80 65 60 71 64 116 852 71 12 12 Median location = (n +1) / 2 = (12 + 1)/2 = 13/2 = 6.5 Values in order: 54, 58, 60, 64, 64, 65, 68, 70, 71, 80, 82, 116 Median = 6.5 up from the bottom = average of 65 and 68 = 66.5 Mode = most frequently occurring score = 64 b). What is the range, interquartile range, variance, and standard deviation of these data? (4 points) Range = Largest score – smallest score = 116- 54 = 62 Drop the median location fraction first Median location = (n + 1)/ 2 = (12 + 1) / 2 = 6.5 Quartile location = (median location + 1)/2 = (6 + 1) / 2 = 3.5 Values in order: 54, 58, 60, 64, 64, 65, 68, 70, 71, 80, 82, 116 Q1= 3.5 up from bottom = average of 60 & 64 = 62 Q3 = 3.5 down from top = average of 71 & 80 = 75.5 IQR = Q3 – Q1 = 75.5 – 62 = 13.5 Sample Variance using the computational formula: s2 = X X 2 2 X n 1 2 n (54) 2 (58) 2 (68) 2 (70) 2 (64) 2 (82) 2 (80) 2 (65) 2 (60) 2 (71) 2 (64) 2 (116) 2 2916 3364 4624 4900 4096 6724 6400 4225 3600 5041 4096 13456 63442 X (54 58 68 70 64 82 80 65 60 71 64 116) 852 s2 = 2 852 63442 12 12 1 63442 60492 268.18 11 Using the definitional formula: s2 = (X X) 2 n 1 We get the same answer using either formula = (54 71) 2 (58 71) 2 (68 71) 2 (70 71) 2 (64 71) 2 (82 71) 2 (80 71) 2 (65 71) 2 (60 71) 2 (71 71) 2 (64 71) 2 (116 71) 2 12 1 = (289 169 9 1 49 121 81 36 121 0 49 2025) = 2950 268.18 11 11 Sample Standard deviation: s s 2 268.18 16.38 2. Construct a boxplot of the data reported in question 1. (1 point) * 115 Outlier: 116 110 105 End Upper Whisker: 82 100 95 90 Q3: 75.5 85 80 75 Median: 66.5 70 65 60 55 50 End Lower Whisker: 54 Q1: 62 a). Describe/identify what each part of the plot signifies. (1 point) See plot above. Note the end upper whisker is largest value Q3 + (1.5)(IQR) = largest value 75.5 + (1.5)(13.5) = 95.75 Therefore, the end upper whisker value is 82 The end lower whisker is the smallest value Q1 - (1.5)(IQR) = smallest value 62 - (1.5)(13.5) = 41.75 Therefore, the end lower whisker value is 54 b). Is the distribution of anxiety scores symmetric, positively, or negatively skewed, and how can you tell? (1 point) The distribution is positively skewed. I can tell that the distribution is not symmetric because the black line inside the box is not centered inside. If the distribution was symmetric, the 1st and 3rd quartiles would be about equally close to the median. In terms of the box plot, this would be true if the black line inside the box (the median) was centered inside. In this case though, the line representing the median is closer to the bottom of the box (the 1st quartile) than the top of the box (the 3rd quartile). I can tell that the skew is positive because the outlier is an extreme large number. Thus, the tail of the distribution would be extended out toward this large number, or if plotted as a histogram, to the right. 3. Remove the outlier from the dataset. a). Calculate the mean, median and mode based on this new dataset. (3 points) Mean X n 54 58 68 70 64 82 80 65 60 71 64 736 66.91 11 11 Median location = (n +1) / 2 = (11 + 1)/2 = 12/2 = 6 Values in order: 54, 58, 60, 64, 64, 65, 68, 70, 71, 80, 82 Median = 6 up from the bottom = 65 Mode = most frequently occurring score = 64 b). Compared to the answers you obtained for question 1, which values changed substantially, which changed very little, and which stayed the same and why? (2 points) The mode is exactly the same as before. The median changed slightly, by 1.5 units. The mean changed more significantly, by 4.09 units. The median and mode are fairly resistant to outliers. A single outlying value will never change the mode, because it is just the most frequently occurring score. The median is the middle location in the distribution. It ignores how far values are away from it. So even though an outlying value is quite far away from the median, the median is not affected by the magnitude of that distance. Thus, outliers have little influence on medians. The mean is not very resistant to outliers. The mean is the mathematical center of the distribution. It is constructed such that the sum of the deviations around it will equal zero. Thus, it is the “balancing point” of the distribution. Because the mean is the value for which the sum of the deviations is zero, the mean is sensitive to how far values are away from it. Therefore, an outlying value will pull the mean toward it. c). Calculate the range, interquartile range, and variance based on this new dataset (3 points) Range = Largest score – smallest score = 82- 54 = 28 Median location = (n + 1)/ 2 = (11 + 1) / 2 = 6 Quartile location = (median location + 1)/2 = (6 + 1) / 2 = 3.5 Values in order: 54, 58, 60, 64, 64, 65, 68, 70, 71, 80, 82 Q1= 3.5 up from bottom = average of 60 & 64 = 62 Q3 = 3.5 down from top = average of 70 & 71 = 70.5 IQR = Q3 – Q1 = 70.5 – 62 = 8.5 Above, I illustrated that the definitional and computational methods yield the same answer. So this time, I will only use the computational formula (though using either formula is correct). s2 = X 2 2 X n 1 n X 2 (54) 2 (58) 2 (68) 2 (70) 2 (64) 2 (82) 2 (80) 2 (65) 2 (60) 2 (71) 2 (64) 2 2916 3364 4624 4900 4096 6724 6400 4225 3600 5041 4096 49986 X (54 58 68 70 64 82 80 65 60 71 64) 736 s2 = 49986 7362 11 1 11 49986 49245.09 74.091 10 d). Compared to the answers you obtained for question 1, which values changed substantially and which changed very little and why? (2 points) The range changed substantially (from 62 in problem 1 to 28 now). By definition, the range is the difference between the two most extreme scores in the distribution. When there is a single, extreme outlier, its removal will have an obvious and direct impact on the range. The variance also changed substantially (from 268.16 in problem 1 to 74.091 now). The variance is calculated by finding the average squared distance between the data points and the mean. An outlier, by definition, is a far distance from the mean value. Thus, its squared distance from the mean is quite large and inflates the variance estimate. Therefore, removal of an outlier will reduce the variance. The IQR changed somewhat (from 13.5 in problem 1 to 8.5 now), but not nearly as much as the variance or range did. The IQR is fairly resistant to outliers. This is because the IQR is the distance between the 3rd and 1st quartiles. In other words, the IQR captures the middle 50% of the scores in the distribution. The IQR ignores completely the extreme scores by focusing exclusively on values in the middle 50% of the distribution. Therefore, the existence of an outlier will have little impact on the calculation of the IQR.