Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SOLUTIONS TO HOMEWORK 01 SECTION 1.1 (NOT means it’s an extra problem that I included, but it’s NOT to be graded) NOT 1.15 10 11 12 13 14 15 | | | | | | 9 0 1 0 1 2 3 0 1 4 4 1 2 4 4 2 2 7 6 4 2 8 7 5 3 9 7 5 4 9 8 8 9 5 6 6 7 8 9 9 9 9 4 4 4 5 7 8 9 9 a.) 10.9% and 11.0% are the smallest values in the dataset. b.) The shape is rather symmetric if you ignore the 2 smallest. The center of the distribution is about 13.9%. The spread is 1 or 2% from center. 1.18 The shape is skewed to the right. This means that there are more short words (34 letters), but there are a few quite long words (>10 letters). We would expect the distribution of other authors to be similar, because short words are common. Notice: we could define center as: 1) halfway between 0 – 12; 2) the 50th %tile = x = 4 (you can stack the bars 5+17+23 = only 45, so 4, the next bar, contains the median); 3) the mode = the tallest bar = 4; or 4) the mean = x , which is too complicated to calculate from the graph, but it would be 4 or more since the data is skewed to the right. Whichever we choose, because the center is fairly close to zero (some-where around 4), word lengths can be only a little less than the center but can be much greater than the center, hence the tail on the right. 1.25 These are modified stemplots from SPSS to show the ‘extremes’ in a different way and to make them easier to compare by ‘lining up’ the stems. Stem-and-Leaf Plot for GENDER= F Frequency 0.00 2.00 8.00 15.00 4.00 0.00 0.00 1.00 Stem 0 0 1 1 2 2 3 3 Stem width: Each leaf: 100 & Leaf . . 69 . 12222222 . 555578888888888 . 0444 . . . 6 1 case(s) Stem-and-Leaf Plot for GENDER= M Frequency 6.00 8.00 7.00 3.00 5.00 0.00 1.00 0.00 Stem 0 0 1 1 2 2 3 3 Stem width: Each leaf: 100 & . . . . . . . . Leaf 033334 66679999 2222222 558 00344 0 1 case(s) It is now easy to see that the center for women is larger and more concentrated than that for men, plus the outlier in women looks more extreme. In fact, without the outlier, women’s times look fairly normal. Men’s times are skewed to the right. a.) The times are in multiples of 10 minutes probably because it is difficult to estimate to an exact minute. One woman claimed to study 360 minutes (= 6 hours) a night. b.) Mid for Men = 120, Mid for Women = 180 the true median is shown with the box . 1.27 The stemplot and histogram show a low outlier (4.88) and otherwise a mound-shaped distribution from 5.07 to 5.85, with the center being 5.44 and 5.46. The histogram seems to show two low areas (actually three measurements), then a larger peak at 5.3, with irregular peaks extending up to 5.8. Using the center as the estimate for the density: from the histogram, I would estimate the density to be between 5.38 and 5.5; from the stemplot, I would estimate 8 5.46, the median. Stem-and-Leaf Plot Frequency Stem & Leaf 6 1.00 .00 1.00 1.00 4.00 5.00 4.00 5.00 5.00 2.00 1.00 48 49 50 51 52 53 54 55 56 57 58 . . . . . . . . . . . 8 7 0 6799 04469 2467 03578 12358 59 5 4 2 Std. Dev = .22 Mean = 5.45 Stem width: Each leaf: .10 1 case(s) N = 29.00 0 4.88 5.00 5.13 This is the data in each bar: 4.88 It doesn’t exactly match the stemplot, but it’s close. 5.25 5.38 5.50 5.63 5.75 5.88 5.07 5.26 5.34 5.44 5.57 5.75 5.85 5.10 5.27 5.34 5.46 5.58 5.79 5.29 5.36 5.47 5.61 5.29 5.39 5.50 5.62 5.30 5.42 5.53 5.63 5.55 5.65 5.68 NOT 1.32 Both the stemplot and the histogram, suggest that the midpoint is somewhat above 100 and is more like 109. modified Stem-and-Leaf Plot Frequency Stem & 4.00 7 . 2.00 8 . 8.00 9 . 22.00 10 . 27.00 11 . 12.00 12 . 3.00 13 . Stem width: Each leaf: 20 Leaf 2479 69 01336778 0022333344555666777789 000011112222333444455688999 003344677888 10 026 10 1 case(s) Shape: slightly skewed to the left Center: x = 110 x = 108.9 Spread: 15 to 20 Std. Dev = 13.17 Mean = 108.9 N = 78.00 0 70.0 80.0 75.0 90.0 85.0 100.0 95.0 110.0 105.0 120.0 115.0 130.0 125.0 135.0 SECTION 1.2 1.46 10 11 12 13 14 15 | | | | | | 9 0 1 0 1 2 3 0 1 4 4 1 2 4 4 2 2 7 6 4 2 8 7 5 3 9 7 5 4 9 median = middle number, if there is an even number of points, average the 2 8 8 9 middle numbers; if odd it’s THE middle number 5 6 6 7 8 9 9 9 9 4 4 4 5 7 8 9 9 a.) The 25th and 26th largest values are 13.9 (rounded) so the median is 13.9. Q1 is the median for the lower half = 13.0. Q3 is the median of the upper half = 14.4. b.) IQR = Q3 Q1 = 14.4 13.0 = 1.4. To find out if there are any outliers, you use the 1.5*IQR Rule: Q3 + 1.5*1.4 = 14.4 + 2.1 = 16.1, so there are no outliers above Q1 1.5*1.4 = 13.0 2.1 = 10.9, so there are no outliers below (barely). Had there been a 10.8, it would have been an outlier. c.) Omitting the 2 smallest points, Montana and Wyoming, the median will move up one place in the list, but that is still 13.9. 1.47 (refer to the graph in 1.18) The method for finding the median from a histogram is explained in 1.18. The min = 1, Q1 = 2, x = 3 or 4, Q3 = 5, max (given) = 12. You can find Q1 by stacking the bars until you reach 25% and Q3 until you reach 75%. 1.53 with DC De scriptives Lower Bound Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Sk ewness Kurtos is Std. Error 12.6862 20 269.6770 232.7516 225.0000 8207.881 90.5974 154 737 583 47.0000 3.570 17.243 10 Frequency Mean 95% Confidenc e Interval for Mean Histogram Statistic 244.1961 218.7152 Std. Dev = 90.60 .333 .656 Mean = 244.2 N = 51.00 0 150.0 250.0 200.0 without DC 400.0 550.0 500.0 650.0 600.0 750.0 700.0 Lower Bound Upper Bound 14 St atist ic 234.3400 217.9647 St d. Error 8.1487 229.5667 224.5000 3320.025 57.6197 154 412 258 46.7500 1.414 1.967 12 10 250.7153 8 6 4 Frequency 5% Trimmed Mean Median Variance St d. Deviat ion Minimum Maximum Range Int erquartile Range Sk ewness Kurtos is 300.0 450.0 Histogram De scri ptives Mean 95% Confidenc e Int erval for Mean 350.0 Std. Dev = 57.62 2 Mean = 234.3 .337 .662 N = 50.00 0 160.0 200.0 180.0 240.0 220.0 280.0 260.0 320.0 300.0 360.0 340.0 400.0 380.0 420.0 Percentiles Weighted Average(Definition 1) 5 10 25 Percentiles 50 75 90 95 165.2000 171.2000 199.5000 224.5000 246.2500 334.7000 379.8500 200.0000 224.5000 246.0000 Tukey's Hinges a.) Mean, x = 234.34 (down from 244.2 since DC is an outlier above), StDev, s = 57.6 (down from 90.6, again because of DC), Min = 154 (still the same since DC was on the other end), Q1 = 199.5 (almost the same as 200 since we only dropped 1 point), x = 224 (down only slightly from 225 as opposed to how much the mean changed), Q3 = 246.25 (again, down only slightly from 247), Max = 412 (down considerably since 737 was DC, the outlier that we dropped). Even without the outlier, the distribution is still skewed, so the median and IQR are preferred summary numbers. b.) Mean and StDev do not reveal skewness, so the 5-number-summary is a better representation. You can ‘see’ the skewedness by noticing that the minimum is closer to Q1 (199.5 165.2 = 34.3) than the maximum is to Q3 (412 246.25 = 165.75), but is still doesn’t show the gaps. 1.62 The total of the observations is 11200. Divided by 7, this gives a mean of 1600 = x/7 = x . Subtracting this from each observation gives the following table: data Diff Squares 1792 192 36864 1666 66 4356 1362 -238 56644 1614 14 196 1460 -140 19600 1867 267 71289 1439 -261 25921 0 214872 Totals 11200 Dividing total Squares by 6 gives 35812 = s2 = ((x x )2)/(n1) = 214872/6. Taking square root yields 189.24 = s. Descriptives Mean 95% Confidence Interval for Mean Lower Bound Upper Bound 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic 1600.0000 1424.9825 1775.0175 1598.3889 1614.0000 35811.667 189.2397 1362 1867 505 353.0000 .207 -1.498 Std. Error 71.5259 .794 1.587 NOT 1.65 a.) Choose all four to be the same, such as {1,1,1,1}. To have a standard deviation of 0 means that you have NO spread at all. It doesn’t matter what the value is, it will be the mean and the difference between the mean and any other value will be 0 (since they will all be equal to the mean). b.) We want the greatest spread; {0,0,10,10} is the answer. Adding more 0’s and 10’s will increase the standard deviation. Adding other numbers (not 0 or 10) will actually decrease the standard deviation since on average the distance to the mean will be smaller. c.) There are many answers to a.) but only one to b.) 1.68 De scriptive Statistics 20 N 83 83 Valid N (listwise) Minimum -26.00 Maximum 19.20 Mean 1.9072 Std. Deviation 7.4853 a.) Mean, x = 1.9072%. SD, s = 7.4853 New value = investment + average rate of return*investment = $100 + 1.91%($100) = $101.91 10 b.) New value = $100 + (-26.6%)($100) = $100 $26.60 = $74.40. Std. Dev = 7.49 Mean = 1.9 N = 83.00 0 -25.0 -22.5 -17.5 -12.5 De scriptive Statistics N Valid N (listwise) 82 82 Minimum -14.00 Maximum 19.20 -20.0 -15.0 -10.0 Sum 184.30 Mean 2.2476 -5.0 -7.5 -2.5 0.0 5.0 2.5 10.0 7.5 15.0 12.5 20.0 17.5 Std. Deviation 6.8548 Mean, x = 2.2476. SD, s = 6.8548. Omitting one point will not change the median or quartiles by much. (Look at the graph) NOT 1.72 a.) Since this is just a scale change, the shift, a = 0. If there are 0.62m/km, we would need to multiply the miles by the scale change, b = 1/0.62. So, kilometers = (1/0.62)*65miles = 104.84. b.) 746 watts = 1 horsepower watts = 0 + 746*hp a = 0, b = 746 So, watts = 746*140-hp = 104,440 or 140*742watts = 140hp 1-4,440 watts = 140hp 1.74 Descriptives Statistic Std. Error Mean 5.4479 4.103E-02 95% Confidence Interval for Mean Lower Bound 5.3639 Upper Bound 5.5320 5% Trimmed Mean 5.4549 Median 5.4600 Variance 4.882E-02 Std. Deviation .2209 Minimum 4.88 Maximum 5.85 Range .97 Interquartile Range .3200 Skewness -.468 .434 Kurtosis .354 .845 a.) Mean, x = 5.4479, SD, s = 0.2209 b.) Cavendish found the density of the earth to be 5.5 times the density of water. This density is 62.43 lb/cu-ft, so his value of 5.5 is 5.5*62.43 = 343.365 lb/cu-ft. The mean is 340.11 lb/cu-ft (5.4479*62.43) and the standard deviation is 13.79 lb/cu-ft (0.2209*62.43). Remember, scale changes affect BOTH locations and spreads. SECTION 1.3 1.79 a.) In order for the total area to be 1, a the width of 2 means the height must be ½. b.) Half the area is to the left of 1, so half the outcomes are less than 1. c.) The area is (1.3-0.5) * 0.5 = 0.4. 1.81 a.) Mean is C, median is B. Skewed right means mean, x > median, x . b.) Mean, x , and Median, x , are A since the distribution is symmetric. c.) Mean, x , is A, median, x , is B since the distribution is skewed to the left, x < x . 1.83 a.) 99.7% falls within 3sd’s of the mean 3* = 336 3*3 = 327 to 345 days b.) 336 + 1*3 = 339, so 339 days is 1 standard deviation above the mean. It’s really asking what percent of the distribution is above 339. The z-score = (339 336)/3 = 1. We know that 68% of the distribution is within 1, so 32% falls outside. The percent of the distribution more than 1 standard deviation above the mean = percent of the distribution more than 1 sd below the mean, so half of 32% = 16% falls above 339. 1.87 Cobb’s batting avg = 0.420 z = (0.420-0.266)/0.371 = 4.15 Williams’s = 0.406 z = (0.406-0.267)/0.0326 = 4.26 Brett’s = 0.390 z = (0.390-0.261)/0.0317 = 4.07 All are over 4 standard deviations over the mean. The three stand close together, an astounding four standard deviations above the typical hitter. (Williams has a slight edge, but perhaps not large enough to declare him “the best.”) Notice that although Cobb’s average is higher than Williams’, it’s not relatively higher. Williams actually did better vs. his peers than Cobb did. Also, even though Brett’s is 0.03 (almost 10%) lower than Cobb’s, it’s not but 0.08th of a standard deviation less (closer to the mean). 1.88 Draw a curve, locate the point on the line, then shade in the direction of the sign: < means shade the area to the left, > means shade the area to the right. See the handout on the web for more help. a.) 0.9978 b.) 0.0022 Note: if we add the last 2 together we’d get 1 since it would cover the entire curve. c.) 0.9515 d.) 0.95150.0022 = 0.9493 NOT 1.89 Since Z is continuous, it doesn’t matter whether we include the ‘line’ (=) or not. We are looking at areas under the curve and adding the width of a line (at the exact point, e.g., 2.25) doesn’t add anything to the area. a.) 0.0122 b.) 0.9878 c.) 0.0384 d.) 0.98780.0384 = 0.9494 1.99 a.) z = 1.625. The area to the left of 1.625 (probability of less than) = 0.0521. b.) z = 0.25 for 270 days. The area between z = 1.625 and z = 0.25 is 0.5466. c.) Longest 20% are 0.84 standard deviations above the mean or 279.4 days. The area to the right of 279.4 ( = 266 + 0.84*16) is 20%. Look up 0.20 in the body of the Z table and read off the z-score = 0.84, but since it’s the area to the right(longest), it’s the negative, or +0.84.