Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2.1 A. A B C 26% 56% 18% 13/50 28/50 9/50 B. C. D. 2.15 A 91 94 97 100 102 102 103 108 111 112 115 115 116 116 117 117 117 122 122 123 124 128 129 130 132 B The stem and leaf gives more information since the way it is presented it is easier to see where the bulk of values are and it is easier to see which values occur most often. The Stem and leaf is only a good choice for this problem since the values are concentrated, the more disperse the values the less effective the stem and leaf becomes. C 117 Gallons is most likely to be purchased since it is the mode. There are three values for 117 while the next closest number to be the mode only has two values. D Yes there is a concentration around the middle/upper bracket of the 11* category. However it is interesting to note that there are not values for 118,119,120,121 before we have a bulk of value in the range of 122-124. 2.19 A B 35 85 110 120 170 180 240 260 300 380 380 460 3 5 16 29 42 4 17 0 30 0 43 5 18 0 31 44 6 19 32 45 7 20 33 46 0 8 5 21 34 9 22 35 10 23 36 11 0 24 0 37 12 0 25 38 00 13 26 0 39 14 27 40 15 28 41 C In this situation the ordered array provides more value to the interpreter since there is no concentration to the values presented. Neither is really a good fit for this data though. A box and whisker plot or some other display option that can make better use at representing the low concentration of data and extreme outliers would be more appropriate. D According to the data, battery life has two occurrences at 380 and a max of one occurrence at other values. This is misleading though, if we were to cut our data set in half at the 26, so we have the range of 3-23 and 26-46(equal size ranges starting and ending with the extreme values) we would see that 8 of our values would be in the first subset of 3-23, while only 4 would be in the second subset of 26-46. 2.25 A. (1870-1522)/6=58 Insulator Force Frequency Distribution Force Applied Frequency Percentage 1580 but less than 1638 5 16.67% 1639 but less than 1696 6 20.00% 1696 but less than 1754 7 23.33% 1754 but less than 1812 8 26.67% 1812 but less than 1870 4 13.33% B. D. The insulators are of significant strength since not a single value was below the required breaking point. All but three of the insulators had breaking points in excess of 1600lbs. 3.1 A. N = 5 Sample Data set {7,4,9,8,2} A. Mean Sample mean is the sum of the values divided by number of values (PG 97) (7+4+9+8+2)/5= 6 Median The middle value in data set. Odd number of values median is middle number. Even amount of values, median is average of two closest values. Median = (N+1)/2 ranked value (PG 99) 2,4,7,8,9 (5+1)/2=3 which is the third value which is 7 Mode The value that appears most often in a data set, often there is no mode (PG100) {7,4,9,8,2} There is no mode for this data set B. Range The range is equal to the largest value minus the smallest value (PG105) 9-2 =7 First Quartile (PG101) Q1 = (n+1)/4 ranked value Third Quartile Q3 = 3(n+1)/4 ranked value Rules for Quartiles (PG102) Rule 1: if whole number, than quartile = to that ranked value Rule 2: if fractional half than quartile is average between two closest ranked values Rule 3: if fraction other than half, round to closest ranked value. Inter Quartile Range Also called Midspread is the difference between the third and first quartile. (PG106) {7,4,9,8,2} Q1=6/4=1.5 Average between 2 and 4 =3 Q3=18/4= 4.5 Average between 8 and 9 =8.5 Inter Quartile Range for this data set would be 8.5-3= 5.5 Sample Variance Is the sum of the squared differences around the mean divided by the sample size minus 1 (PG107) Data set {7,4,9,8,2} In Excel use the formula VAR for answer of 8.5 By hand Mean = 6 Sample size =5 (7-6)^2 + (4-6) ^2 + (9-6) ^2 + (8-6) ^2 + (2-6) ^2 (5-1) (1)^2 + (-2)^2 + (3)^2 + (2)^2 + (-4)^2 4 1+4+9+4+16 4 34 4 = 8.5 as variance Standard deviation The sample standard deviation is the square root of the sum of the squared differences around the mean divided by the sample size minus one. (PG 107) NOTE: To calculate take the square root of the variance. (8.5)^(1/2)=2.92 To calculate using excel, use the formula STDEV Coefficient of Variation The coefficient of variation is equal to the standard deviation divided by the mean, multiplied by 100% (PG 110) STD=2.92 =2.92/6*100% =48.67 C. Z Scores An extreme outlier or value is a value located far away from mean. A Z score is an outlier if it is +or – 3.0 away from the mean. Mean = 6 Value 7 Formula (7-6)/2.915476 0.342997 Outlier N 4 9 8 2 (4-6)/2.915477 (9-6)/2.915478 (8-6)/2.915479 (2-6)/2.915480 -0.68599 1.028991 0.685994 -1.37199 N N N N D. Shape of data set (PG 112,113) Symmetrical is a bell curve design Mean = Median Negative or left skewed, Mean < median Positive or Right Skewed, Mean > median In our data set of {7,4,9,8,2}, the mean is 6, and median is 7 so our curve will be negative or left skewed. 3.3 N=7, {12, 7, 4, 9, 0, 7, 3} A. Mean= 42/7=6 Median= (7+1)/2=4 {0, 3, 4, 7, 7, 9, 12} = 7 Mode=7 B. Range= 12-0=12 Q1= (7+1)/4=2 ranked value, which is 3 Q3=3(7+1)/4=6 ranked value, which is 9 Inter Quartile Range= 9-3=6 Variance= (0-6)^2+(3-6)^2+(4-6)^2+(7-6)^2+(7-6)^2+(9-6)^2+(12-6)^2 7-1 36+9+4+1+1+9+36 6 96 6 =16 Standard Deviation= 16^(1/2)=4 Coefficient of variation= 4/6*100= 66.67 C. Value 0 3 4 7 7 9 12 Z Scores Formula (0-6)/4 (3-6)/4 (4-6)/4 (7-6)/4 (7-6)/4 (9-6)/4 (12-6)/4 -1.5 -0.75 -0.5 0.25 0.25 0.75 1.5 Outlier N N N N N N N D. Shape of data set Mean = 6, Median = 7 Our curve will be negative or left skewed. 3.7 A. Mean 31+33.75+35.05+36.15+40.25+43=219.20, 219.20/6=$36.53 Median (6+1)/2 = 3.5 ranked value, 35.05 +36.15= 71.2/2=$35.60 Q1 (6+1)/4 = 1.75 ranked value = 33.75 (rounding rule) Q3 3(6+1)/4= 5.25 ranked value = 40.25 (rounding rule) B. Variance (31-36.53)^2+(33.75-36.53)^2+(35.05-36.53)^2+(36.15-36.53)^2+(40.25-36.53)^2+(43-36.53)^2 (6-1) (-5.53)^2+(-2.78)^2+(-1.48)^2+(-.38)^2+(3.72)^2+(6.46)^2 (5) 30.581 + 7.728 + 2.19 + .144 + 13.838 + 41.861 5 96.343/5 =19.269 Excel formula answer is 19.26867 STD =19.269^.5=4.390 Excel Formula answer is 4.389609 Range 43.00 – 31.00 = 12 Inter Quartile Range 40.25 – 33.75 = 6.5 Coefficient of Variation STD /mean*100% 4.390/36.54*100%= 12.014 C. IS the data skewed? If so How? Median =35.60, Mean =36.53 Since the mean is greater than the median the data is positively skewed or right skewed. D. Conclusions about going to movies based on information from part (a) and then conclusions based on information from part (b). The information in Part A of the question deals largely with the concentration and centralized nature of the data including mean, median, and quartile 1 and quartile 3. This information is relevant to what I might expect on average if I were to blindly go to the random movie theaters over a large enough time. The information in Part B deals with the explaining all the possible values I may encounter and their overall tendencies. 3.13 A. Money Market Accounts {4.55 4.50 4.40 4.38 4.38 } Mean:22.21/5=4.442 Variance: (4.55 – 4.442)^2 + (4.50 – 4.442)^2 + (4.40 – 4.442)^2 + (4.38 – 4.442)^2 + (4.38 – 4.442)^2 (5-1) (.108)^2 + (.058)^2 + (-.042)^2 + (-.062)^2 + (-.062)^2 (4) .01166+.00336+0.00176+.00384+.00384 (4) .024/4= .00612 Excel answer =.00612 Standard Deviation =.00612^.5=.07823 Q1= (n+1)/4 ranked value =6/4=1.5ranked value which is 4.38 Q3=3(n+1)/4 ranked value=18/4=4.5 ranked value which is (4.55+4.50)/2=4.525 Inter quartile Range =4.38-4.525= -0.145 Coefficient of Variation STD/mean*100% .07823/4.442*100 = 1.761144 One Year CD’s {4.94 4.90 4.85 4.85 4.85} Mean = 24.39/5=4.878 Variance (4.94 -4.878)^2+(4.90 -4.878)^2+(4.85 -4.878)^2+(4.85 -4.878)^2+(4.95 -4.878)^2 (5-1) =(0.062)^2+(0.022)^2+(-0.028)^2+(-0.028)^2+(-0.028)^2 (4) =.00384+.00048+.00078+.00078+.00078 (4) =.007/4=.00167 Excel answer=.00167 Standard Deviation =.00167^.5 =0.040866 Excel answer = 0.04086563 Difference in answers is in rounding Inter quartile Range Q1= (n+1)/4 ranked value =6/4=1.5ranked value which is 4.85 Q3=3(n+1)/4 ranked value=18/4=4.5 ranked value which is (4.94+4.90)/2=4.92 Inter quartile range = 4.92-4.85=.07 Coefficient of Variation STD/mean*100% =0.040866/4.878*100=0.837761 B. Money Market Accounts have more variation than One year CD’s. This can be seen in the variance, Ranges, and Standard deviation. For these figures of that show how disperse the data is the smaller the number the more condensed the data set. The Money market accounts numbers in these areas are all larger than the one year CD’s. 3.21 The following is a data set for a population with N=10 {7,5,11,8,3,6,2,1,9,8} A. Compute the population mean: The Population mean is the sum of all the values in the population divided by the population size of N (PG118). Sum of values is 60. Population mean is 60/10=6 B. Compute the population standard deviation. The population variance is the sum of the squared differences around the population mean divided by the population size. Differences between Population variance and Sample variance; -POP divide by N, Sample divide by n-1 -Population must use POP size (N) and POP mean (µ), sample must use sample size (X, suppose to be X with line over top but do not have the font) and sample mean Population Variance: =(1-6)^2+(2-6)^2+(3-6)^2+(5-6)^2+(6-6)^2+(7-6)^2+(8-6)^2+(8-6)^2+(9-6)^2+(11-6)^2 10 =(-5)^2+(-4)^2+(-3)^2+(-1)^2+(0)^2+(1)^2+(2)^2+(2)^2+(3)^2+(5)^2 10 =25+16+9+1+0+1+4+4+9+25 10 =94/10 =9.4 Population STD =9.4^.5 =3.065941943 EXCEL answer = 3.065941943 3.23 Solved Using Excel A. Mean =514/50=10.28 (thousands) Variance =VAR(A2:A51)= 4.182041 Standard Deviation = =STDEVP(A2:A51) = 2.024451 B. Value range 16.27335 14.2489 8.255549 6.231099 4.206648 Within 1 Within 2 Within 3 12.22445 Separate 32 15 3 Dev.’s 3 2 1 plus/minus -2 -3 Inclusive 32 47 50 Occurrences 2 8 32 7 1 % 0.64 0.94 1 C. The empirical rule (PG 120) 68% results should be within plus/minus 1 deviation 95% results should be within plus/minus 2 deviations 99.7% results should be within plus/minus 3 deviations These results are close to the empirical rule as outline in the chart below: Actual % Empirical Rule % .68 0.64 .95 0.94 .997 1 3.31 The Five Number Summary consists of: Smallest value, 1st Quartile, Median, 3rd Quartile, Largest number for a series (PG 123) Chart comparing relationships of these numbers to distribution of data is also found on PG123 A box and Whisker plot is a Graphical summary of the Five number summary. (PG 124) Important chart for applying box and whisker to distributions is figure 3.5 on pg 125 Data set ordered: Smallest Value : 4 , Largest Value :56 Median 4 5 7 8 16 19 19 20 20 23 24 25 29 29 30 30 30 30 40 56 Median: (23+24)/2 = 23.5 Q1 = (n+1)/4 =21/4=5.25 ranked value= 16 Q3 = 3(n+1)/4= 15.25 ranked value = 30 Computer Assignments sections. Throughout the homework I have checked and verified my answers using microsoft excel. I have found out that my answers are always the same as long as I perform the due dilligance to take my work out enough decimal places. As noted earlier in 3.1,3.3, 3.21 it takes a little over a page of computation of simple math to come up with the variance and standard deviation but with excel it only takes one simple line and a whole lot less time to check your work. In that page of computation every time I perform a calculation is a chance that I will perform a simple mistake and come up with a wrong answer. Doing work by hand is nice to have a good understanding of the process and expected results, however it greatly increases chances of coming up with an incorrect answer and takes an enormous amount of time that has an opportunity cost attached to it.