Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EXERCISES of CHAPTER (1)- (Descriptive Statistics) 1. The value of till 50 decimal places is given below: 3.14159265358979323846264338327950288419716939937510 a. Make a frequency table of the digits from 0 to 9 after the decimal point. Answer: The frequency table of the digits from 0 to 9 after the decimal point is as follow: Digits 0 1 2 3 4 5 6 7 8 9 Total Frequency 2 5 5 8 4 5 4 4 5 8 50 b. What are the most and the least frequently occurring digits? Answer: The most frequently occurring digits is 9, and the least frequently occurring digits is 0. 2. Consider the data shown in the following frequency table: Number of subjects in which student failed Frequency 3 1 2 0 Total 2 18 12 8 40 a. Represent them graphically using pie chart and bar chart. Answers: The graph of pie chart for the given data is: -1- The graph of bar chart for the given data is: b. Compute the range of this data. Answer: To calculate the range of the given data we must arrange the values of represent data, so we become the range of the given data equal to: R xm x1 3 0 3 c. Compute the mean of this data. Answer: The mean of the given data is: x m 1 f i fx i 1 i i 46 1.15 40 3. The following data give the results of a sample survey. The letters A, B and C represent three categories: A C C B B C A B C C B C C B C C C C B C A B C C B C B A C C a. Prepare a frequency table of this data. Answer: The frequency table of the given data is as follow: Digits A B C Total Frequency 4 9 17 30 b. Calculate the relative frequencies and percentages for all symbols. Answer: The relative frequencies and percentages for all symbols by the given data is as follow: Digits A B C Total Frequency 4 9 17 30 Relative Frequency 0.133 0.300 0.567 1 -2- Percentage 13.3 % 30.0 % 56.7 % 100% c. What percentage of the elements belongs to category B? Answer: The percentage of the elements belongs to category B is 30% d. Draw a bar chart and the pie chart for the frequency table. Answers: The graph of bar chart for the given data is: The graph of pie chart for the given data is: 4. A company manufactures car batteries of a particular type. The lives (in years) of 40 such batteries were recorded as follows: 2.6 3.5 2.5 3.5 3.0 2.3 4.4 3.2 3.7 3.2 3.4 3.9 3.2 3.4 3.3 3.2 2.05 3.8 2.9 3.2 4.1 3.5 3.2 4.05 3.0 4.3 3.1 3.7 4.4 3.7 2.8 3.4 4.45 2.9 3.5 3.2 3.8 3.6 4.2 2.6 Construct a frequency distribution table for this data, using classboundary intervals of size 0.5 starting from the classboundary 2 2.5 (the measure unite is 0.1) . Answer: The frequency distribution table for the given data is as follow: N. of C. 1 2 3 4 5 Total Class Limit 2.05 − 2.55 − 3.05 − 3.55 − 4.05 − 2.45 2.95 3.45 3.95 4.45 Class Boundaries Midpoint Frequency Relative Frequency Percentage 2 → 2.5 2.5 → 3.0 3.0 → 3.5 3.5 → 4.0 4.0 → 4.5 ------------ 2.25 2.75 3.25 3.75 4.25 ----- 2 6 14 11 7 40 0.05 0.15 0.35 0.275 0.175 1 5% 015% 35% 27.5% 17.5% 100 -3- A.C.F. 2 8 22 33 40 ------- 5. The distance (in km) of 40 engineers from their residence to their place of work were found as follows: 5 19 7 12 3 10 9 14 10 12 7 0.5 20 17 8 9 25 18 3 6 11 11 5 15 13 32 12 15 7 17 15 7 12 16 18 6 31 2 3 12 a. Construct a frequency distribution table with 5 classes for the data given above. Answer: For the construction of the frequency distribution table with 5 classes for the data given above we must calculate the range R of data. R x xs 32 0.5 31.5 Therefore, the length of each classboundary equals to: C= Class Limit 0.5 − 6 7 − 12.5 13.5 − 19 20 − 25.5 26.5 − 32 Total R 1 32.5 6.5 k 5 Class Boundaries Midpoint Frequency 0 → 6.5 6.5 → 13 13 → 19.5 19.5 → 26 26 → 32.5 ------------ 3.25 9.75 16.25 22.75 29.25 ------- 9 16 11 2 2 40 Relative Frequency 0.225 0.400 0.275 0.050 0.050 1 Percentage A.C.F. D.C.F. 22.5% 40.0% 27.5% 5.0% 5.0% 100 9 25 36 38 40 ----- 40 31 15 4 2 b. Draw the histogram for the data of above frequency distribution table. Answer: The histogram for the data of above frequency distribution table is as follow: c. Draw the polygon for the data of frequency distribution table. Answer: The polygon for the data of above frequency distribution table is as follow: -4- d. Draw the less and greater than ogive for the data of frequency distribution table. Answer: The less than ogive for the data of above frequency distribution table is as follow: The less than ogive for the data of above frequency distribution table is as follow: e. How many engineers have residence at distance less than 20 km from their workplace? Answer: The number of engineers, which have residence at distance less than 20 km from their workplace are nearly 36. -5- f. How many engineers have residence at distance more than 15 km from their workplace? Answer: The number of engineers, which have residence at distance more than 15 km from their workplace, are nearly 40-28 = 12. 6. Forty children were asked about the number of hours they watched TV programs in the previous week. The results were found as follows: 8 10 1 3 10 3 6 2 12 4 2 8 14 12 3 5 12 2 5 9 10 8 12 6 8 15 5 8 6 1 8 7 4 17 4 14 2 6 8 12 a. Construct a frequency distribution table for this data. Answer: The rang of given data equals to R 17 1 16 , and the number of classes is: k 3. 322 log n 5.322 5 So the classboundary length equals to C = 3.4. We will take C 3.5 . Therefore, the length of class limit equal to C 1 3.5 1 2.5 . -6- Class Limit Class Boundaries Class Midpoint Frequency Relative Frequency ACF DCF 1 − 3.5 4.5 − 7 8 − 10.5 11.5 − 14 15 − 17.5 Total 0.5 → 4 4 → 7.5 7.5 → 11 11 → 14.5 14.5 → 18 --------- 2.25 5.75 9.25 12.75 16.25 ------- 9 11 11 7 2 40 0.225 0.275 0.275 0.175 0.050 1 9 20 31 38 40 ------ 40 31 20 13 2 ------ b. Calculate the mean, median and mode for the given data and the frequency distribution table. What do you note? Answers: The mean of the given data is calculated as follow: n 1 n x x i 1 i 292 7.3 40 The mean of the frequency distribution table is calculated as follow: m 1 x f f x i i 1 i i 307 7.675 40 To calculate the median of the given data we must arrange the given data, so we get: x : x n x n 1 2 2 2 x 20 x 21 2 78 7.5 2 The median of the data in the frequency distribution table is calculated as follow: We have 1 fi 20. Therefore, the median class is 4 7.5 . So, the median for the data in 2 above frequency distribution table is: x 1 f F i L 2 f f C 4 3.5 7.5 20 20 11 11 The mode of the given data is x̂ 8 . To calculate the mode of the data in the frequency distribution table, we note that a unique mode exist, and the modal class is 4 7.5 or 7.5 11 . So, we have: a) for the modal class 4 7.5 : xˆ Lˆ d1 d1 d2 C4 2 3.5 7.5 20 Where are Lˆ 4, d1 0, d2 2 and C 3.5. b) for the modal class is 7.5 11 : xˆ Lˆ d1 d1 d2 C 7.5 0 3.5 7.5 04 Where we have Lˆ 7.5, d1 0, d2 4 and C 3.5. We note that: The value of the mean for raw data is deferent from the mean for the frequency distribution table. -7- The value of the median for raw data is not deferent from the median for the frequency distribution table. The value of the mode for raw data is not deferent from the mode for the frequency distribution table. c) Calculate the Pearson coefficient of skewness for the given data. Answer: To calculate the Pearson coefficient of skewness (SK ) for the given data we must calculate the standard deviation S. where we have S 4.189 . Therefore, we have: SK x xˆ S So, the distribution of data is left skewed. 7.3 8 0.167 0 4.189 7. A sample of 100 children was asked how many times they play computer games for a period of one week. The following frequency table gives their answers. Times of playing (by hours) Less than 2 2→5 5 → 10 10 → 18 More than 18 Number of children 23 40 28 6 3 a. Prepare the relative frequency and percentage columns. Answer: The relative frequency and percentage columns as follows: Times of playing (by hours) Number of children Relative Frequency Percentage Less than 2 2→5 5 → 10 10 → 18 More than 18 Total 23 40 28 6 3 ------------ 0.23 0.40 0.23 0.06 0.03 ------- 23 % 40 % 23 % 6% 3% 40 % b. What percentage of these children plays 10 hours or more weekly? Answer: The percentage of these children plays 10 or more hours weekly is (6+3) % = 9% c. Draw the bar chart of the given data. Answer: The bar chart of the given data has the following figure. -8- 8. Consider the following frequency distribution table, representing the weights of 36 students of a class: Weights (in kg) 40 → 50 50 → 60 60 → 70 70 → 80 80 → 90 Total Number of students 6 Relative Frequency Percentage A.C.F. 0.20 36% 44 6 50 a. Complete the above given frequency distribution table. Answer: The above given frequency distribution table will have the following form: Weights (in kg) Number of students Relative Frequency Percentage A.C.F. D.C.F. 40 → 50 50 → 60 60 → 70 70 → 80 80 → 90 Total 6 10 18 10 6 50 0.12 0.20 0.36 0.20 0.12 1 12% 20% 36% 20% 12% 100% 6 16 34 44 50 ------- 50 44 34 16 6 -------- b. Calculate mean, median and mode for the frequency distribution table. Answers: We have: The mean of the frequency distribution table is: x m 1 f i f x i 1 i i 3250 65 50 The median of the data in the frequency distribution table is calculated as follow: We have 1 fi 25. Therefore, the median class is 60 70 . So, the median for the data in 2 above frequency distribution table is: x 1 f F i 2 L f f C 60 -9- 25 36 18 18 10 63.89 To calculate the mode of the data in the frequency distribution table, we note that a unique mode exist, and the modal class is 60 70 . So, we have: xˆ Lˆ d1 d1 d2 C 60 8 10 65 88 c. Draw ACFP and DCFP for the frequency distribution table. Answers: The less than ogive (ACFP) for the frequency distribution table is as follow: The greater than ogive (DCFP) for the frequency distribution table is as follow: c. How many students have weights less than 70 Kg? Answer: The students, whose have weights less than 70 Kg equals to 34. e. Calculate the Pearson coefficient of skewness for the given data. Answer: To calculate the Pearson coefficient of skewness (SK ) for the given data we must calculate the standard deviation S, where we have S 11.78 . Therefore, we have: SK x xˆ S 65 65 0 11.78 So, the distribution of data is symmetric. 9. The following table gives the life times of 400 neon lamps: - 10 - N.o.C. 1 2 3 4 5 6 7 8 Lifetime (in hours) 200 → 300 300 → 400 400 → 500 500 → 600 600 → 700 700 → 800 800 → 900 900 → 1000 Number of lamps 14 56 60 76 64 52 40 38 a. Represent the given information (or data) by a histogram. Answer: The histogram for the data of above frequency distribution table is as follow: b. How many lamps have a life time of more than 700 hours? Answer: The number of lamps, which have a life time of more than 700 hours are: 52+40+38 =130 10. The following table gives the distribution of students of two sections according to the grades obtained by them: Grade F D C B A Section A Frequency 3 9 17 12 9 Section B Frequency 5 19 15 10 1 Represent the grades of the students of both the sections on the same graph by multiple bar charts. From the multiple bar charts compare the performance of the two sections. Answer: The by two frequency polygons for the data of above frequency table is as follow: - 11 - 11. The following frequency distribution represents marks of an examination of 50 students of a class: Class Limit Class Class Relative Frequency Boundaries Midpoint Frequency 2− 6 7 − 11 12 − 16 17 − 21 22 − 26 Total Ascending Cumulative Frequency (ACF) 6 0.24 36 0.12 8 50 Then: a. Complete the above frequency distribution table. Answer: The given frequency distribution table have the following form: Class Limit Class Boundaries Class Midpoint Frequency Relative Frequency A.C.F. 2− 6 7 − 11 12 − 16 17 − 21 22 − 26 Total 1.5 → 6.5 6.5 → 11.5 11.5 → 16.5 16.5 → 21.5 21.5 → 26.5 -------------- 4 9 14 19 24 ---------- 6 12 18 6 8 50 0.12 0.24 0.36 0.12 0.16 1 6 18 36 42 50 ----- b. Calculate the variance and standard deviation for the above frequency distribution table. Answers: Fist we will calculate the mean x of the given frequency distribution table. x m 1 f i f x i 1 i i 690 13.8 50 The variance of the given frequency distribution table given by the following relation. S2 m 1 fi 1 i 1 m f x i 1 i i x 2 Therefore, we become that the standard deviation S is: - 12 - 1848 37.71 49 S S 2 37.71 6.14 12. The points scored by a team in a series of matches are as follows: 17 10 2 24 7 48 27 10 15 8 5 7 14 18 8 28 Then: a. Calculate the mean and standard deviation of the given data. Answers: The mean of the given data is: x n 1 n x i 1 248 15.5 16 i To calculate the standard deviation, we must calculate the variance at first. We calculate the variance by the following relation. 1 n 1 S2 m x i 1 i x 2 2038 135.87 15 Therefore, we become that: S S 2 135.87 11.656 b. Calculate the standard score of the value (7) in the given data. Answer: The standard score of the value (7) in the given data is: xi x 7 15.5 zx S i z7 0.729 11.656 c. Calculate the coefficient of variation for the given data. Answer: The coefficient of variation for the given data is: CV S 11.656 100 100 75.2 % x 15.5 d. Calculate Q1 , Q2 , Q3 , LF and HF for the given data. Answers: To calculate the quartiles we must order the given data. So we have the order of data as follow: x2 5 x1 2 7 7 8 8 10 10 14 15 17 18 24 27 28 x16 48 Now, to calculate a quartile Qr we must determine q r the rank of this quartile, where we have: qr r (n 1) 4 k s ; k , s [0 , 1) So the rank q1 of the first quartile Q1 equals to: q1 (16 1) 4 4.25 Therefore, the first quartile Q1 equals to: Q1 7 0.25 (8 7) 7.25 The rank q 2 of the second quartile Q2 equals to: q2 2 (16 1) 4 - 13 - 8.5 Therefore, the second quartile Q2 equals to: Q2 10 0.5 (14 10) 12 The rank q 3 of the third quartile Q3 equals to: q3 3 (16 1) 4 12.75 Therefore, the third quartile Q3 equal to: Q3 18 0.75 (24 18) 22.5 For the lower fence LF we have: LF Q1 1.5(Q3 Q1 ) 7.25 1.5 (22.5 7.25) 15.625 For the higher fence HF we have: HF Q3 1.5(Q3 Q1 ) 22.5 1.5 (22.5 7.25) 45.375 d. Draw the Boxplot for the given data. Answer: The Boxplot for the given data is as follow: 13. Consider the following frequency distribution table: Class Class No. Boundaries 1 2→ 6 2 3 4 5 18 → 22 Total Class Relative Percentage Frequency Midpoint Frequency % 4 0.25 20 0.25 8 40 A.C.F 22 32 a. Complete the above frequency distribution table. Answer: The above frequency distribution table has the following form: Class No. Class Boundaries Class Midpoint 1 2 3 4 5 Total 2→ 6 6 → 10 10 → 14 14 → 18 18 → 22 --------- 4 8 12 16 20 ---------- Frequency 4 10 8 10 8 40 Relative Frequency 0.10 0.25 0.20 0.25 0.20 1 Percentage % 10 25 20 25 20 100 b. Draw the histogram, polygon and ACFP for the above frequency distribution table. - 14 - A.C.F 4 14 22 32 40 ------ Answers: The histogram for the above frequency distribution table has the following figure. The polygon for the above frequency distribution table has the following figure. The ACFP for the above frequency distribution table has the following figure. c. Calculate the mode(s) for the above frequency distribution table. Answer: We note that the given frequency distribution table have two modes. The first mode on the class boundary 6 10 . That means, the class boundary 6 10 is a modal class. Therefore, the first mode is: xˆ1 Lˆ1 d11 d11 d21 C1 6 Where we have C 1 4 and: L1 6 is the lower bound of the first modal class, - 15 - 6 4 9 62 d11 10 4 6 is the difference between the first modal class and previous class, d21 10 8 2 is the difference between the first modal class and subsequent class. The second mode on the class boundary 14 18 . That means, the class boundary 14 18 is a modal class. Therefore, the second mode is: xˆ2 Lˆ2 d21 d21 d22 C 2 14 2 4 16 22 Where we have C 2 4 and: L2 14 d21 is the lower bound of the second modal class, 10 8 2 is the difference between the second modal class and previous class, d22 10 8 2 is the difference between the second modal class and subsequent class. 14. Consider the marks obtained (out of 100 marks) by 50 students of class X of a school: 10 92 92 92 92 20 88 40 88 40 36 80 50 80 50 92 70 50 70 50 95 72 56 72 56 40 70 60 70 60 50 36 70 36 70 56 40 60 40 60 60 36 60 36 60 70 40 88 40 88 Then: a. Calculate P10 , P50 and P93 . Answers: To calculate a percentile Pr first we have to arrange the data. So we have the following ordered data: x1 10 36 40 50 56 60 70 70 88 92 x 2 20 36 40 50 56 60 70 72 88 92 36 40 40 50 60 60 70 72 88 92 36 40 40 50 60 60 70 80 88 x 49 92 36 40 50 56 60 70 70 80 92 x 50 95 Now, to calculate a percentile Pr we must determine pr the rank of this percentile, where we have: pr r (n 1) 4 k s ; k , s [0 , 1) Then we have Pr xk s (x k 1 x k ) for any r 1, 2, ..., 99 . So the rank p10 for the percentile P10 equals to: p10 10 (50 1) 100 5.1 Therefore, the percentile P10 equals to: P10 x 5 0.1(x 6 x 5 ) 36 0.1(36 36) 36 - 16 - To calculate the percentile P50 we must determine the rank p50 of this percentile, where we have: 50 (50 1) p50 100 25.5 Therefore, the percentile P50 equals to: P50 x 25 0.5 (x 26 x 25 ) 60 0.5 (60 60) 60 Finally, the rank p93 for the percentile P93 equals to: p93 93 (50 1) 100 47.43 Therefore, the percentile P93 equals to: P93 x 47 0.43 (x 48 x 47 ) 92 0.43 (92 92) 92 b. Calculate D3 , D5 and D8 . Answers: To calculate a decile Dr first we have to arrange the data. Second we must determine dr the rank of this decile Dr , where we have: dr r (n 1) 10 k s ; k , s [0 , 1) Then we have Dr xk s (xk 1 xk ) for any r 1, 2, ..., 10 . So we have: d3 3 (50 1) 10 15.3 Therefore, the decile D3 equals to: D3 x15 0.3 (x16 x15 ) 50 0.3 (50 50) 50 The rank d 5 of the decile D5 equals to: d5 5 (50 1) 10 25.5 Therefore, the decile D5 equals to: D5 x 25 0.5 (x 26 x 25 ) 60 0.5 (60 60) 60 Finally, the rank of the decile D8 equals to: d8 8 (50 1) 10 40.8 Therefore, the decile D8 equals to: D8 x 40 0.8 (x 41 x 40 ) 80 0.8 (88 80) 86.4 c. Calculate Q1 , Q2 and Q3 . Answers: To calculate a quartile Qr first we have to arrange the data. Second we must determine a quartile Qr the rank of this decile Qr , where we have: - 17 - qr r (n 1) k s ; k 4 , s [0 , 1) Then we have Qr xk s (xk 1 xk ) for any r 1, 2, 3 . So the rank q1 of the first quartile Q1 equals to: q1 (50 1) 4 12.75 Therefore, the first quartile Q1 equals to: Q1 x12 0.75 (x13 x12 ) 40 0.75 (40 40) 40 The rank q 2 of the second quartile Q2 equals to: 2 (50 1) q2 25.5 4 Therefore, the second quartile Q2 equals to: Q2 x 25 0.5 (x 26 x 25 ) 60 0.5 (60 60) 60 Finally, the rank q 3 of the third quartile Q3 equals to: q3 3 (50 1) 4 38.25 Therefore, the third quartile Q3 equals to: Q3 x 38 0.75 (x 39 x 38 ) 72 0.25 (80 72) 74 d. chick if the given data have extreme values or not? And draw the Boxplot of them. Answers: To chick if the given data have extreme values or not we must calculate the lower fence (LF) and the higher fence (HF). For the lower fence LF we have: LF Q1 1.5(Q3 Q1 ) 40 1.5 (74 40) 11 This means that there is no small extreme value. For the higher fence HF we have: HF Q3 1.5(Q3 Q1 ) 74 1.5 (74 40) 125 This means that there is no large extreme value. So, we have not extreme values. The Boxplot for the given data is as follow: 15. The daily sale of sugar (in Kg) in a certain grocery shop is given below: - 18 - Monday Tuesday Wednesday Thursday Friday Saturday 75 120 12 50 70.5 140.5 a. Calculate the average daily sale of sugar. Answer: The average daily sale of sugar equals to the mean of the given data. Therefore, the average daily sale of sugar is: x n 1 n x i 1 i 468 78 6 . b. Calculate the variance and the standard deviation of the above data. Answer: Since the value of mean is known, we will calculate the variance by the relation S2 1 n 1 m x i 1 i x 2 . So we have: S2 1 n 1 m x i 1 i x 2 10875.5 2175.1 5 Therefore, the standard deviation of the above data equals to: S S 2 2175.1 46.64 c. Determine the coefficient of variation. Answer: The coefficient of variation for the given data equals to: CV S 46.64 100 x 78 100 59.79 % 16. Let the following data be marks obtained (out of 100) by 10 students in a test: 45 45 63 Then: a. Calculate Q1 , Q2 and Q3 . 76 67 84 75 48 62 65 Answers: To calculate a quartile Qr first we have to arrange the data. So we have the following ordered data: x1 45 45 48 62 63 65 67 75 76 x10 84 Now, to calculate a quartile Qr we must determine q r the rank of this quartile, where we have: qr r (n 1) 4 k s ; k , s [0 , 1) Then we have Qr xk s (xk 1 xk ) for any r 1, 2, 3 . So the rank q1 of the first quartile Q1 equals to: q1 (10 1) 4 2.75 Therefore, the first quartile Q1 equals to: Q1 x 2 0.75 (x 3 x 2 ) 45 0.75 (48 45) 47.25 The rank q 2 of the second quartile Q2 equals to: - 19 - 2 (10 1) q2 4 5.5 Therefore, the second quartile Q2 equals to: Q2 x 5 0.5 (x 6 x 5 ) 63 0.5 (65 63) 64 Finally, the rank q 3 of the third quartile Q3 equals to: q3 3 (10 1) 4 8.25 Therefore, the third quartile Q3 equals to: Q3 x 8 0.25 (x 9 x 8 ) 75 0.25 (76 75) 75.25 b. Calculate the IQR. Answer: The interquartile range IQR Q3 Q1 equals to: IQR 75.25 46.5 28.75 c. Have the given data got extreme values? Answers: In order to check for extreme values in the given data we must calculate the lower fence (LF) and the higher fence (HF). For the lower fence LF we have: LF Q1 1.5 (Q3 Q1 ) 46.5 1.5 28.75 3.375 This means that there is no small extreme value. For the higher fence HF we have: HF Q3 1.5 (Q3 Q1 ) 75.25 1.5 28.75 118.375 This means that there is no large extreme value. Therefore, the given data have not extreme values. d. Construct the box plot for the given data. Answer: The box plot for the given data is as follow: 17. Consider the following data: 65 70 70 10 20 40 60 65 85 90 90 150 75 75 75 80 Then: a. Calculate Q1 , Q2 and Q3 . - 20 - Answers: To calculate a quartile Qr first we have to arrange the data. So we have the following ordered data: x1 10 75 x 2 20 75 40 75 60 80 85 65 90 65 90 70 70 x16 150 Now, to calculate a quartile Qr we must determine q r the rank of this quartile, where we have: qr r (n 1) k s ; k 4 , s [0 , 1) Then we have Qr xk s (xk 1 xk ) for any r 1, 2, 3 . So the rank q1 of the first quartile Q1 equals to: (16 1) q1 4 4.25 Therefore, the first quartile Q1 equals to: Q1 x 4 0.25 (x 5 x 4 ) 60 0.25 (65 60) 61.25 The rank q 2 of the second quartile Q2 equals to: 2 (16 1) q2 4 8.5 Therefore, first quartile Q2 equals to: Q2 x 8 0.5 (x 9 x 8 ) 70 0.5 (75 70) 72.5 Finally, the rank q 3 of the third quartile Q3 equals to: q3 3 (16 1) 4 12.75 Therefore, first quartile Q3 equals to: Q3 x12 0.75 (x13 x12 ) 80 0.75 (85 80) 83.75 b. Calculate the IQR. Answer: The interquartile range IQR equal to: IQR Q3 Q1 83.75 61.25 22.5 c. Have the given data extreme values? Answers: In order to check for extreme values in the given data we must calculate the lower fence (LF) and the higher fence (HF). For the lower fence LF we have: LF Q1 1.5 (Q3 Q1 ) 61.25 1.5 22.5 27.5 This means that the values xs 10 and x 20 are extreme values. As well we have for the higher fence HF: HF Q3 1.5 (Q3 Q1 ) 83.75 1.5 22.5 117.5 This means that the value x 150 is an extreme value. - 21 - d. Construct the box plot for the given data. Answer: The boxplot for the given data is as follow: e. Comment the skewness of these distribution data. Answer: We note that the distribution of the data is rounded to the right by a small amount due to the extreme value, which is far from the data. 18. Consider the following ordered data: 40 45 55 65 ? ? 75 75 78 183 Then use the suitable measure to calculate the central tendency and dispersion for the given data. Answers: The measures of the central tendency are the mean, median and mode. Because of the loss of two values in the middle we cannot use the mean and median, therefore, we try by using the mode. We note that a one mode only exist, it is x̂ 75 . Because of the loss of two values in the middle we cannot use the standard deviation as measure for the dispersion. Therefore, we can use the range or the interquartile range for the dispersion. But we note that the value x10 183 is an extreme value, so we will use the interquartile range as measure of dispersion. The interquartile range is IQR Q3 Q1 . Therefore, to calculate the interquartile range, we must calculate the first and third quartiles. Now, to calculate a quartile Qr we must determine q r the rank of this quartile, where we have: qr r (n 1) 4 k s ; k , s [0 , 1) Then we have Qr xk s (xk 1 xk ) for any r 1, 2, 3 . So the rank q1 of the first quartile Q1 equals to: q1 (10 1) 4 2.75 Therefore, the first quartile Q1 equals to: Q1 x 2 0.75 (x 3 x 2 ) 45 0.75 (55 45) 52.5 The rank of Q3 is: q3 3 (10 1) 4 8.25 Therefore, the third quartile Q3 equals to: Q3 x 8 0.25 (x 9 x 8 ) 75 0.25 (78 75) 75.75 So we get: - 22 - IQR Q3 Q1 75.75 52.5 23.25 19. Consider the following ordered data: -15 20 40 50 65 65 70 73 75 137 a. Have the given data got extreme values? Answer: In order to check the extreme values in the data we have to calculate the lower fence LF and the higher fence HF: HF Q3 1.5 (Q3 Q1 ) and LF Q1 1.5 (Q3 Q1 ) Therefore, we must calculate the first and third quartiles for the given data. Now, to calculate a quartile Qr we must determine q r the rank of this quartile, where we have: qr r (n 1) 4 k s ; k , s [0 , 1) Then we have Qr xk s (xk 1 xk ) for any r 1, 2, 3 . So the rank q1 of the first quartile Q1 equals to: q1 (10 1) 4 2.75 Therefore, the first quartile Q1 equals to: Q1 x 2 0.75 (x 3 x 2 ) 20 0.75 (40 20) 35 The rank q 3 of the first quartile Q3 equals to: q3 3 (10 1) 8.25 4 Therefore, the third quartile Q3 equals to: Q3 x 8 0.25 (x 9 x 8 ) 73 0.25 (75 73) 73.5 We become Q3 Q1 73.5 35 38.5 . So we get: The higher fence HF equal to: HF Q3 1.5 (Q3 Q1 ) 73.5 1.5 38.5 131.25 This means that the value x 137 is an extreme value. The lower fence HF equals to: LF Q1 1.5 (Q3 Q1 ) 35 1.5 38.5 22.75 This means that there is no small extreme value. b. Use the suitable measure to calculate the central tendency and dispersion for the given data. Answers: Although 137 is extreme value, but the suitable measure for the central tendency of the given data is the mean. Where we have: x 1 n n x i 1 i When we calculate the median, we find: - 23 - 580 58 10 x x5 x6 2 65 65 65 2 Here we note the mean is better than the median for the central tendency measure for these data. The suitable measure of dispersion for the given data is the interquartile range, because we have an extreme value. Where we have: IQR Q3 Q1 73.5 35 38.5 20. The mean age of six persons is 49 years. The ages of five of these six persons are 55, 39, 44, 51, and 45 years respectively. What is the age of the sixth person? Answer: We spouse that the age of the sixth person is x, then we have: 49 = x = Therefore, we get: 55 + 39 + 44 + 51+ 45 + x 6 x =294 234=60 Also, the age of the sixth person is 60 years. 21. The following observations have been arranged in ascending order. 29, 32, 48, 50, x, x + 2, 72, 78, 84, 95 Now, if the median of the data is 63, then calculate the value of x. Answer: Since the number of data is even, then the median value is given in the following relation: 63 x x5 x6 2 So we get x 62 . x (x 2) x 1 2 22. Consider the following two data sets (we will assume that they are degrees of students in the 60-degree test). Data set X: 8 12 25 37 40 Data set Y: 23 27 40 52 55 Notice that each value of the second data set is obtained by adding 15 to the corresponding value of the first data set. Then: a. Calculate the mean for each of these two data sets. Comment on the relationship between the two means. Answers: The mean of data set X is: x 1 n n x i 1 i 122 24.4 5 The mean of data set Y is: y 1 n n y i 1 i 197 39.4.4 5 So we find that y x 15 also. This means that if we perform a constant withdrawal of all data, the average value will change by this constant. - 24 - b. Calculate the standard deviation for each of these two data sets. Comment on the relationship between the two standard deviations. Answers: The variance for data set X is: 1 n 1 S X2 m x i 1 i x 2 825.2 206.3 4 Therefore, we become that: S X S X2 206.3 14.36315 The variance for data set Y is: 1 n 1 SY2 m y i 1 i y 2 825.2 206.3 4 Therefore, we become that: SY SY2 206.3 14.36315 So we find that S X SY . Important result: This means that if we perform a constant withdrawal of all data, the average value will change by this constant but the standard deviation value will not change. Green points are the means c. Calculate the standard score of the value (40) in data sets X and Y. Answer: The z-score for data set X is: x x z 40,X SX 40 24.4 1.78234 14.36315 The z-score for data set Y is: z 40,Y y y 40 39.4 0.04177 SY 14.36315 So we find that z 40,Y z 40,X . This means that if we perform a constant withdrawal of all data, the zscore value will change but not with this constant. On the other hand, this result means that the student which has 40 in set X best level than the student which has 40 in set Y. d. Calculate the coefficient of variation for each of these two data sets, then compare them. Answers: The coefficient of variation for data set X is: CVX SX 100 14.36315 100 58.87 % 24.4 100 14.36315 100 36.455 % 39.4 x The coefficient of variation for data set Y is: CVY SY y So we find that CVY CVX and CVY CVX 15 . This means that if we perform a constant withdrawal of all data, the coefficient of variation value will change but not with this constant. - 25 - 23. Consider the following three data sets. Data set X: 5, 10, 15, 20, 25 Data set Y: 10, 20, 30, 40, 50 Data set Z: 20, 40, 60, 80, 100 a. Calculate the mean for each of these data sets. Comment on the relationship between the three means. Answers: The mean of data set X is: x 1 n n x i 1 i 75 15 5 The mean of data set Y is: y 1 n n i 150 30 5 i 300 60 5 y i 1 The mean of data set Z is: z 1 n n z i 1 So we find that z 2y 4 x also. This means that if we rotate the data at a fixed base, the mean will be rotate with the same fixed base. b. Calculate the standard deviation for each of these data sets. Comment on the relationship between the two standard deviations. Answers: The variance for data set X is: S X2 1 n 1 m x i 1 i x 2 251.8 62.95 4 Therefore, we become that: S X S X2 62.95 7.905694 The variance for data set Y is: SY2 1 n 1 m y i 1 i y 2 1000 250 4 Therefore, we become that: SY SY2 250 15.81139 The variance for data set Z is: SZ2 1 n 1 m z i 1 i z 2 8500 2125 4 Therefore, we become that: SZ SZ2 2125 46.09772 So we find that SY2 4 S X2 and therefore, we get SY 2 S X . Important result: This means that if we rotate data at a fixed base, the standard deviation will be rotate with the same fixed base. c. Calculate the standard score of the value (20) in data set Y. - 26 - Answer: The z-score for data set Y is: y y z 20,Y SY 20 30 0.63256 15.81139 d. Calculate the coefficient of variation for each of these data sets, then compare them. Answers: The coefficient of variation for data set X is: CVX SX 100 7.905694 100 52.70 % 15 100 15.81139 100 52.70 % 30 x The coefficient of variation for data set Y is: CVY SY y The coefficient of variation for data set Z is: SZ 31.62278 100 52.70 % z 60 Therefore, we find that CVY CVX and CVY 2CVX . This means that if we rotate the data at a fixed CVZ 100 base, the coefficient of variation value will not change. So we find that CVZ CVY CVX and CVZ 2 CVY 4 CVX . 24. Consider the data set 15, 15, 15, 15, 15, 15. Then: a. Calculate the standard deviation. Answer: To calculate the variance for data set X we must first calculate the mean of data set X, where we have: x 1 n n x i 1 i 65 5 6 The variance for data set X is: S2 1 n 1 m x i 1 i x 2 0 0 5 Therefore, we become that: S S2 0 0 b. Is its value of the standard deviation equal to zero? If yes, why? Answers: Yes, the standard deviation equal to zero, because all values are equal to 5. Therefore, there are no values scatter around this value. 25. Consider the following polygon of grouped data, representing the degree of an examination of 60 students: - 27 - Then: a. Prepare the frequency distribution table of this data. Answer: The frequency distribution table of this data is as follow: Class Limit 3-5 Class Boundaries 2.5→5.5 Class Midpoint 4 Frequency Percent Frequency 8.33 % ACF DCF 5 Relative Frequency 0.0833 5 60 6-8 5.5→8.5 7 10 0.16667 16.67% 15 55 9 - 11 8.5→11.5 10 15 0.25000 25.00% 30 45 12 - 14 11.5→14.5 13 20 0.33333 33.33% 50 30 15 - 17 14.5→17.5 16 10 0.16667 16.67% 60 10 Total ---------- ------ 60 1 100 % ----- ---- b. Draw the histogram, ascending cumulative frequency polygon (less than ogive) and descending cumulative frequency polygon (greater than ogive) for this table. Answers: The histogram for this table is as follow: The ogive for this table is as follow: - 28 - The ascending cumulative frequency polygon The descending cumulative frequency polygon c. Calculate the mean, median and mod for this table. Answers: The mean of the given frequency distribution table is: x m 1 f f x i i 1 i i 660 11 60 For the median: The median class is 8.5 11.5 , and we have 1 fi 30. So, the median for the data in above 2 table is: x 1 f F i 2 : L f f C 8.5 30 30 15 15 3 11.5 For the mode: The given frequency distribution table has a unique mode, and the modal class is 11.5 14.5 . So we have: xˆ Lˆ d1 d1 d2 C 11.5 5 3 12.5 5 10 26. Consider the following histogram of grouped data, representing the temperatures in 50 cities in Europe: - 29 - Then: a. Prepare the frequency distribution table of this data. The frequency distribution table of this data is as follow: Class Limit 1-5 Class Boundaries 0.5→5.5 Class Midpoint 3 Frequency Percent Frequency 14 % ACF DCF 7 Relative Frequency 0.14 7 50 6 - 10 5.5→10.5 8 9 0.18 18 % 16 43 11- 15 10.5→15.5 13 14 0.28 28 % 30 34 16 - 20 15.5→20.5 18 12 0.24 24 % 42 20 21 - 25 20.5→25.5 23 8 0.16 16 % 50 8 Total ---------- -------- 50 1 100 % ----- ----- b. Draw the polygon and ACFP for this table. Answer: The polygon for this table has the following figure: The ascending cumulative frequency polygon (ACFP) for this table has the following figure: - 30 - c. Calculate the standard deviation for this table. Answer: To calculate the standard deviation, we must calculate the variance for the given data, Therefore, first we will calculate the mean for the given data, where we have: x m 1 f x f i i 1 i i 675 13.5 50 So, the variance for the given data is: S2 1 fi 1 i 1 m m f x i 1 i i x 2 251.25 5.128 49 Therefore, the standard deviation equals of the given data equals to: S S 2 5.128 2.2644 27. Consider the following ACFP of grouped data, representing the weight to 30 fruit boxes: a. Prepare the frequency distribution table of this data. Answer: The frequency distribution table of this data is as follow: - 31 - Class Limit Class Midpoint 7.5 Frequency 5.5 – 9.5 Class Boundaries 5 → 10 Percent Frequency 12.5% ACF DCF 5 Relative Frequency 0.125 5 40 10.5 – 14.5 10 → 15 12.5 15 0.375 37.5% 20 35 15.5 – 19.5 15 → 20 17.5 5 0.125 12.5% 25 20 20.5 – 24.5 20 → 25 22.5 10 0.250 25% 35 15 25.5 – 29.5 25 → 30 27.5 5 0.125 12.5% 40 5 Total ---------- ------ 40 1 100 % ----- ---- b. Draw the histogram and polygon for this table. Answers: The histogram for this table is as follow: The polygon for this table is as follow: - 32 -