Download Statistics and Probability Chapter 1 Questions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
EXERCISES of CHAPTER (1)- (Descriptive Statistics)
1. The value of  till 50 decimal places is given below:
3.14159265358979323846264338327950288419716939937510
a. Make a frequency table of the digits from 0 to 9 after the decimal point.
Answer: The frequency table of the digits from 0 to 9 after the decimal point is as follow:
Digits
0
1
2
3
4
5
6
7
8
9
Total
Frequency
2
5
5
8
4
5
4
4
5
8
50
b. What are the most and the least frequently occurring digits?
Answer: The most frequently occurring digits is 9, and the least frequently occurring digits is 0.
2. Consider the data shown in the following frequency table:
Number of subjects in
which student failed
Frequency
3
1
2
0
Total
2
18
12
8
40
a. Represent them graphically using pie chart and bar chart.
Answers: The graph of pie chart for the given data is:
-1-
The graph of bar chart for the given data is:
b. Compute the range of this data.
Answer: To calculate the range of the given data we must arrange the values of represent data, so
we become the range of the given data equal to:
R  xm  x1  3  0  3
c. Compute the mean of this data.
Answer: The mean of the given data is:
x 
m
1
f
i
fx
i 1
i i

46
 1.15
40
3. The following data give the results of a sample survey. The letters A, B and C represent three
categories:
A
C
C
B
B
C
A
B
C
C
B
C
C
B
C
C
C
C
B
C
A
B
C
C
B
C
B
A
C
C
a. Prepare a frequency table of this data.
Answer: The frequency table of the given data is as follow:
Digits
A
B
C
Total
Frequency
4
9
17
30
b. Calculate the relative frequencies and percentages for all symbols.
Answer: The relative frequencies and percentages for all symbols by the given data is as follow:
Digits
A
B
C
Total
Frequency
4
9
17
30
Relative Frequency
0.133
0.300
0.567
1
-2-
Percentage
13.3 %
30.0 %
56.7 %
100%
c. What percentage of the elements belongs to category B?
Answer: The percentage of the elements belongs to category B is 30%
d.
Draw a bar chart and the pie chart for the frequency table.
Answers: The graph of bar chart for the given data is:
The graph of pie chart for the given data is:
4. A company manufactures car batteries of a particular type. The lives (in years) of 40 such
batteries were recorded as follows:
2.6
3.5
2.5
3.5
3.0
2.3
4.4
3.2
3.7
3.2
3.4
3.9
3.2
3.4
3.3
3.2
2.05
3.8
2.9
3.2
4.1 3.5
3.2 4.05
3.0 4.3
3.1 3.7
4.4
3.7
2.8
3.4
4.45
2.9
3.5
3.2
3.8
3.6
4.2
2.6
Construct a frequency distribution table for this data, using classboundary intervals of size 0.5
starting from the classboundary 2  2.5 (the measure unite is 0.1) .
Answer: The frequency distribution table for the given data is as follow:
N. of C.
1
2
3
4
5
Total
Class Limit
2.05 −
2.55 −
3.05 −
3.55 −
4.05 −
2.45
2.95
3.45
3.95
4.45
Class Boundaries
Midpoint
Frequency
Relative Frequency
Percentage
2 → 2.5
2.5 → 3.0
3.0 → 3.5
3.5 → 4.0
4.0 → 4.5
------------
2.25
2.75
3.25
3.75
4.25
-----
2
6
14
11
7
40
0.05
0.15
0.35
0.275
0.175
1
5%
015%
35%
27.5%
17.5%
100
-3-
A.C.F.
2
8
22
33
40
-------
5. The distance (in km) of 40 engineers from their residence to their place of work were found as
follows:
5
19
7
12
3
10
9
14
10
12
7
0.5
20
17
8
9
25
18
3
6
11
11
5
15
13
32
12
15
7
17
15
7
12
16
18
6
31
2
3
12
a. Construct a frequency distribution table with 5 classes for the data given above.
Answer: For the construction of the frequency distribution table with 5 classes for the data given
above we must calculate the range R of data.
R  x  xs  32  0.5  31.5
Therefore, the length of each classboundary equals to:
C=
Class Limit
0.5 − 6
7 − 12.5
13.5 − 19
20 − 25.5
26.5 − 32
Total
R  1 32.5

 6.5
k
5
Class Boundaries
Midpoint
Frequency
0 → 6.5
6.5 → 13
13 → 19.5
19.5 → 26
26 → 32.5
------------
3.25
9.75
16.25
22.75
29.25
-------
9
16
11
2
2
40
Relative Frequency
0.225
0.400
0.275
0.050
0.050
1
Percentage
A.C.F.
D.C.F.
22.5%
40.0%
27.5%
5.0%
5.0%
100
9
25
36
38
40
-----
40
31
15
4
2
b. Draw the histogram for the data of above frequency distribution table.
Answer: The histogram for the data of above frequency distribution table is as follow:
c. Draw the polygon for the data of frequency distribution table.
Answer: The polygon for the data of above frequency distribution table is as follow:
-4-
d. Draw the less and greater than ogive for the data of frequency distribution table.
Answer: The less than ogive for the data of above frequency distribution table is as follow:
The less than ogive for the data of above frequency distribution table is as follow:
e. How many engineers have residence at distance less than 20 km from their workplace?
Answer: The number of engineers, which have residence at distance less than 20 km from their
workplace are nearly 36.
-5-
f. How many engineers have residence at distance more than 15 km from their workplace?
Answer: The number of engineers, which have residence at distance more than 15 km from their
workplace, are nearly 40-28 = 12.
6. Forty children were asked about the number of hours they watched TV programs in the previous
week. The results were found as follows:
8
10
1
3
10
3
6
2
12
4
2
8
14
12
3
5
12
2
5
9
10
8
12
6
8
15
5
8
6
1
8
7
4
17
4
14
2
6
8
12
a. Construct a frequency distribution table for this data.
Answer: The rang of given data equals to R  17  1  16 , and the number of classes is:
k  3. 322 log n   5.322   5


So the classboundary length equals to C = 3.4. We will take C  3.5 . Therefore, the length of
class limit equal to C 1  3.5  1  2.5 .
-6-
Class Limit
Class Boundaries
Class Midpoint
Frequency
Relative Frequency
ACF
DCF
1 − 3.5
4.5 − 7
8 − 10.5
11.5 − 14
15 − 17.5
Total
0.5 → 4
4 → 7.5
7.5 → 11
11 → 14.5
14.5 → 18
---------
2.25
5.75
9.25
12.75
16.25
-------
9
11
11
7
2
40
0.225
0.275
0.275
0.175
0.050
1
9
20
31
38
40
------
40
31
20
13
2
------
b. Calculate the mean, median and mode for the given data and the frequency distribution
table. What do you note?
Answers: The mean of the given data is calculated as follow:
n
1
n
x
x
i 1

i
292
 7.3
40
The mean of the frequency distribution table is calculated as follow:
m
1
x
f
f x
i
i 1
i
i

307
 7.675
40
To calculate the median of the given data we must arrange the given data, so we get:
x :
x n  x n 1
2
2
2

x 20  x 21
2

78
 7.5
2
The median of the data in the frequency distribution table is calculated as follow:
We have 1  fi  20. Therefore, the median class is 4  7.5 . So, the median for the data in
2
above frequency distribution table is:
x

1 f  F
i
L 2
f
f
C  4 

  3.5  7.5
20  20  11
11
The mode of the given data is x̂  8 .
To calculate the mode of the data in the frequency distribution table, we note that a unique
mode exist, and the modal class is 4  7.5 or 7.5  11 . So, we have:
a) for the modal class 4  7.5 :
xˆ  Lˆ 
d1
d1  d2
C4
2
 3.5  7.5
20
Where are Lˆ  4, d1  0, d2  2 and C  3.5.
b) for the modal class is 7.5  11 :
xˆ  Lˆ 
d1
d1  d2
C  7.5 
0
 3.5  7.5
04
Where we have Lˆ  7.5, d1  0, d2  4 and C  3.5.
We note that:
The value of the mean for raw data is deferent from the mean for the frequency distribution
table.
-7-
The value of the median for raw data is not deferent from the median for the frequency
distribution table.
The value of the mode for raw data is not deferent from the mode for the frequency
distribution table.
c) Calculate the Pearson coefficient of skewness for the given data.
Answer: To calculate the Pearson coefficient of skewness (SK ) for the given data we must
calculate the standard deviation S. where we have S  4.189 . Therefore, we have:
SK 
x  xˆ
S
So, the distribution of data is left skewed.

7.3  8
  0.167  0
4.189
7. A sample of 100 children was asked how many times they play computer games for a period of
one week. The following frequency table gives their answers.
Times of playing
(by hours)
Less than 2
2→5
5 → 10
10 → 18
More than 18
Number of
children
23
40
28
6
3
a. Prepare the relative frequency and percentage columns.
Answer: The relative frequency and percentage columns as follows:
Times of playing
(by hours)
Number of children
Relative
Frequency
Percentage
Less than 2
2→5
5 → 10
10 → 18
More than 18
Total
23
40
28
6
3
------------
0.23
0.40
0.23
0.06
0.03
-------
23 %
40 %
23 %
6%
3%
40 %
b. What percentage of these children plays 10 hours or more weekly?
Answer: The percentage of these children plays 10 or more hours weekly is (6+3) % = 9%
c. Draw the bar chart of the given data.
Answer: The bar chart of the given data has the following figure.
-8-
8. Consider the following frequency distribution table, representing the weights of 36 students of
a class:
Weights
(in kg)
40 → 50
50 → 60
60 → 70
70 → 80
80 → 90
Total
Number of
students
6
Relative
Frequency
Percentage
A.C.F.
0.20
36%
44
6
50
a. Complete the above given frequency distribution table.
Answer: The above given frequency distribution table will have the following form:
Weights (in kg)
Number of students
Relative Frequency
Percentage
A.C.F.
D.C.F.
40 → 50
50 → 60
60 → 70
70 → 80
80 → 90
Total
6
10
18
10
6
50
0.12
0.20
0.36
0.20
0.12
1
12%
20%
36%
20%
12%
100%
6
16
34
44
50
-------
50
44
34
16
6
--------
b. Calculate mean, median and mode for the frequency distribution table.
Answers: We have:
The mean of the frequency distribution table is:
x 
m
1
f
i
f x
i 1
i
i

3250
 65
50
The median of the data in the frequency distribution table is calculated as follow:
We have 1  fi  25. Therefore, the median class is 60  70 . So, the median for the data in
2
above frequency distribution table is:
x

1 f  F
i
2
L
f
f
 C  60 
-9-

25  36  18
18
  10  63.89
To calculate the mode of the data in the frequency distribution table, we note that a unique
mode exist, and the modal class is 60  70 . So, we have:
xˆ  Lˆ 
d1
d1  d2
C  60 
8
 10  65
88
c. Draw ACFP and DCFP for the frequency distribution table.
Answers: The less than ogive (ACFP) for the frequency distribution table is as follow:
The greater than ogive (DCFP) for the frequency distribution table is as follow:
c. How many students have weights less than 70 Kg?
Answer: The students, whose have weights less than 70 Kg equals to 34.
e. Calculate the Pearson coefficient of skewness for the given data.
Answer: To calculate the Pearson coefficient of skewness (SK ) for the given data we must
calculate the standard deviation S, where we have S  11.78 . Therefore, we have:
SK 
x  xˆ
S

65  65
0
11.78
So, the distribution of data is symmetric.
9. The following table gives the life times of 400 neon lamps:
- 10 -
N.o.C.
1
2
3
4
5
6
7
8
Lifetime (in hours)
200 → 300
300 → 400
400 → 500
500 → 600
600 → 700
700 → 800
800 → 900
900 → 1000
Number of lamps
14
56
60
76
64
52
40
38
a. Represent the given information (or data) by a histogram.
Answer: The histogram for the data of above frequency distribution table is as follow:
b. How many lamps have a life time of more than 700 hours?
Answer: The number of lamps, which have a life time of more than 700 hours are:
52+40+38 =130
10. The following table gives the distribution of students of two sections according to the grades
obtained by them:
Grade
F
D
C
B
A
Section A
Frequency
3
9
17
12
9
Section B
Frequency
5
19
15
10
1
Represent the grades of the students of both the sections on the same graph by multiple bar charts.
From the multiple bar charts compare the performance of the two sections.
Answer: The by two frequency polygons for the data of above frequency table is as follow:
- 11 -
11. The following frequency distribution represents marks of an examination of 50 students of a
class:
Class
Limit
Class
Class
Relative
Frequency
Boundaries Midpoint
Frequency
2− 6
7 − 11
12 − 16
17 − 21
22 − 26
Total
Ascending
Cumulative
Frequency (ACF)
6
0.24
36
0.12
8
50
Then:
a. Complete the above frequency distribution table.
Answer: The given frequency distribution table have the following form:
Class Limit
Class Boundaries
Class Midpoint
Frequency
Relative Frequency
A.C.F.
2− 6
7 − 11
12 − 16
17 − 21
22 − 26
Total
1.5 → 6.5
6.5 → 11.5
11.5 → 16.5
16.5 → 21.5
21.5 → 26.5
--------------
4
9
14
19
24
----------
6
12
18
6
8
50
0.12
0.24
0.36
0.12
0.16
1
6
18
36
42
50
-----
b. Calculate the variance and standard deviation for the above frequency distribution table.
Answers: Fist we will calculate the mean x of the given frequency distribution table.
x
m
1
f
i
f x
i 1
i
i

690
 13.8
50
The variance of the given frequency distribution table given by the following relation.
S2 
m
1


  fi   1
 i 1 
m
 f x
i 1
i
i
x

2
Therefore, we become that the standard deviation S is:
- 12 -

1848
 37.71
49
S   S 2   37.71  6.14
12. The points scored by a team in a series of matches are as follows:
17
10
2
24
7
48
27
10
15
8
5
7
14
18
8
28
Then:
a. Calculate the mean and standard deviation of the given data.
Answers: The mean of the given data is:
x
n
1
n
x
i 1
248
 15.5
16

i
To calculate the standard deviation, we must calculate the variance at first. We calculate the
variance by the following relation.
1
n 1
S2 
m
 x
i 1
i
x

2

2038
 135.87
15
Therefore, we become that:
S   S 2   135.87  11.656
b. Calculate the standard score of the value (7) in the given data.
Answer: The standard score of the value (7) in the given data is:
xi  x
7  15.5
zx


S
i
z7

 0.729
11.656
c. Calculate the coefficient of variation for the given data.
Answer: The coefficient of variation for the given data is:
CV 
S
11.656
 100 
 100  75.2 %
x
15.5
d. Calculate Q1 , Q2 , Q3 , LF and HF for the given data.
Answers: To calculate the quartiles we must order the given data. So we have the order of data as
follow:
x2  5
x1  2
7
7
8
8
10
10
14
15
17
18
24
27
28
x16  48
Now, to calculate a quartile Qr we must determine q r the rank of this quartile, where we have:
qr 
r (n  1)
4
 k s ; k 
, s  [0 , 1)
So the rank q1 of the first quartile Q1 equals to:
q1 
(16  1)
4
 4.25
Therefore, the first quartile Q1 equals to:
Q1  7  0.25 (8  7)  7.25
The rank q 2 of the second quartile Q2 equals to:
q2 
2 (16  1)
4
- 13 -
 8.5
Therefore, the second quartile Q2 equals to:
Q2  10  0.5 (14  10)  12
The rank q 3 of the third quartile Q3 equals to:
q3 
3 (16  1)
4
 12.75
Therefore, the third quartile Q3 equal to:
Q3  18  0.75 (24  18)  22.5
For the lower fence LF we have:
LF  Q1  1.5(Q3  Q1 )  7.25  1.5 (22.5  7.25)  15.625
For the higher fence HF we have:
HF  Q3  1.5(Q3  Q1 )  22.5  1.5 (22.5  7.25)  45.375
d. Draw the Boxplot for the given data.
Answer: The Boxplot for the given data is as follow:
13. Consider the following frequency distribution table:
Class
Class
No. Boundaries
1
2→ 6
2
3
4
5
18 → 22
Total
Class
Relative Percentage
Frequency
Midpoint
Frequency
%
4
0.25
20
0.25
8
40
A.C.F
22
32
a. Complete the above frequency distribution table.
Answer: The above frequency distribution table has the following form:
Class
No.
Class
Boundaries
Class
Midpoint
1
2
3
4
5
Total
2→ 6
6 → 10
10 → 14
14 → 18
18 → 22
---------
4
8
12
16
20
----------
Frequency
4
10
8
10
8
40
Relative
Frequency
0.10
0.25
0.20
0.25
0.20
1
Percentage %
10
25
20
25
20
100
b. Draw the histogram, polygon and ACFP for the above frequency distribution table.
- 14 -
A.C.F
4
14
22
32
40
------
Answers:
The histogram for the above frequency distribution table has the following figure.
The polygon for the above frequency distribution table has the following figure.
The ACFP for the above frequency distribution table has the following figure.
c. Calculate the mode(s) for the above frequency distribution table.
Answer: We note that the given frequency distribution table have two modes.
The first mode on the class boundary 6  10 . That means, the class boundary 6  10 is a modal
class. Therefore, the first mode is:
xˆ1  Lˆ1 
d11
d11  d21
C1  6 
Where we have C 1  4 and:
L1  6
is the lower bound of the first modal class,
- 15 -
6
4  9
62
d11  10  4  6 is the difference between the first modal class and previous class,
d21  10  8  2 is the difference between the first modal class and subsequent class.
The second mode on the class boundary 14  18 . That means, the class boundary 14  18 is a
modal class. Therefore, the second mode is:
xˆ2  Lˆ2 
d21
d21  d22
C 2  14 
2
 4  16
22
Where we have C 2  4 and:
L2  14
d21
is the lower bound of the second modal class,
 10  8  2 is the difference between the second modal class and previous class,
d22  10  8  2 is the difference between the second modal class and subsequent class.
14. Consider the marks obtained (out of 100 marks) by 50 students of class X of a school:
10
92
92
92
92
20
88
40
88
40
36
80
50
80
50
92
70
50
70
50
95
72
56
72
56
40
70
60
70
60
50
36
70
36
70
56
40
60
40
60
60
36
60
36
60
70
40
88
40
88
Then:
a. Calculate P10 , P50 and P93 .
Answers: To calculate a percentile Pr first we have to arrange the data. So we have the following
ordered data:
x1  10
36
40
50
56
60
70
70
88
92
x 2  20
36
40
50
56
60
70
72
88
92
36
40
40
50
60
60
70
72
88
92
36
40
40
50
60
60
70
80
88
x 49  92
36
40
50
56
60
70
70
80
92
x 50  95
Now, to calculate a percentile Pr we must determine pr the rank of this percentile, where we have:
pr 
r (n  1)
4
 k s ; k 
, s  [0 , 1)
Then we have Pr  xk  s (x k 1  x k ) for any r  1, 2, ..., 99 . So the rank p10 for the percentile P10
equals to:
p10 
10 (50  1)
100
 5.1
Therefore, the percentile P10 equals to:
P10  x 5  0.1(x 6  x 5 )  36  0.1(36  36)  36
- 16 -
To calculate the percentile P50 we must determine the rank p50 of this percentile, where we have:
50 (50  1)
p50 
100
 25.5
Therefore, the percentile P50 equals to:
P50  x 25  0.5 (x 26  x 25 )  60  0.5 (60  60)  60
Finally, the rank p93 for the percentile P93 equals to:
p93 
93 (50  1)
100
 47.43
Therefore, the percentile P93 equals to:
P93  x 47  0.43 (x 48  x 47 )  92  0.43 (92  92)  92
b. Calculate D3 , D5 and D8 .
Answers: To calculate a decile Dr first we have to arrange the data. Second we must determine dr
the rank of this decile Dr , where we have:
dr 
r (n  1)
10
 k s ; k 
, s  [0 , 1)
Then we have Dr  xk  s (xk 1  xk ) for any r  1, 2, ..., 10 . So we have:
d3 
3 (50  1)
10
 15.3
Therefore, the decile D3 equals to:
D3  x15  0.3 (x16  x15 )  50  0.3 (50  50)  50
The rank d 5 of the decile D5 equals to:
d5 
5 (50  1)
10
 25.5
Therefore, the decile D5 equals to:
D5  x 25  0.5 (x 26  x 25 )  60  0.5 (60  60)  60
Finally, the rank of the decile D8 equals to:
d8 
8 (50  1)
10
 40.8
Therefore, the decile D8 equals to:
D8  x 40  0.8 (x 41  x 40 )  80  0.8 (88  80)  86.4
c. Calculate Q1 , Q2 and Q3 .
Answers: To calculate a quartile Qr first we have to arrange the data. Second we must determine
a quartile Qr the rank of this decile Qr , where we have:
- 17 -
qr 
r (n  1)
 k s ; k 
4
, s  [0 , 1)
Then we have Qr  xk  s (xk 1  xk ) for any r  1, 2, 3 . So the rank q1 of the first quartile Q1
equals to:
q1 
(50  1)
4
 12.75
Therefore, the first quartile Q1 equals to:
Q1  x12  0.75 (x13  x12 )  40  0.75 (40  40)  40
The rank q 2 of the second quartile Q2 equals to:
2 (50  1)
q2 
 25.5
4
Therefore, the second quartile Q2 equals to:
Q2  x 25  0.5 (x 26  x 25 )  60  0.5 (60  60)  60
Finally, the rank q 3 of the third quartile Q3 equals to:
q3 
3 (50  1)
4
 38.25
Therefore, the third quartile Q3 equals to:
Q3  x 38  0.75 (x 39  x 38 )  72  0.25 (80  72)  74
d. chick if the given data have extreme values or not? And draw the Boxplot of them.
Answers: To chick if the given data have extreme values or not we must calculate the lower fence
(LF) and the higher fence (HF).
For the lower fence LF we have:
LF  Q1  1.5(Q3  Q1 )  40  1.5 (74  40)  11
This means that there is no small extreme value.
For the higher fence HF we have:
HF  Q3  1.5(Q3  Q1 )  74  1.5 (74  40)  125
This means that there is no large extreme value. So, we have not extreme values.
The Boxplot for the given data is as follow:
15. The daily sale of sugar (in Kg) in a certain grocery shop is given below:
- 18 -
Monday Tuesday Wednesday Thursday Friday Saturday
75
120
12
50
70.5
140.5
a. Calculate the average daily sale of sugar.
Answer: The average daily sale of sugar equals to the mean of the given data. Therefore, the average
daily sale of sugar is:
x
n
1
n
x
i 1
i

468
 78
6
.
b. Calculate the variance and the standard deviation of the above data.
Answer: Since the value of mean is known, we will calculate the variance by the relation
S2 
1
n 1
m
 x
i 1
i
x

2
. So we have:
S2 
1
n 1
m
 x
i 1
i
x

2

10875.5
 2175.1
5
Therefore, the standard deviation of the above data equals to:
S   S 2   2175.1  46.64
c. Determine the coefficient of variation.
Answer: The coefficient of variation for the given data equals to:
CV 
S
46.64
 100 
x
78
 100  59.79 %
16. Let the following data be marks obtained (out of 100) by 10 students in a test:
45
45
63
Then:
a. Calculate Q1 , Q2 and Q3 .
76
67
84
75
48
62
65
Answers: To calculate a quartile Qr first we have to arrange the data. So we have the following
ordered data:
x1  45
45
48
62
63
65
67
75
76
x10  84
Now, to calculate a quartile Qr we must determine q r the rank of this quartile, where we have:
qr 
r (n  1)
4
 k s ; k 
, s  [0 , 1)
Then we have Qr  xk  s (xk 1  xk ) for any r  1, 2, 3 . So the rank q1 of the first quartile Q1
equals to:
q1 
(10  1)
4
 2.75
Therefore, the first quartile Q1 equals to:
Q1  x 2  0.75 (x 3  x 2 )  45  0.75 (48  45)  47.25
The rank q 2 of the second quartile Q2 equals to:
- 19 -
2 (10  1)
q2 
4
 5.5
Therefore, the second quartile Q2 equals to:
Q2  x 5  0.5 (x 6  x 5 )  63  0.5 (65  63)  64
Finally, the rank q 3 of the third quartile Q3 equals to:
q3 
3 (10  1)
4
 8.25
Therefore, the third quartile Q3 equals to:
Q3  x 8  0.25 (x 9  x 8 )  75  0.25 (76  75)  75.25
b. Calculate the IQR.
Answer: The interquartile range IQR  Q3  Q1 equals to:
IQR  75.25  46.5  28.75
c. Have the given data got extreme values?
Answers: In order to check for extreme values in the given data we must calculate the lower fence
(LF) and the higher fence (HF).
For the lower fence LF we have:
LF  Q1  1.5 (Q3  Q1 )  46.5  1.5  28.75  3.375
This means that there is no small extreme value.
For the higher fence HF we have:
HF  Q3  1.5 (Q3  Q1 )  75.25  1.5  28.75  118.375
This means that there is no large extreme value. Therefore, the given data have not extreme
values.
d. Construct the box plot for the given data.
Answer: The box plot for the given data is as follow:
17. Consider the following data:
65
70
70
10
20
40
60
65
85
90
90
150
75
75
75
80
Then:
a. Calculate Q1 , Q2 and Q3 .
- 20 -
Answers: To calculate a quartile Qr first we have to arrange the data. So we have the following
ordered data:
x1  10
75
x 2  20
75
40
75
60
80
85
65
90
65
90
70
70
x16  150
Now, to calculate a quartile Qr we must determine q r the rank of this quartile, where we have:
qr 
r (n  1)
 k s ; k 
4
, s  [0 , 1)
Then we have Qr  xk  s (xk 1  xk ) for any r  1, 2, 3 . So the rank q1 of the first quartile Q1 equals
to:
(16  1)
q1 
4
 4.25
Therefore, the first quartile Q1 equals to:
Q1  x 4  0.25 (x 5  x 4 )  60  0.25 (65  60)  61.25
The rank q 2 of the second quartile Q2 equals to:
2 (16  1)
q2 
4
 8.5
Therefore, first quartile Q2 equals to:
Q2  x 8  0.5 (x 9  x 8 )  70  0.5 (75  70)  72.5
Finally, the rank q 3 of the third quartile Q3 equals to:
q3 
3 (16  1)
4
 12.75
Therefore, first quartile Q3 equals to:
Q3  x12  0.75 (x13  x12 )  80  0.75 (85  80)  83.75
b. Calculate the IQR.
Answer: The interquartile range IQR equal to:
IQR  Q3  Q1  83.75  61.25  22.5
c. Have the given data extreme values?
Answers: In order to check for extreme values in the given data we must calculate the lower fence
(LF) and the higher fence (HF).
For the lower fence LF we have:
LF  Q1  1.5 (Q3  Q1 )  61.25  1.5  22.5  27.5
This means that the values xs  10 and x  20 are extreme values.
As well we have for the higher fence HF:
HF  Q3  1.5 (Q3  Q1 )  83.75  1.5  22.5  117.5
This means that the value x  150 is an extreme value.
- 21 -
d. Construct the box plot for the given data.
Answer: The boxplot for the given data is as follow:
e. Comment the skewness of these distribution data.
Answer: We note that the distribution of the data is rounded to the right by a small amount due to
the extreme value, which is far from the data.
18. Consider the following ordered data:
40
45
55
65
?
?
75
75
78
183
Then use the suitable measure to calculate the central tendency and dispersion for the given
data.
Answers: The measures of the central tendency are the mean, median and mode. Because of the
loss of two values in the middle we cannot use the mean and median, therefore, we try by using the
mode. We note that a one mode only exist, it is x̂  75 .
Because of the loss of two values in the middle we cannot use the standard deviation as measure
for the dispersion. Therefore, we can use the range or the interquartile range for the dispersion. But
we note that the value x10  183 is an extreme value, so we will use the interquartile range as
measure of dispersion.
The interquartile range is IQR  Q3  Q1 . Therefore, to calculate the interquartile range, we must
calculate the first and third quartiles.
Now, to calculate a quartile Qr we must determine q r the rank of this quartile, where we have:
qr 
r (n  1)
4
 k s ; k 
, s  [0 , 1)
Then we have Qr  xk  s (xk 1  xk ) for any r  1, 2, 3 . So the rank q1 of the first quartile Q1 equals
to:
q1 
(10  1)
4
 2.75
Therefore, the first quartile Q1 equals to:
Q1  x 2  0.75 (x 3  x 2 )  45  0.75 (55  45)  52.5
The rank of Q3 is:
q3 
3 (10  1)
4
 8.25
Therefore, the third quartile Q3 equals to:
Q3  x 8  0.25 (x 9  x 8 )  75  0.25 (78  75)  75.75
So we get:
- 22 -
IQR  Q3  Q1  75.75  52.5  23.25
19. Consider the following ordered data:
-15
20
40
50
65
65
70
73
75
137
a. Have the given data got extreme values?
Answer: In order to check the extreme values in the data we have to calculate the lower fence LF
and the higher fence HF:
HF  Q3  1.5 (Q3  Q1 ) and LF  Q1  1.5 (Q3  Q1 )
Therefore, we must calculate the first and third quartiles for the given data.
Now, to calculate a quartile Qr we must determine q r the rank of this quartile, where we have:
qr 
r (n  1)
4
 k s ; k 
, s  [0 , 1)
Then we have Qr  xk  s (xk 1  xk ) for any r  1, 2, 3 . So the rank q1 of the first quartile Q1 equals
to:
q1 
(10  1)
4
 2.75
Therefore, the first quartile Q1 equals to:
Q1  x 2  0.75 (x 3  x 2 )  20  0.75 (40  20)  35
The rank q 3 of the first quartile Q3 equals to:
q3 
3 (10  1)
 8.25
4
Therefore, the third quartile Q3 equals to:
Q3  x 8  0.25 (x 9  x 8 )  73  0.25 (75  73)  73.5
We become Q3  Q1  73.5  35  38.5 . So we get:
The higher fence HF equal to:
HF  Q3  1.5 (Q3  Q1 )  73.5  1.5  38.5  131.25
This means that the value x  137 is an extreme value.
The lower fence HF equals to:
LF  Q1  1.5 (Q3  Q1 )  35  1.5  38.5  22.75
This means that there is no small extreme value.
b. Use the suitable measure to calculate the central tendency and dispersion for the given
data.
Answers: Although 137 is extreme value, but the suitable measure for the central tendency of the
given data is the mean. Where we have:
x 
1
n
n
x
i 1
i
When we calculate the median, we find:
- 23 -

580
 58
10
x 
x5  x6

2
65  65
 65
2
Here we note the mean is better than the median for the central tendency measure for these data.
The suitable measure of dispersion for the given data is the interquartile range, because we have an
extreme value. Where we have:
IQR  Q3  Q1  73.5  35  38.5
20. The mean age of six persons is 49 years. The ages of five of these six persons are 55, 39, 44,
51, and 45 years respectively. What is the age of the sixth person?
Answer: We spouse that the age of the sixth person is x, then we have:
49 = x =
Therefore, we get:
55 + 39 + 44 + 51+ 45 + x
6
x =294  234=60
Also, the age of the sixth person is 60 years.
21. The following observations have been arranged in ascending order.
29, 32, 48, 50, x, x + 2, 72, 78, 84, 95
Now, if the median of the data is 63, then calculate the value of x.
Answer: Since the number of data is even, then the median value is given in the following
relation:
63  x 
x5  x6

2
So we get x  62 .
x  (x  2)
 x 1
2
22. Consider the following two data sets (we will assume that they are degrees of students in the
60-degree test).
Data set X: 8
12
25
37
40
Data set Y: 23
27
40
52
55
Notice that each value of the second data set is obtained by adding 15 to the corresponding value
of the first data set. Then:
a. Calculate the mean for each of these two data sets. Comment on the relationship between
the two means.
Answers: The mean of data set X is:
x
1
n
n
x
i 1
i

122
 24.4
5
The mean of data set Y is:
y
1
n
n
y
i 1
i

197
 39.4.4
5
So we find that y  x  15 also. This means that if we perform a constant withdrawal of all data,
the average value will change by this constant.
- 24 -
b. Calculate the standard deviation for each of these two data sets. Comment on the relationship
between the two standard deviations.
Answers: The variance for data set X is:
1
n 1
S X2 
m
 x
i 1
i
x

2

825.2
 206.3
4
Therefore, we become that:
S X   S X2   206.3  14.36315
The variance for data set Y is:
1
n 1
SY2 
m
 y
i 1
i
y

2

825.2
 206.3
4
Therefore, we become that:
SY   SY2   206.3  14.36315
So we find that S X  SY .
Important result:
This means that if we perform a constant withdrawal of all data, the average value will change by
this constant but the standard deviation value will not change.
Green points are the means
c. Calculate the standard score of the value (40) in data sets X and Y.
Answer: The z-score for data set X is:
x x
z 40,X

SX

40  24.4
 1.78234
14.36315
The z-score for data set Y is:
z 40,Y

y  y 40  39.4

 0.04177
SY
14.36315
So we find that z 40,Y  z 40,X . This means that if we perform a constant withdrawal of all data, the zscore value will change but not with this constant. On the other hand, this result means that the
student which has 40 in set X best level than the student which has 40 in set Y.
d. Calculate the coefficient of variation for each of these two data sets, then compare them.
Answers: The coefficient of variation for data set X is:
CVX 
SX
 100 
14.36315
 100  58.87 %
24.4
 100 
14.36315
 100  36.455 %
39.4
x
The coefficient of variation for data set Y is:
CVY 
SY
y
So we find that CVY  CVX and CVY  CVX  15 . This means that if we perform a constant
withdrawal of all data, the coefficient of variation value will change but not with this constant.
- 25 -
23. Consider the following three data sets.
Data set X: 5, 10, 15, 20, 25
Data set Y: 10, 20, 30, 40, 50
Data set Z: 20, 40, 60, 80, 100
a. Calculate the mean for each of these data sets. Comment on the relationship between the three
means.
Answers: The mean of data set X is:
x
1
n
n
x
i 1
i
75
 15
5

The mean of data set Y is:
y
1
n
n
i

150
 30
5
i

300
 60
5
y
i 1
The mean of data set Z is:
z 
1
n
n
z
i 1
So we find that z  2y  4 x also. This means that if we rotate the data at a fixed base, the mean
will be rotate with the same fixed base.
b. Calculate the standard deviation for each of these data sets. Comment on the relationship
between the two standard deviations.
Answers: The variance for data set X is:
S X2 
1
n 1
m
 x
i 1
i
x

2
251.8
 62.95
4

Therefore, we become that:
S X   S X2   62.95  7.905694
The variance for data set Y is:
SY2 
1
n 1
m
 y
i 1
i
y

2

1000
 250
4
Therefore, we become that:
SY   SY2   250  15.81139
The variance for data set Z is:
SZ2 
1
n 1
m
 z
i 1
i
z

2

8500
 2125
4
Therefore, we become that:
SZ   SZ2   2125  46.09772
So we find that SY2  4 S X2 and therefore, we get SY  2 S X .
Important result:
This means that if we rotate data at a fixed base, the standard deviation will be rotate with the
same fixed base.
c. Calculate the standard score of the value (20) in data set Y.
- 26 -
Answer: The z-score for data set Y is:
y y
z 20,Y

SY

20  30
  0.63256
15.81139
d. Calculate the coefficient of variation for each of these data sets, then compare them.
Answers: The coefficient of variation for data set X is:
CVX 
SX
 100 
7.905694
 100  52.70 %
15
 100 
15.81139
 100  52.70 %
30
x
The coefficient of variation for data set Y is:
CVY 
SY
y
The coefficient of variation for data set Z is:
SZ
31.62278
 100  52.70 %
z
60
Therefore, we find that CVY  CVX and CVY  2CVX . This means that if we rotate the data at a fixed
CVZ 
 100 
base, the coefficient of variation value will not change.
So we find that CVZ  CVY  CVX and CVZ  2 CVY  4 CVX .
24. Consider the data set 15, 15, 15, 15, 15, 15. Then:
a. Calculate the standard deviation.
Answer: To calculate the variance for data set X we must first calculate the mean of data set X,
where we have:
x
1
n
n
x
i 1
i

65
5
6
The variance for data set X is:
S2 
1
n 1
m
 x
i 1
i
x

2

0
0
5
Therefore, we become that:
S   S2   0  0
b. Is its value of the standard deviation equal to zero? If yes, why?
Answers: Yes, the standard deviation equal to zero, because all values are equal to 5. Therefore,
there are no values scatter around this value.
25. Consider the following polygon of grouped data, representing the degree of an examination of
60 students:
- 27 -
Then:
a. Prepare the frequency distribution table of this data.
Answer: The frequency distribution table of this data is as follow:
Class
Limit
3-5
Class
Boundaries
2.5→5.5
Class
Midpoint
4
Frequency
Percent
Frequency
8.33 %
ACF
DCF
5
Relative
Frequency
0.0833
5
60
6-8
5.5→8.5
7
10
0.16667
16.67%
15
55
9 - 11
8.5→11.5
10
15
0.25000
25.00%
30
45
12 - 14
11.5→14.5
13
20
0.33333
33.33%
50
30
15 - 17
14.5→17.5
16
10
0.16667
16.67%
60
10
Total
----------
------
60
1
100 %
-----
----
b. Draw the histogram, ascending cumulative frequency polygon (less than ogive) and descending
cumulative frequency polygon (greater than ogive) for this table.
Answers: The histogram for this table is as follow:
The ogive for this table is as follow:
- 28 -
The ascending cumulative frequency polygon
The descending cumulative frequency polygon
c. Calculate the mean, median and mod for this table.
Answers: The mean of the given frequency distribution table is:
x
m
1
f
f x
i
i 1
i
i

660
 11
60
For the median:
The median class is 8.5  11.5 , and we have 1  fi  30. So, the median for the data in above
2
table is:
x

1 f  F
i
2
: L 
f
f
 C  8.5 

30  30  15
15
  3  11.5
For the mode:
The given frequency distribution table has a unique mode, and the modal class is 11.5  14.5 . So
we have:
xˆ  Lˆ 
d1
d1  d2
C  11.5 
5
 3  12.5
5  10
26. Consider the following histogram of grouped data, representing the temperatures in 50 cities in
Europe:
- 29 -
Then:
a. Prepare the frequency distribution table of this data.
The frequency distribution table of this data is as follow:
Class
Limit
1-5
Class
Boundaries
0.5→5.5
Class
Midpoint
3
Frequency
Percent
Frequency
14 %
ACF
DCF
7
Relative
Frequency
0.14
7
50
6 - 10
5.5→10.5
8
9
0.18
18 %
16
43
11- 15
10.5→15.5
13
14
0.28
28 %
30
34
16 - 20
15.5→20.5
18
12
0.24
24 %
42
20
21 - 25
20.5→25.5
23
8
0.16
16 %
50
8
Total
----------
--------
50
1
100 %
-----
-----
b. Draw the polygon and ACFP for this table.
Answer: The polygon for this table has the following figure:
The ascending cumulative frequency polygon (ACFP) for this table has the following figure:
- 30 -
c. Calculate the standard deviation for this table.
Answer: To calculate the standard deviation, we must calculate the variance for the given data,
Therefore, first we will calculate the mean for the given data, where we have:
x
m
1
f x
f
i
i 1
i
i

675
 13.5
50
So, the variance for the given data is:
S2 
1


  fi   1
 i 1 
m
m
 f x
i 1
i
i
x

2

251.25
 5.128
49
Therefore, the standard deviation equals of the given data equals to:
S   S 2   5.128  2.2644
27. Consider the following ACFP of grouped data, representing the weight to 30 fruit boxes:
a. Prepare the frequency distribution table of this data.
Answer: The frequency distribution table of this data is as follow:
- 31 -
Class Limit
Class
Midpoint
7.5
Frequency
5.5 – 9.5
Class
Boundaries
5 → 10
Percent
Frequency
12.5%
ACF
DCF
5
Relative
Frequency
0.125
5
40
10.5 – 14.5
10 → 15
12.5
15
0.375
37.5%
20
35
15.5 – 19.5
15 → 20
17.5
5
0.125
12.5%
25
20
20.5 – 24.5
20 → 25
22.5
10
0.250
25%
35
15
25.5 – 29.5
25 → 30
27.5
5
0.125
12.5%
40
5
Total
----------
------
40
1
100 %
-----
----
b. Draw the histogram and polygon for this table.
Answers: The histogram for this table is as follow:
The polygon for this table is as follow:
- 32 -