Download Exercises of Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Student's t-test wikipedia , lookup

Probability amplitude wikipedia , lookup

Transcript
Exercises of Statistics
Exercise 0.1 For each of the following two sets of data:
A: 57243
B : 15 11 9 8 10 12
calculate the measures of central tendency, variability and shape.
Exercise 0.2 Let X be the discrete variable that consider the number of children per family
in the entire population of a certain geographical area, and suppose that in a given sample it
is obtained the following data: 1,0,2,2,4,1,1,3. Compute (by rounding to cents the results) the
absolute frequency distribution of the data, the measures of central tendency, the measures of
variation and the measures of shape. Draw the bar chart and the pie chart of the absolute
frequency distribution.
Solution: We have: the sample dimension is n = 8 (that represents the number of considered
families), the distinct observations are: x1 = 0, x2 = 1, x3 = 2, x4 = 3, x5 = 4.
class a. freq. c. a. freq. r. freq. c. r. freq.
xi
fi
fiC
pi
pC
xi fi x2i fi |xi − x| |xi − x|fi |xi − m|fi
i
0
1
1
0.125
0.125
0
0
1.75
1.75
1.5
1
3
4
0.375
0.500
3
3
0.75
2.25
1.5
2
2
6
0.250
0.750
4
8
0.25
0.50
1.0
3
1
7
0.125
0.875
3
9
1.25
1.25
1.5
4
1
8
0.125
1.000
4
16
2.25
2.25
2.5
8
1.000
14
36
8.00
8.0
Measures of central tendency:
0·1+1·3+2·2+3·1+4·1
14
=
= 1.75;
8
8
√
√
0 · 1 + 1 · 3 + 4 · 2 + 9 · 1 + 16 · 1
36
EQ (X) =
=
= 2.12;
8
8
the median is 1.50 in fact by ordering the data we have 0,1,1,1,2,2,3,4; the mode is 1; the
mid-range is (0 + 4)/2 = 2.
Measures of variability: The range is 4 − 0 = 4; the mean absolute deviation is
E(X) = x =
8
|0 − 1.75| · 1 + |1 − 1.75| · 3 + |2 − 1.75| · 2 + |3 − 1.75| · 1 + |4 − 1.75| · 1
= = 1;
8
8
The median absolute deviation is
8
|0 − 1.5| · 1 + |1 − 1.5| · 3 + |2 − 1.5| · 2 + |3 − 1.5| · 1 + |4 − 1.5| · 1
= = 1;
8
8
the variance is
σ2 =
1.752 · 1 + 0.752 · 3 + 0.252 · 2 + 1.252 · 1 + 2.252 · 1
11.5
=
= 1.44;
8
8
36
1∑ 2
− 1.752 = 1.44;
σ =
xi fi − x2 = EQ (X)2 − x2 =
n i=1
8
n
2
1
√
11.5
8
= 1.20; the sample variance is s2 = 11.5/7 = 1.28;
√
the sample standard deviation is s = 11.5/7 =;
Measures of shape: for the symmetry we have
the standard deviation is σ =
γ1 =
m3
1 −1.753 · 1 − 0.753 · 3 + 0.253 · 2 + 1.253 · 1 + 2.253 · 1
= 3.92
=
σ3
1.203
8
and so the distribution has positive asymmetry.
fi
3
2.5
2
1.5
1
0.5
0
1
2
Number of children
4
3
Number of children
1
0
4
2
3
2
Exercise 0.3 Let X be the random variable: height (cm) of a given species of plants. Suppose
that in a sample of 40 plants we have the following values of X, obtained by rounding to units:
98 111 119 130 170 143 156 126 113 127 107 83 100 128 143 127 117 125 64 119 130 120 108
95 192 124 129 143 198 131 163 152 104 119 161 178 135 146 158 176
Compute the frequency distribution of the data, draw the corresponding histogram and frequency
polygon. Moreover compute the measures of central tendency, the measures of variation and
the measures of shape, rounded to tenths.
Solution:
class
60-79
80-99
100-119
120-139
140-159
160-179
180-199
actual
endpoints
59.5-79.5
79.5-99.5
99.5-119.5
119.5-139.5
139.5-159.5
159.5-179.5
179.5-199.5
cen. val. abs. freq.
xi
fi
69.5
1
89.5
3
109.5
10
129.5
12
149.5
7
169.5
5
189.5
2
40
rel. freq.
pi
0.025
0.075
0.250
0.300
0.175
0.125
0.050
1.000
cum. freq.
pC
i
0.025
0.100
0.350
0.650
0.825
0.950
1.000
a. f. den.
f /n
Fi
xi fi
xi i i
0.050
69.5
1.112
0.150
268.5 1.401
0.500
1095 3.235
0.600
1554 4.302
0.350
1046.5 2.402
0.250
847.5 1.900
0.100
379
1.300
5260 128.5
Fi
0.6
0.5
0.4
0.3
0.2
0.1
69.5
89.5 109.5 129.5 149.5 169.5 189.5
Height of plants
Figure 1: Hystogram of the height of 40 plants.
Measures of central tendency:
E(X) = x =
actual
cen. val. abs. freq.
endpoints
xi
fi
59.5-79.5
69.5
1
79.5-99.5
89.5
3
99.5-119.5
109.5
10
119.5-139.5
129.5
12
139.5-159.5
149.5
7
159.5-179.5
169.5
5
179.5-199.5
189.5
2
40
5260
= 131.5;
40
rel. freq.
pi
0.025
0.075
0.250
0.300
0.175
0.125
0.050
1.000
3
EG (X) = 128.5;
cum. freq.
1
pC
x2i fi
f
|xi − x|fi
i
xi i
0.025
4830.250 0.0144
62
0.100
24030.75 0.0335
126
0.350
119902.5 0.0913
220
0.650
201243
0.0927
24
0.825
156451.75 0.0468
126
0.950
143651.25 0.0295
190
1.000
71820.5 0.0106
116
864
0.3188
128.5
Fi
0.6
0.5
0.4
0.3
0.2
0.1
Height of plants
69.5
89.5
109.5
129.5
149.5
169.5
189.5
Figure 2: Frequency polygon of the height of 40 plants.
√
√
721930
40
EQ (X) =
= 134.3;
EH (X) =
= 125.5;
40
0.3188
the median is the 0.5-quantile, we have q = 0.5, α = 0.35, β = 0.65, a = 119, b = 139 and so
0.5 − 0.35
x − 119.5
0.15
=
⇒ x = 119.5 + 20
= 129.5;
0.65 − 0.35
139.5 − 119.5
0.3
the mode is 129.5; the mid-range is (64 + 198)/2 = 131.
Measures of variability: The range is 198 − 64 = 134; the mean absolute deviation is
864/40 = 21.6; the median absolute deviation is 864/40 = 21, 6; the variance is
σ 2 = EQ (X)2 − x2 =
721930
− 131.52 = 756;
40
√
the standard deviation is σ√= 756 = 27.5; the sample variance is s2 = 775.4; the sample
standard deviation is s = 775.4 = 27.8;
Measures of shape: for the symmetry we have
γ1 =
m3
3456
=
= 0.2
σ3
27.53
and so the distribution has positive asymmetry.
We note that if the measures are obtained by truncation (and not by rounding as specified in
the text) to unit, then in the frequency distribution the actual endpoints and so the central
values cange respect to the above distribution, in particular we have:
class
60-79
80-99
100-119
120-139
140-159
160-179
180-199
actual
cen. val. abs. freq. rel. freq.
endpoints
xi
fi
pi
60-80
70
1
0,025
80-100
90
3
0,075
100-120
110
10
0,25
120-140
130
12
0,3
140-160
150
7
0,175
160-180
170
5
0,125
180-200
190
2
0,05
40
1
4
cum. freq.
pC
i
0,025
0,1
0,35
0,65
0,825
0,95
1
Exercise 0.4 (5 p.) Determine the frequency distribution of the following measurements (obtained by truncation to tenths) of the level of serum uric acid (in mg per 100 ml) of 50 adult
males:
5.1 6.1 3.9 4.1 5.9 4.8 5.2 5.9 4.6 5.5 6.1 6.3 4.9 5.5 4.36.2 5.5 5.7 4.8 4.5 5.6 4.7 5.5 5.4 4.4
5.5 4.3 5.3 6.2 5.1 5.8 5.7 5.9 5.6 5.7 5.3 5.6 5.5 5.6 5.4 4.9 6.0 4.1 4.9 5.1 4.7 6.3 5.0 4.9 5.9
moreover determine the actual endpoints and the central value of each class, and draw the histogram. Calculate the mean, the variance and the 0.4 − quantile, by rounding the corresponding
value to cents.
Solution:
Minumum value=3.9
Maximum value=6.3
Amplitude= 6.3−3.9
= 2.4/5 = 0.48 hence we take the amplitude of the class equal to A = 0.5,
5
class
3.5-3.9
4.0-4.4
4.5-4.9
5.0-5.4
5.5-5.9
6.0-6.4
actual
cen. val. abs. freq. a. f. den.
endpoints
xi
fi
Fi
3.5-4.0
3.75
1
2
4.0-4.5
4.25
5
10
4.5-5.0
4.75
10
20
5.0-5.5
5.25
9
18
5.5-6.0
5.75
18
36
6.0-6.5
6.25
7
14
50
xi fi
x2i fi
3.75
14.0625
21.25 90.3125
47.50 225.625
47.50 248.0625
103.50 595.125
43.75 273.4375
267.0 1446.625
rel. freq.
pi
0.02
0.10
0.20
0.18
0.36
0.14
1.00
cum. freq.
pC
i
0.02
0.12
0.32
0.50
0.86
1.00
267
1446.625
= 5.34; The variance is σ 2 =
− 5.342 = 0.42;
50
50
The 0.4-quantile is 5.22, in fact we have q = 0.4, α = 0.32, β = 0.50 (note that α < q ≤ β),
a = 5.0, b = 5.5 and so
The mean is E(X) = x =
0.4 − 0.32
x − 5.0
0.08
=
⇒ x = 5.0 + 0.5
= 5.22,
0.50 − 0.32
5.5 − 5.0
0.18
note that a < 0.4−quantile≤ b.
5
Exercise 0.5 Concentrations (mg/l obtained by rounding) of sodium and chloride in 36 Apennine lakes:
• give the graphical representation of the the data and their frequency distribution.
• calculate the measures of central tendency, variability and shape.
Lake Sodium (mg/l)
1
1.78
2
1.63
3
1.85
4
2.10
5
1.35
6
1.40
7
1.82
8
1.35
9
2.06
10
1.85
11
1.51
12
2.00
13
2.02
14
1.90
15
1.60
16
2.18
17
1.82
18
1.90
19
1.75
20
2.11
21
2.30
22
1.95
23
2.60
24
2.44
25
2.18
26
2.51
27
2.37
28
2.54
29
2.06
30
2.77
31
2.31
32
2.81
33
2.33
34
1.45
35
1.78
36
2.09
Chloride (mg/l)
1.60
1.80
2.90
2.90
2.90
2.90
2.00
2.00
2.00
2.20
2.30
2.30
2.80
2.80
2.80
2.50
2.50
2.50
2.60
2.60
2.60
2.70
2.90
2.90
3.00
3.10
3.10
3.30
3.30
3.40
3.40
3.60
3.70
3.80
3.80
3.90
6
Exercise 0.6 (5 p.) Give the frequency distribution of the following measures (obtained by
rounding to the units) of the cholesterol level (mg/dl) of 40 individuals:
150, 152, 175, 148, 161, 155, 144, 158, 178, 197, 203, 147, 145,
165, 184, 115, 165, 169, 160, 150, 185, 195, 205, 217, 177, 184,
153, 138, 162, 192, 125, 114, 174, 182, 148, 194, 218, 175, 180, 200
compute the actual endpoints and the central value of each class of the distribution and draw
the corresponding histogram. Compute the mean, the variance and the 0.8-quantile of such
frequency distribution.
Exercise 0.7 (5 pt) In a lake there was a die-off of 50% of the fishes of a certain species.
Based on the knowledge of the industries present nearby the lake, it is supposed that the causes
of this die-off can be mainly attributed to pollution from a substance S1 or from a substance S2 .
The probability of having pollution from S1 is equal to 0.05, the ones of having pollution from
S2 is equal to 0.1. The probability of observing a mortality of 50% of the fishes, of the above
species, supposing of having a pollution from S1 is equal to 0.9, the one supposing of having a
pollution from S2 is equal to 0.15, the one supposing of having a pollution from other causes is
equal to 0.005. Calculate the probability of having pollution due to substance S1 , assuming you
made the above observation.
Solution: We use the Bayes’ formula, in fact we want to determine the causes of a given
observation:
B =observation=In a lake there was a die-off of 50% of the fishes of a certain species.
Possible causes:
A1 = pollution from a substance S1
A2 = pollution from a substance S2
A3 = pollution from other substances S3
P (A1 ) = 0.05 probability of pollution from S1
P (A2 ) = 0.1 probability of pollution from S2
P (A3 ) = 1 − 0.05 − 0.1 = 0.85 probability of pollution from other substances
P (B|A1 ) = 0.9 is the probability of observing a mortality of 50% of the fishes, of the above
species, supposing of having a pollution from S1 .
P (B|A2 ) = 0.15 is the probability of observing a mortality of 50% of the fishes, of the above
species, supposing of having a pollution from S2 .
P (B|A3 ) = 0.005 is the probability of observing a mortality of 50% of the fishes, of the above
species, supposing of having a pollution from S1 .
The probability of having pollution due to substance S1 , assuming you made the above observation is
P (A1 ) P (B|A1 )
.
P (A1 |B) =
P (A1 ) P (B|A1 ) + P (A2 ) P (B|A2 ) + P (A3 ) P (B|A3 )
P (A1 |B) =
0.05 · 0.9
= 0.07
0.05 · 0.9 + 0.1 · 0.15 + 0.85 · 0.005
Exercise 0.8 (5 pt) A patient has a body temperature greater than 39◦ C. In the population
were recorded the following data: an individual has a probability 0.1 of having an influenza
virus, he has a probability 0.005 of suffering from meningitis; the probability of observing a
body temperature greater than 39◦ C at an individual suffering from influenza virus is equal to
0.05, the one at an individual suffering from meningitis is equal to 0.7, the one at an individual
7
suffering from other reasons is equal to 0.001. Calculate the probability that the patient has
meningitis assuming you made the above observation, that is, he has a body temperature greater
than 39◦ C.
Exercise 0.9 (2 pt) In an experiment on the soil fertility, you want to evaluate all the pairs
between: Ca, Mg, Na, N, P, K.
a. How many pairs of elements have to be taken into consideration?
b. To evaluate all the terns, how many different groups will have to be formed?
Answers:
a.
C6,2 =
6!
= 15.
(6 − 2)!2!
C6,3 =
6!
= 20.
(6 − 3)!3!
b.
Exercise 0.10 In a race among 10 competitors:
1. (1 p.) How many different orders of arrival are possible?
2. What is the probability of guessing the first three
• (1 p.) by establishing their order?
• (1 p.) without establishing their order?
3. Is it convenient to bet 10 euro to earn 500 euro if you will guess the first two
• (2 p.) by establishing their order?
• (2 p.) without establishing their order?
Answer:
1. (1 p.) In a race with 10 competitors, the possible arrival orders are permutations of 10
elements: P10 = 10! = 1 · 2 · · · · · 10 = 3· 628· 800
2. The possible groups of the first three from 10 competitors
• taking into account the order of arrival, is the number of 3-permutations of 10 that
is
D10,3 = 10 · 9 · 8 = 720;
so the probability of guessing the first three by giving also the order of arrival is
1/720 = 0.001389;
• regardless of the order of arrival, is the number of 3-combinations of 10 that is
C10,3 =
720
D10,3
=
= 120;
3!
6
so the probability of guessing the first three without establish the order is 1/120 =
0.00833.
8
3. The probability of guessing the first 2 of 10, including who will be first and who second,
is given by the 2-permutations of 10 in particular, and it is 1/D10,2 = 1/(10 · 9) = 1/90.
So the probability of guessing is 1/90 that is less favorable of the ratio 10/500 = 1/50
set in the bet (so it is not convenient to bet). The probability of guessing the first 2 of
10, without establishing the order is given by the 2-combinations of 10 in particular it is
equal to
2!
1/C10,2 =
= 2/90 = 1/45.
D10,2
The probability of guessing is 1/45, more favorable of the ratio 1/50 set in the bet (the
bet is convenient).
Exercise 0.11 (pt. 3) All the possible anagrams of the word WORLD (independently from
their meaning) are
720
120
24
9
6
Exercise 0.12 (5 p.) In humans, more males than females are born, with a sex ratio 1.07 (
that means 107 males for 100 females). Compute the probability distribution of the number of
male children into families with 4 children. Draw its bar chart and compute its mean and its
variace.
Answer: A posteriori, on the basis of the data collected, we can say that the frequentist
probability of a male born (result A) is φ = 107/(107 + 100) = 0.52 and of a female born
(result B) is 1 − φ = 0.48. By using the binomial distribution we can calculate the probabilities
of having 0, 1, 2, 3, 4 (= k) births of male children into families with 4 children (n = 4), and we
have
• 0 sons: P0 = C4,0 φ0 (1 − φ)4 = 1 · 1 · 0.484 = 0.05;
• 1 son: P1 = C4,1 φ1 (1 − φ)3 = 4 · 0.52 · 0.483 = 0.23;
• 2 sons: P2 = C4,2 φ2 (1 − φ)2 = 6 · 0.522 · 0.482 = 0.37;
• 3 sons: P3 = C4,3 φ3 (1 − φ)1 = 4 · 0.523 · 0.48 = 0.27;
• 4 sons: P4 = C4,4 φ4 (1 − φ)0 = 1 · 0.524 · 1 = 0.07;
and so the probability distribution is:
xi
0
1
2
3
4
pi
0.05
0.23
0.37
0.27
0.07
0.99
pC
i
0.05
0.28
0.65
0.92
0.99
xi pi
0
0.23
0.74
0.71
0.28
1.96
x2i pi
0
0.23
1.48
2.13
1.12
4.96
the mean is µ = nφ = 4 · 0.52 = 2.02 ≈ 1.96; the variance is σ 2 = nφ(1 − φ) = 4 · 0.52 · 0.48 =
0.9984 ≈ 4.96 − 1.962 = 1.1184
Exercise 0.13
• Which is the probability of getting three times the number 1 by throwing
a dice five times?
• Which is the probability that 9 laboratory experiments are positive and one negative, if
usually the experiments are positive in 20% of cases?
Solution:
• Experiment: ”throw a dice”;
Success: A =”get the number 1”;
Probability of success= φ = P (A) = 61 ;
Probability of unsuccess 1 − φ = 56 ;
Numeber of independent experiments= n = 5;
Number of successes= k = 3
The probability of getting three times the number 1 by throwing a dice five times is
( )3 ( 5 )5−3
P3 = C5,3 16
.
6
10
• Experiment: ”laboratory experiment”;
Success: A =”the result of the laboratory experiment is positive”;
Probability of success= φ = P (A) = 20/100 = 0.2;
Probability of unsuccess 1 − φ = 0.8;
Numeber of independent experiments= n = 10;
Number of successes= k = 9
The probability that 9 laboratory experiments are positive and one negative
P9 = C10,9 (0.2)9 (0.8)10−9 .
Exercise 0.14 In the population of a certain geographic region you have the following breakdown for blood group: 10% group AB, 20% group B, 30% group A, 40% group 0.
1. By extracting 10 individuals, what is the probability that 2 have the group AB, 3 the group
B, 2 the group A and 3 the group 0?
2. By extracting 8 individuals, what is the probability that 4 have the group 0 and 4 the group
A?
Solution:
A1 = ”individual has group AB”, φ1 = 10/100 = 0.10
A2 = ”individual has group B”, φ2 = 20/100 = 0.20
A3 = ”individual has group A”, φ3 = 30/100 = 0.30
A4 = ”individual has group 0”, φ4 = 40/100 = 0.40
the events are pairwise mutually exclusive and φ1 + φ2 + φ3 + φ4 = 1, so we can use the
multinomial distribution and
1. the probability that by extracting 10(= n) individuals, we have: 2 = k1 with group AB,
3 = k2 with the group B, 2 = k3 with group A and 3 = k4 with group 0 is
P2,3,2,3 =
10!
0.102
2!·3!·2!·3!
· 0.203 · 0.302 · 0.403 = 0.012
2. the probability that by extracting 8 = n individuals, what is the probability that 4 = k4
have the group 0 and 4 = k3 the group A is
P0,0,4,4 =
8!
0.100
0!·0!·4!·4!
· 0.200 · 0.304 · 0.404 = 0.046
Exercise 0.15 (pt. 4) If we have a dice with three red faces, two green faces and one blu face.
Compute the probability of having 1 time the red face, 3 times the green face and 4 times the
blu face by throwing 8 times the dice.
Exercise 0.16 Let us consider a deck of 52 cards with four suits: hearts, diamonds, clubs,
spades (1, 2, . . . , 10, J, Q, K). By extracting without reintroduction 5 cards from the deck, what
is the probability of
1. obtaining 4 aces;
2. obtaining 4 aces or 5 cards of hearts;
3. obtaining 5 cards of the same suit;
4. not obtaining 5 cards of the same suit.
11
Solution:
1. P (obtaining 4 aces) =
52−4
C52,5
=
48·(52−5)!·5!
52!
=
48!·5·4·3·2
52·51·50·49·48!!
= 0.18 · 10−4 .
2. As the events: A=” obtaining 4 aces”, B=” obtaining 5 cards of hearts” are mutually
exclusive then
P ( obtaining 4 aces or 5 cards of hearts) = P (A ∪ B) = P (A) + P (B) =
= 0.18 · 10−4 +
C13,5
C52,5
=;
3. As the events: AH =” obtaining 5 cards of hearts”, AD =” obtaining 5 cards of diamonds”,
AC =” obtaining 5 cards of clubs”, AS =” obtaining 5 cards of spades” are pairwise mutually exclusive then
P (AH ) =
C13,5
C52,5
= P (AD ) = P (AC ) = P (AS )
P (obtaining 5 cards of the same suit) = P (AH ∪ AD ∪ AC ∪ AS ) = P (AH ) + P (AD ) +
13,5
P (AC ) + P (AS ) = 4 C
=. . .
C52,5
4. P ( not obtaining 5 cards of the same suit) = 1 − P (obtaining 5 cards of the same
suit) = . . . .
Exercise 0.17 In a planktonic comunity, the population of Eudiaptomus vulgaris is present
with the 2% of individuals:
• By sampling 200 individuals what is the probability of not finding Eudiaptomus?
• By sampling 100 individuals what is the probability of finding 4 exemplars of Eudiaptomus?
• With a presence of 5% of Eudiaptomus vulgaris, as would change the previous probabilities?
Answers
By sampling 200 individuals: the mean with 2% of presences is µ = 200 · 0.02 = 4; so the
probability of not finding individuals (k = 0) is
P0 =
40 −4
e = 0.0183
0!
By sampling 200 individuals: the mean with 5% of presences is µ = 200 · 0.05 = 10; so the
probability of not finding individuals (k = 0) is
P0 =
100 −10
e
= 0.0000454
0!
By sampling 100 individuals: the mean with 2% of presences is µ = 100 · 0.02 = 2; so the
probability of finding 4 individuals (k = 4) is
P4 =
24 −2
e = 0.0902
4!
By sampling 100 individuals: the mean with 5% of presences is µ = 100 · 0.05 = 5; so the
probability of finding 4 individuals (k = 4) is
P4 =
54 −5
e = 0.1755
4!
12
Exercise 0.18 In a lake there are 12 fishes belonging to different species, but with 50% of
trouts; if we fish 4 fishes at random, calculate the probability that no one is trout.
Solution: The solution ca be obtained by using the hypergeometric distribution with the
following parameters: N = 12, m = 0.5 · 12 = 6, n = 4, k = 0; then the probability that ”no
one is trout” is equal to
P0/4 =
C6,0 · C6,4
6·5·4·3·2
1
6! · 4! · 8!
=
=
=
= 0.030303
C12,4
2! · 4! · 12!
2 · 12 · 11 · 10 · 9
33
Exercise 0.19 In a small natural reserve there are 9 boars: 3 females and 6 males; to reduce
their number is decided a hunt, in which it will be captured 5 without attention to gender;
calculate the probability:
1. All the three females are captured.
2. 2 females are captured.
3. One female is captured.
4. No female is captured.
Solution: The solution ca be obtained by using the hypergeometric distribution with the
following parameters N = 9 - animals; n = 5 - captured animals; m = 3 - present females; k captured females.
Exercise 0.20 In a population we have that the lengths of fishes have: µ = 35cm and σ = 6cm.
Calculate the probability of fishing fishes having length l, without taking into account the used
approximation for the measurements, when:
1. l ≥ 42cm;
2. l ≤ 42cm;
3. l ≤ 23cm;
4. 42cm≤ l ≤ 50cm;
5. 29cm≤ l ≤ 42cm;
6. l = 33cm, taking into account that the measurement is obtained by rounding to the nearest
cm.
Answers
1. The probability of fishing fishes having length l ≥ 42cm is equal to the area under the
standard normal curve, on the right of z = (42 − 35)/6 = 1.17 and so it is 0.121 (12.1%);
2. The probability of fishing fishes having length l ≤ 42cm is equal to the area under the
standard normal curve, on the left of z = 1.17 and so it is 1 − 0.121 = 0.879 (87.9%);
3. The probability of fishing fishes having length l ≤ 23cm is equal to the area under the
standard normal curve, on the left of z = (23 − 35)/6 = −2 and so it is 0.0228 (2.28%);
13
4. The probability of fishing fishes having length 42cm≤ l ≤ 50cm is equal to the area
under the standard normal curve, between z = 1.17 and z = 2.5 so it is 0.121 − 0.062 =
0.1148(11.48%);
5. The probability of fishing fishes having length 29cm≤ l ≤ 42cm is equal to the area
between z = −1 and z = 1.17 so it is 1 − 0.1587 − 0.121 = 0.7203 (72.03%);
6. By measuring the lenght of fishes by rounding to the nearest cm, we register a length
equal to l = 33cm for all the fishes having a length between 32.5cm and 33.5cm. Hence,
the probability of fishing fishes having length 33cm is equal to the area between z = −0.42
and z = −0.25 so it is 0.4013 − 0.3409 = 0.0604%);
Exercise 0.21 In a species of adult rodents, males and females are distinguishable by length:
females: µ = 37.5cm σ = 3.8cm
males:
µ = 34.5cm σ = 3.2cm
Without taking into account the used approximation for the measurements answer to the following questions.
1. Compared to the mean of their sex, are more rare males having length ≥ 40cm or females
having length ≥ 41cm?
2. Consider the group of 5% of the females of greatest length, what is the minimum length
in this group of rodents?
3. Consider the group of 5% of the males of shortest length, what is the maximum length in
this group of rodents?
4. Let l be the minimum length in the group of 30% of females of largest length, how many
males have length greater than l?
5. Let l be the maximum length in the group of 20% of females of smallest length, how many
males have length shorter than l?
Exercise 0.22 Suppose that, from the literature data, it is known that in a lakeside zooplankton
population, individuals of Eudiaptomus vulgaris are 10% of the total individuals. In a random
sample of 120 individuals what is the probability of finding:
a. Exactly 15 individuals of Eudiaptomus;
b. At least 15 individuals of Eudiaptomus;
c. Less than 15 individuals of Eudiaptomus.
Answers First of all we note that n = 120, φ = 0.1, x = 15, so that
µ = nφ = 120 · 0.1 = 12;
σ 2 = nφ(1 − φ) = 120 · 0.1 · 0.9 = 10.8,
σ = 3.29;
We can answer to the above questions by using both bionomial and normal in particular
a. BINOMIAL: the probability of having exactly 15 individuals of Eudiaptomus is
P15 = C120,15 φ15 (1 − φ)120−15 =
14
120!
· 0.115 · (0.9)105 = 0.074
105! · 15!
a. NORMAL: the probability of having exactly 15 individuals of Eudiaptomus is the area
under the probability density function of N (12, 10.8) between x−0.5 = 14.5 and x+0.5 =
15.5 that is equal to the area under the probability density function of Z = N (0, 1) (the
standard normal distribution) between z = 0.76 and z = 1.06 = 0.2236 − 0.1466 = 0.079.
Exercise 0.23 (3 p.) Let X be a discrete random variable that has Poisson distribution with
variance 6, complete the following
the mean E(X) =
P (X ≤ 2) =
P (X > 3) =
Exercise 0.24 (pt. 3) Let X be a random variable that has Student’s t-distribution with 15
degrees of freedom, then
P (X < −1.753) =
P (X < 2.131) =
P (1.753 < X < 2.131) =
Exercise 0.25 (pt. 3) Let X be a random variable that has χ2 -distribution with 12 degrees of
freedom, then
P (X > −1.75) =
P (3.57 < X < 23.34) =
P (X < 5.23) =
P (X <
) = 0.05
Exercise 0.26 Let X be a random variable that has outcomes in the interval [1, e] and proba2
bility density function f (x) = 3 logx x , compute its mean E(X) and its variance Var(X).
15
Exercise 0.27 To determine the average reaction time of drivers in case of danger were made
8 measures (in seconds): 0.84, 0.75, 1.02, 0.99, 1.05, 1.10, 0.68, 0.82. Assuming that the
reaction times are distributed in a normal manner, determine the average reaction time for all
drivers using a confidence level equal to: 0.95 and 0.99, in the following two cases:
a. the variance of the reaction time is equal to 0.025.
b. the variance of the reaction time is not known.
Answers
a. √
We have a population with normal distribution and known variance, in particular σ =
0.025 = 0.158.
– With confidence level equal to 0.95(= 1 − α), α = 0.05 we have zc = 1.96 (from the
tables of the standard normal distribution), moreover x = (0.84+0.75+1.02+0.99+
1.05 + 1.10 + 0.68 + 0.82)/8 = 0.906, and so the confidence interval with confidence
level 0.95 is:
σ
σ
x − zc √ ≤ µ ≤ x + zc √
(1)
n
n
0.158
0.158
0.906 − 1.96 √ ≤ µ ≤ 0.906 + 1.96 √
(2)
8
8
0.796 ≤ µ ≤ 1.015
(3)
– With confidence level equal to: 0.99(= 1 − α), α = 0.01 we have zc = 2.58 (from
the tables of the standard normal distribution), and so the confidence interval with
confidence level 0.99 is:
0.158
0.158
0.906 − 2.58 √ ≤ µ ≤ 0.906 + 2.58 √
8
8
0.762 ≤ µ ≤ 1.05
(4)
(5)
b. We have a population with normal distribution and unknown variance.
– With confidence level equal to 0.95(= 1 − α), α = 0.05 we have tc = 2.365 (from
the tables of the Student’s t-distribution with 7 degrees of freedom), we can also
calculate the sample standard deviation s = 0.144, and so the confidence interval
with confidence level 0.95 is:
s
s
x − tc √ ≤ µ ≤ x + tc √
(6)
n
n
0.144
0.144
0.906 − 2.365 √ ≤ µ ≤ 0.906 + 2.365 √
(7)
8
8
0.785 ≤ µ ≤ 1.026
(8)
– With confidence level equal to: 0.99(= 1−α), α = 0.01 we have tc = 3.499 (from the
tables of Student’s t-distribution with 7 degrees of freedom), and so the confidence
interval with confidence level 0.99 is:
0.144
0.144
0.906 − 3.499 √ ≤ µ ≤ 0.906 + 3.499 √
8
8
0.728 ≤ µ ≤ 1.084
16
(9)
(10)
Exercise 0.28 Consider the following sample where we have the measures (obtained by rounding to the units) of the cholesterol level (mg/dl) of 40 individuals:
150, 152, 175, 148, 161, 155, 144, 158, 178, 197, 203, 147, 145, 165, 184, 115, 165, 169, 160, 150,
185, 195, 205, 217, 177, 184, 153, 138, 162, 192, 125, 114, 174, 182, 148, 194, 218, 175, 180, 200.
Consider the event A that an individual has a cholesterol level above 180 mg/dl. Complete the
following
• (1 p.) the relative frequency (rounded to the cents) of A in the sample above is p =
• (2 p.) the confidence interval of the proportion of A in the population with confidence
level 0.99 is [
,
].
Exercise 0.29 In a clinical experiment two different types of analgesics A and B has been
given to 100 patients. At the end of the treatment 65 patients preferred A, and 35 preferred B.
Question: calculate the confidence interval of the proportion of preferences of analgesic A with
a confidence level equal to 0.95.
Answer: with confidence level equal to: 0.95(= 1 − α), α = 0.05 and from the tables of the
standard normal distribution we have zc = 1.96, moreover p = 65/100 = 0.65, then by using
the second estimate, the confidence interval with confidence level 0.95 is:
√
√
0.65 · 0.35
0.65 · 0.35
0.65 − 1.96
≤ φ ≤ 0.65 + 1.96
(11)
100
100
0.557 ≤ φ ≤ 0.743.
(12)
Complete the exercise by using the first estimate.
Exercise 0.30 We want to estimate the mileage from the tires of two companies; 80 tires
produced from the first company have a mean, of the traveled distance before their deterioration,
equal to 47000km with a standard deviation of 3500 km; 50 tires produced from the second
company have a mean, of the traveled distance before their deterioration, equal to 35000km with
a standard deviation of 2500 km. Question: estimate the confidence interval of the difference
of the means of the traveled distances from the tires of the two companies, with confidence level
0.95.
Answer: we have independent and large samples; with confidence level 0.95(= 1 − α), α =
0.05, from the tables of the standard normal distribution we have zc = 1.96; moreover x1 =
47000, σ1 ≈ s1 = 3500, x2 = 35000 and σ2 ≈ s2 = 2500; then the confidence interval with
confidence level 0.95 is:
√
√
2
2
σ1 σ2
σ12 σ22
(x1 − x2 ) − zc
+
≤ µ1 − µ2 ≤ (x1 − x2 ) + zc
+
n1 n2
n1 n2
√
√
35002 25002
35002 25002
+
≤ µ1 − µ2 ≤ (47000 − 35000) + 1.96
+
(47000 − 35000) − 1.96
80
50
80
50
10966 ≤ µ1 − µ2 ≤ 13034
17
Exercise 0.31 In an experiment for the evaluation of a new treatment N T in relation to an old
treatment OT , patients are divided into two groups. 257 patients were treated with the method
N T but 41 did not have any benefit. 244 patients were treated with the method OT but 64 did
not have any benefit.
Question: estimate the difference of proportions of ineffectiveness of the treatement with a
confidence level equal to 0.99
Answer: we have independent and large samples; the confidence level is 1 − α = 0.99, α =
0.01. The first population is the patients treated with N T and the second population is the
patients treated with OT . The parameter that we want to estimate is φ1 − φ2 , where φ1 is
the probability that the treatment has no effect in the first population (where N T is used) and
φ2 is the probability that the treatment has no effect in the second population (where OT is
used), so that the event A is ”the treatment has no effect”. The estimator is p1 − p2 , where
p1 = 41/257 = 0.1596, p2 = 64/244 = 0.2623. So that the confidence interval is


√
√
p1 − p2 − zc φ1 (1 − φ1 ) + φ2 (1 − φ2 ) , p1 − p2 + zc φ1 (1 − φ1 ) + φ2 (1 − φ2 ) (13)
n1
n2
n1
n2
By using φ1 ≈ p1 , φ2 ≈ p2 , n1 = 257 and n2 = 244 in the above formula, and by using the
table of the standard normal distribution, from which we have zc = 2.575 ≈ 2.58, we have that
the interval is
(−0.1962, −0.0091).
Exercise 0.32 In an experiment for the evaluation of a new treatment N T in relation to an old
treatment OT , the patients are paired according to a specific criterion. 142 couples of patients
were treated in the following way: the first patient was treated with the method N T and the
second patient was treated with the method OT . The following results have been obtained: 3
pairs where both patients did not have any benefit; 17 pairs where the patients treated with N T
did not have any benefit; 25 pairs where the patients treated with OT did not have any benefit;
Question: estimate the difference of proportions of ineffectiveness of the treatement with a
confidence level equal to 0.99
Answer: we have matching and large samples; the confidence level is 1 − α = 0.99, α = 0.01.
The first population is the patients treated with OT and the second population is the patients
treated with N T . The parameter that we want to estimate is δ = φ1 − φ2 , where φ1 is the
probability that the treatment has no effect in the first population (where N T is used) and
φ2 is the probability that the treatment has no effect in the second population (where OT is
used), so that the event A is ”the treatment has no effect”, and event B is ”the treatment has
effect”. The estimator is d, where d = (u − v)/n = (17 − 25)/142 = −0.0563, moreover
σd2 =
u + v − (u − v)2 /n
17 + 25 − (17 − 25)2 /142
=
= 0.0021
n2
1422
and the confidence interval is (d − zc σd , d + zc σd ). By using the table of the standard normal
distribution we have zc = 2.58, moreover n1 = 257 and n2 = 244, so that the interval is
(−0.1745, 0.0619).
Exercise 0.33 The nicotine content of cigarettes of a certain type is normally distributed with
a standard deviation of 4mg. If, in order to minimize the risk of lung cancer, the average
18
nicotine content of the cigarette must not exceed 26mg and in a sample of 10 cigarettes were
obtained the following values of nicotine (in mg):
33 27 20 36 25 24 27 24 34 29
Can we say, with a significance level of 0.05, that consumers of that type of cigarettes are at
minimal risk of lung cancer?
Answer: we have H0 : µ = 26, H1 : µ > 26 (one-tailed test on the right), where µ is the average
nicotine content of cigarettes of the given type. The significance level is α = 0.05, µ0 = 26,
σ = 4, n = 10, x = (33 + 27 + 20 + 36 + 25 + 24 + 27 + 24 + 34 + 29)/10 = 27.9 and
Z=
x − µ0
√σ
n
=
27.9 − 26
√4
10
= 1.502
zc = 1.645 (from the tables of the standard normal distribution), and so z computed above falls
in the acceptance region. The sample is not statistically significant and we accept H0 .
Exercise 0.34 Based on experience from previous years, voting at a written examination, reported by the students of a certain degree are approximately normally distributed with a mean
of 23/30. If a group of 60 students of this year shows a mean of 25/30 with standard deviation
of 4/30, can we accept the hypothesis that these students do not differ from those of previous
years with the significance level of 0.02?
Answer: we have H0 : µ = 23, H1 : µ ̸= 23 (two-tailed test), where µ is the mean of the
students’ votes. The significance level is α = 0.02, µ0 = 23, from the sample data we have:
n = 50, x = 25 and s = 4, so that
t=
x − µ0
√s
n
=
25 − 23
√4
50
= 3.536
has degrees of freedom ν = n − 1 = 49 ≈ 50, so that tc = 2.403, and t falls in the critical region.
The sample is statistically significant and we refuse the null hypotesis H0 .
Exercise 0.35 A newspaper says that only 25% of students in the region read newspapers. A
random sample of 400 students shows that 90 of them read newspapers. Verify the claim of the
newspaper with a significance level equal to 0.05.
Answer: we have H0 : φ = 0.25, H1 : φ ̸= 0.25 (two-tailed test), where φ is the probability
that a student read newspapers. The significance level is α = 0.05, φ0 = 0.25,
I Method (two-tailed test): From the sample data, n = 400, p = 90/400 = 0.225, moreover
with significance level α = 0.05 we have zc = 1.96, so that
p − φ0
Z=√
φ0 (1−φ0 )
n
=
0.225 − 0.25
√
= −1.155
0.25·0.75
400
falls in the acceptance region. The sample is not statistically significant and we accept
the null hypotesis H0 .
19
II Method, by using χ2 -test. From the sample p = 90/400 = 0.225, fo = np = 90, n − fo =
310, moreover fe = nφ0 = 100, n − fe = 300 so that
(fo − fe )2 (n − fo − (n − fe ))2
(90 − 100)2 (310 − 300)2
χ =
+
= 1.333
+
=
fe
n − fe
100
300
2
falls in the acceptance region, in fact ν = 1 and χ2c = 3.84. The sample is not statistically
significant and we accept the null hypotesis H0 .
Exercise 0.36 A sample of 40 capsules of analgesic was manufactured by a machine A, the
mean weight is 330mg, the standard deviation is 7mg; a machine B has produced 50 capsules
with mean weight 320mg and standard deviation 6.5mg. Test the hypothesis that the two machines produce capsules of the same weight with a significance level of α =0.05.
Answer: We have H0 : µ1 = µ2 , H1 : µ1 ̸= µ2 (two-tailed test); the samples are large and
independent.
• I Method by using the standard normal distribution: With significance level α =0.05 we
have zc = 1.96 (from the tables of the standard normal distribution), moreover from the
sample data we have n1 =40, x1 = 330, σ1 ≈ 7, n2 = 50, x2 = 320, σ2 ≈ 6.5, and
x1 − x2
330 − 320
√
=
= 6.95
Z=√ 2
σ22
σ1
72
6.52
+ 50
+ n1
40
n1
falls in the critical region R \ [−1.96, 1.96]. The sample is statistically significant and we
refuse H0 .
• II Method by using the Student’s t-distribution: From the sample data we have n1 =40,
x1 = 330, s1 = 7, n2 = 50, x2 = 320, s2 = 6.5, moreover with significance level α =0.05,
and ν = n1 + n2 − 2 = 88 we have tc = 1.987 (from the tables of the Student’s tdistribution), σ 2 = ((n1 − 1)s21 + (n2 − 1)s22 )/ν = 44.236 and
x1 − x2
330 − 320
=√
= 7, 09
t= √
2
2
σ
σ
44.236
44.236
+
+
n1
n1
40
50
falls in the critical region R \ [−1.987, 1.987]. The sample is statistically significant and
we refuse H0 .
Exercise 0.37 In a clinical trial designed to evaluate the effectiveness of a new tranquilizer in
psychoneurotic patients, a sample of 10 patients were considered and each patient was treated
for a week with a drug and for a week with placebo. At the end of each week of care to every
patient was proposed a questionnaire to determine his level of anxiety, that is measured by a
score from 0 to 30. The differences between the anxiety scores of the two treatments for each
patient, had mean −1.3 and variance 20.68. Can we affirm with a significance level α = 0.05
that the two treatments were equally effective?
Answer: We have H0 : δ = 0, H1 : δ ̸= 0 (two-tailed test); the samples are matching. With
significance level α = 0.05 and ν = n − 1 = 9 degrees of freedom we have tc = 2.262 (from
the tables of the Student’s t-distribution
with 9 degrees of freedom), moreover from the sample
√
data we have d = −1.3 and s = 20.68 = 4.55 so that
t=
d
√s
n
=
−1.3
4.55
√
10
20
= −0.90
falls in the acceptance region [−2.262, 2.262]. The sample is not statistically significant we
accept H0 .
Exercise 0.38 In an experiment for the evaluation of a new treatment N T in relation to an
old treatment OT , the patients have been divided into two groups, one group has been treated
with N T and the other one with OT . Of 257 patients treated with the method N T 41 did not
have any benefit; of 244 patients treated with the method OT 64 did not have any benefit. Test
the hypothesis that the two treatments were equally effective with a significance level of 0.05.
Answer: We have H0 : φ1 = φ2 , H1 : φ1 ̸= φ2 (two-tailed test); the samples are large. The first
population is the patients treated with N T and the second population is the patients treated
with OT . The parameter that we want to estimate is φ1 − φ2 , where φ1 is the probability that
the treatment has no effect in the first population (where N T is used) and φ2 is the probability
that the treatment has no effect in the second population (where OT is used), so that the event
A is ”the treatment has no effect”. The estimator is p1 − p2 , where p1 = 41/257 = 0.1596,
p2 = 64/244 = 0.2623. So that
n1 p1 + n2 p2
41 + 64
==
= 0.2096
n1 + n2
257 + 244
(
)
(
)
1
1
1
1
2
σp1 −p2 = p(1 − p)
+
= 0.2096(1 − 0.2096)
+
= 0.001324
n1 n2
257 244
0.1596 − 0.2623
p1 − p2
= √
Z=
= −2.823
σp1 −p2
0.001324
falls in the critical region, in fact with significance level α = 0.05 we have zc = 1.96. The
sample is statistically significant we refuse H0 .
p=
Exercise 0.39 In an experiment for the evaluation of a new treatment N T in relation to an
old treatment OT , the patients were paired according to a specific criterion. 142 couples of
patients were treated in the following way: the first patient was treated with the method N T and
the second patient was treated with the method OT . The following results have been obtained: 3
pairs where both patients did not have any benefit; 17 pairs where the patients treated with N T
did not have any benefit; 25 pairs where the patients treated with OT did not have any benefit.
Can we say that the new treatment is better than the old one with a significance level 0.01?
Answer: We have matching and large samples and H0 : δ = φ1 − φ2 = 0, H1 : δ < 0 (one-tailed
test on the left). The first population is the patients treated with N T and the second population
is the patients treated with OT . The parameter that we want to estimate is δ = φ1 − φ2 , where
φ1 is the probability that the treatment has effect in the first population (where N T is used) and
φ2 is the probability that the treatment has effect in the second population (where OT is used),
so that the event A is ”the treatment has effect”, and event B is ” the treatment has no effect”.
From the data: n = 142, ; the estimator is d, where d = (u − v)/n = (25 − 17)/142 = 0.0563,
moreover
17 + 25 − (25 − 17)2 /142
u + v − (u − v)2 /n
=
= 0.0021
σd2 =
n2
1422
and
d
0.0563
Z=
=√
= −1.228
σd
0.0021
falls in the acceptance region, in fact with significance level α = 0.01, by using the table of the
standard normal distribution we have zc = 2.33, and the acceptance region is (−2.33, +∞).
The sample is not statistically significant we accept H0 .
21
Exercise 0.40 A medical journal says that, in a fixed geographic region, the blood group of
individuals is distributed as follows: 9% AB group, 21% B group, 29% A group, 41% 0 ; On a
sample of 400 individuals extracted from this geographic region, we had the following result: 20
AB group, 120 B group, 110 A group and 150 0 group. Check the affirmation of the medical
journal at a significance level of 0.05
Answer: we have r = 4

φ1 = 0.09



φ2 = 0.21
H0 :
φ3 = 0.29



φ4 = 0.41
(AB)
(B)
(A)
(0)
H1 :
φ1 ̸= 0.09 (AB) or φ2 ̸= 0.21 (B)
φ3 ̸= 0.29 (A) or φ4 =
̸ 0.41 (0)
(14)
From the significance level (α = 0.05) we have has χ2c = 7.81 (from the tables for the chi-squared
distribution with 3 degrees of freedom). From the sample we have: n = 400,
fo,1 = 20,
fo,2 = 120,
fo,3 = 110,
21
= 84,
100
fe,3 = 400 ·
fo,4 = 150.
From the jornal we have:
fe,1 = 400 ·
χ2 =
9
= 36,
100
fe,2 = 400 ·
r
∑
(fo,i − fe,i )2
i=1
fe,i
=
29
= 116,
100
fe,4 = 400 ·
41
= 164.
100
(20 − 36)2 (120 − 84)2 (110 − 116)2 (150 − 164)2
+
+
+
=
36
84
116
164
falls in the acceptance region, the sample is not statistically significant we accept H0 .
22