* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Exercises of Statistics
Survey
Document related concepts
Transcript
Exercises of Statistics Exercise 0.1 For each of the following two sets of data: A: 57243 B : 15 11 9 8 10 12 calculate the measures of central tendency, variability and shape. Exercise 0.2 Let X be the discrete variable that consider the number of children per family in the entire population of a certain geographical area, and suppose that in a given sample it is obtained the following data: 1,0,2,2,4,1,1,3. Compute (by rounding to cents the results) the absolute frequency distribution of the data, the measures of central tendency, the measures of variation and the measures of shape. Draw the bar chart and the pie chart of the absolute frequency distribution. Solution: We have: the sample dimension is n = 8 (that represents the number of considered families), the distinct observations are: x1 = 0, x2 = 1, x3 = 2, x4 = 3, x5 = 4. class a. freq. c. a. freq. r. freq. c. r. freq. xi fi fiC pi pC xi fi x2i fi |xi − x| |xi − x|fi |xi − m|fi i 0 1 1 0.125 0.125 0 0 1.75 1.75 1.5 1 3 4 0.375 0.500 3 3 0.75 2.25 1.5 2 2 6 0.250 0.750 4 8 0.25 0.50 1.0 3 1 7 0.125 0.875 3 9 1.25 1.25 1.5 4 1 8 0.125 1.000 4 16 2.25 2.25 2.5 8 1.000 14 36 8.00 8.0 Measures of central tendency: 0·1+1·3+2·2+3·1+4·1 14 = = 1.75; 8 8 √ √ 0 · 1 + 1 · 3 + 4 · 2 + 9 · 1 + 16 · 1 36 EQ (X) = = = 2.12; 8 8 the median is 1.50 in fact by ordering the data we have 0,1,1,1,2,2,3,4; the mode is 1; the mid-range is (0 + 4)/2 = 2. Measures of variability: The range is 4 − 0 = 4; the mean absolute deviation is E(X) = x = 8 |0 − 1.75| · 1 + |1 − 1.75| · 3 + |2 − 1.75| · 2 + |3 − 1.75| · 1 + |4 − 1.75| · 1 = = 1; 8 8 The median absolute deviation is 8 |0 − 1.5| · 1 + |1 − 1.5| · 3 + |2 − 1.5| · 2 + |3 − 1.5| · 1 + |4 − 1.5| · 1 = = 1; 8 8 the variance is σ2 = 1.752 · 1 + 0.752 · 3 + 0.252 · 2 + 1.252 · 1 + 2.252 · 1 11.5 = = 1.44; 8 8 36 1∑ 2 − 1.752 = 1.44; σ = xi fi − x2 = EQ (X)2 − x2 = n i=1 8 n 2 1 √ 11.5 8 = 1.20; the sample variance is s2 = 11.5/7 = 1.28; √ the sample standard deviation is s = 11.5/7 =; Measures of shape: for the symmetry we have the standard deviation is σ = γ1 = m3 1 −1.753 · 1 − 0.753 · 3 + 0.253 · 2 + 1.253 · 1 + 2.253 · 1 = 3.92 = σ3 1.203 8 and so the distribution has positive asymmetry. fi 3 2.5 2 1.5 1 0.5 0 1 2 Number of children 4 3 Number of children 1 0 4 2 3 2 Exercise 0.3 Let X be the random variable: height (cm) of a given species of plants. Suppose that in a sample of 40 plants we have the following values of X, obtained by rounding to units: 98 111 119 130 170 143 156 126 113 127 107 83 100 128 143 127 117 125 64 119 130 120 108 95 192 124 129 143 198 131 163 152 104 119 161 178 135 146 158 176 Compute the frequency distribution of the data, draw the corresponding histogram and frequency polygon. Moreover compute the measures of central tendency, the measures of variation and the measures of shape, rounded to tenths. Solution: class 60-79 80-99 100-119 120-139 140-159 160-179 180-199 actual endpoints 59.5-79.5 79.5-99.5 99.5-119.5 119.5-139.5 139.5-159.5 159.5-179.5 179.5-199.5 cen. val. abs. freq. xi fi 69.5 1 89.5 3 109.5 10 129.5 12 149.5 7 169.5 5 189.5 2 40 rel. freq. pi 0.025 0.075 0.250 0.300 0.175 0.125 0.050 1.000 cum. freq. pC i 0.025 0.100 0.350 0.650 0.825 0.950 1.000 a. f. den. f /n Fi xi fi xi i i 0.050 69.5 1.112 0.150 268.5 1.401 0.500 1095 3.235 0.600 1554 4.302 0.350 1046.5 2.402 0.250 847.5 1.900 0.100 379 1.300 5260 128.5 Fi 0.6 0.5 0.4 0.3 0.2 0.1 69.5 89.5 109.5 129.5 149.5 169.5 189.5 Height of plants Figure 1: Hystogram of the height of 40 plants. Measures of central tendency: E(X) = x = actual cen. val. abs. freq. endpoints xi fi 59.5-79.5 69.5 1 79.5-99.5 89.5 3 99.5-119.5 109.5 10 119.5-139.5 129.5 12 139.5-159.5 149.5 7 159.5-179.5 169.5 5 179.5-199.5 189.5 2 40 5260 = 131.5; 40 rel. freq. pi 0.025 0.075 0.250 0.300 0.175 0.125 0.050 1.000 3 EG (X) = 128.5; cum. freq. 1 pC x2i fi f |xi − x|fi i xi i 0.025 4830.250 0.0144 62 0.100 24030.75 0.0335 126 0.350 119902.5 0.0913 220 0.650 201243 0.0927 24 0.825 156451.75 0.0468 126 0.950 143651.25 0.0295 190 1.000 71820.5 0.0106 116 864 0.3188 128.5 Fi 0.6 0.5 0.4 0.3 0.2 0.1 Height of plants 69.5 89.5 109.5 129.5 149.5 169.5 189.5 Figure 2: Frequency polygon of the height of 40 plants. √ √ 721930 40 EQ (X) = = 134.3; EH (X) = = 125.5; 40 0.3188 the median is the 0.5-quantile, we have q = 0.5, α = 0.35, β = 0.65, a = 119, b = 139 and so 0.5 − 0.35 x − 119.5 0.15 = ⇒ x = 119.5 + 20 = 129.5; 0.65 − 0.35 139.5 − 119.5 0.3 the mode is 129.5; the mid-range is (64 + 198)/2 = 131. Measures of variability: The range is 198 − 64 = 134; the mean absolute deviation is 864/40 = 21.6; the median absolute deviation is 864/40 = 21, 6; the variance is σ 2 = EQ (X)2 − x2 = 721930 − 131.52 = 756; 40 √ the standard deviation is σ√= 756 = 27.5; the sample variance is s2 = 775.4; the sample standard deviation is s = 775.4 = 27.8; Measures of shape: for the symmetry we have γ1 = m3 3456 = = 0.2 σ3 27.53 and so the distribution has positive asymmetry. We note that if the measures are obtained by truncation (and not by rounding as specified in the text) to unit, then in the frequency distribution the actual endpoints and so the central values cange respect to the above distribution, in particular we have: class 60-79 80-99 100-119 120-139 140-159 160-179 180-199 actual cen. val. abs. freq. rel. freq. endpoints xi fi pi 60-80 70 1 0,025 80-100 90 3 0,075 100-120 110 10 0,25 120-140 130 12 0,3 140-160 150 7 0,175 160-180 170 5 0,125 180-200 190 2 0,05 40 1 4 cum. freq. pC i 0,025 0,1 0,35 0,65 0,825 0,95 1 Exercise 0.4 (5 p.) Determine the frequency distribution of the following measurements (obtained by truncation to tenths) of the level of serum uric acid (in mg per 100 ml) of 50 adult males: 5.1 6.1 3.9 4.1 5.9 4.8 5.2 5.9 4.6 5.5 6.1 6.3 4.9 5.5 4.36.2 5.5 5.7 4.8 4.5 5.6 4.7 5.5 5.4 4.4 5.5 4.3 5.3 6.2 5.1 5.8 5.7 5.9 5.6 5.7 5.3 5.6 5.5 5.6 5.4 4.9 6.0 4.1 4.9 5.1 4.7 6.3 5.0 4.9 5.9 moreover determine the actual endpoints and the central value of each class, and draw the histogram. Calculate the mean, the variance and the 0.4 − quantile, by rounding the corresponding value to cents. Solution: Minumum value=3.9 Maximum value=6.3 Amplitude= 6.3−3.9 = 2.4/5 = 0.48 hence we take the amplitude of the class equal to A = 0.5, 5 class 3.5-3.9 4.0-4.4 4.5-4.9 5.0-5.4 5.5-5.9 6.0-6.4 actual cen. val. abs. freq. a. f. den. endpoints xi fi Fi 3.5-4.0 3.75 1 2 4.0-4.5 4.25 5 10 4.5-5.0 4.75 10 20 5.0-5.5 5.25 9 18 5.5-6.0 5.75 18 36 6.0-6.5 6.25 7 14 50 xi fi x2i fi 3.75 14.0625 21.25 90.3125 47.50 225.625 47.50 248.0625 103.50 595.125 43.75 273.4375 267.0 1446.625 rel. freq. pi 0.02 0.10 0.20 0.18 0.36 0.14 1.00 cum. freq. pC i 0.02 0.12 0.32 0.50 0.86 1.00 267 1446.625 = 5.34; The variance is σ 2 = − 5.342 = 0.42; 50 50 The 0.4-quantile is 5.22, in fact we have q = 0.4, α = 0.32, β = 0.50 (note that α < q ≤ β), a = 5.0, b = 5.5 and so The mean is E(X) = x = 0.4 − 0.32 x − 5.0 0.08 = ⇒ x = 5.0 + 0.5 = 5.22, 0.50 − 0.32 5.5 − 5.0 0.18 note that a < 0.4−quantile≤ b. 5 Exercise 0.5 Concentrations (mg/l obtained by rounding) of sodium and chloride in 36 Apennine lakes: • give the graphical representation of the the data and their frequency distribution. • calculate the measures of central tendency, variability and shape. Lake Sodium (mg/l) 1 1.78 2 1.63 3 1.85 4 2.10 5 1.35 6 1.40 7 1.82 8 1.35 9 2.06 10 1.85 11 1.51 12 2.00 13 2.02 14 1.90 15 1.60 16 2.18 17 1.82 18 1.90 19 1.75 20 2.11 21 2.30 22 1.95 23 2.60 24 2.44 25 2.18 26 2.51 27 2.37 28 2.54 29 2.06 30 2.77 31 2.31 32 2.81 33 2.33 34 1.45 35 1.78 36 2.09 Chloride (mg/l) 1.60 1.80 2.90 2.90 2.90 2.90 2.00 2.00 2.00 2.20 2.30 2.30 2.80 2.80 2.80 2.50 2.50 2.50 2.60 2.60 2.60 2.70 2.90 2.90 3.00 3.10 3.10 3.30 3.30 3.40 3.40 3.60 3.70 3.80 3.80 3.90 6 Exercise 0.6 (5 p.) Give the frequency distribution of the following measures (obtained by rounding to the units) of the cholesterol level (mg/dl) of 40 individuals: 150, 152, 175, 148, 161, 155, 144, 158, 178, 197, 203, 147, 145, 165, 184, 115, 165, 169, 160, 150, 185, 195, 205, 217, 177, 184, 153, 138, 162, 192, 125, 114, 174, 182, 148, 194, 218, 175, 180, 200 compute the actual endpoints and the central value of each class of the distribution and draw the corresponding histogram. Compute the mean, the variance and the 0.8-quantile of such frequency distribution. Exercise 0.7 (5 pt) In a lake there was a die-off of 50% of the fishes of a certain species. Based on the knowledge of the industries present nearby the lake, it is supposed that the causes of this die-off can be mainly attributed to pollution from a substance S1 or from a substance S2 . The probability of having pollution from S1 is equal to 0.05, the ones of having pollution from S2 is equal to 0.1. The probability of observing a mortality of 50% of the fishes, of the above species, supposing of having a pollution from S1 is equal to 0.9, the one supposing of having a pollution from S2 is equal to 0.15, the one supposing of having a pollution from other causes is equal to 0.005. Calculate the probability of having pollution due to substance S1 , assuming you made the above observation. Solution: We use the Bayes’ formula, in fact we want to determine the causes of a given observation: B =observation=In a lake there was a die-off of 50% of the fishes of a certain species. Possible causes: A1 = pollution from a substance S1 A2 = pollution from a substance S2 A3 = pollution from other substances S3 P (A1 ) = 0.05 probability of pollution from S1 P (A2 ) = 0.1 probability of pollution from S2 P (A3 ) = 1 − 0.05 − 0.1 = 0.85 probability of pollution from other substances P (B|A1 ) = 0.9 is the probability of observing a mortality of 50% of the fishes, of the above species, supposing of having a pollution from S1 . P (B|A2 ) = 0.15 is the probability of observing a mortality of 50% of the fishes, of the above species, supposing of having a pollution from S2 . P (B|A3 ) = 0.005 is the probability of observing a mortality of 50% of the fishes, of the above species, supposing of having a pollution from S1 . The probability of having pollution due to substance S1 , assuming you made the above observation is P (A1 ) P (B|A1 ) . P (A1 |B) = P (A1 ) P (B|A1 ) + P (A2 ) P (B|A2 ) + P (A3 ) P (B|A3 ) P (A1 |B) = 0.05 · 0.9 = 0.07 0.05 · 0.9 + 0.1 · 0.15 + 0.85 · 0.005 Exercise 0.8 (5 pt) A patient has a body temperature greater than 39◦ C. In the population were recorded the following data: an individual has a probability 0.1 of having an influenza virus, he has a probability 0.005 of suffering from meningitis; the probability of observing a body temperature greater than 39◦ C at an individual suffering from influenza virus is equal to 0.05, the one at an individual suffering from meningitis is equal to 0.7, the one at an individual 7 suffering from other reasons is equal to 0.001. Calculate the probability that the patient has meningitis assuming you made the above observation, that is, he has a body temperature greater than 39◦ C. Exercise 0.9 (2 pt) In an experiment on the soil fertility, you want to evaluate all the pairs between: Ca, Mg, Na, N, P, K. a. How many pairs of elements have to be taken into consideration? b. To evaluate all the terns, how many different groups will have to be formed? Answers: a. C6,2 = 6! = 15. (6 − 2)!2! C6,3 = 6! = 20. (6 − 3)!3! b. Exercise 0.10 In a race among 10 competitors: 1. (1 p.) How many different orders of arrival are possible? 2. What is the probability of guessing the first three • (1 p.) by establishing their order? • (1 p.) without establishing their order? 3. Is it convenient to bet 10 euro to earn 500 euro if you will guess the first two • (2 p.) by establishing their order? • (2 p.) without establishing their order? Answer: 1. (1 p.) In a race with 10 competitors, the possible arrival orders are permutations of 10 elements: P10 = 10! = 1 · 2 · · · · · 10 = 3· 628· 800 2. The possible groups of the first three from 10 competitors • taking into account the order of arrival, is the number of 3-permutations of 10 that is D10,3 = 10 · 9 · 8 = 720; so the probability of guessing the first three by giving also the order of arrival is 1/720 = 0.001389; • regardless of the order of arrival, is the number of 3-combinations of 10 that is C10,3 = 720 D10,3 = = 120; 3! 6 so the probability of guessing the first three without establish the order is 1/120 = 0.00833. 8 3. The probability of guessing the first 2 of 10, including who will be first and who second, is given by the 2-permutations of 10 in particular, and it is 1/D10,2 = 1/(10 · 9) = 1/90. So the probability of guessing is 1/90 that is less favorable of the ratio 10/500 = 1/50 set in the bet (so it is not convenient to bet). The probability of guessing the first 2 of 10, without establishing the order is given by the 2-combinations of 10 in particular it is equal to 2! 1/C10,2 = = 2/90 = 1/45. D10,2 The probability of guessing is 1/45, more favorable of the ratio 1/50 set in the bet (the bet is convenient). Exercise 0.11 (pt. 3) All the possible anagrams of the word WORLD (independently from their meaning) are 720 120 24 9 6 Exercise 0.12 (5 p.) In humans, more males than females are born, with a sex ratio 1.07 ( that means 107 males for 100 females). Compute the probability distribution of the number of male children into families with 4 children. Draw its bar chart and compute its mean and its variace. Answer: A posteriori, on the basis of the data collected, we can say that the frequentist probability of a male born (result A) is φ = 107/(107 + 100) = 0.52 and of a female born (result B) is 1 − φ = 0.48. By using the binomial distribution we can calculate the probabilities of having 0, 1, 2, 3, 4 (= k) births of male children into families with 4 children (n = 4), and we have • 0 sons: P0 = C4,0 φ0 (1 − φ)4 = 1 · 1 · 0.484 = 0.05; • 1 son: P1 = C4,1 φ1 (1 − φ)3 = 4 · 0.52 · 0.483 = 0.23; • 2 sons: P2 = C4,2 φ2 (1 − φ)2 = 6 · 0.522 · 0.482 = 0.37; • 3 sons: P3 = C4,3 φ3 (1 − φ)1 = 4 · 0.523 · 0.48 = 0.27; • 4 sons: P4 = C4,4 φ4 (1 − φ)0 = 1 · 0.524 · 1 = 0.07; and so the probability distribution is: xi 0 1 2 3 4 pi 0.05 0.23 0.37 0.27 0.07 0.99 pC i 0.05 0.28 0.65 0.92 0.99 xi pi 0 0.23 0.74 0.71 0.28 1.96 x2i pi 0 0.23 1.48 2.13 1.12 4.96 the mean is µ = nφ = 4 · 0.52 = 2.02 ≈ 1.96; the variance is σ 2 = nφ(1 − φ) = 4 · 0.52 · 0.48 = 0.9984 ≈ 4.96 − 1.962 = 1.1184 Exercise 0.13 • Which is the probability of getting three times the number 1 by throwing a dice five times? • Which is the probability that 9 laboratory experiments are positive and one negative, if usually the experiments are positive in 20% of cases? Solution: • Experiment: ”throw a dice”; Success: A =”get the number 1”; Probability of success= φ = P (A) = 61 ; Probability of unsuccess 1 − φ = 56 ; Numeber of independent experiments= n = 5; Number of successes= k = 3 The probability of getting three times the number 1 by throwing a dice five times is ( )3 ( 5 )5−3 P3 = C5,3 16 . 6 10 • Experiment: ”laboratory experiment”; Success: A =”the result of the laboratory experiment is positive”; Probability of success= φ = P (A) = 20/100 = 0.2; Probability of unsuccess 1 − φ = 0.8; Numeber of independent experiments= n = 10; Number of successes= k = 9 The probability that 9 laboratory experiments are positive and one negative P9 = C10,9 (0.2)9 (0.8)10−9 . Exercise 0.14 In the population of a certain geographic region you have the following breakdown for blood group: 10% group AB, 20% group B, 30% group A, 40% group 0. 1. By extracting 10 individuals, what is the probability that 2 have the group AB, 3 the group B, 2 the group A and 3 the group 0? 2. By extracting 8 individuals, what is the probability that 4 have the group 0 and 4 the group A? Solution: A1 = ”individual has group AB”, φ1 = 10/100 = 0.10 A2 = ”individual has group B”, φ2 = 20/100 = 0.20 A3 = ”individual has group A”, φ3 = 30/100 = 0.30 A4 = ”individual has group 0”, φ4 = 40/100 = 0.40 the events are pairwise mutually exclusive and φ1 + φ2 + φ3 + φ4 = 1, so we can use the multinomial distribution and 1. the probability that by extracting 10(= n) individuals, we have: 2 = k1 with group AB, 3 = k2 with the group B, 2 = k3 with group A and 3 = k4 with group 0 is P2,3,2,3 = 10! 0.102 2!·3!·2!·3! · 0.203 · 0.302 · 0.403 = 0.012 2. the probability that by extracting 8 = n individuals, what is the probability that 4 = k4 have the group 0 and 4 = k3 the group A is P0,0,4,4 = 8! 0.100 0!·0!·4!·4! · 0.200 · 0.304 · 0.404 = 0.046 Exercise 0.15 (pt. 4) If we have a dice with three red faces, two green faces and one blu face. Compute the probability of having 1 time the red face, 3 times the green face and 4 times the blu face by throwing 8 times the dice. Exercise 0.16 Let us consider a deck of 52 cards with four suits: hearts, diamonds, clubs, spades (1, 2, . . . , 10, J, Q, K). By extracting without reintroduction 5 cards from the deck, what is the probability of 1. obtaining 4 aces; 2. obtaining 4 aces or 5 cards of hearts; 3. obtaining 5 cards of the same suit; 4. not obtaining 5 cards of the same suit. 11 Solution: 1. P (obtaining 4 aces) = 52−4 C52,5 = 48·(52−5)!·5! 52! = 48!·5·4·3·2 52·51·50·49·48!! = 0.18 · 10−4 . 2. As the events: A=” obtaining 4 aces”, B=” obtaining 5 cards of hearts” are mutually exclusive then P ( obtaining 4 aces or 5 cards of hearts) = P (A ∪ B) = P (A) + P (B) = = 0.18 · 10−4 + C13,5 C52,5 =; 3. As the events: AH =” obtaining 5 cards of hearts”, AD =” obtaining 5 cards of diamonds”, AC =” obtaining 5 cards of clubs”, AS =” obtaining 5 cards of spades” are pairwise mutually exclusive then P (AH ) = C13,5 C52,5 = P (AD ) = P (AC ) = P (AS ) P (obtaining 5 cards of the same suit) = P (AH ∪ AD ∪ AC ∪ AS ) = P (AH ) + P (AD ) + 13,5 P (AC ) + P (AS ) = 4 C =. . . C52,5 4. P ( not obtaining 5 cards of the same suit) = 1 − P (obtaining 5 cards of the same suit) = . . . . Exercise 0.17 In a planktonic comunity, the population of Eudiaptomus vulgaris is present with the 2% of individuals: • By sampling 200 individuals what is the probability of not finding Eudiaptomus? • By sampling 100 individuals what is the probability of finding 4 exemplars of Eudiaptomus? • With a presence of 5% of Eudiaptomus vulgaris, as would change the previous probabilities? Answers By sampling 200 individuals: the mean with 2% of presences is µ = 200 · 0.02 = 4; so the probability of not finding individuals (k = 0) is P0 = 40 −4 e = 0.0183 0! By sampling 200 individuals: the mean with 5% of presences is µ = 200 · 0.05 = 10; so the probability of not finding individuals (k = 0) is P0 = 100 −10 e = 0.0000454 0! By sampling 100 individuals: the mean with 2% of presences is µ = 100 · 0.02 = 2; so the probability of finding 4 individuals (k = 4) is P4 = 24 −2 e = 0.0902 4! By sampling 100 individuals: the mean with 5% of presences is µ = 100 · 0.05 = 5; so the probability of finding 4 individuals (k = 4) is P4 = 54 −5 e = 0.1755 4! 12 Exercise 0.18 In a lake there are 12 fishes belonging to different species, but with 50% of trouts; if we fish 4 fishes at random, calculate the probability that no one is trout. Solution: The solution ca be obtained by using the hypergeometric distribution with the following parameters: N = 12, m = 0.5 · 12 = 6, n = 4, k = 0; then the probability that ”no one is trout” is equal to P0/4 = C6,0 · C6,4 6·5·4·3·2 1 6! · 4! · 8! = = = = 0.030303 C12,4 2! · 4! · 12! 2 · 12 · 11 · 10 · 9 33 Exercise 0.19 In a small natural reserve there are 9 boars: 3 females and 6 males; to reduce their number is decided a hunt, in which it will be captured 5 without attention to gender; calculate the probability: 1. All the three females are captured. 2. 2 females are captured. 3. One female is captured. 4. No female is captured. Solution: The solution ca be obtained by using the hypergeometric distribution with the following parameters N = 9 - animals; n = 5 - captured animals; m = 3 - present females; k captured females. Exercise 0.20 In a population we have that the lengths of fishes have: µ = 35cm and σ = 6cm. Calculate the probability of fishing fishes having length l, without taking into account the used approximation for the measurements, when: 1. l ≥ 42cm; 2. l ≤ 42cm; 3. l ≤ 23cm; 4. 42cm≤ l ≤ 50cm; 5. 29cm≤ l ≤ 42cm; 6. l = 33cm, taking into account that the measurement is obtained by rounding to the nearest cm. Answers 1. The probability of fishing fishes having length l ≥ 42cm is equal to the area under the standard normal curve, on the right of z = (42 − 35)/6 = 1.17 and so it is 0.121 (12.1%); 2. The probability of fishing fishes having length l ≤ 42cm is equal to the area under the standard normal curve, on the left of z = 1.17 and so it is 1 − 0.121 = 0.879 (87.9%); 3. The probability of fishing fishes having length l ≤ 23cm is equal to the area under the standard normal curve, on the left of z = (23 − 35)/6 = −2 and so it is 0.0228 (2.28%); 13 4. The probability of fishing fishes having length 42cm≤ l ≤ 50cm is equal to the area under the standard normal curve, between z = 1.17 and z = 2.5 so it is 0.121 − 0.062 = 0.1148(11.48%); 5. The probability of fishing fishes having length 29cm≤ l ≤ 42cm is equal to the area between z = −1 and z = 1.17 so it is 1 − 0.1587 − 0.121 = 0.7203 (72.03%); 6. By measuring the lenght of fishes by rounding to the nearest cm, we register a length equal to l = 33cm for all the fishes having a length between 32.5cm and 33.5cm. Hence, the probability of fishing fishes having length 33cm is equal to the area between z = −0.42 and z = −0.25 so it is 0.4013 − 0.3409 = 0.0604%); Exercise 0.21 In a species of adult rodents, males and females are distinguishable by length: females: µ = 37.5cm σ = 3.8cm males: µ = 34.5cm σ = 3.2cm Without taking into account the used approximation for the measurements answer to the following questions. 1. Compared to the mean of their sex, are more rare males having length ≥ 40cm or females having length ≥ 41cm? 2. Consider the group of 5% of the females of greatest length, what is the minimum length in this group of rodents? 3. Consider the group of 5% of the males of shortest length, what is the maximum length in this group of rodents? 4. Let l be the minimum length in the group of 30% of females of largest length, how many males have length greater than l? 5. Let l be the maximum length in the group of 20% of females of smallest length, how many males have length shorter than l? Exercise 0.22 Suppose that, from the literature data, it is known that in a lakeside zooplankton population, individuals of Eudiaptomus vulgaris are 10% of the total individuals. In a random sample of 120 individuals what is the probability of finding: a. Exactly 15 individuals of Eudiaptomus; b. At least 15 individuals of Eudiaptomus; c. Less than 15 individuals of Eudiaptomus. Answers First of all we note that n = 120, φ = 0.1, x = 15, so that µ = nφ = 120 · 0.1 = 12; σ 2 = nφ(1 − φ) = 120 · 0.1 · 0.9 = 10.8, σ = 3.29; We can answer to the above questions by using both bionomial and normal in particular a. BINOMIAL: the probability of having exactly 15 individuals of Eudiaptomus is P15 = C120,15 φ15 (1 − φ)120−15 = 14 120! · 0.115 · (0.9)105 = 0.074 105! · 15! a. NORMAL: the probability of having exactly 15 individuals of Eudiaptomus is the area under the probability density function of N (12, 10.8) between x−0.5 = 14.5 and x+0.5 = 15.5 that is equal to the area under the probability density function of Z = N (0, 1) (the standard normal distribution) between z = 0.76 and z = 1.06 = 0.2236 − 0.1466 = 0.079. Exercise 0.23 (3 p.) Let X be a discrete random variable that has Poisson distribution with variance 6, complete the following the mean E(X) = P (X ≤ 2) = P (X > 3) = Exercise 0.24 (pt. 3) Let X be a random variable that has Student’s t-distribution with 15 degrees of freedom, then P (X < −1.753) = P (X < 2.131) = P (1.753 < X < 2.131) = Exercise 0.25 (pt. 3) Let X be a random variable that has χ2 -distribution with 12 degrees of freedom, then P (X > −1.75) = P (3.57 < X < 23.34) = P (X < 5.23) = P (X < ) = 0.05 Exercise 0.26 Let X be a random variable that has outcomes in the interval [1, e] and proba2 bility density function f (x) = 3 logx x , compute its mean E(X) and its variance Var(X). 15 Exercise 0.27 To determine the average reaction time of drivers in case of danger were made 8 measures (in seconds): 0.84, 0.75, 1.02, 0.99, 1.05, 1.10, 0.68, 0.82. Assuming that the reaction times are distributed in a normal manner, determine the average reaction time for all drivers using a confidence level equal to: 0.95 and 0.99, in the following two cases: a. the variance of the reaction time is equal to 0.025. b. the variance of the reaction time is not known. Answers a. √ We have a population with normal distribution and known variance, in particular σ = 0.025 = 0.158. – With confidence level equal to 0.95(= 1 − α), α = 0.05 we have zc = 1.96 (from the tables of the standard normal distribution), moreover x = (0.84+0.75+1.02+0.99+ 1.05 + 1.10 + 0.68 + 0.82)/8 = 0.906, and so the confidence interval with confidence level 0.95 is: σ σ x − zc √ ≤ µ ≤ x + zc √ (1) n n 0.158 0.158 0.906 − 1.96 √ ≤ µ ≤ 0.906 + 1.96 √ (2) 8 8 0.796 ≤ µ ≤ 1.015 (3) – With confidence level equal to: 0.99(= 1 − α), α = 0.01 we have zc = 2.58 (from the tables of the standard normal distribution), and so the confidence interval with confidence level 0.99 is: 0.158 0.158 0.906 − 2.58 √ ≤ µ ≤ 0.906 + 2.58 √ 8 8 0.762 ≤ µ ≤ 1.05 (4) (5) b. We have a population with normal distribution and unknown variance. – With confidence level equal to 0.95(= 1 − α), α = 0.05 we have tc = 2.365 (from the tables of the Student’s t-distribution with 7 degrees of freedom), we can also calculate the sample standard deviation s = 0.144, and so the confidence interval with confidence level 0.95 is: s s x − tc √ ≤ µ ≤ x + tc √ (6) n n 0.144 0.144 0.906 − 2.365 √ ≤ µ ≤ 0.906 + 2.365 √ (7) 8 8 0.785 ≤ µ ≤ 1.026 (8) – With confidence level equal to: 0.99(= 1−α), α = 0.01 we have tc = 3.499 (from the tables of Student’s t-distribution with 7 degrees of freedom), and so the confidence interval with confidence level 0.99 is: 0.144 0.144 0.906 − 3.499 √ ≤ µ ≤ 0.906 + 3.499 √ 8 8 0.728 ≤ µ ≤ 1.084 16 (9) (10) Exercise 0.28 Consider the following sample where we have the measures (obtained by rounding to the units) of the cholesterol level (mg/dl) of 40 individuals: 150, 152, 175, 148, 161, 155, 144, 158, 178, 197, 203, 147, 145, 165, 184, 115, 165, 169, 160, 150, 185, 195, 205, 217, 177, 184, 153, 138, 162, 192, 125, 114, 174, 182, 148, 194, 218, 175, 180, 200. Consider the event A that an individual has a cholesterol level above 180 mg/dl. Complete the following • (1 p.) the relative frequency (rounded to the cents) of A in the sample above is p = • (2 p.) the confidence interval of the proportion of A in the population with confidence level 0.99 is [ , ]. Exercise 0.29 In a clinical experiment two different types of analgesics A and B has been given to 100 patients. At the end of the treatment 65 patients preferred A, and 35 preferred B. Question: calculate the confidence interval of the proportion of preferences of analgesic A with a confidence level equal to 0.95. Answer: with confidence level equal to: 0.95(= 1 − α), α = 0.05 and from the tables of the standard normal distribution we have zc = 1.96, moreover p = 65/100 = 0.65, then by using the second estimate, the confidence interval with confidence level 0.95 is: √ √ 0.65 · 0.35 0.65 · 0.35 0.65 − 1.96 ≤ φ ≤ 0.65 + 1.96 (11) 100 100 0.557 ≤ φ ≤ 0.743. (12) Complete the exercise by using the first estimate. Exercise 0.30 We want to estimate the mileage from the tires of two companies; 80 tires produced from the first company have a mean, of the traveled distance before their deterioration, equal to 47000km with a standard deviation of 3500 km; 50 tires produced from the second company have a mean, of the traveled distance before their deterioration, equal to 35000km with a standard deviation of 2500 km. Question: estimate the confidence interval of the difference of the means of the traveled distances from the tires of the two companies, with confidence level 0.95. Answer: we have independent and large samples; with confidence level 0.95(= 1 − α), α = 0.05, from the tables of the standard normal distribution we have zc = 1.96; moreover x1 = 47000, σ1 ≈ s1 = 3500, x2 = 35000 and σ2 ≈ s2 = 2500; then the confidence interval with confidence level 0.95 is: √ √ 2 2 σ1 σ2 σ12 σ22 (x1 − x2 ) − zc + ≤ µ1 − µ2 ≤ (x1 − x2 ) + zc + n1 n2 n1 n2 √ √ 35002 25002 35002 25002 + ≤ µ1 − µ2 ≤ (47000 − 35000) + 1.96 + (47000 − 35000) − 1.96 80 50 80 50 10966 ≤ µ1 − µ2 ≤ 13034 17 Exercise 0.31 In an experiment for the evaluation of a new treatment N T in relation to an old treatment OT , patients are divided into two groups. 257 patients were treated with the method N T but 41 did not have any benefit. 244 patients were treated with the method OT but 64 did not have any benefit. Question: estimate the difference of proportions of ineffectiveness of the treatement with a confidence level equal to 0.99 Answer: we have independent and large samples; the confidence level is 1 − α = 0.99, α = 0.01. The first population is the patients treated with N T and the second population is the patients treated with OT . The parameter that we want to estimate is φ1 − φ2 , where φ1 is the probability that the treatment has no effect in the first population (where N T is used) and φ2 is the probability that the treatment has no effect in the second population (where OT is used), so that the event A is ”the treatment has no effect”. The estimator is p1 − p2 , where p1 = 41/257 = 0.1596, p2 = 64/244 = 0.2623. So that the confidence interval is √ √ p1 − p2 − zc φ1 (1 − φ1 ) + φ2 (1 − φ2 ) , p1 − p2 + zc φ1 (1 − φ1 ) + φ2 (1 − φ2 ) (13) n1 n2 n1 n2 By using φ1 ≈ p1 , φ2 ≈ p2 , n1 = 257 and n2 = 244 in the above formula, and by using the table of the standard normal distribution, from which we have zc = 2.575 ≈ 2.58, we have that the interval is (−0.1962, −0.0091). Exercise 0.32 In an experiment for the evaluation of a new treatment N T in relation to an old treatment OT , the patients are paired according to a specific criterion. 142 couples of patients were treated in the following way: the first patient was treated with the method N T and the second patient was treated with the method OT . The following results have been obtained: 3 pairs where both patients did not have any benefit; 17 pairs where the patients treated with N T did not have any benefit; 25 pairs where the patients treated with OT did not have any benefit; Question: estimate the difference of proportions of ineffectiveness of the treatement with a confidence level equal to 0.99 Answer: we have matching and large samples; the confidence level is 1 − α = 0.99, α = 0.01. The first population is the patients treated with OT and the second population is the patients treated with N T . The parameter that we want to estimate is δ = φ1 − φ2 , where φ1 is the probability that the treatment has no effect in the first population (where N T is used) and φ2 is the probability that the treatment has no effect in the second population (where OT is used), so that the event A is ”the treatment has no effect”, and event B is ”the treatment has effect”. The estimator is d, where d = (u − v)/n = (17 − 25)/142 = −0.0563, moreover σd2 = u + v − (u − v)2 /n 17 + 25 − (17 − 25)2 /142 = = 0.0021 n2 1422 and the confidence interval is (d − zc σd , d + zc σd ). By using the table of the standard normal distribution we have zc = 2.58, moreover n1 = 257 and n2 = 244, so that the interval is (−0.1745, 0.0619). Exercise 0.33 The nicotine content of cigarettes of a certain type is normally distributed with a standard deviation of 4mg. If, in order to minimize the risk of lung cancer, the average 18 nicotine content of the cigarette must not exceed 26mg and in a sample of 10 cigarettes were obtained the following values of nicotine (in mg): 33 27 20 36 25 24 27 24 34 29 Can we say, with a significance level of 0.05, that consumers of that type of cigarettes are at minimal risk of lung cancer? Answer: we have H0 : µ = 26, H1 : µ > 26 (one-tailed test on the right), where µ is the average nicotine content of cigarettes of the given type. The significance level is α = 0.05, µ0 = 26, σ = 4, n = 10, x = (33 + 27 + 20 + 36 + 25 + 24 + 27 + 24 + 34 + 29)/10 = 27.9 and Z= x − µ0 √σ n = 27.9 − 26 √4 10 = 1.502 zc = 1.645 (from the tables of the standard normal distribution), and so z computed above falls in the acceptance region. The sample is not statistically significant and we accept H0 . Exercise 0.34 Based on experience from previous years, voting at a written examination, reported by the students of a certain degree are approximately normally distributed with a mean of 23/30. If a group of 60 students of this year shows a mean of 25/30 with standard deviation of 4/30, can we accept the hypothesis that these students do not differ from those of previous years with the significance level of 0.02? Answer: we have H0 : µ = 23, H1 : µ ̸= 23 (two-tailed test), where µ is the mean of the students’ votes. The significance level is α = 0.02, µ0 = 23, from the sample data we have: n = 50, x = 25 and s = 4, so that t= x − µ0 √s n = 25 − 23 √4 50 = 3.536 has degrees of freedom ν = n − 1 = 49 ≈ 50, so that tc = 2.403, and t falls in the critical region. The sample is statistically significant and we refuse the null hypotesis H0 . Exercise 0.35 A newspaper says that only 25% of students in the region read newspapers. A random sample of 400 students shows that 90 of them read newspapers. Verify the claim of the newspaper with a significance level equal to 0.05. Answer: we have H0 : φ = 0.25, H1 : φ ̸= 0.25 (two-tailed test), where φ is the probability that a student read newspapers. The significance level is α = 0.05, φ0 = 0.25, I Method (two-tailed test): From the sample data, n = 400, p = 90/400 = 0.225, moreover with significance level α = 0.05 we have zc = 1.96, so that p − φ0 Z=√ φ0 (1−φ0 ) n = 0.225 − 0.25 √ = −1.155 0.25·0.75 400 falls in the acceptance region. The sample is not statistically significant and we accept the null hypotesis H0 . 19 II Method, by using χ2 -test. From the sample p = 90/400 = 0.225, fo = np = 90, n − fo = 310, moreover fe = nφ0 = 100, n − fe = 300 so that (fo − fe )2 (n − fo − (n − fe ))2 (90 − 100)2 (310 − 300)2 χ = + = 1.333 + = fe n − fe 100 300 2 falls in the acceptance region, in fact ν = 1 and χ2c = 3.84. The sample is not statistically significant and we accept the null hypotesis H0 . Exercise 0.36 A sample of 40 capsules of analgesic was manufactured by a machine A, the mean weight is 330mg, the standard deviation is 7mg; a machine B has produced 50 capsules with mean weight 320mg and standard deviation 6.5mg. Test the hypothesis that the two machines produce capsules of the same weight with a significance level of α =0.05. Answer: We have H0 : µ1 = µ2 , H1 : µ1 ̸= µ2 (two-tailed test); the samples are large and independent. • I Method by using the standard normal distribution: With significance level α =0.05 we have zc = 1.96 (from the tables of the standard normal distribution), moreover from the sample data we have n1 =40, x1 = 330, σ1 ≈ 7, n2 = 50, x2 = 320, σ2 ≈ 6.5, and x1 − x2 330 − 320 √ = = 6.95 Z=√ 2 σ22 σ1 72 6.52 + 50 + n1 40 n1 falls in the critical region R \ [−1.96, 1.96]. The sample is statistically significant and we refuse H0 . • II Method by using the Student’s t-distribution: From the sample data we have n1 =40, x1 = 330, s1 = 7, n2 = 50, x2 = 320, s2 = 6.5, moreover with significance level α =0.05, and ν = n1 + n2 − 2 = 88 we have tc = 1.987 (from the tables of the Student’s tdistribution), σ 2 = ((n1 − 1)s21 + (n2 − 1)s22 )/ν = 44.236 and x1 − x2 330 − 320 =√ = 7, 09 t= √ 2 2 σ σ 44.236 44.236 + + n1 n1 40 50 falls in the critical region R \ [−1.987, 1.987]. The sample is statistically significant and we refuse H0 . Exercise 0.37 In a clinical trial designed to evaluate the effectiveness of a new tranquilizer in psychoneurotic patients, a sample of 10 patients were considered and each patient was treated for a week with a drug and for a week with placebo. At the end of each week of care to every patient was proposed a questionnaire to determine his level of anxiety, that is measured by a score from 0 to 30. The differences between the anxiety scores of the two treatments for each patient, had mean −1.3 and variance 20.68. Can we affirm with a significance level α = 0.05 that the two treatments were equally effective? Answer: We have H0 : δ = 0, H1 : δ ̸= 0 (two-tailed test); the samples are matching. With significance level α = 0.05 and ν = n − 1 = 9 degrees of freedom we have tc = 2.262 (from the tables of the Student’s t-distribution with 9 degrees of freedom), moreover from the sample √ data we have d = −1.3 and s = 20.68 = 4.55 so that t= d √s n = −1.3 4.55 √ 10 20 = −0.90 falls in the acceptance region [−2.262, 2.262]. The sample is not statistically significant we accept H0 . Exercise 0.38 In an experiment for the evaluation of a new treatment N T in relation to an old treatment OT , the patients have been divided into two groups, one group has been treated with N T and the other one with OT . Of 257 patients treated with the method N T 41 did not have any benefit; of 244 patients treated with the method OT 64 did not have any benefit. Test the hypothesis that the two treatments were equally effective with a significance level of 0.05. Answer: We have H0 : φ1 = φ2 , H1 : φ1 ̸= φ2 (two-tailed test); the samples are large. The first population is the patients treated with N T and the second population is the patients treated with OT . The parameter that we want to estimate is φ1 − φ2 , where φ1 is the probability that the treatment has no effect in the first population (where N T is used) and φ2 is the probability that the treatment has no effect in the second population (where OT is used), so that the event A is ”the treatment has no effect”. The estimator is p1 − p2 , where p1 = 41/257 = 0.1596, p2 = 64/244 = 0.2623. So that n1 p1 + n2 p2 41 + 64 == = 0.2096 n1 + n2 257 + 244 ( ) ( ) 1 1 1 1 2 σp1 −p2 = p(1 − p) + = 0.2096(1 − 0.2096) + = 0.001324 n1 n2 257 244 0.1596 − 0.2623 p1 − p2 = √ Z= = −2.823 σp1 −p2 0.001324 falls in the critical region, in fact with significance level α = 0.05 we have zc = 1.96. The sample is statistically significant we refuse H0 . p= Exercise 0.39 In an experiment for the evaluation of a new treatment N T in relation to an old treatment OT , the patients were paired according to a specific criterion. 142 couples of patients were treated in the following way: the first patient was treated with the method N T and the second patient was treated with the method OT . The following results have been obtained: 3 pairs where both patients did not have any benefit; 17 pairs where the patients treated with N T did not have any benefit; 25 pairs where the patients treated with OT did not have any benefit. Can we say that the new treatment is better than the old one with a significance level 0.01? Answer: We have matching and large samples and H0 : δ = φ1 − φ2 = 0, H1 : δ < 0 (one-tailed test on the left). The first population is the patients treated with N T and the second population is the patients treated with OT . The parameter that we want to estimate is δ = φ1 − φ2 , where φ1 is the probability that the treatment has effect in the first population (where N T is used) and φ2 is the probability that the treatment has effect in the second population (where OT is used), so that the event A is ”the treatment has effect”, and event B is ” the treatment has no effect”. From the data: n = 142, ; the estimator is d, where d = (u − v)/n = (25 − 17)/142 = 0.0563, moreover 17 + 25 − (25 − 17)2 /142 u + v − (u − v)2 /n = = 0.0021 σd2 = n2 1422 and d 0.0563 Z= =√ = −1.228 σd 0.0021 falls in the acceptance region, in fact with significance level α = 0.01, by using the table of the standard normal distribution we have zc = 2.33, and the acceptance region is (−2.33, +∞). The sample is not statistically significant we accept H0 . 21 Exercise 0.40 A medical journal says that, in a fixed geographic region, the blood group of individuals is distributed as follows: 9% AB group, 21% B group, 29% A group, 41% 0 ; On a sample of 400 individuals extracted from this geographic region, we had the following result: 20 AB group, 120 B group, 110 A group and 150 0 group. Check the affirmation of the medical journal at a significance level of 0.05 Answer: we have r = 4 φ1 = 0.09 φ2 = 0.21 H0 : φ3 = 0.29 φ4 = 0.41 (AB) (B) (A) (0) H1 : φ1 ̸= 0.09 (AB) or φ2 ̸= 0.21 (B) φ3 ̸= 0.29 (A) or φ4 = ̸ 0.41 (0) (14) From the significance level (α = 0.05) we have has χ2c = 7.81 (from the tables for the chi-squared distribution with 3 degrees of freedom). From the sample we have: n = 400, fo,1 = 20, fo,2 = 120, fo,3 = 110, 21 = 84, 100 fe,3 = 400 · fo,4 = 150. From the jornal we have: fe,1 = 400 · χ2 = 9 = 36, 100 fe,2 = 400 · r ∑ (fo,i − fe,i )2 i=1 fe,i = 29 = 116, 100 fe,4 = 400 · 41 = 164. 100 (20 − 36)2 (120 − 84)2 (110 − 116)2 (150 − 164)2 + + + = 36 84 116 164 falls in the acceptance region, the sample is not statistically significant we accept H0 . 22