Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2.3 Continuous Random Variables A continuous random variable is one which can take any value in an interval (the values that can be taken by such a variable cannot be listed). Such variables are normally measured according to a scale. Examples of continuous random variables: age, height, weight, time, air pressure. Such variables are normally only measured to a given accuracy (e.g. the age of a person is normally given to the nearest year). 1 / 108 2.3.1 The notion of a density function Suppose X is a continuous random variable. Consider fδ (x) = P(x < X < x + δ) δ This is the probability that X lies in an interval of length δ divided by the length of the interval. i.e. this can be thought of as the average probability density on the interval (x, x + δ). 2 / 108 The notion of a density function Let P(x < X < x + δ) . δ Then fX (x) is the probability density function of the random variable X . fX (x) = limδ→0 If it is clear which variable we are talking about then the subscript may be left out. ”Likely” values of X correspond to areas where the density function is large. ”Unlikely” values of X correspond to areas where the density function is small. 3 / 108 2.3.2 Properties of a density function A density function f (x) of a random variable X satisfies 2 conditions: 1) f (x)≥0, for all x. Z ∞ 2) f (x)dx=1. −∞ The second condition simply states that the total area under the density curve is 1. 4 / 108 The support of a continuous random variable The support, SX , of a continuous random variable X is the set of values for which f (x) > 0. We have Z f (x)dx = 1 SX In general, we only have to integrate over intervals where the density function is positive. 5 / 108 Density curves and probability The probability that X lies between a and b is the area under the density curve between x = a and x = b. 6 / 108 Density curves and probability Hence, Z b P(a < X < b) = f (x)dx. a In particular, Z ∞ 1. P(X > a)= f (x)dx a Z b 2. P(X < b)= f (x)dx. −∞ Note that for any constant a, P(X = a) = 0. 7 / 108 2.3.3 Expected value of a continuous random variable The expected value of a random variable X with density function f (x) is Z E (X ) = µX = xf (x)dx. SX i.e. we integrate over the interval(s) where the density function is positive. The expected value of a function g(X) of a random variable is Z E [g (X )] = g (x)f (x)dx. SX If a distribution is symmetrical about x = x0 , then E (X ) = x0 . 8 / 108 2.3.3 Variance of a continuous random variable The variance of X is given by σX2 =Var (X ) = E [(X − µ)2 ] Z = (x − µ)2 f (x)dx SX =E (X 2 ) − E (X )2 . σX is the standard deviation of the random variable X . Note that these formulas are analogous to the definitions of expected values for discrete random variables. The only change is that the summations become integrals. 9 / 108 2.3.4 The Cumulative Distribution Function and Quartiles of a Distribution The (cumulative) distribution function of a continuous random variable X is denoted FX . By definition, Z t FX (t) = P(X ≤ t) = fX (x)dx, −∞ where fX is the density function. Differentiating this equation we obtain FX0 (x) = fX (x). Suppose SX = [a, b], where a and b are finite. For x ≤ a, FX (x) = 0. Also, for x ≥ b, FX (x) = 1. 10 / 108 The Cumulative Distribution Function It should be noted that some textbooks define the cumulative distribution function as FX (x) = P(X < x). In the case of continuous distributions, this definition is equivalent to the definition given above, since P(X = x) = 0. However, these definitions are not equivalent in the case of discrete random variables. 11 / 108 The Quantiles of a Distribution For 0 < p < 1, the p-quantile of a continuous random variable, qp , satisfies FX (qp ) = p. q0.5 is the median of X . q0.25 and q0.75 are called the lower and upper quartiles of X , respectively. If the support SX is an interval, then all quantiles are uniquely defined. 12 / 108 Relation between the mean and the median for a continuous distribution If a continuous random variable X has a distribution which is symmetric around x0 , then E [X ] = q0.5 = x0 . Many continuous distributions are right skewed, i.e. have a long right hand tail (e.g. the distribution of wages, the exponential distribution). For such distributions, the mean is greater than the median, i.e. in everyday language the ”average” (median) person earns less than the ”average” (given as the mean) wage. For left-skewed distributions, the median is greater than the mean. 13 / 108 Example 2.3.1 Suppose the random variable X has density function f (x) = cx on the interval [0,5] and f (x) = 0 outside this interval. 1. Calculate the value of the constant c. 2. Calculate the probability that (X − 2)2 ≥ 1. 3. Calculate E (X ) and σX . 4. Derive the cumulative distribution function of X . 5. Calculate the median, lower quartile and upper quartile of this distribution. 14 / 108 Example 2.3.1 15 / 108 Example 2.3.1 16 / 108 Example 2.3.1 17 / 108 Example 2.3.1 18 / 108 Example 2.3.1 19 / 108 Example 2.3.1 20 / 108 Example 2.3.1 21 / 108 Example 2.3.1 22 / 108 2.3.4 Standard continuous distributions The uniform distribution on the interval [a, b]. We write X ∼ U[a, b]. f (x) 1 b−a 0 a b x 23 / 108 The uniform distribution The area under the density function (a rectangle) is 1. The width of this rectangle is (b − a), the height of this rectangle is f (x), the density function. Hence, for x ∈ [a, b] (b − a)f (x) = 1 ⇒ f (x) = 1 b−a Otherwise, f (x) = 0. 24 / 108 The uniform distribution By symmetry, E (X ) is the mid-point of the interval i.e. E (X ) = a+b . 2 Suppose a calculator calculates to k decimal places. The rounding error involved in a calculation may be assumed to be uniform on the interval [−0.5 × 10−k , 0.5 × 10−k ]. 25 / 108 Example 2.3.2 Suppose the length of the side of a square is chosen from the uniform distribution on [0, 3]. Calculate 1. the probability that the length of the side is between 2 and 4 2. the expected area of this square. 26 / 108 Example 2.3.2 27 / 108 Example 2.3.2 28 / 108 2. The exponential distribution The density function of an exponential random variable with parameter λ is given by f (x) = λe −λx , for x ≥ 0 and f (x) = 0 for x < 0. We write X ∼ Exp(λ). 29 / 108 The exponential distribution This distribution may be used to model the time between the arrival of telephone calls. λ is the rate at which calls arrive (i.e. the expected length of time between calls is 1/λ). 30 / 108 The exponential distribution and the Poisson distribution From the interpretation of the exponential distribution and the Poisson distribution, we can see that there is a connection between them. If the time between observations, X , has an Exp(λ) distribution (e.g. the time between two calls, when calls come in at rate λ), then the number of observations in time t has a Poisson(λt) distribution. Note that λt is the expected number of calls to arrive in time t. 31 / 108 Example 2.3.3 The average number of calls coming into a call centre is 3/minute. Calculate 1) the probability that the time between two calls is greater than k mins. 2) t, where t is the time such that the length of time between two calls is less than t with probability 0.8. 3) the probability that the time between calls is greater than c + k, given that the time between calls is at least c (c, k > 0). 32 / 108 Example 2.3.3 33 / 108 Example 2.3.3 34 / 108 Example 2.3.3 35 / 108 Example 2.3.3 36 / 108 The lack of memory property Note that this result states that it does not matter how long we have waited between calls, the distribution of the extra time we wait until the next call is simply the distribution of the time between calls. This property is called the ”lack of memory” property. Another distribution which has this property is the geometric distribution. For example, suppose I throw a die until I get a six. It does not matter how often I have already thrown the die, I expect to throw the die on average another 6 times before I obtain a six (as long as the die is ”fair”). 37 / 108 3. The Pareto distribution X has a Pareto distribution with parameters xm and α when the density function is given by αx α m , x ≥ xm . f (x) = x α+1 0, x < xm We write X ∼ Pareto(xm , α). Note that α > 1, xm > 0. 38 / 108 The Pareto distribution The density function of the Pareto distribution looks similar to that of the exponential distribution shifted xm units to the right (it has a heavier tail though). The Pareto distribution is often used to model the distribution of wages when there is a minimum wage. xm represents the minimum wage. α represents the degree of ”concentration” of the wage distribution, i.e. the smaller α, the larger the degree of wage inequality. 39 / 108 Standard Results for the Exponential and Pareto Distributions Suppose X ∼ Exp(λ), then for k ≥ 0 P(X > k) = e −λk Suppose X ∼ Pareto(xm , α), then for k ≥ xm x α m P(X > k) = k 40 / 108 Standard Results for the Exponential and Pareto Distributions For α > 1 the expected (mean) of the Pareto distribution is given by αxm E (X ) = α−1 Note that these results follow directly from calculating the appropriate definite integrals. 41 / 108 Standard Results for the Exponential and Pareto Distributions In order to get any probability related to the exponential or Pareto distribution, we use the facts from the previous slide together with the following two facts, which hold for any continuous distribution. 1. P(X < k)=1 − P(X > k) 2. P(a < X < b)=P(X > a) − P(X > b) 42 / 108 Illustration of these Results The second result follows from the fact that the probability of X lying between a and b is the area under the density curve between x = a and x = b. 43 / 108 Example 2.3.4 The distribution of monthly salaries in Poland can be modelled using the Pareto distribution. The minimum salary is 2 000 PLN and the concentration factor is 2. Calculate the probability that an individual earns i) less than 4 000 PLN ii) greater than 8 000 PLN iii) Calculate the expected wage. 44 / 108 Example 2.3.4 45 / 108 Example 2.3.4 46 / 108 Example 2.3.4 47 / 108 4. The normal (Gaussian) distribution X has a normal distribution with expected value (mean) µ and variance σ 2 when the density function is given by −(x − µ)2 1 √ exp . f (x) = 2σ 2 σ 2π We write X ∼ N(µ, σ 2 ). This is the very commonly met bell shaped distribution. Much of the theory of statistics is based upon the properties of this distribution. The normal distribution will be the subject of the next section. 48 / 108 The normal (Gaussian) distribution 49 / 108 Expected value and variance of standard continuous distributions Distribution N(µ, σ 2 ) Exp(λ) U[a, b] Pareto(xm , α) Expected value µ Variance σ2 1 λ a+b 2 αxm α−1 1 λ2 (b−a)2 12 2 xm α−1 α α−2 Note: The expected value of the Pareto distribution exists only when α > 1, the variance exists only when α > 2. 50 / 108 2.4 The Normal Distribution and the Central Limit Theorem The importance of the normal distribution results from the central limit theorem, which explains why this bell shaped distribution is so often observed in nature. 51 / 108 2.4.1 The standard normal distribution The density function cannot be integrated algebraically. In order to calculate probabilities associated with the normal distribution, we can use standardization. A standard normal random variable has expected value 0 and standard deviation equal to 1. Such a random variable is denoted by Z i.e. Z ∼ N(0, 1). 52 / 108 Using tables for the standard normal distribution The NORMSDIST(k) function in excel gives the value of P(Z < k). This function has been used to create the table used in this course with probabilities of the form P(Z > k) = 1 − P(Z < k) for k ≥ 0. Of course, often we have to calculate probabilities of events which take a different form. In order to do this we use the following 3 rules. These follow from the interpretation of the probability of an event as the appropriate area under the density curve. 53 / 108 1. The law of complementarity The law of complementarity P(Z < k) = 1 − P(Z > k) It should be noted that P(Z = k) = 0. The area under the density curve is 1, hence P(Z < k) + P(Z > k) = 1 i.e. P(Z < k) = 1 − P(Z > k). This is a general rule for continuous distributions. 54 / 108 The law of complementarity 55 / 108 2. The law of symmetry The law of symmetry Since the standard normal distribution is symmetric about 0, P(Z < −k) = P(Z > k) This is used to calculate probabilities in the ”left hand tail” of the distribution (i.e. when the constant is negative). This law is specific to distributions which are symmetric around 0. 56 / 108 The law of symmetry 57 / 108 3. The interval rule The interval rule P(a < Z < b)=P(Z > a) − P(Z > b) General for continuous distributions 58 / 108 Reading the table for the standard normal distribution In order to read P(Z > k), where k is given to 2 decimal places, we find the row corresponding to the digits either side of the decimal point and the column corresponding to the second place after the decimal point. The table on the next slide illustrates a fragment of the table. 59 / 108 Reading the table for the standard normal distribution 1.1 1.2 0.00 0.1357 0.1151 0.01 0.1335 0.1131 0.02 0.1314 0.1112 0.03 0.1292 0.1093 For example, P(Z > 1.22) = 0.1112. Since, P(Z > k) is decreasing in k and must be non-negative, we can assume that for k > 4, P(Z > k) ≈ 0. 60 / 108 Example 2.4.1 Calculate i) P(Z > 1.76) ii) P(Z > −0.83) iii) P(Z < −0.18) iv) P(−0.43 < Z < 1.36). 61 / 108 Example 2.4.1 62 / 108 Example 2.4.1 63 / 108 Example 2.4.1 64 / 108 Example 2.4.1 65 / 108 Reading the table for the standard normal distribution Sometimes it is necessary to find the number k for which P(Z > k) = p, where p ≤ 0.5. In this case we find the value closest to p in the heart of the table and the value of k is read from the values corresponding to the appropriate row and column. The rules of complementarity and symmetry may be needed to obtain the desired form i.e. P(Z > k) = p, where p ≤ 0.5. 66 / 108 Example 2.4.2 Find the value of k satisfying P(Z < k) = 0.17. 67 / 108 Example 2.4.2 68 / 108 Example 2.4.2 69 / 108 Example 2.4.2 70 / 108 The NORMSINV function in Excel The function NORMSINV gives the value of k for which P(X < k) = p for a given p, 0 < p < 1. This is the inverse function to the distribution function of the standard normal distribution. In the previous example, i.e. Find the value of k satisfying P(Z < k) = 0.17. k = NORMSINV (0.17) = −0.95 71 / 108 2.4.2 Standardisation of a normal random variable Clearly, the technique used in the previous subsection only works for a standard normal random variable. How do we calculate appropriate probabilities for a general normal distribution i.e. X ∼ N(µ, σ 2 )? The first step is to standardise the variable. 72 / 108 Standardisation of a normal random variable If X ∼ N(µ, σ 2 ), then Z= X −µ ∼ N(0, 1) σ Subtracting the expected value first centres the distribution around 0 and then division by the standard deviation ”shrinks” the dispersion of the distribution to the dispersion of the standard normal distribution. 73 / 108 Transformations of normal random variables In general, if X ∼ N(µ, σ 2 ), then Y = aX + b also has a normal distribution. In particular, Y ∼ N(aµ + b, a2 σ 2 ). The sum of independent, normal random variables is also normally distributed. The expected value and variance of such a sum are the sums of the individual expected values and variances, respectively. After standardisation, we can calculate the appropriate probabilities as before. 74 / 108 Transformations of normal random variables It should be noted that this standardisation procedure is specific to the normal distribution. Other distributions may have particular standard forms and standardisation procedures. For example, the standard exponential distribution is Exp(1), where f (x) = e −x . Note that if Y ∼Exp(λ), then λY ∼Exp(1). 75 / 108 Example 2.4.3 The height of male students is normal with a mean of 175cm and variance of 144cm2 . a) What is the probability that a randomly picked male student is i) taller than 190cm ii) between 163 and 181cm? b) 10% of male students are shorter than what height? 76 / 108 Example 2.4.3 77 / 108 Example 2.4.3 78 / 108 Example 2.4.3 79 / 108 Example 2.4.3 80 / 108 Example 2.4.3 81 / 108 2.4.3 The central limit theorem Suppose I throw a coin once. The distribution of the number of heads, X , is P(X = 0) = 0.5; P(X = 1) = 0.5 i.e. nothing like a bell shape distribution. However, suppose I throw the coin a large number of times, say k times. I am reasonably likely to get around k2 heads, but the probability of getting either a large number or small number of heads (with respect to k2 ) is very small. The distribution of the number of heads thrown, X , has a bell like shape (i.e. similar to the normal distribution). 82 / 108 2.4.3 The central limit theorem This is a particular case of the central limit theorem. Note that X can be written as X = X1 + X2 + . . . + Xn , where Xi = 1 if the i-th toss results in heads Xi = 0 if the i-th toss results in tails. 83 / 108 The central limit theorem (CLT) Suppose X = X1 + X2 + . . . + Xn , where n is large and the Xi are independent random variables, then X is approximately normally distributed, i.e. X ∼approx N(µ, σ 2 ), where µ=E (X ) = n X E (Xi ) i=1 n X σ 2 =Var (X ) = Var (Xi ). i=1 This approximation is good if n ≥ 30, the variances of the Xi are comparable and the distributions of the Xi ’s are reasonably symmetrical. If the distributions of the Xi ’s are clearly asymmetric, then this approximation will be less accurate. 84 / 108 Example 2.4.4 n independent observations are taken from the exponential distribution with expected value 1. Using an appropriate approximation, estimate the probability that the mean of these observations (the sample mean X ) is between 0.9 and 1.1 when i) n = 30, ii) n = 100 85 / 108 Example 2.4.4 86 / 108 Example 2.4.4 87 / 108 Example 2.4.4 88 / 108 Example 2.4.4 89 / 108 Example 2.4.4 90 / 108 Example 2.4.4 91 / 108 The relation between the central limit theorem and sampling Note 1: As the sample size grows, the probability of the sample mean being close to the theoretical (population) mean increases. In the case of the exponential distribution, the estimation of the theoretical mean using the sample mean is not highly accurate, due to the high coefficient of variation (CV=1). 92 / 108 The relation between the central limit theorem and sampling Note 2: In this case the exact probabilities can be calculated., since the sum of i.i.d. exponential random variables has a Gamma distribution (not considered in this course). In the first case, the exact probability (to 4 d.p.) is 0.4162 (compared to the estimate 0.4176). In the second case, the exact probability (to 4 d.p.) is 0.6835 (compared to the estimate 0.6826). Hence, as the number of observations increases the more accurate the approximation using the CLT is. Since the exponential distribution is clearly asymmetrical the approximation using CLT is relatively poor. 93 / 108 Proportion of observations from a normal distribution within one standard deviation of the mean Note 3: After standardisation, the constants indicate the number of standard deviations from the mean (a negative sign indicates deviations below the mean). Here, P(−1 < Z < 1) = 0.6826 shows that if X comes from any normal distribution, the probability of being within one standard deviation of the mean is just over 23 . Similarly, P(−2 < Z < 2) = 0.9545. Thus, an observation from any normal distribution will be less than 2 standard deviations from the mean with a probability of just over 0.95. 94 / 108 2.4.4 The normal approximation to the binomial distribution Suppose n is large and X ∼ Bin(n, p), then X ∼approx N(µ, σ 2 ), where µ = np, σ 2 = np(1 − p). This approximation is used when n ≥ 30, 0.1 ≤ p ≤ 0.9. For values of p outside this range, the Poisson approximation tends to work better. 95 / 108 The continuity correction for the normal approximation to the binomial distribution It should be noted that X has a discrete distribution, but we are using a continuous distribution in the approximation. For example, suppose we wanted to estimate the probability of obtaining exactly k heads when we throw a coin n times. This probability will in general be positive. However, if we use the normal approximation without an appropriate ”correction”, we cannot sensibly estimate P(X = k) [for continuous distributions P(X = k) = 0]. 96 / 108 The continuity correction for the normal approximation to the binomial distribution Suppose the random variable X takes only integer values and has an approximately normal distribution. In order to estimate P(X = k), we use the continuity correction. This uses the fact that when k is an integer P(X = k) = P(k − 0.5 < X < k + 0.5). 97 / 108 Example 2.4.5 Suppose a coin is tossed 36 times. Using CLT, estimate the probability that exactly 20 heads are thrown. 98 / 108 Example 2.4.5 99 / 108 Example 2.4.5 100 / 108 Example 2.4.5 Using the BINOMDIST function in the excel package, we can calculate this probability using the exact distribution. To four decimal places, this probability is 0.1063. 101 / 108 The continuity correction for the normal approximation to the binomial distribution This continuity correction can be adapted to problems in which we have to estimate the probability that the number of successes is in a given interval. e.g. P(15 ≤ X < 21)=P(X = 15) + P(X = 16) + . . . + P(X = 20) =P(14.5 < X < 15.5) + . . . + P(19.5 < X < 20.5) =P(14.5 < X < 20.5) 102 / 108 The continuity correction for the normal approximation to the binomial distribution Note that when applying the continuity correction, if the end point of an interval is given by a non-strict inequality, then we stretch the interval at that end by 0.5. If the end point of an interval is given by a strict inequality, then we shrink the interval at that end by 0.5. 103 / 108 Example 2.4.6 A die is thrown 180 times. Estimate the probability that 1) at least 35 sixes are thrown 2) between 27 and 33 sixes are thrown (inclusively). 104 / 108 Example 2.4.6 105 / 108 Example 2.4.6 106 / 108 Example 2.4.6 107 / 108 The normal approximation to the binomial The probabilities calculated from the exact distribution, to four decimal places, are 0.1828 and 0.5160 It should be noted that the normal approximation to the binomial is most accurate when n is large and p is close to 0.5. This is due to the fact that X = X1 + X2 + . . . + Xn , where Xi ∼ 0 − 1(p). The distribution of Xi is symmetric when p = 0.5. 108 / 108