Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
39 Proposition 4.4. Let X be a discrete random variable, then its probability mass function satisfies: (i) pX (x) 0 for all x 2 R; (ii) {x 2 R : pX (x) 6= 0} is at most countable; (iii) P x2R pX (x) = 1. Conversely, given a map p : R ! R that satisfies the properties (i)-(iii), then p is the probability mass function of some discrete random variable. Proof. (i) It follows just from the definition of a PMF, that is pX (x) = P(X = x), and the fact that probability measures are non-negative. (ii) Since X is a discrete random variable, there exists a subset of the real line K ⇢ R that is countable and such that P(X 2 K) = 1. That means that the PMF is null outside K, that is pX (x) = 0 for all x 2 R \ K. Thus, the points to which the PMF assigns strictly positive value can only be among the ones contained in K, hence there is at most a countable number of such points. (iii) Note that, since a random variable takes values in R, we can rewrite the sample space as ⌦= [ x2R {X = x} = [ x2X(⌦) {X = x}, that is a union over all possible values of X (the image of ⌦ through X), of which we have at most a countable number. Moreover, the events in the union are all mutually exclusive. So, by the additivity axiom of probability measures, we have 0 1 [ X 1 = P(⌦) = P @ {X = x}A = P ({X = x}) . x2X(⌦) x2X(⌦) 40 4. Discrete random variables Example 4.5. In the experiment consisting in three independent tosses of a coin, let us denote by X the number of heads obtained. If the coin is balanced, we have an equal-likelihood model, so each elementary event has a probability equal to 1/8=0.125. So, if we want to compute the probability that in the three tosses we get exactly one head, we have pX (1) = P(X = 1) = P({(HT T ), (T HT ), (T T H)} = P({(HT T )}) + P({(T HT )}) + P({(T T H)} = 3 · 0.125 = 0.375. Another function associated to a discrete random variable and determined by its PMF is the cumulative distribution function. Definition 4.6. Let X be a discrete random variable, the map FX : R ! R defined by FX (a) = P(X a) = X pX (x), xa a 2 R, is called the cumulative distribution function of X. 4.1 Classes of discrete random variables Discrete random variable can be classified based on their PMF in di↵erent classes that are often related to specific situations. Bernoulli In many contexts we are interested in the occurrence of a certain event, called a success, or the non-occurrence thereof, called a failure, in any of a sequence of independent repetition of a random experiment, where the probability of success is always the same. If we denote by E the event of interest, the success probability is the paramenter p = P(E), which is constant over all independent repetitions of the experiment, called Bernoulli trials. Then 1 p is called the failure probability. 4.1 Classes of discrete random variables 41 Definition 4.7. A Bernoulli random variable X with success probability p is a random variable defined by 8 <1, ! 2 E, X(!) = :0, ! 2 E c , 8! 2 ⌦, for some event E 2 F with probability P(E) = p. Note that the probability mass function of a Bernoulli random variable X with success probability p is given by 8 > 1 p, if x = 0, > > < pX (x) = p, if x = 1, > > > :0, otherwise, 8x 2 R. Examples of Bernoulli trials are independent tosses of a coin, where we observe whether the outcome is a head (success) or not (failure), repetitive bets on “red” in the roulette, where we observe whether we win (success) or not (failure), or again the random sampling with replacement from a population, where we observe whether the selected member posseses a specific attribute (success) or not (failure). Binomial Binomial random variables arise when we are not just interested in the success/failure relative to some event E 2 F in a single trial, but rather in the number of successes in a finite sequence of Bernoulli trials. Definition 4.8. A Binomial random variable X with parameters n, p is a random variable that counts the number of successes, each with probability p, in a finite sequence of n Bernoulli trials. We denote X ⇠ Bin(n, p). Proposition 4.9. The probability mass function of a Binomial random variable X with parameters n, p is 8 < n px (1 p)n x , x pX (x) = :0, if x = 0, 1, . . . , n, otherwise, 8x 2 R. (4.1) 42 4. Discrete random variables Proof. The sample space for a sequence of n Bernoulli trials is ⌦ = {(x1 , . . . , xn ) : xi 2 {0, 1}, i = 1, . . . , n}, where each element xi in any n-tuple representing a possible outcome has value 1 in case of a success and value 0 in case of a failure. Take now a non negative integer k, 0 k n, and consider the event of getting exactly k successes in n trials. This is written as {X = k} = {(x1 , . . . , xn ) 2 ⌦ : |{xi = 1, i = 1, . . . , n}| = k}. Since the trials are independent, any single outcome in {X = k} has probability pk (1 p)n k . Since the cardinality of {X = k} is the number of combinations of k elements from a collection of n and the single outcomes are mutually exclusive events, we get ✓ ◆ n x P(X = k) = p (1 x p)n x . For all other values of k, the probability that X equals k is null. Example 4.10. Insurance companies compute their premiums based on many factor, among which the mortality tables, containing the probabilities that people of a certain age will live another specified number of years. Assume that the probability that a persone aged 20 will be alive at age 65 is 80%, and suppose that three people of age 20 are randomly selected. Compute the probability that exactly two, at most one, or at least one respectively of the three people will be alive at age 65. We assume that the life of each of the three people is independent of the others, which means we have three Bernoulli trials. The event of interest is E = “the selected person is still alive at age 65” and the success probability is p = 0.8. Then, using formula (4.1), the PMF of the variable X counting the number of people among the three that are still alive at 65 is ✓ ◆ 3 pX (x) = (0.8)x (0.2)3 x , x = 0, 1, 2, 3, x 4.1 Classes of discrete random variables 43 and 0 otherwise. We get P(X = 2) = pX (2) = 38.4%, P(X 1) = pX (0) + pX (1) = 10.4%, P(X 1) = pX (1) + pX (2) + pX (3) = 1 pX (0) = 99.2%. Hypergeometric In statistical estimations of population proportions or in quality control, the key role is played by hypergeometric random variables. Unlike binomial random variables, these are not related to a sequence of Bernoulli trials. Definition 4.11. A hypergeometric random variable X with parameters N, n, p is a random variable that counts the number of elements in a random sample of size n, taken without replacement from a population of size N , having a specified attribute, where p is the proportion of members of the population possessing the attribute. We denote X ⇠ H(N, n, p). Proposition 4.12. The probability mass function of a hypergeometric random variable X with parameters N, n, p is gievn by pX (x) = Np x N (1 p) n x N n , max{0, n N (1 p)} x min{n, N p}, (4.2) and pX (x) = 0 otherwise. Proof. Since the process of sampling wihout replacement does not consist of independent selections, we don’t have Bernoulli trials. However, since each member of the population is equally likely to be selected, then we have an equal-likelihood model, where the probability of any event E can be computed as the ratio of the number of outcomes in the event over the number of all possible outcomes. The cardinality of the sample space, i.e. the number of all possible samples of size n without replacement, is |⌦| = N n . In order to compute the PMF of X, we only have to consider the events of the kind {X = k} where max{0, n N (1 p)} k min{n, N p}, since for any 44 4. Discrete random variables other value of k we have {X = k} = ;. Indeed, the number of selected mem- bers having the specified attribute cannot be less than the di↵erence of the sample size and the number of members of the whole population that don’t have the attribute, and it cannot be bigger than the number of members of the whole population that have the attribute. Then, in order to compute pX (k) = P(X = k), we have to compute the cardinality of {X = k}. To this aim, we can divide the population in two groups, one made up by the members possessing the attribute and one made up by the remaining ones. Outcomes in {X = k} are samples where k elements are selected from the first group, for which we have N p Cn possibilities, and n the second group, for which we have BCR, |{X = k}| = ✓ N (1 p) Cn k Np k ◆✓ k are selected from possibilities. Thus, by the ◆ N (1 p) . n k This implies that |{X = k}| P(X = k) = = |⌦| Np k N (1 p) n k N n , which ends the proof. Example 4.13 (Statistical quality control). Consider a quality assurance engineer who has to inspect the finished products of a company manufactoring TVs. In particular, he has to select at random 5 TVs from each lot of 100, inspect them thoroughly and report the number of defective ones. Let X denote the number of defective TVs in a lot of size 100. Assume that 6 items are actually defective in that lot and compute the probability that the selected sample contains k defective items, for k = 0, 1, 2, 3, 4, 5. The variable X defined here is a hypergeometric random variable with parameters 100,5,p, where p = 6/100 = 0.06. Therefore, we have P(X = k) = 100·0.06 k 100·0.94) 5 k 100 5 = 6 k 94 5 k 100 5 . 4.1 Classes of discrete random variables 45 Poisson Poisson random variable are very recurrent in many applications and, together with the binomial random variables, are the most important one in the discrete framework. Definition 4.14. A discrete random variable X is a Poisson random variable with parameter > 0 if its probability mass function is given by 8 x <e , if x = 0, 1, . . . , x! pX (x) = 8x 2 R. :0 otherwise, (4.3) Remark 4.15. The function defined in (4.3) is indeed a probability mass function. This is important in order for Definition 4.14 to make sense. In order to prove it, it is enough to check that the properties (i)-(iii) in Proposition 4.4 are satisfied. The properties (i) and (ii) are trivially satisfied, by the definition in (4.3). Let us check if (iii) holds true. Note that for any t 2 R, the exponential of t can be rewritten as t e = 1 X tk k=0 We have: X pX (x) = x2R 1 X x=0 pX (x) = e k! . 1 X x=0 x x! =e e = 1. So all properties in Proposition 4.4 are satisfied and we have a PMF. Poisson random variables often arise in the modeling of the frequence of occurrence of a certain event during a specified period of time. Geometric Geometric random variable are also related to a sequence of Bernoulli trials, but instead of counting the number of successes in a finite sequence like in the case of binomial variables, they count the number of trials performed up to the first success. 46 4. Discrete random variables Definition 4.16. A geometric random variable X with parameter p is a random variable that counts the number of Bernoulli trials with success probability p up to, and including, the first success. We denote X ⇠ G(p). Proposition 4.17. The probability mass function of a geometric random variable X with parameter p is given by 8 <p(1 p)x 1 , if x = 0, 1, . . . , pX (x) = :0, otherwise, 8x 2 R. (4.4) Proof. Let us consider a non-negative integer k. The event {X = k} = “the first success is at the k-th trial” occurs if and only if the first k 1 trials result in failures and the k-th trail results in a success. Thus, denoted Ei the event that the the i-th trial results in a success, for all i 2 N, we can rewrite ! k\1 {X = k} = Eic \ Ek . i=1 Then, since the trials are all independent, and so are all the events in the intersection, we get pX (k) = P({X = k}) = k Y1 i=1 P (Eic ) · P(Ek ) = k Y1 i=1 (1 p) · p = (1 p)k 1 p. Geometric random variables have an interesting property which is due to the independence of Bernoulli trials. If we choose any trial in a sequence of Bernoulli trials and we consider the subsequent sequence of trials, this behaves in the same way as the sequence starting from the beginning. Thus, if by the n-th trial no successes occur, the number of trials up to the first success starting from the n-th trial has the same distribution as the number of trials from the first one up to the first success. Furthermore, we can prove that geometric random variable are the only positive-integer-valued random variable possessing such a property. 4.1 Classes of discrete random variables 47 Proposition 4.18. A discrete random variable X taking values in N [ {0} has the lack-of-memory property, i.e. P(X = n + k|X > n) = P(X = k), 8n, k 2 N, (4.5) if and only if it is a geometric random variable. Proof. ((). If X ⇠ G(p), then P(X = n + k, X > n) P(X > n) P(X = n + k) = P(X > n) p(1 p)n+k 1 = P1 p)i 1 i=n+1 p(1 P(X = n + k|X > n) = p(1 p)n+k 1 = P1 p i=n (1 p)i p(1 p)n+k 1 = n p 1(1 (1p) p) = p(1 p)k 1 = P(X = k), so the lack-of-memory property holds true. ()). If X is a discrete random variable such that P(X 2 N) = 1 and having the lack-of-memory peroperty, then we want to prove that it is geometrically distributed. Let us denote p = P(X = 1), i.e. the probability of having a success on the first trial, hence on each trial, and qn = P(X > n). Applying the lack-of-memory property for k = 1, we get P(X = n + 1|X > n) = P(X = 1) = p, where the left-hand side can be rewritten as P(X = n + 1) P(X = n + 1|X > n) = P(X > n) qn qn+1 = qn qn+1 =1 . qn 48 4. Discrete random variables Hence the equation qn+1 qn = 1 p holds true for all n 2 N, but it is also true for n = 0, since q0 = P(X > 0) = 1 and q1 = P(X > 1) = 1 Then, taking the product for n = 0 to n = k qk = k Y1 qn+1 = (1 q n n=0 P(X = 1) = 1 1, we get p)k , and consequently pX (k) = qk Therefore, X ⇠ G(p). 1 qk = (1 p)k 1 (1 p)k = p(1 p)k 1 . p.