Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 02: Discrete Random Variables Ba Chu E-mail: ba [email protected] Web: http://www.carleton.ca/∼bchu (Note that this is a lecture note. Please refer to the textbooks suggested in the course outline for details. Examples will be given and explained in the class) 1 Objectives The introduction of random variables in Lecture 1 was completely general – the principles that we discussed apply to all kinds of random variables. In this lecture, we will study an important special class of random variables, the discrete random variables, such as the Bernoulli random variables, the binomial random variables, the geometric random variables, and the Poisson random variables. I will illustrate the usefulness of these random variables with interesting examples. 2 Basic Concepts We begin with formal definition. Definition 1. A random variable X is discrete if the set of possible values of X, X(S), is countable. Definition 2. Let X be a discrete random variable. The probability density function (pdf ) of X is the function f : R −→ R defined by f (x) = P (X = x) which satisfies the following properties: 1. f (x) > 0 for every x ∈ R. 2. If x 6∈ X(S), then f (x) = 0. 1 3. By the definition of X(S), P x∈X(S) f (x) = P x∈X(S) P (X = x) = P (∪x∈X(S) {x}) = P (X ∈ X(S)) = 1. Example 1. A fair coin is tossed and the outcome is Heads (H) or Tails (T). Define a random variable X by X(H) = 1 and X(T ) = 0. The pdf of X is the function f (x) defined by f (0) = 0.5, f (1) = 0.5, and f (x) = 0 for all x 6∈ X(S) = {0, 1}. Example 2. Fair die is tossed and the number of dots on the upper face is observed. The sample space is S = {1, 2, 3, 4, 5, 6}. Define a random variable X by X(s) = 1 if s is a prime number and X(s) = 0 if s is not a prime number. Then, the pdf of X is f (0) = 1/3, f (1) = 2/3, and f (x) = 0 for all x 6∈ X(S) = {0, 1}. Definition 3. A random variable X is a Bernoulli trial if X(S) = {0, 1}. Traditionally, we call X = 1 a ‘success’ and X = 0 a ‘failure’. The pdf of this random variable is defined by f (0) = 1 − p, f (1) = p, and f (x) = 0 for all x 6∈ X(S). Definition 4. Consider a series of n independent Bernoulli trials, let Y denote the number of successes. We call Y a binomial random variable. The pdf of Y is given by f (k) = P (Y = k) = C(n, k)pk (1 − p)n−k . Definition 5. Let X1 , X2 , . . . denote independent Bernoulli trials, and let Y denote the number of failures that precede the first success. We call Y a geometric random variable. The pdf of Y is given by f (j) = P (Y = j) = P (X1 = 0, . . . , Xj = 0, Xj+1 = 1) = (1 − p)j p. Thus, we can obtain the cdf of Y , i.e., F (k) = P (Y ≤ k) = 1 − P (Y > k) = 1 − P (Y ≥ k + 1) = 1 − (1 − p)k+1 . Example 3. John is a 4-th year undergraduate student who is determined to have a formal date. He believes that each woman he asks is twice as likely to decline his invitation as to accept it, but 2 he resolves to extend invitations until one is accepted. However, each of his first ten invitations is declined. Assuming that John’s assumptions about his own desirability are correct, what is the probability that he would encounter such run of failures?. The answer is that John evidently believes that he can model his invitations as a sequence of independent Bernoulli trials, each with success probability p = 1/3. If so, the number of unsuccessful invitations that he extends is a geometric random variable Y and 2 10 = 0.0173. P (Y ≥ 10) = 1 − P (Y ≤ 9) = 1 − F (9) = 1 − 1 − ( ) 3 Imagine an urn that contains m red balls and n black balls. Our experiment involves selecting k balls from the urn in a way that each of the C(m + n, k) possible outcomes that might be obtained are equally likely. Let X denote the number of red balls selected in this manner. If we observe X = x, then x red balls were selected from a total of m red balls and k − x black balls were selected from a total of n black balls. We call X a hypergeometric random variable, and its pdf is given by f (x) = P (X = x) = #(X = x) C(m, x)C(n, k − x) = . #(S) C(m + n, k) Example 4. Suppose that, in ECON 4002 class, 30 female students and 20 male students are randomly assigned grade A or B. What is probability that exactly 15 female students and 5 male students receive A? Answer: 3 C(30,15).C(20,5) . C(50,20) Expectation Since decision-makers often face many different prospects, it is plausible to use decision criteria in which each prospect should be weighted by the chance of realizing that prospect. This motivates the notion of expectation Definition 6. The expected value of a discrete random variable X, which we will denote E(X), is 3 the probability-weighted average of the possible values of X, i.e., E(X) = X xP (X = x) = x∈X(S) X xf (x). x∈X(S) The expected value of X, E(X), is often called the population mean and denoted µ. Example 5. If X ∼ Bernoulli(p), then µ = P x∈{0,1} xP (X = x) = p. Example 6. Suppose that you own a slot machine that pays a jackpot of $1000 with p = 0.0005 and $0 with 1 − p. How much you should charge a customer to play this machine?. Let X denote the payoff, the expected payoff per play is EX = 1000.0.0005 + 0.0.9995 = 0.5. Thus, you should charge more than $0.5. Example 7. You are offered the choice between receiving a certain $5 and playing the following game: a fair coin is tossed and you receive $10 or $0 according to whether H or T is observed. Will you play the game? Example 8. Consider a lottery game, in which 6 numbers are drawn ( without replacement) from the set {1, 2, . . . , 39, 40}. For $2, you can purchase a ticket that specifies 6 such numbers. If the numbers on your ticket match the numbers selected by the state, then you win $1 million; otherwise, you win nothing. Is buying a lottery ticket ‘completely irrational’? The probability of winning a lottery ticket is p = 1 C(40,6) = 2.6053 × 10−7 , so the expected value is approximately $0.26, which is considerably less than the cost of a ticket. Evidently, it is completely irrational to buy tickets for this lottery as an investment strategy. In most state lotteries, the fair value of game is less than the cost of a ticket. This is natural– lotteries exist because it generate revenue for the states. Definition 7. Suppose that X : S −→ R is a discrete random number and φ : R −→ R is a function. Let Y = φ(X). Then Y : R −→ R is a random variable and Eφ(X) = X x∈X(S) 4 φ(x)f (x). Example 9. Consider a game in which the jackpot starts at $1 and doubles each time that Tails is observed when a fair coin is tossed. The game terminates when Heads is observed for the first time. How much payoff would you receive from this game?. This is very curious game. The payoff will, most probably, be rather small; there is small chance of a very large payoff. Let X denote the number of T that are observed before the game terminates. Then X(S) = {0, 1, 2, . . . } and the geometric random variable X has the pdf: f (x) = 0.5x . The P P∞ x x payoff from this game is Y = 2X ; thus, the expected payoff is E[2X ] = ∞ x=0 2 0.5 = 0 1 = ∞. This is very amazing!. This example is famous in probability theory – it is known as the St. Petersburg Paradox. 3.1 Properties of expectation • E[cφ(X)] = cE[φ(X)]. • E[φ1 (X) + φ2 (Y )] = E[φ1 (X)] + E[φ2 (X)]. Definition 8. The variance of a discrete random variable X, which we denote V ar(X), is the probability-weighted average of the squared deviations of X from E(X) = µ, i.e., V ar(X) = E[(X − µ)2 ] = X (x − µ)2 f (x). x∈X(S) The variance of X if oftern called the population variance and denoted by σ 2 . Example 10. If X is a Bernoulli random variable with the probability measure p, then V ar(X) = E[(X − µ)2 ] = p(1 − p). Theorem 1. If X is a discrete random variable, then V ar(X) = E[X 2 ] − (E[X])2 . Example 11. Suppose that X is a random variable whose possible values are X(S) = {2, 3, 5, 10}. Suppose that the probability of each of these values is given by the formula f (x) = P (X = x) = x/20. 5 Then, the expected value µ = E[X] = 6.9, the variance σ 2 = V ar(X) = 10.39, and the standard deviation σ = 3.2234. Theorem 2. Let X denote a discrete random variable and suppose that c ∈ R is constant. Then V ar(cX) = c2 V ar(X). If the discrete random variables X1 and X2 are independent, then V ar(X1 + X2 ) = V ar(X1 ) + V ar(X2 ). 3.2 Binomial distribution Suppose that a fair coin is tossed twice and the number of Heads (H) is counted. Let Y denote the total number of Hs. Because the sample space has 4 equally likely outcomes – S = {HH, HT, T H, T T }, the pdf of Y is given by f (0) = P (Y = 0) = 0.25, f (1) = P (Y = 1) = 0.5, f (2) = P (Y = 2) = 0.25, and f (y) = 0 if y 6∈ Y (S) = {0, 1, 2}. Now, we state the formal definition of the binomial random variable. Definition 9. Let X1 , . . . , Xn be mutually independent Bernoulli trials, each with success probability p. Then, Y = n X Xi i=1 is a binomial random variable, denoted Y ∼ Bin(n, p). The mean of Y is np, and the variance of Y is np(1 − p). 6 The general formula for the binomial pdf is f (j) = P (Y = j) = C(n, j)pj (1 − p)n−j , (3.1) and the binomial cdf is F (k) = P (Y ≤ k) = k X C(n, j)pj (1 − p)n−j . j=0 Example 12. In 12 trials with success probability 0.5, what is the probability that no more than 4 success will be observed?. That is, P (Y ≤ 4) = F (4). In 15 trials with success probability 0.8, what is the probability that at least 4 but no more than 10 successes will be observed?. That is, P (4 ≤ Y ≤ 10) = F (10) − F (4). Theorem 3. If, in (3.1), we let n −→ ∞ and p −→ 0 in such a way that λ = np remains constant, then lim P (Y = j) = n−→∞ p−→0 np=const lim C(n, j)pj (1 − p)n−j = n−→∞ p−→0 np=const e−λ λj . j! In this case, we call Y a Poisson random variable, E[Y ] = λ and V ar(Y ) = λ. 4 Exercises 1. Suppose that a weighted die is tossed. Let X denote the number of dots that appear on the upper face of the die, and suppose that P (X = x) = (7 − x)/20 for x = 1, 2, 3, 4, 5 and P (X = 6) = 0. Determine each of the following: (a) The probability density function (pdf) of X. (b) The cumulative distribution function (cdf) of X. (c) the expected value (mean) of X. (d) the variance of X. 7 2. Paul is planning a dinner party at which he will be able to accommodate 7 guests. From past experience, he knows that each person invited to the party will accept his invitation with probability 0.5. He also knows that each person who accepts will actually attend with probability 0.8. Suppose that Paul invites 12 people. Assuming that they behave independently from each other, what is the probability that he will end up with more guests than he could accommodate?. 3. If X ∼ Bin(n, p), find the pdf of Y = n − X. 4. John teaches 2 sections of Statistics for a total of 1500 students. Each of his students spins a penny 89 times and counts the number of Heads. Assuming that each of these 1500 pennies has P (H) = 0.3 for a single spin, what is the probability that John will encounter at least 1 student who observes no more thatn 2 Hs?. 5. Problems 3.3.6, 3.3.10, 3.3.14, and 3.3.15 in LM06. 8