Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 2. Discrete random variables Mathematical Statistics and Discrete Mathematics November 4th, 2015 1 / 20 Random variables The setting that we start with is a sample space S and a probability measure P on the events contained in S. A random variable is an assignment of a real number to each outcome in S. In other words, X is a function defined on S and taking values in R (we denote this by X : S → R). The value taken by X for the outcome ω is denoted by X(ω). Let S be the combined sample space for three tosses of a die. Let ω = (ω1 , ω2 , ω3 ) ∈ S be a configuration of three outcomes. Then, the following are random variables: !X !X !X !X !X 0 = 2, a constant random variable. 1 = sum of all outcomes; X1 (ω) = ω1 + ω2 + ω3 , X1 (2, 5, 3) = 10. 2 = maximal outcome; X2 (ω) = max{ω1 , ω2 , ω3 }, X2 (2, 5, 3) = 5. 3 = third outcome; X3 (ω) = ω3 , X3 (2, 5, 3) = 3. 4 = index of smallest outcome; X4 (2, 5, 3) = 1 since the first outcome is the smallest. 2 / 20 Discrete random variables A set K ⊂ R is countably infinite if one can list all elements of K in an infinite sequence, that is, one can write, K = {k1 , k2 , k3 , . . .}. Subsets for which this is not the case are called uncountable. ! The sets of natural and integer numbers are countably infinite. ! Any interval (a, b) with a < b is uncountable. A random variable X is called discrete if it takes values in a finite or countably infinite subset of R. ! Toss a coin until first heads. Let X be the total number of tosses. The set of possible values of X is {1, 2, 3, . . . }, and therefore X is discrete. ! Take a sample of sea water. Let X be the amount of salt per weight unit. The set 1 1 1 2 of possible values of X2 is [0, 1), and therefore X2 is not discrete. 3 / 20 Discrete probability density function We will use capital letters X, Y, Z, etc. to denote random variables, and small letters x, y, z, etc. for their observed values. The knowledge of all possible values is not enough to describe a random variable. We also need to know how probable it is that the variable attains the values. The discrete probability density function (PDF) fX : R → [0, 1] of a discrete random variable X is defined by fX (x) = P(X = x) = P({ω : X(ω) = x}). We say that x is a possible value for a discrete random variable X if fX (x) = P(X = x) > 0. The probability density function fX contains all information about the random variable X. We say that two variables have the same distribution if they have the same PDF. 4 / 20 Discrete probability density function Toss a biased coin with P(heads = 1/3) 3 times, and let X be the total number of heads. The only possible values are 0, 1, 2, 3. We have • fX (0) = P(X = 0) = P(ttt) = P(t)3 = 8/27, • fX (1) = P(X = 1) = P({htt, tht, tth}) = 3P(t)2 P(h) = 3 · (2/3)2 · 1/3 = 4/9, • fX (2) = P(X = 2) = P({hht, hth, thh}) = 3P(h)2 P(t) = 3 · (1/3)2 · 2/3 = 2/9, • fX (3) = P(X = 3) = P(hhh) = P(h)3 = 1/27. f(x) 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 -‐1 0 1 2 3 4 5 / 20 Properties of discrete PDF A nonnegative function f defined on the real numbers is a PDF of some discrete random variable if and only if the following two conditions hold: • The set K = {x : f (x) > 0} is finite or countably infinite, • X f (x) = 1. x∈K Moreover, if X is a discrete random variable whose PDF is f , then for any set T ⊂ R X P(X ∈ T) = f (x). all possible values x of X in T Toss a biased coin with P(heads = 1/3) 3 times, and let X be the total number of heads. We have P(X ≤ 2.5) = P(X ∈ [0, 2.5]) = fX (0) + fX (1) + fX (2) = 26/27. 6 / 20 Cumulative density function The cumulative density function (CDF) FX : R → [0, 1] of a random variable X is defined by FX (x) = P(X ≤ x). Toss a biased coin with P(heads = 1/3) 3 times, and let X be the total number of heads. f(x) 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 -‐1 0 1 2 3 4 7 / 20 Expectation Main motivating question: what is the average value that a random variable assumes? • Let us conduct a simple experiment: toss a die 6 million times, and let x̄ be the arithmetic average of all outcomes. • We want to know x̄. • Since each outcome is equally probable and we toss the die an enormous number of times, there should be no bias towards any of the possible outcomes. • This means that each outcome appears roughly 1 million times, and hence x̄ ∼ 106 · 1 + 106 · 2 + 106 · 3 + 106 · 4 + 106 · 5 + 106 · 6 = 3.5 6 · 106 • more generally for non-uniform distributions, the number of appearance of each outcome should be proportional to its probability. 8 / 20 Expectation The expectation, expected value, mean value, or simply the mean E[X] of a discrete random variable X is defined by X X x · P(X = x) = x · fX (x). E[X] = µX = all possible values x of X all possible values x of X ! Toss a die and let Y be the outcome. Then, E[Y] = (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. ! Toss a biased coin with P(heads = 1/3) 3 times, and let X be the total number of heads. Then, E[X] = 0 · fX (0) + 1 · fX (1) + 2 · fX (2) + 3 · fX (3) = 4/9 + 2 · 2/9 + 3 · 1/27 = 1. 9 / 20 Expectation Properties of expectation: • If X and Y have the same distribution, then E[X] = E[Y]. Expectation is a characteristic of the distribution. The following is known as linearity of expectation. • E[c] = c for a constant c. • E[aX] = aE[X] for any random variable X and a constant a. • E[X + Y] = E[X] + E[Y] for any pair of random variables X and Y. • More generally, we have E[X1 + X2 + . . . + Xn ] = E[X1 ] + E[X2 ] + . . . + E[Xn ] for any choice of random variables X1 , X2 , . . . , Xn . Exercise: check the facts above using the definition of expectation. 10 / 20 Expectation Toss a biased coin with P(heads = 1/3) 3 times, and let X be the total number of heads. For i = 1, 2, 3, let Xi = 1 if the outcome is heads in the ith toss. We have X = X1 + X2 + X3 . Note that each Xi , i = 1, 2, 3, describes the same kind of experiment, and hence X1 , X2 , and X3 have the same distribution. We have, E[X1 ] = E[X2 ] = E[X3 ] = 0 · 2/3 + 1 · 1/3 = 1/3. Using linearity of expectation, we get E[X] = E[X1 ] + E[X2 ] + E[X3 ] = 1. No need to count all possible configurations as before! 11 / 20 Expectations of functions of random variables Let X be a discrete random variable and let H : R → R be a function. The expectation, expected value, mean value, or simply the mean E[H(X)] of H(X) is defined by X X H(x) · P(X = x) = H(x) · fX (x). E[H(X)] = all possible values x of X all possible values x of X Toss a die and let Y be the outcome. Let H(x) = x2 . We have E[H(Y)] = E[Y 2 ] = (1 + 22 + 32 + 42 + 52 + 62 )/6 ∼ 15.17. 12 / 20 Variance Main motivating question: how far from its expectation is a random variable on average? The variance of a random variable X is defined to be Var[X] = σX2 = E[(X − E[X])2 ] = E[(X − µX )2 ]. It is usually easier to use the following computational formula: VarX = E[X 2 ] − (E[X])2 . Toss a die and let Y be the outcome. We have VarY = E[Y 2 ] − (E[Y])2 ∼ 15.17 − (3.5)2 = 2.92. 13 / 20 Variance The standard deviation of a random variable X is defined to be q p σX = σX2 = Var[X]. We say that two random variables X and Y are indpendent if the knowledge about the values assumed by X does not give any information about the values of Y, and vice versa. • If X and Y have the same distribution, then Var[X] = Var[Y]. Variance is a characteristic of the distribution. • Var[X] ≥ 0 for any random variable X. • Var[c] = 0 for a constant c. • Var[aX] = a2 Var[X] for a constant a and a random variable X. • Var[X + Y] = Var[X] + Var[Y] for any pair of independent random variables X and Y. • In particular Var[X + c] = Var[X] for any random variable X and a constant c. 14 / 20 Important distributions based on a coin toss A trial is an experiment with two possible outcomes: success or failure. A trial from a probabilistic point of view is nothing more but a toss of a (possibly biased) coin. • Bernoulli trial - a single trial • Geometric distribution - time until first success in a sequence of independent Bernoulli trials • Binomial distribution - the number of successes in a sequence of independent Bernoulli trials 15 / 20 Bernoulli trial (Bernoulli(p)) A random variable X has Bernoulli distribution with parameter p ∈ [0, 1] if P(X = 1) = p = 1 − P(X = 0). We denote this by writing X ∼ Bernoulli(p). We interpret the event {X = 1} as {success} and {X = 0} as {failure}. Properties: • fX (1) = p, fX (0) = 1 − p, • E[X] = p, • Var[X] = p(1 − p). 16 / 20 Geometric distribution (Geom(p)) A random variable X has geometric distribution with parameter p ∈ [0, 1] if fX (x) = (1 − p)x−1 p, for x = 1, 2, 3, . . . . We denote this by writing X ∼ Geom(p). Properties: • E[X] = 1p , • Var[X] = 1−p , p2 • FX (x) = 1 − (1 − p)bxc , for x ≥ 0 where bxc = floor of x = round down integer value. 17 / 20 Geometric distribution (Geom(p)) One should think of X ∼ Geom(p) as the time of the first success in an infinite sequence of independent trials Bernoulli(p). Suppose that you play a 6 out of 49 lottery every week. Let X be the first time you win since you started playing. How long do you wait on average? We have X ∼ Geom(1/ 49 6 ). Hence, 49 E[X] = weeks ∼ 14 million weeks ∼ 270 thousand years. 6 18 / 20 Binomial distribution (Binom(n, p)) A random variable X has binomial distribution with parameters p ∈ [0, 1] and n = 1, 2, 3, . . . if n x fX (x) = p (1 − p)n−x , for x = 1, 2, 3, . . . . x We denote this by writing X ∼ Binom(n, p). Properties: • E[X] = np, • Var[X] = np(1 − p), Pbxc • FX (x) = i=0 nx px (1 − p)n−x , for x ≥ 0. 19 / 20 One should think of X ∼ Binom(n, p) as the total number of successes in a sequence of n independent trials Bernoulli(p). Toss a die 12 times and let X be the total number of 5’s. Then X ∼ Binom(12, 1/6), and for example EX = 12/6 = 2. 20 / 20