Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 10: Discrete Random Variables 1. Random Variables Definition: Let (Ω, F, P) be a probability space. A real-valued random variable is a function X : Ω → R which satisfies the property that for every x ∈ R, X −1 (−∞, x] = {ω ∈ Ω : X(ω) ≤ x} ∈ F. (1) Remarks: • X is said to be a measurable mapping from Ω to R if it satisfies (1). • For the purposes of this course, we will simply assume measurability. • Formally, the random variable X(ω) depends on a particular outcome ω ∈ Ω, but usually we will suppress the dependence on ω and simply write X. • Random variables can also take values in more general spaces, e.g., we can define random variables that are Rn -valued, complex-valued, matrix-valued, function-valued, and even probability-measure-valued! Intuition: We perform an experiment with an outcome ω in a sample space Ω, but we only record some function of the outcome specified by X = X(ω). Example: Suppose that we toss a coin ten times and record the number of tosses that come up heads. Then our sample space is Ω = {(x1 , · · · , x10 ) : xi ∈ {H, T }} and we can represent the number of heads as a random variable X = X((x1 , · · · , x10 )) = #{i : xi = H}. Definition: The distribution (or law) of a real-valued random variable X is the probability distribution Q on R defined by Q(E) = P{X ∈ E} for any (measurable) subset E ⊂ R. It can be shown that Q is a probability measure on R. Definition: The cumulative distribution function (or CDF) of a real-valued random variable X is the function F : R → [0, 1] defined by F (x) = P{X ≤ x}. The most important fact about the CDF is that it uniquely determines the distribution of the random variable. 1 2.) Discrete Random Variables: Definition: A real-valued random variable is said to be discrete if it can take on only countably many possible values. A discrete random variable is uniquely determined by its probability mass function p(a) = P{X = a}, which can be positive for at most a countably infinite set of values and is zero elsewhere. If X can only take on one of the values x1 , x2 , · · · , then p(xi ) ≥ 0 for i = 1, 2, · · · p(x) = 0 for all other values of x ∞ X p(xi ) = 1. i=1 Example (Ross, 2a): Find C so that the function p(n) defined below is the probability mass function of a random variable taking values in the non-negative integers: p(n) = C λn . n! Answer: We need the probabilities to sum to 1, so 1 = X p(n) = C n≥0 X λn n≥0 n! = Ceλ , which shows that C = e−λ . (2) C is said to be a normalizing constant, and the specific distribution defined by this probability mass function is called the Poisson distribution with parameter λ (of which much more later in the course). Remark: The cumulative distribution function is related to the probability mass function via the following formula: X F (x) = P{X ≤ x} = p(y). y≤x Notice that F is a step function, and that the step size at a point x is equal to the probability mass at that point. 2 3. Expectation Definition: Let X be a discrete random variable with probability mass function p(x). The expected value (or expectation or mean) of X is defined to be X E[X] = p(x) · x. x Later in the course we will show that if we have an infinite sequence of independent random variables, say (Xn ; n ≥ 1), all with the same distribution as X, then the partial sums converge to the expected value of X: N 1 X Xi → E[X]. N i=1 Example (Ross, 3a): Let X be the number obtained by rolling a fair die. Then E[X] = 6 X 1 i=1 6 i = 7 . 2 Thus the expected value of a random variable X need not belong to the set of values that can be assumed by X. Example (Ross, 3b): Recall that the indicator function of an event E is the function IA : Ω → {0, 1} defined by 1 if ω ∈ A IA (ω) = 0 if ω ∈ / A. Then IA is a discrete random variable with expected value equal to E[IA ] = P(A) · 1 + (1 − P(A)) · 0 = P(A). 4. Expectation of a function of a random variable Remark: One way to obtain new random variables is to compose a real-valued function with an existing random variable. Let X be a discrete random variable that takes on values xi , i ≥ 1 with probabilities p(xi ), and let g be a real-valued function. Then Y = g(X) is a discrete random variable with values in some countable set {yi ; i ≥ 1} and probability mass function X p(yi ) = p(xj ). xj :g(xj )=yi Notice that more than one value xi may contribute to the probability of each point yj if g is not one-to-one. The next result is sometimes known as the Law of the Unconscious Statistician. 3 Proposition 4.1: Let X be a discrete random variable that takes on values xi , i ≥ 1 with probabilities p(xi ). Then, for any real-valued function g, X E[g(X)] = p(xi )g(xi ). i Proof: Let g(X) take on the values yi , i ≥ 1 with probabilities P(g(X) = yi ). Then, X E[g(X)] = yj P(g(X) = yj ) j = X yj j = = = X X X j i:g(xi )=yj X X j i:g(xi )=yj X p(xi ) i:g(xi )=yj yj p(xi ) g(xi )p(xi ) p(xi )g(xi ). i A simple, but very useful consequence of this proposition is: Corollary 4.1: If a and b are constants, then E[aX + b] = aE[X] + b. Caveat: For non-linear functions g, it is generally true that E[g(X)] 6= g (E[X]) , i.e., we cannot interchange the composition of functions with the taking of expectations. Example: Let X be a random variable with P(X = 1) = P(X = −1) = 1/2. Then 1 if n is even E[X n ] = 0 if n is odd. However, E[X]n = 0 for all n ≥ 1. 5.) Variance It is often useful to have a measure of the dispersion or spread of a random variable about its expected value. One such measure is provided by the variance. 4 Definition: Let X be a random variable with mean µ = E[X]. Then the variance of X is the quantity Var(X) = E[(X − µ)2 ]. In other words, the variance of X measures the mean square deviation of X from its expectation. This is often most easily calculated using the following formula: E[(X − µ)2 ] = E X 2 − 2µX + µ2 = E[X 2 ] − 2µE[X] + µ2 = E[X 2 ] − (E[X])2 . Example (Ross, 5a): Let X be the outcome when a fair die is rolled. Then Var(X) = E[X 2 ] − (7/2)2 n 1 X 2 49 i − = 6 4 i=1 = 91 35 − 494 = . 6 12 A useful identity related to Corollary 4.1 is that Var(aX + b) = a2 Var(X), whenever a and b are constants. In other words, a change of scale but not of location changes the variance of a random variable. 5