Download Random Variables & Expectation

Random Variables & Expectation Random Variable A random variable (r.v.) is a well defined rule for assigning a numerical value to all possible outcomes of an experiment. example: experiment: outcomes: sample space S: random variable: taking a course grades A, B, C, D, F discrete & finite Y = 4 if grade is A Y = 3 if grade is B Y = 2 if grade is C Y = 1 if grade is D Y = 0 if grade is F Experiment: throw 2 dice What are the possible outcomes? 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 Define the random variable X to be the sum of the dots on the 2 dice. For which outcomes does X = 9 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 For which outcomes does X = 9 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 What is Pr(X=9)? 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 Since there are 36 equally likely outcomes, each has a probability of 1/36. So since there are 4 outcomes that yield X=9, Pr(X=9) = 4/36 =1/9 Let’s calculate the probabilities of all the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 Pr(X=x) Let’s calculate the probabilities of the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 2 Pr(X=x) 1/36 Let’s calculate the probabilities of the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 2 3 Pr(X=x) 1/36 2/36 Let’s calculate the probabilities of the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 2 3 4 Pr(X=x) 1/36 2/36 3/36 Let’s calculate the probabilities of the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 2 3 4 5 Pr(X=x) 1/36 2/36 3/36 4/36 Let’s calculate the probabilities of the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 2 3 4 5 6 Pr(X=x) 1/36 2/36 3/36 4/36 5/36 Let’s calculate the probabilities of the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 2 3 4 5 6 7 Pr(X=x) 1/36 2/36 3/36 4/36 5/36 6/36 Let’s calculate the probabilities of the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 2 3 4 5 6 7 8 Pr(X=x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 Let’s calculate the probabilities of the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 2 3 4 5 6 7 8 9 Pr(X=x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 Let’s calculate the probabilities of the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 2 3 4 5 6 7 8 9 10 Pr(X=x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 Let’s calculate the probabilities of the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 2 3 4 5 6 7 8 9 10 11 Pr(X=x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 Let’s calculate the probabilities of the possible values x of the random variable X x 1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5 2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6 2 3 4 5 6 7 8 9 10 11 12 Pr(X=x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 Let’s graph the probability distribution of X. Pr(X=x) x Pr(X=x) 2 3 4 5 6 7 8 9 10 11 12 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 8/36 6/36 4/36 2/36 0 2 3 4 5 6 7 8 9 10 11 12 x Pr(X=x) = f(x) = p(x) as described in this table or graph is called the probability distribution or probability mass function (p.m.f.) Pr(X=x) x Pr(X=x) 2 3 4 5 6 7 8 9 10 11 12 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 8/36 6/36 4/36 2/36 0 2 3 4 5 6 7 8 9 10 11 12 x Properties of Probability Distributions 1. 0 ≤ Pr(X=x) ≤ 1 for all x 2.  p ( x)  1 x Cumulative Mass Function F ( x0 )  Pr( X  x0 )   p( x) x  x0 Cumulative Mass Function (2 dice problem) x Pr(X=x) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/36 10 3/36 11 2/36 12 1/36 Pr(X≤x) 1/36 3/36 6/36 10/36 15/36 21/36 26/36 30/36 33/36 35/36 1 1 F(x) 30/36 24/36 18/36 12/36 6/36 0 1 2 3 4 5 6 7 8 9 10 11 12 13 x Expectation, Expected Value, or Mean of a Random Variable   E ( X )   xp( x) x Notice the similarity of the definitions of the mean of a random variable & the mean of a frequency distribution for a population   E ( X )   xp( x) x  fi  pop.freq. distrib.:   (1/ N ) xi f i   xi   i 1 i 1 N c c Recall that probability [p(x)] is the relative frequency [f/N] with which something occurs over the long run. So these definitions are saying the same thing. Example: Suppose that a stock broker wants to estimate the price of a certain stock one year from now. If the probability mass function of the price in a year is as given, determine the expected price. x = price in one year 94 98 102 106 p(x) 0.25 0.25 0.25 0.25 Example: Suppose that a stock broker wants to estimate the price of a certain stock one year from now. If the probability mass function of the price in a year is as given, determine the expected price. x = price in one year 94 98 102 106 p(x) 0.25 0.25 0.25 0.25 1.00 Example: Suppose that a stock broker wants to estimate the price of a certain stock one year from now. If the probability mass function of the price in a year is as given, determine the expected price. x = price in one year 94 98 102 106 p(x) 0.25 0.25 0.25 0.25 1.00 xp(x) 23.5 24.5 25.5 26.5 Example: Suppose that a stock broker wants to estimate the price of a certain stock one year from now. If the probability mass function of the price in a year is as given, determine the expected price. x = price in one year 94 98 102 106 p(x) 0.25 0.25 0.25 0.25 1.00 xp(x) 23.5 24.5 25.5 26.5 100.0 Notice that you do NOT divide by the number of observations when you’re done adding. Also, the probabilities do not have to be equal; they just have to add up to one. Theorem: Suppose that g(X) is a function of a random variable X, & the probability mass function of X is px(x). Then the expected value of g(X) is E[ g ( X )]   g ( x) px ( x) x Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using 1. the definition of expected value, & 2. the previous theorem. x -2 -1 1 2 p(x) 0.1 0.2 0.3 0.4 Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using 1. the definition of expected value, & 2. the previous theorem. x -2 -1 1 2 p(x) 0.1 0.2 0.3 0.4 y p(y) Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using 1. the definition of expected value, & 2. the previous theorem. x -2 -1 1 2 p(x) 0.1 0.2 0.3 0.4 y 1 p(y) 0.5 Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using 1. the definition of expected value, & 2. the previous theorem. x -2 -1 1 2 p(x) 0.1 0.2 0.3 0.4 y 1 4 p(y) 0.5 0.5 Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using 1. the definition of expected value, & 2. the previous theorem. x -2 -1 1 2 p(x) 0.1 0.2 0.3 0.4 y 1 4 p(y) 0.5 0.5 yp(y) 0.5 2.0 Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using 1. the definition of expected value, & 2. the previous theorem. x -2 -1 1 2 p(x) 0.1 0.2 0.3 0.4 y 1 4 p(y) yp(y) 0.5 0.5 0.5 2.0 E(Y) = 2.5 Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using 1. the definition of expected value, & 2. the previous theorem. x -2 -1 1 2 p(x) 0.1 0.2 0.3 0.4 y 4 1 1 4 Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using 1. the definition of expected value, & 2. the previous theorem. x -2 -1 1 2 p(x) 0.1 0.2 0.3 0.4 y 4 1 1 4 ypx(x) 0.4 0.2 0.3 1.6 Example: Suppose Y = X2 & the distribution of X is as given below. Determine the mean of g(X) by using 1. the definition of expected value, & 2. the previous theorem. x -2 -1 1 2 p(x) 0.1 0.2 0.3 0.4 y ypx(x) 4 0.4 1 0.2 1 0.3 4 1.6 E(Y) = 2.5 Definition: Variance of a random variable X   V ( X )  E[( X   ) ] 2 2   ( X   ) p ( x) 2 x Theorem: The variance of X can also be calculated as follows:   V(X)  E(X )  [E(X)] 2 2 2 Standard Deviation of a random variable X     V (X ) 2 Example: Suppose sales at a donut shop are distributed as below. Calculate (a) the mean number of donuts sold, (b) the variance (using both the definition of the variance & the theorem), & (c) the standard deviation. x p(x) 1 0.08 2 0.27 4 0.10 6 0.33 12 0.22 First, the mean…. x p(x) xp(x) 1 0.08 0.08 2 0.27 0.54 4 0.10 0.40 6 0.33 1.98 12 0.22 2.64 First, the mean…. x p(x) xp(x) 1 0.08 0.08 2 0.27 0.54 4 0.10 0.40 6 0.33 1.98 12 0.22 2.64 =5.64 Next, the variance using the definition:  2  V ( X )  E[( X   ) 2 ]   ( X   ) 2 p( x) x x p(x) xp(x) x- 1 0.08 0.08 -4.64 2 0.27 0.54 -3.64 4 0.10 0.40 -1.64 6 0.33 1.98 0.36 12 0.22 2.64 6.36 =5.64 Next, the variance using the definition: 2 2 2   V ( X )  E[( X   ) ]   ( X   ) p( x) x p(x) xp(x) 1 0.08 0.08 -4.64 21.53 2 0.27 0.54 -3.64 13.25 4 0.10 0.40 -1.64 2.69 6 0.33 1.98 0.36 0.13 12 0.22 2.64 6.36 40.45 =5.64 x- (x-)2 x Next, the variance using the definition:   V ( X )  E[( X   ) ]   ( X   ) p( x) 2 2 2 x p(x) xp(x) 1 0.08 0.08 -4.64 21.53 1.72 2 0.27 0.54 -3.64 13.25 3.58 4 0.10 0.40 -1.64 2.69 0.27 6 0.33 1.98 0.36 0.13 0.04 12 0.22 2.64 6.36 40.45 8.90 =5.64 x- (x-)2 (x-)2p(x) x Next, the variance using the definition:  2  V ( X )  E[( X   ) 2 ]   ( X   ) 2 p( x) x p(x) xp(x) 1 0.08 0.08 -4.64 21.53 1.72 2 0.27 0.54 -3.64 13.25 3.58 4 0.10 0.40 -1.64 2.69 0.27 6 0.33 1.98 0.36 0.13 0.04 12 0.22 2.64 6.36 40.45 8.90 =5.64 x- (x-)2 (x-)2p(x) x 2 =14.51 Now, the variance using the theorem: V(X) = E(X2)-[E(X)]2. x2 p(x) xp(x) 1 0.08 0.08 -4.64 21.53 1.72 1 2 0.27 0.54 -3.64 13.25 3.58 4 4 0.10 0.40 -1.64 2.69 0.27 16 6 0.33 1.98 0.36 0.13 0.04 36 12 0.22 2.64 6.36 40.45 8.90 144 =5.64 x- (x-)2 (x-)2p(x) x 2 =14.51 Now, the variance using the theorem: V(X) = E(X2)-[E(X)]2. (x-)2 (x-)2p(x) x2 x2p(x) 1.72 1 0.08 -3.64 13.25 3.58 4 1.08 0.40 -1.64 2.69 0.27 16 1.60 0.33 1.98 0.36 0.13 0.04 36 11.88 12 0.22 2.64 6.36 40.45 8.90 144 31.68 x p(x) xp(x) 1 0.08 0.08 -4.64 21.53 2 0.27 0.54 4 0.10 6 =5.64 x- 2 =14.51 Now, the variance using the theorem: V(X) = E(X2)-[E(X)]2. (x-)2 (x-)2p(x) x2 x2p(x) 1.72 1 0.08 -3.64 13.25 3.58 4 1.08 0.40 -1.64 2.69 0.27 16 1.60 0.33 1.98 0.36 0.13 0.04 36 11.88 12 0.22 2.64 6.36 40.45 8.90 144 31.68 x p(x) xp(x) 1 0.08 0.08 -4.64 21.53 2 0.27 0.54 4 0.10 6 =5.64 x- 2 =14.51 E(X2)=46.32 Now, the variance using the theorem: V(X) = E(X2)-[E(X)]2. (x-)2 (x-)2p(x) x2 x2p(x) 1.72 1 0.08 -3.64 13.25 3.58 4 1.08 0.40 -1.64 2.69 0.27 16 1.60 0.33 1.98 0.36 0.13 0.04 36 11.88 12 0.22 2.64 6.36 40.45 8.90 144 31.68 x p(x) xp(x) 1 0.08 0.08 -4.64 21.53 2 0.27 0.54 4 0.10 6 =5.64 x- 2 =14.51 2 = V(X) = E(X2) – [E(X)]2 = 46.32 – (5.64)2 = 14.51 E(X2)=46.32 And lastly, the standard deviation, by taking the square root of the variance. (x-)2 (x-)2p(x) x2 x2p(x) 1.72 1 0.08 -3.64 13.25 3.58 4 1.08 0.40 -1.64 2.69 0.27 16 1.60 0.33 1.98 0.36 0.13 0.04 36 11.88 12 0.22 2.64 6.36 40.45 8.90 144 31.68 x p(x) xp(x) 1 0.08 0.08 -4.64 21.53 2 0.27 0.54 4 0.10 6 =5.64 x- 2 =14.51 E(X2)=46.32 2 = V(X) = E(X2) – [E(X)]2 = 46.32 – (5.64)2 = 14.51  = 3.81 Important Theorem If X has mean  and variance 2, then (X-)/ has mean 0 and variance 1. Example: (G-)/ Suppose your course grades have a mean of 2.7 and a standard deviation of 1.2. Suppose you took your grades, subtracted 2.7 from each one, then divided those results by 1.2. The new set of numbers would have a mean of 0 and a standard deviation of 1. Expectation Rules Let k, a, & b be constants. 1. E(k) = k The mean of a constant is the constant. 2. V(k) = 0 The variance of a constant is zero. 3. E(a + bX) = a + b E(X) 4. V(a + bX) = b2 V(X) Example: If X has a mean of 3 and a variance of 2/3, what are the mean and variance of Y=5+2X ? First find the mean E(Y) = E(5+2X). E(a + bX) = a + b E(X). Let a=5 & b=2. Then just plug into the formula. So, E(Y) = E(5+2X) = 5 + 2 E(X) = 5 + 2(3) = 11. Next find the variance V(Y) = V(5+2X). V(a + bX) = b2 V(X). Again let a=5 and b=2 and just plug into the formula. V(Y) = V(5+2X) = 22 V(X) = 4 V(X) = 4(2/3) = 8/3. Notice that the constant term shifts the mean but has no effect on the spread of the distribution. Joint Probability Distribution for 2 Discrete Random Variables X & Y p(x,y) = Pr(X=x and Y=y) Properties of Joint Probability Distributions 1. 0  p(x, y)  1 f or all x and y 2.   p(x, y )  1 x y Example: Consider the following joint distribution of the number of jobs & the number of promotions of college graduates in their 1st 5 years out of college. Number of jobs (x) Number of Promotions (y) 1 2 3 4 1 0.10 0.15 0.12 0.06 2 0.05 0.07 0.10 0.05 3 0.04 0.02 0.14 0.10 For example, the probability of 3 jobs & 2 promotions is 0.02. Number of jobs (x) Number of Promotions (y) 1 2 3 4 1 0.10 0.15 0.12 0.06 2 0.05 0.07 0.10 0.05 3 0.04 0.02 0.14 0.10 We can determine the marginal distribution of the 2 random variables X & Y just as we did before for 2 events. Just add across the row or down the column. Number of jobs (x) Number of Promotions (y) 1 2 3 4 1 0.10 0.15 0.12 0.06 2 0.05 0.07 0.10 0.05 3 0.04 0.02 0.14 0.10 For the probability of 1 job… 1 2 3 4 pX(x): marginal prob. of x 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 3 0.04 0.02 0.14 0.10 Number of jobs (x) Number of Promotions (y) Similarly for the probabilities of 2 or 3 jobs … 1 2 3 4 pX(x): marginal prob. of x 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 Number of jobs (x) Number of Promotions (y) For the probability of 1 promotion … 1 2 3 4 pX(x): marginal prob. of x 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 Number of jobs (x) Number of Promotions (y) pY(y): marginal 0.19 prob. of y and for the probabilities of 2, 3, or 4 promotions … 1 2 3 4 pX(x): marginal prob. of x 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 Number of jobs (x) Number of Promotions (y) Notice again, that you must get at total one when you total the marginal probabilities for x and for y. 1 2 3 4 pX(x): marginal prob. of x 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) Number of Promotions (y) Conditional Probabilities for Random Variables Example The probability that X is 2 given that Y is 3: pX|Y(2|3) = Pr(X=2|Y=3) = Pr(X=2 & Y=3)/Pr(Y=3). The probability that Y is 2 given that X is 3: pY|X(2|3) = Pr(Y=2|X=3) = Pr(Y=2 & X=3)/Pr(X=3). Let’s do the calculations using our previous example. 1 2 3 4 pX(x): marginal prob. of x 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 Number of jobs (x) Number of Promotions (y) pY(y): marginal 0.19 prob. of y pX|Y(2|3) = Pr(X=2|Y=3) = Pr(X=2 & Y=3)/Pr(Y=3) 0.10/0.36 = 0.278. pY|X(2|3) = Pr(Y=2|X=3) = Pr(Y=2 & X=3)/Pr(X=3) = 0.02/0.30 = 0.067. 0.24 0.36 0.21 1.00 Cumulative Joint Mass Function for 2 Discrete Random Variables X & Y F(X,Y) = Pr(X ≤ x and Y ≤ y) Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x F(2,3) Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x F(2,3) = f(1,1) … Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x F(2,3) = f(1,1) + f(1,2) … Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x F(2,3) = f(1,1) + f(1,2) + f(1,3) … Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x F(2,3) = f(1,1) + f(1,2) + f(1,3) … Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) … Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) + f(2,2) … Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) + f(2,2) + f(2,3) … Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) + f(2,2) + f(2,3) Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) + f(2,2) + f(2,3) = 0.10 + 0.15 + 0.12 + 0.05 + 0.07 + 0.10 Job/Promotion Example: Find probability that a person had 2 or fewer jobs & 3 or fewer promotions Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x F(2,3) = f(1,1) + f(1,2) + f(1,3) + f(2,1) + f(2,2) + f(2,3) = 0.10 + 0.15 + 0.12 + 0.05 + 0.07 + 0.10 = 0.59 Independence Recall that 2 events A & B were independent if Pr(A∩B)=Pr(A) Pr(B) Similarly 2 random variables are independent if p(x,y) = pX(x) pY(y) for all values of x & y In our previous example, are the number of jobs & number of promotions independent? Number of Promotions (y) 2 3 4 1 0.10 0.15 0.12 0.06 0.43 2 0.05 0.07 0.10 0.05 0.27 3 0.04 0.02 0.14 0.10 0.30 pY(y): marginal 0.19 prob. of y 0.24 0.36 0.21 1.00 Number of jobs (x) 1 pX(x): marginal prob. of x We must have p(x,y) = pX(x) pY(y) for all values of x & y. To start, does p(1,1) equal pX(1) pY(1) ? p(1,1) = 0.10 pX(1) pY(1) = 0.43 • 0.19 = 0.0817 ≠ 0.10 So X & Y are not independent. If that case had been equal, we wouldn’t be done yet. We’d have to verify that equality held for all the cells. Theorem: mean of a function of 2 random variables X & Y E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Suppose that based on the joint distribution of the length X & width Y of lumber sold by a lumberyard, we would like to determine the mean length, mean width, & mean area of the lumber. So we want to calculate E(X), E(Y), and E(XY). Given the joint distribution below, calculate E(X), E(Y), & E(XY). Y 2 4 6 4 0.05 0.05 0.10 8 0.10 0.50 0.20 X First, determine the marginal distributions. Y 2 4 6 4 0.05 0.05 0.10 8 0.10 0.50 0.20 X The marginal distribution of X ... Y pX(x) 2 4 6 4 0.05 0.05 0.10 0.20 8 0.10 0.50 0.20 0.80 X The marginal distribution of Y ... Y pX(x) 2 4 6 4 0.05 0.05 0.10 0.20 8 0.10 0.50 0.20 0.80 0.15 0.55 0.30 X pY(y) Check that the marginal distribution probabilities sum to 1. Y pX(x) 2 4 6 4 0.05 0.05 0.10 0.20 8 0.10 0.50 0.20 0.80 0.15 0.55 0.30 1.00 X pY(y) Next we calculate the mean length & mean width. Y pX(x) 2 4 6 4 0.05 0.05 0.10 0.20 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 X For E(X), remember we need to multiply the values by their probabilities and add up. Y pX(x) 2 4 6 4 0.05 0.05 0.10 0.20 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 X x p(x) xp(x) We get the values of X and their probabilities … Y pX(x) x p(x) 2 4 6 4 0.05 0.05 0.10 0.20 4 0.20 8 0.10 0.50 0.20 0.80 8 0.80 pY(y) 0.15 0.55 0.30 1.00 X xp(x) multiply … Y pX(x) x p(x) xp(x) 2 4 6 4 0.05 0.05 0.10 0.20 4 0.20 0.80 8 0.10 0.50 0.20 0.80 8 0.80 6.40 pY(y) 0.15 0.55 0.30 1.00 X and add up. Y pX(x) x p(x) xp(x) 2 4 6 4 0.05 0.05 0.10 0.20 4 0.20 0.80 8 0.10 0.50 0.20 0.80 8 0.80 6.40 pY(y) 0.15 0.55 0.30 1.00 X 7.20 We now have our E(X). Y pX(x) x p(x) xp(x) 2 4 6 4 0.05 0.05 0.10 0.20 4 0.20 0.80 8 0.10 0.50 0.20 0.80 8 0.80 6.40 pY(y) 0.15 0.55 0.30 1.00 X E(X) = 7.20 For E(Y), we do the same thing. Y pX(x) 2 4 6 4 0.05 0.05 0.10 0.20 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 X y p(y) yp(y) Get the values of Y and their probabilities … Y pX(x) 2 4 0.05 4 0.05 y p(y) 2 0.15 4 0.55 6 0.30 6 0.10 0.20 X 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 yp(y) multiply … Y pX(x) 2 4 0.05 4 0.05 y p(y) yp(y) 2 0.15 0.30 4 0.55 2.20 6 0.30 1.80 6 0.10 0.20 X 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 and add up. Y pX(x) 2 4 0.05 4 0.05 y p(y) yp(y) 2 0.15 0.30 4 0.55 2.20 6 0.30 1.80 6 0.10 0.20 X 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 4.30 There’s our E(Y). Y pX(x) 2 4 0.05 4 0.05 y p(y) yp(y) 2 0.15 0.30 4 0.55 2.20 6 0.30 1.80 6 0.10 0.20 X 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 E(Y) = 4.30 To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 0.05 0.10 0.20 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 X For the mean area, E(XY), the theorem translates to E[ XY ]   xy p( x, y ) x y To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 0.05 0.10 0.20 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 X E[XY]   xy p(x, y) x y To keep track of the xy terms, we are going to put them in our table. To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 0.10 0.20 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 X E[ XY ]    xy p ( x, y ) x y To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 0.20 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 X E[ XY ]    xy p ( x, y ) x y To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 0.50 0.20 0.80 pY(y) 0.15 0.55 0.30 1.00 X E[ XY ]    xy p ( x, y ) x y To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 0.20 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x y To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 (32) 0.20 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x y To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x y To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x y Next, we need to multiple the xy terms by the corresponding probabilities, … To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x … and then add it all up. y To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x So we have 0.05 (8) ... y To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x So we have 0.05 (8) + 0.05 (16) ... y To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x y So we have 0.05 (8) + 0.05 (16) + 0.10 (24) ... To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x y So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) ... To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x y So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) + 0.50 (32) ... To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x y So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) + 0.50 (32) + 0.20 (48) ... To calculate the mean area E(XY), we use the theorem E[ g ( X , Y )]   g ( x, y ) p( x, y ) x y Y pX(x) 2 4 6 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80 0.15 0.55 0.30 1.00 X pY(y) E[ XY ]    xy p ( x, y ) x y So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) + 0.50 (32) + 0.20 (48) = 30.8 for the mean area. You might wonder if we could get E(XY) by just multiplying E(X) by E(Y). The answer is generally not. In our example, we had E(X) = 7.2, E(Y) =4.3, & E(XY) = 30.8 E(X) E(Y) = 30.96, not 30.80. Close in this case, but not the same. If X and Y are independent, then it is true that E(XY) = E(X) E(Y). It may also hold occasionally in other cases. But generally, it doesn’t work. Definition: Covariance of X & Y C ( X ,Y )  E[( X   X )(Y  Y )]   ( x   X )( y  Y ) p( x, y) x y What does this mean? Suppose that two variables tend to move in the same direction, like study time and grades. Next, when x is large, so that it is larger than its mean, then x-X > 0. When x is large, y tends to be large as well, so that y-Y > 0 also. Remember, that the p(x,y) values are probabilities and therefore must be positive. So those terms in the formula would look like C ( X , Y )   ( x   X )( y  Y ) p( x, y ) x y + These products are positive. + + Similarly, since x and y tend to be small together, we have x-X < 0 with y-Y<0 too. Those terms would look like C ( X , Y )   ( x   X )( y  Y ) p( x, y ) x y - - + These products are positive too. So we’re adding up a lot of positive numbers. What all that means is that when 2 variables tend to move in the same direction, the covariance will positive. When 2 variables tend to move in opposite directions, their covariance C(X,Y) < 0, perhaps like party time and grades. If variables don’t tend to move either in the same or opposite directions, their covariance C(X,Y) = 0. This case includes independent variables. It is usually easier to calculate covariances using this theorem. Theorem: C(X,Y) = E(XY) – E(X) E(Y) Returning to the lumber example Remember we had E(X) = 7.2, E(Y) = 4.3, & E(XY) = 30.8 Then the covariance would be C(X,Y) = E(XY) – E(X) E(Y) = (30.8) – (7.2)(4.3) = - 0.16 Difficulty The value of the covariance changes when you change units. That is, you get different answers if you use feet, inches, or meters. So it’s difficult to tell if a particular answer means a strong relationship or not. Fortunately, we have a solution to this problem … Correlation Coefficient The correlation coefficient is similar to the covariance, but it doesn’t vary with the units used. Correlation Coefficient  ( X ,Y )  C ( X ,Y )  X Y The correlation coefficient is denoted by the Greek letter rho, . It’s computed by dividing the covariance of X & Y by the standard deviations of X & of Y. The correlation coefficient is always between -1 and 1. -1 ≤  ≤ 1. Correlation Coefficient -1 ≤  ≤ 1 So, if your correlation coefficient  is close to 1, you have a strong positive relationship. If it is close to -1, you have a strong negative relationship. If it is close to zero, there is no strong linear relationship at all. Back to the lumber example again  ( X ,Y )  C ( X ,Y )  X Y We had C(X,Y) = -0.16. We need the standard deviations of X and Y, which we have not calculated yet. This is what we had for X so far. x p(x) xp(x) 4 0.20 0.80 8 0.80 6.40 E(X) = 7.20 Recall we said previously that we can calculate V(X) as V(X) = E(X2) – [E(X)]2. x p(x) xp(x) We have E(X) but we need E(X2). 4 0.20 0.80 8 0.80 6.40 The theorem E[g(X)] = Sg(x)p(x) gives us E(X) = 7.20 E(X2) = Sx2p(x) E(X2) = Sx2p(x) x p(x) xp(x) x2 4 0.20 0.80 16 8 0.80 6.40 64 E(X) = 7.20 x2p(x) E(X2) = Sx2p(x) x p(x) xp(x) x2 x2p(x) 4 0.20 0.80 16 3.2 8 0.80 6.40 64 51.2 E(X) = 7.20 E(X2) = Sx2p(x) x p(x) xp(x) x2 x2p(x) 4 0.20 0.80 16 3.2 8 0.80 6.40 64 51.2 E(X) = 7.20 E(X2) = 54.4 Now we need to subtract to get V(X). x p(x) xp(x) x2 x2p(x) 4 0.20 0.80 16 3.2 8 0.80 6.40 64 51.2 E(X) = 7.20 V(X) = E(X2) – [E(X)]2 E(X2) = 54.4 x p(x) xp(x) x2 x2p(x) 4 0.20 0.80 16 3.2 8 0.80 6.40 64 51.2 E(X) = 7.20 E(X2) = 54.4 V(X) = E(X2) – [E(X)]2 = 54.4 – (7.2)2 x p(x) xp(x) x2 x2p(x) 4 0.20 0.80 16 3.2 8 0.80 6.40 64 51.2 E(X) = 7.20 E(X2) = 54.4 V(X) = E(X2) – [E(X)]2 = 54.4 – (7.2)2 =2.56 Take the square root to get the standard deviation X x p(x) xp(x) x2 x2p(x) 4 0.20 0.80 16 3.2 8 0.80 6.40 64 51.2 E(X) = 7.20 E(X2) = 54.4 V(X) = E(X2) – [E(X)]2 = 54.4 – (7.2)2 =2.56 X = 1.60 We do the same thing with Y. y p(y) yp(y) 2 0.15 0.30 4 0.55 2.20 6 0.30 1.80 E(Y) = 4.30 Get y2 y p(y) yp(y) y2 2 0.15 0.30 4 4 0.55 2.20 16 6 0.30 1.80 36 E(Y) = 4.30 y2p(y) Multiply by p(y). y p(y) yp(y) y2 y2p(y) 2 0.15 0.30 4 0.60 4 0.55 2.20 16 8.80 6 0.30 1.80 36 10.80 E(Y) = 4.30 Add to get E(Y2). y p(y) yp(y) y2 y2p(y) 2 0.15 0.30 4 0.60 4 0.55 2.20 16 8.80 6 0.30 1.80 36 10.80 E(Y) = 4.30 E(Y2) = 20.20 Subtract to get V(Y). y p(y) yp(y) y2 y2p(y) 2 0.15 0.30 4 0.60 4 0.55 2.20 16 8.80 6 0.30 1.80 36 10.80 E(Y) = 4.30 E(Y2) = 20.20 V(Y) = E(Y2) – [E(Y)]2 = 20.20 – (4.3)2 =1.71 Take the square root to get the standard deviation Y y p(y) yp(y) y2 y2p(y) 2 0.15 0.30 4 0.60 4 0.55 2.20 16 8.80 6 0.30 1.80 36 10.80 E(Y) = 4.30 E(Y2) = 20.20 V(Y) = E(Y2) – [E(Y)]2 = 20.20 – (4.3)2 =1.71 Y = 1.31 Now we have everything we need to compute the correlation coefficient for the lumber problem.  ( X ,Y )  C ( X ,Y )  X Y 0.16   0.076 (1.60)(1.31) This number is much closer to 0 than it is to -1. So the negative relation between the length & width of the lumber is very weak. Theorem 1. E(aX + bY) = aE(X) + bE(Y) 2. V(aX + bY) = a2V(X) + b2V(Y) + 2ab[C(X,Y)] Example: The mean & variance of X are 1 & 5 respectively. The mean & variance of Y are 2 & 6 respectively. The covariance of X & Y is 7. Determine the mean & variance of 4X + 3Y. Recall: E(aX + bY) = aE(X) + bE(Y) V(aX + bY) = a2V(X) + b2V(Y) + 2ab[C(X,Y)] To solve this problem what should “a” & “b” be? a is 4 & b is 3. E(aX + bY) = aE(X) + bE(Y) = 4 (1) + 3(2) = 4 + 6 =10 V(aX + bY) = a2V(X) + b2V(Y) + 2ab[C(X,Y)] = 42V(X) + 32V(Y) + 2(4)(3)C(X,Y) = 16(5) + 9(6) +24(7) = 80 + 54 + 168 =302 Consider the following joint distribution of X & Y. y 2 1 x Determine the following: 4 0.20 0.25 3 0.15 0.20 5 0.15 0.05 a. The mean & variance of X b. The mean & variance of Y c. The covariance & correlation coefficient of X & Y d. The mean & variance of X+Y First, determine the marginal distribution of X y 2 x 4 pX(x) 1 0.20 0.25 0.45 3 0.15 0.20 0.35 5 0.15 0.05 0.20 and the marginal distribution of Y. y 2 x 4 pX(x) 1 0.20 0.25 0.45 3 0.15 0.20 0.35 5 0.15 0.05 0.20 pY(y) 0.50 0.50 Verify that they sum to 1. y 2 x 4 pX(x) 1 0.20 0.25 0.45 3 0.15 0.20 0.35 5 0.15 0.05 0.20 pY(y) 0.50 0.50 1 Set up table to compute the mean & variance of X. x y pX(x) 2 4 1 0.20 0.25 0.45 3 0.15 0.20 0.35 5 0.15 0.05 0.20 pY(y) 0.50 0.50 x 1 p(x) xp(x) x2p(x) Fill in the values of X and their probabilities. y x p(x) xp(x) x2p(x) 1 0.45 pX(x) 2 4 1 0.20 0.25 0.45 3 0.35 3 0.15 0.20 0.35 5 0.20 5 0.15 0.05 0.20 pY(y) 0.50 0.50 x 1 Multiply x by p(x). x p(x) xp(x) x2p(x) 1 0.45 0.45 3 0.35 1.05 5 0.20 1.00 Add to get the mean of X. x p(x) xp(x) 1 0.45 0.45 3 0.35 1.05 5 0.20 1.00 E(X) =2.50 x2p(x) To calculate the variance, first compute E(X2) = S x2p(x). x p(x) xp(x) x2p(x) 1 0.45 0.45 0.45 3 0.35 1.05 3.15 5 0.20 1.00 5.00 E(X) =2.50 To calculate the variance, first compute E(X2) = S x2p(x). x p(x) xp(x) x2p(x) 1 0.45 0.45 0.45 3 0.35 1.05 3.15 5 0.20 1.00 5.00 E(X) =2.50 E(X2)=8.60 Calculate the variance as V(X) = E(X2) – [E(X)]2. x p(x) xp(x) x2p(x) 1 0.45 0.45 0.45 3 0.35 1.05 3.15 5 0.20 1.00 5.00 E(X) =2.50 E(X2)=8.60 V(X) = E(X2) – [E(X)]2 = 8.6 – (2.5)2 = 2.35 Set up table to compute the mean & variance of Y. y 2 pX(x) 4 1 0.20 0.25 0.45 3 0.15 0.20 0.35 5 0.15 0.05 0.20 pY(y) 0.50 0.50 x 1 y p(y) yp(y) y2p(y) Fill in the values of Y and their probabilities. y 2 1 x 0.20 pX(x) 4 y p(y) yp(y) y2p(y) 2 0.5 4 0.5 0.25 0.45 3 0.15 0.20 0.35 5 0.15 0.05 0.20 pY(y) 0.50 0.50 1 Multiply y by p(y) y p(y) yp(y) y2p(y) 2 0.5 1 4 0.5 2 and add to get E(Y). y p(y) yp(y) 2 0.5 1 4 0.5 2 E(Y)= 3 y2p(y) To calculate the variance, first compute E(Y2) = S y2p(y). y p(y) yp(y) y2p(y) 2 0.5 1 2 4 0.5 2 8 E(Y)= 3 To calculate the variance, first compute E(Y2) = S y2p(y). y p(y) yp(y) y2p(y) 2 0.5 1 2 4 0.5 2 8 E(Y)= 3 E(Y2) = 10 Calculate the variance as V(Y) = E(Y2) – [E(Y)]2. y p(y) yp(y) y2p(y) 2 0.5 1 2 4 0.5 2 8 E(Y)= 3 E(Y2) = 10 V(Y) = E(Y2) – [E(Y)]2 = 10 – (3)2 = 1 To determine the C(X,Y) = E(XY) - E(X) E(Y), we need E ( XY )   xy p( x, y) x y As before, we’ll put the xy values in the table next to the probability values y x pX(x) 2 4 1 0.20 (2) 0.25 (4) 0.45 3 0.15 (6) 0.20 (12) 0.35 5 0.15 (10) 0.05 (20) 0.20 0.50 0.50 1.00 pY(y) Then we multiply and add. y x pX(x) 2 4 1 0.20 (2) 0.25 (4) 0.45 3 0.15 (6) 0.20 (12) 0.35 5 0.15 (10) 0.05 (20) 0.20 0.50 0.50 1.00 pY(y) E(XY) = (0.20)(2) + (0.25)(4) + (0.15)(6) + (0.20)(12) + (0.15)(10) + (0.05)(20) = 0.40 = 7.20 + 1.00 + 0.90 + 2.40 + 1.50 + 1.00 C(X,Y) = E(XY) – E(X) E(Y) Since E(XY) = 7.2, E(X) = 2.5, & E(Y) = 3.0, C(X,Y) = 7.2 – (2.5)(3) = 7.2 – 7.5 = -0.3 Next, the correlation coefficient. Since C(X,Y) = -0.3, V(X)=2.35, & V(Y) =1,  ( X ,Y )  C( X ,Y )  X Y  0.3 2.35 1  0.196 The next part of the problem asked for E(X+Y) We know that E(X) = 2.5 and E(Y) = 3.0. E(aX+bY) = a E(X) + b E(Y) What should “a” & “b” be? 1&1 So E(X+Y) = 1 E(X) + 1E(Y) = E(X) + E(Y) = 2.5 + 3.0 = 5.5 Lastly: V(X+Y) We know V(X) = 2.35, V(Y) = 1, & C(X,Y) = -0.3. V(aX+bY) = a2 V(X) + b2 V(Y) + 2ab [C(X,Y)] What are “a” & “b” ? 1&1 V(aX+bY) = a2 V(X) + b2 V(Y) + 2ab [C(X,Y)] = 12 V(X) + 12 V(Y) + 2(1)(1)[C(X,Y)] = V(X) + V(Y) + 2[C(X,Y)] = 2.35 + 1 + 2 (-0.3) = 2.75 Specific Discrete Distributions 1. 2. 3. 4. 5. Uniform Binomial Hypergeometric Multinomial Poisson Uniform Distribution The uniform distribution assigns all the possible values equal probabilities. example: a fair die has possible values 1, 2, 3, 4, 5, and 6 each with probability 1/6. Graph of Uniform Distribution Example: Fair Die Probability 1/6 0 1 2 3 4 5 6 value on die Binomial Distribution Example: What is the probability of getting 3 heads on 5 tosses of an unfair (lopsided) coin whose probability on any toss of getting a head is 1/3. What is the probability of getting specifically HTHHT ? (1/3) (2/3) (1/3) (1/3) (2/3) = (1/3)3 (2/3)2 What is the probability of any other specific outcome with 3 heads on 5 tosses? The same. So we just have to figure out how many different ways you can get 3 heads on 5 tosses, and multiply that by the probability of each individual outcome. That will give us the probability of getting 3 heads on 5 tosses. How many ways can you get 3 heads on 5 tosses? It’s the number of combinations of 5 objects taken 3 at a time. 5! 5! 120 C3     10 5 3!(5  3)! 3! 2! (6)(2) So the probability of getting 3 heads on 5 tosses is 1 C3   5 3 3 2 2  1  4  40    (10)     0.1646 3  27  9  243 In general, the probability of getting x successes on n trials in which the probability of success on any given trial is p is ( n C x )p 1  p ) x n x This is the binomial distribution. Notes 1. 0! = 1 2. Each trial that can result in either success or failure is called a Bernoulli trial. Example: If the probability that any person passes this course is 0.95, what is the probability that in a a class of 30 people, exactly 28 people pass? ( n C x )p 1  p ) x n x ( 30 C 28 )(0.95) 0.05)  0.259 28 2 30! 30  29  28! 30  29 where n Cx     15  29  435 28! 2! 28! 2! 2 Let’s go back to the example in which we flipped a coin 5 times & the probability of heads on each toss was 1/3. For 3 heads, the probability was 0.1646. Using the binomial formula, we can determine the probabilities of the other possibilities. x 0 1 2 3 4 5 p(x) 0.1317 0.3292 0.3292 0.1646 0.0412 0.0041 1 If we graph this distribution, it looks like: probability x 0 1 2 3 4 5 p(x) 0.1317 0.3292 0.3292 0.1646 0.0412 0.0041 1 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0 1 2 3 4 5 number of heads Notice that there is a bump on the left and a tail on the right. Such a distribution is said to be skewed to the right. The skew is where the tail is. Binomial Distribution The binomial distribution graph we just did was for p = 1/3 and the skew was to the right. A binomial distribution with p < ½ will always have a skew to the right. What do you think the distribution will look like if p>½? It will be skewed to the left. (The tail will be on the left & the bump will be on the right.) Binomial Distribution What do you think the distribution will look like if p  ½ ? It will be symmetric. The left and right sides will be mirror images of each other. If the number of trials n (tosses in our example) is large, the graph will be roughly symmetric even if p ≠ ½ . How large does n have to be for the graph to be roughly symmetric? That depends on how far p is from ½. There are two sets of rules that are sometimes used to determine if the graph is roughly symmetric. One rule requires that np ≥ 5 and n(1p) ≥ 5. The other rule requires that np1p) ≥ 3. These rules are not exactly equivalent, but they both work reasonably well. Mean & Variance of the Binomial Distribution Mean: Variance:  = np 2 = np(1p) Example: What are the mean, variance, & standard deviation for our binomial distribution example in which n=5 & p=1/3? Mean:  = np = (5)(1/3) = 5/3 Variance: 2 = np1p) = (5)(1/3)(2/3)= 10/9 Standard Deviation:   10 9  1.054 Using Excel to calculate Binomial Probabilities On an Excel spreadsheet, you can get the binomial distribution as follows: click insert, and then click function select statistical as the category of function, scroll down to the binomdist function, and click on it fill in the information in the dialog box . Suppose that you wanted to calculate a messy binomial, such as the probability of between 60 and 70 successes inclusive, on 100 trials with success probability on each trial of 0.64. This would be a lot of work with just a calculator. You would have to calculate 11 separate binomial probabilities (the probabilities for 60, 61, 62, … 70) and then add them up. It’s much easier with Excel. Remember: you want the probability of between 60 and 70 successes inclusive, on 100 trials with success probability on each trial of 0.64. You can calculate the (cumulative) probability of 70 or fewer successes. Then calculate the cumulative probability of 59 or fewer successes. Then take the difference. To get the probability of 70 or fewer successes, specify the following: # of successes: 70 # of trials: 100 prob.of success on any trial: 0.64 cumulative: True (because you want 70 or fewer, not just 70) To get the probability of 59 or fewer successes, specify the following: # of successes: 59 # of trials: 100 prob.of success on any trial: 0.64 cumulative: True Then just subtract the two cumulative function values you calculated. If you do this, you get 0.91368 – 0.17394 = 0.7397 We can also study binomial problems using proportions. For example, we might want to know the probability of getting 60% heads on 5 tosses of a coin with probability of heads on each toss of 1/3. (This is the same as getting 3 heads.) In general, if X is the number of successes on n trials, the proportion of successes is X/n. We can easily determine the mean & variance of this binomial proportion variable X/n. If p again is the probability of success on any given trial, E(X/n) = p V(X/n) = p1p)/n When can we use the binomial distribution? 1. We have exactly two possibilities on each trial (success or failure, heads or tails, male or female, yes or no, etc.) 2. The probability of success is the same on each trial. 3. The trials are independent. (What happens on one trial has no effect on what happens on the next trial.) Sampling with & without Replacement Suppose we have a bowl with 6 red and 4 green marbles. We select 3 marbles at random without replacement. We want to know the probability of selecting exactly 2 red marbles. What’s the probability of getting a red marble on the 1st draw? 6/10 What’s the probability of getting a red marble on the 2nd draw? It depends on what we got on the first draw. If we got a red one, then the probability is 5/9. If we got a green one, then the probability is 6/9. Since the probability varies from trial to trial, we can not use the binomial distribution. We will discuss very shortly what we use instead. What if we selected the marbles with replacement? Then the probability of a red marble would be the same on each draw, regardless of what you pulled out previously. Then we could use the binomial distribution. Suppose we instead of having 6 red marbles and 4 green marbles, we had 6000 red ones and 4000 green ones. The probability of red on the 1st draw would be 6,000/10,000 = 0.6 . If we got red on the 1st draw, the probability of red on the 2nd draw would be 5999/9999 = 0.59996 If we got green on the 1st draw, the probability of red on the 2nd would be 6000/9999 = 0.60006 These three numbers are very close. So you could use the binomial distribution to get a very good approximation of the probability. So if we have two options on each trial, when we can use the binomial distribution? 1. If we sample with replacement, or 2. We sample without replacement, but the sample is small relative to the population. A rule that is often used is that the sample is less than 5% of the population (n < 0.05 N). If our sample is more than 5% of our population, then we will use the hypergeometric distribution. Let’s return to our marble problem. Suppose we have a bowl with 6 red and 4 green marbles. We select 3 marbles at random without replacement. We want to know the probability of selecting exactly 2 red marbles. Remember that the number of ways of selecting x objects from n is n C x . So there are 6 C2 ways of selecting 2 red marbles from 6. C1 ways of selecting 1 green marble from 4. There are 10 C 3 ways of selecting 3 marbles from 10. There are 4 So the probability of getting exactly 2 red marbles on 3 draws will be # of ways of getting the 2 red marbles out of 6 # of ways of getting the 1 green marble out of 4 ( 6 C 2 ) ( 4 C1 ) (10 C 3 ) # of ways of getting 3 marbles out of 10. and our probability is  6!   4!      ( 6 C 2 ) ( 4 C1 )  2! 4!   1!3!   (10 C3 )  10!     3! 7!  (6  5  4  3  2  1) (4  3  2  1) (2  1)(4  3  2  1) (1)(3  2  1)  (10  9  8  7  6  5  4  3  2  1) (3  2  1)7  6  5  4  3  2  1) (15)(4)  120 60  120  0 .5 The hypergeometric distribution can also be used if you have more than 2 categories. If you had 3 categories, for example, you would have 3 combinations in the numerator instead of two. What do you do if the probabilities are constant from trial to trial but you have more than 2 categories? You use the multinomial distribution, which is a generalization of the binomial. Recall that the formula for the binomial is ( n C x )p 1  p ) x n x where p is the probability of success and 1p is the probability of failure. Remember that this is equal to n! n x x p 1  p ) x! (n - x)! Suppose we have k outcomes for each trial instead of 2, and their probabilities are p1, p2, p3, … pk. Then on n trials, the probability of x1 outcomes of type 1, x2 outcomes of type 2, x3 outcomes of type 3, and … xk outcomes of type k would be n! prob.  p 1x1 p 2x2 p 3x3 ...p kxk x1! x 2!x 3!...x k! where x1 + x1 + x1 + …+ x1 = n and p1 + p2 + p3 + …+ pk = 1 Example: Suppose that at a fair, children pay money to reach into a container, which holds a large number of toys. 50% are of type 1, 30% are of type 2, & 20% are of type 3. Sally pays for 3 toys, and reaches into the box and grabs 3 at random. What is the probability that she gets one of each type? prob.  n! p 1x1 p 2x2 p 3x3 ...p kxk x1! x 2!x 3!...x k! 3!  (0.50)1 (0.30)1 (0.20)1 1!1!1!  6 (0.50)(0.30)(0.20) (1)(1)(1)  6(0.03)  0.18 Our fifth discrete probability distribution is the Poisson distribution. The Poisson distribution has outcome possibilities 0,1, 2, 3, …. that describe the number of occurrences per unit of time or per unit of space. It applies in problems involving requests for service such as at expressway tollbooths, supermarket checkout counters, bank teller windows, airport runways, and repair shops. Poisson Distribution Formula e  p( x)  x!  x where x is the number of occurrences and  is the mean rate of occurrence. Remember that e is a constant that is approximately equal to 2.71828. Example: If a bank serves on average 1 customer per minute, (a) what is the probability that exactly 2 customers will enter the bank in the same particular minute? The mean rate of occurrence  = 1. -1 2 e ( 1 ) e   Pr(X  2)  2! x! - x e -1 0.368    0.184 2 2 (b) What is the probability that 2 or more customers will enter in the same minute? We want Pr(X ≥ 2) = Pr(X=2) + Pr(X=3) + Pr(X=4) + …. Even though these calculations are going to diminish in size, you’re going to have to do a lot of calculations to get a good approximation. There’s a much easier way to do this problem. Use the complement. The complement (or opposite) of “2 or more customers” is “1 or fewer customers.” So Pr(X ≥ 2) = 1 - Pr(X ≤ 1) . Let’s do the problem that way. (b) What is the probability that 2 or more customers will enter in the same minute? The mean rate of occurrence  is still 1. Pr(X 2)  1  Pr(X  1)  1  [Pr(X  0)  Pr(X  1)]  e-110 e -1 (1)1   1    0 ! 1 !    1  [e 1  e 1 ]  1  [0.368  0.368]  0.264 e   x p( x)  x! Mean & Variance of a Poisson Distributed Random Variable Not surprisingly, the mean is  since we’ve been referring to that Poisson parameter as the mean rate of occurrence. It turns out that the variance is also .

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Random Variables & Expectation