Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PHP 2510 Expectation, variance, covariance, correlation Expectation • Discrete RV - weighted average • Continuous RV - use integral to take the weighted average Variance • Variance is the average of (X − µ)2 • Standard deviation Covariance and correlation • Covariance is the average of (X − µX )(Y − µY ) • Correlation is a scaled version of covariance Lots of examples PHP 2510 – Oct 8, 2008 1 Expected value Synonyms for expected value: ‘average’, ‘mean’ The expectation or expected value of a random variable X is a weighted average of its possible outcomes. For a discrete random variable, each outcome is weighted by its probability of occurrence, using the mass function: X X E(X) = xi · P (X = xi ) = xi p(xi ) i i For a continuous random variable, each outcome is weighted by the relative frequency of its occurrence, using the density function: Z E(X) = x f (x) dx PHP 2510 – Oct 8, 2008 2 Examples: Discrete random variables Example 1. Let X denote the number of boys in a family with three children. Assume the probability of having a boy is .5. Step 1: Compute the mass function k p(k) 0 .125 1 .375 2 .375 3 .125 Step 2: Compute weighted average E(X) = 3 X k p(k) k=0 = (0)(.125) + (1)(.375) + (2)(.375) + (3)(.125) = 1.5 PHP 2510 – Oct 8, 2008 3 Example 2: Roulette. In roulette, a ball is tossed on a spinning wheel, and it lands on one of 38 numbers (each of 1 to 36, plus 0 and 00). If you bet $1 on a particular number, the payoff for winning is $36. Suppose you bet $1 on the number 12. Define the random variable X to be your winnings on one play of the roulette wheel. Then 36 if the number is 12 X= −1 if the number is not 12 Find E(X), or your expected winnings. PHP 2510 – Oct 8, 2008 4 Step 1: Compute mass function k p(k) 36 1 38 37 38 –1 Step 2: Compute E(X) as weighted average of outcomes X E(X) = k p(k) k=−1,36 µ = (−1) 37 38 ¶ µ + (36) 1 38 ¶ = −0.026 Question: What the expected return in 100 plays of roulette? PHP 2510 – Oct 8, 2008 5 Expected value for common discrete RV’s Binomial. If X has the binomial distribution with parameters n and π, then E(X) = nπ. Example: Toss a coin 50 times, and let X denote the number of heads. Then E(X) = nπ = 50 × .5 = 25 Example: The proportion of individuals with coronary artery disease is .3. In a sample of 45 individuals, what is the expected number of cases of CAD? E(X) = nπ = 45 × .3 = 13.5 Suppose one person is selected from the population. Define a random variable Y such that Y = 1 if the person has CAD and Y = 0 if not. Then E(Y ) = nπ = 1 × .3 = .3 PHP 2510 – Oct 8, 2008 6 Poisson. If X has the Poisson distribution with rate parameter λ, then E(X) = λ. This is because µ ¶ ∞ k X λ E(X) = k e−λ =λ k! k=0 The mean of a Poisson RV is the number of events you expect to observe. PHP 2510 – Oct 8, 2008 7 Geometric. If X has the Geometric distribution with success probability π, then E(X) = 1/π. This is because E(X) = ∞ X k=1 k © k−1 (1 − π) ª π = 1 π The mean of a geometric RV is the number of trials you expect to require before observing the first success. Hence if the success probability π is low, E(X) will be high; and vice-versa. Example. If you roll two dice, the probability of rolling a 3 is 2/36 or about 0.56. Let X denote the number of rolls until a 3 comes up. What is E(X)? (Ans: 18) PHP 2510 – Oct 8, 2008 8 Expected value for continuous RV Let X be a continuous random variable defined on an interval A. Then the expected value is a weighted average of outcomes, weighted by the relative frequency of each outcome. The weighted average is computed using an integral, Z x f (x) dx E(X) = A PHP 2510 – Oct 8, 2008 9 Example. Suppose X is a uniform random variable on the interval [1, 4]. Find E(X). 1 = 31 , and that the interval A is Step 1: Recall that f (x) = 4−1 [1, 4]. So the appropriate integral is Z 4 Z 4 1 x f (x) dx = x dx 3 1 1 Step 2: Evaluate the integral ¯ Z 4 2 ¯4 1 1x ¯ x dx = ¯ = 2.5 3 3 2 1 1 PHP 2510 – Oct 8, 2008 10 Expected values for common continuous RV’s Normal. If X has a normal distribution with parameters µ and σ, then E(X) = µ. Exponential. If X has the exponential distribution with parameter θ, then E(X) = θ. In this case, θ is the expected waiting time until an event occurs, and 1/θ is called the event rate. PHP 2510 – Oct 8, 2008 11 Some properties of expected values. 1. Linear combinations. If a and b are constants, then E(aX + b) = aE(X) + b 2. Sums of random variables. The expected value of a sum of random variables is the sum of expected values. E(X1 + X2 + · · · + Xn ) = PHP 2510 – Oct 8, 2008 E(X1 ) + E(X2 ) + · · · + E(Xn ) 12 Example. Suppose X is a Poisson random variable denoting the number of lottery winners per week. Its expected value is E(X) = 2. What is the expected number of winners over 4 weeks? E(4X) = 4 × E(X) = 4 × 2 = 8 Example. Let X denote the daily low temperature for each day in September, and let E(X) denote its average. Suppose E(X) = 65, measured in degrees Fahrenheit. What is the mean temperature in degrees Celsius? To convert X from F to C, define a new random variable 160 5 Y = X− 9 9 Then using the rule about linear combinations, E(Y ) = PHP 2510 – Oct 8, 2008 5 160 E(X) − ≈ 18.3 9 9 13 Computing means from a sample of data Loosely speaking, for a sample of observed data x1 , x2 , . . . , xn , each of the individual xi can be thought of as having associated probability mass p(xi ) = 1/n. So the sample mean is x = = n X i=1 n X xi p(xi ) xi (1/n) i=1 n = 1X xi n i=1 Simply put, take the sum of the observations and divide by n. Sample means are not expected values! They are random variables. We will discuss sample means later on .... PHP 2510 – Oct 8, 2008 14 Variance of a random variables Variance measures dispersion of a random variable’s distribution. It is just an average. It is the average squared deviation of a random variable from its mean. To make notation simple, let µ = E(X). Then var(X) = E{(X − µ)2 } In other words, it is the average value of (X − µ)2 . For a discrete random variable, var(X) = X (xi − µ)2 p(xi ) i For a continuous random variable, Z var(X) = (x − µ)2 f (x) dx PHP 2510 – Oct 8, 2008 15 Example 1 (consumers of alcohol). In a certain population, the proportion of those consuming alcohol is .65. Select a person at random, with X = 1 if consumer of alcohol and X = 0 if not. In this example, E(X) = µ = 0.65. var(X) = E{(X − 0.65)2 } X = (xi − 0.65)2 p(xi ) i = (1 − 0.65)2 (0.65) + (0 − 0.65)2 (0.35) = .228 Example 2. Suppose instead the probability was 0.1. What then is var(X)? Ans = 0.09. Pattern: For a Binomial random variable X with n = 1 and success probability π, var(X) = π(1 − π) PHP 2510 – Oct 8, 2008 16 Properties of variance • If a and b are constants, then var(aX + b) = a2 var(X) (Why is b not included?) • If X1 , X2 , . . . , Xn are independent random variables, then var(X1 + X2 + · · · + Xn ) PHP 2510 – Oct 8, 2008 = var(X1 ) + var(X2 ) + · · · + var(Xn ) 17 Computing variances from a sample of data Like with the sample mean, for a sample of observed data x1 , x2 , . . . , xn , each of the individual xi can be thought of as having associated probability mass p(xi ) = 1/n. To calculate the sample variance, we take an average of (xi − x)2 . The sample variance is S2 = = n X (xi − x)2 p(xi ) i=1 n X (xi − x)2 (1/n) i=1 n = It is more common to use for this later. 1X (xi − x)2 n i=1 1 n−1 instead of 1 n. We will discuss reasons For now, you should think of variance as an average. PHP 2510 – Oct 8, 2008 18 Standard deviation The standard deviation measures the average distance of a random p variable X from its mean. By definition, SD(X) = var(X). The logic goes like this: 1. because var(X) measures average squared deviation between X and its mean; and p 2. because SD(X) = var(X); then 3. SD(X) is approximately equal to the average absolute deviation between X and its mean PHP 2510 – Oct 8, 2008 19 Example. In September in Providence, noon time temperature has mean 65 and variance 100. • What is the SD of the temperatures? • Select a day at random. What does SD tell us about the temperature on that day, relative to the average temperature? • Suppose noon time temps are normally distributed. Should a noon time temperature of 85 be considered unusual? Why or why not? PHP 2510 – Oct 8, 2008 20 Mean and variance for some common RV’s Random variable Binomial(n, π) Poisson(λ) Geometric(π) Mass or Density Function E(X) var(X) ¡n¢ x n−x π (1 − π) x nπ nπ(1 − π) e−λ λx /x! λ λ (1 − π)x−1 π 1/π 1/π 2 µ σ2 1/θ 1/θ2 Normal(µ, σ 2 ) Exponential(θ) PHP 2510 – Oct 8, 2008 (1/θ)e−θ/x 21 Correlation and Covariance Correlation and covariance are one way to measure association between two random variables that are observed at the same time on the same unit. Example: Height and weight measured on the same person Example: years of education and income Example: two successive measures of weight, taken on the same person but one year apart. PHP 2510 – Oct 8, 2008 22 Covariance Covariance measures the degree to which two variables differ from their mean. It is an average: cov(X, Y ) = E {(X − µX )(Y − µY )} cov(X, Y ) > 0 means that X and Y tend to vary in the same direction relative to their means (both higher or both lower). They have a positive association. • Example: height and weight cov(X, Y ) < 0 means that X and Y tend to vary in opposite directions relative to their means (when one is higher, the other is lower). They have a negative association. • Example: weight and minutes of exercise per day cov(X, Y ) = 0 generally means that X and Y are not associated. PHP 2510 – Oct 8, 2008 23 Example: mean arterial pressure and body mass index during pregnancy SUMMARY STATISTICS Variable | Obs Mean Std. Dev. ----------+--------------------------------map24 | 326 76.55951 7.351673 bmi | 326 25.10736 6.217994 Give an interpretation for SD here. PHP 2510 – Oct 8, 2008 24 100 map24 80 60 40 20 PHP 2510 – Oct 8, 2008 40 bmi 60 25 Computing covariance For individual i, let mi denote MAP and let bi denote BMI. In this table, prod represents (mi − m) × (bi − b) Recall m = 76.6 and b = 25.1. To compute covariance, we take the average (sample mean) of the products (following pages) DATA EXCERPT map24 (m_i) bmi (b_i) prod ------------------------------------1. 72.7 15.9 35.53593 2. 69.3 16.3 63.9371 3. 81 16.3 -39.10899 4. 63.7 16.3 113.2583 5. 74 16.6 21.77467 6. 73.3 16.6 27.7298 PHP 2510 – Oct 8, 2008 26 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 69.3 74.7 82.7 73 66.3 74 73 84.3 68.3 70.3 16.9 16.9 17 17.2 17.2 17.8 17.8 17.9 17.9 18 59.58139 15.26169 -49.78313 28.14632 81.12561 18.70326 26.01062 -55.78852 59.52924 44.48857 SUMMARY STATISTICS Variable | Obs Mean ---------+----------------------prod | 326 13.2753 PHP 2510 – Oct 8, 2008 27 Computing covariance from a sample Like mean and variance, covariance is an average. In a sample of pairs (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), we can assume each pair is observed with probability p(xi , yi ) = 1/n. Then the sample covariance is a weighted average of (xi − x) (yi − y): cd ov(X, Y ) = n X (xi − x) (yi − y) p(xi , yi ) i=1 n = PHP 2510 – Oct 8, 2008 1X (xi − x) (yi − y) n i=1 28 Correlation is a standardized covariance cov(X, Y ) corr(X, Y ) = SD(X) × SD(Y ) Always between –1 and 1 Measures degree of linear relationship (If relationship not linear, correlation not an appropriate measure of association) Pearson’s sample correlation plugs in sample estimates for the quantities in the formula above corr(X, d Y)= PHP 2510 – Oct 8, 2008 (1/n) Pn i=1 (xi − x)(yi − y) Sx × Sy 29 SUMMARY STATISTICS Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------prod | 326 13.2753 53.69735 -131.3067 391.1627 map24 | 326 76.55951 7.351673 55 101.3 bmi | 326 25.10736 6.217994 15.9 57.2 CORRELATION COEFFICIENT (obs=326) | bmi ---------+-----------------map24 | 0.2913 Using the numbers on the table above, how would you obtain the correlation coefficient? PHP 2510 – Oct 8, 2008 30