Download Probability Theory and Random Variables: Mean, Variance

Probability Theory and Random Variables: Mean, Variance, Covariance, and Correlation Dan Saunders Introduction Suppose X is a random variable. What does that mean? The simplest notion is that we do not know the value that X will take on with certainty. However, this does not imply that we are clueless about X. The short answer is that uncertainty, as modeled by probability theory, is an exhaustive characterization of all of the possible outcomes, and that random variables make that uncertainty amenable to mathematical analysis. Probability Theory Probability theory requires three main ingredients. The first is the sample space, Ω, which is the set of possible outcomes. Second is the event space, F, which is the set of all subsets of Ω. Essentially, F represents the set of all possible events to which a probability can be assigned. Finally, we need the probability measure, P , which assigns those probabilities to every event. There is a lot more that could be said about these three mathematical objects, but it will be easier to demonstrate with an example. Consider the familiar case of a fair coin toss Ω = {H, T } F = {∅, H, T, {H, T }} 1 P (∅) = 0, P (H) = P (T ) = , P ({H, T }) = 1 2 As can be seen from the example, these three objects {Ω, F, P } are a complete description of the uncertain nature of the coin toss. For clarity, let’s consider one more example: tossing a die. In this case, the set of outcomes are naturally represented as the numbers Ω = {1, 2, 3, 4, 5, 6}. The set of events, F can be any subset of Ω. For example, we may ask for the probability that we roll less than three, in which case we are choosing the event {1, 2, 3}, which is a subset of Ω. Finally, we would need to assign the probability to each event, such as P ({1}) = 1/6 or P ({1, 2, 3}) = 1/2. All of this logical construction occurs before any mention of random variables. So what is a random variable? Well, it’s actually quite a misnomer, because a random variable is more than just a variable. 1 Random Variables A random variable is actually a function. Specifically, it’s a function which assigns a real number to every element in the sample space, in accordance with its respective probability measure X Ω −→ < Again, the fair coin toss can help the discussion. Consider the random variable X, which maps the outcome “heads” to the number 1 and the outcome “tails” to the number 0 1 with probability 1/2 X= 0 with probability 1/2 In fact, this type of random variable should seem familiar. Recall the definition of a Bernoulli random variable, which takes the form 1 with probability p X= 0 with probability 1 − p A Bernoulli random variable is a mathematical function which can be used to represent any uncertainty with two outcomes. In fact, the use of 0 and 1 is not that special. We could use any two real numbers to achieve the same goal. Why do we need random variables? Simply put, we cannot mathematically analyze the uncertainty described by {Ω, F, P }. We must first map the set of possible outcomes to numbers, before we can define the mean or variance. After all, what’s the average of heads and tails? On the other hand, we can say what the average is of 0 and 1. If they are equally likely, then the average is 1 · 1/2 + 0 · 1/2 = 1/2. Expected Value More generally, we define the expected value of any discrete random variable X as X E(X) = X(ω)P (ω) ω∈Ω It is important to keep in mind that this is probability theory, and E(X) represents the theoretically true mean; sometimes called the population mean and denoted by µ. This is entirely separate from statistics, where we do not know the mean or the distribution. Notice that, for any Bernoulli random variable we have E(X) = 1 · p + 0 · (1 − p) = p In the case where the distribution is unknown, we collect n independent and identically distributed data points, in order to estimate the expected value using the sample mean estimator n x̄ = 1X xi n i=1 This is also refer to as the arithmetic mean. This estimator is a function of random variables. Therefore, it is also a random variable, whose properties will be of great interest when making statistical inference. 2 Variance and Standard Deviation While the mean is a measure of the central tendency, the variance is a measure of the dispersion. Intuitively, we are measuring the average squared distance from the mean. h i V ar(X) = E (X − E(X))2 Why squared? Minimizing the variance of an estimator increases precision. However, in order to use calculus, we must have a smooth function without kinks. Therefore, minimization will be easier with squared terms than with absolute values; although, the absolute value function would have a more intuitive interpretation. The variance is often denoted σ 2 . An alternative definition exists, which may be more useful in solving problems. First, note that the mean could be written as E(X 1 ). The exponent of 1 emphasizes the why the mean is sometimes called “the first moment”. In general, the nth moment of a random variable is E(X n ). As it turns out, the variance is closely related to the second moment by the following relation 2 V ar(X) = E(X 2 ) − E(X) It is quite common for people to refer to the second moment when talking about the variance. In particular, if E(X) = 0, then the second moment is exactly equal to the variance. The standard deviation, defined as the square root of the variance and denoted σ, has a simple, important interpretation. Recall the distance formula from Algebra p d = (x1 − x2 )2 + (y1 − y2 )2 So distance is naturally measured as the square root of the sum of squared differences. The variance, by definition, is a sum of squared differences from the mean, and the standard deviation is the square root of that sum. Therefore, the standard deviation can arguably be said to measure the average distance from the mean. 1/2 X 2 X(ω) − µ · P (ω) σ= ω∈Ω The standard deviation also shares the same units as the underlying random variable, unlike the variance. For example, if X is measured in meters, then so are µ and σ, whereas σ 2 is measured in meters-squared. In the instance where we don’t know the underlying distribution, as with the mean, we must use a sample analog by collecting n independent observations. Specifically, for the variance we have the following estimator n 2 1 X 2 s = xi − x̄ n−1 i=1 We divide by n − 1 because we lose one degree of freedom by using the estimate of the mean, x̄, in the estimation of the variance. As a matter of convention, we refer to s, the estimated standard deviation, as the “standard error”. 3 Covariance and Correlation The covariance is a generalization of the variance. This is obvious from the definition h i Cov(X, Y ) = E X − E(X) · Y − E(Y ) Notice that Cov(X, X) = V ar(X). All we can hope to interpret about the covariance is its sign (positive or negative). If, for example, the covariance is positive, than we can say the following: “On average, if X is above its mean, then Y is also above its mean, and vice versa.” As with the simplification for the variance, we have the following formula Cov(X, Y ) = E[XY ] − E[X] · E[Y ] Unfortunately, the covariance is quite sensitive to the units of measurement. For example, suppose X and Y were both measured in meters. Now suppose we measure X in centimeters, i.e., we create a new random variable Z = 100X. Then Cov(Z, Y ) = 100 · Cov(X, Y ). So the covariance can be scaled up by any arbitrarily large number, without any change in the underlying relationship. To solve this problem, we calculate the correlation, sometimes called the coefficient of correlation and denoted ρ. The definition is as follows ρX,Y = Cov(X, Y ) σX · σY As it turns out, −1 ≤ ρ ≤ 1 for any two random variables, and it is invariant to scale. Values of ρ close to 1 imply a strong linear relationship, while values of ρ close to 0 indicate a weak relationship. Properties For any two random variables X, Y and any two constants a, b we have 1. The expectations operator is linear E[aX + bY ] = aE[X] + bE[Y ] 2. The covariance, defined as an expected value, has the following property Cov(aX, bY ) = abCov(X, Y ) This further implies that the variance is a non-linear operator V ar(aX) = Cov(aX, aX) = a2 V ar(X) 3. Finally, we use all of these properties together to find the variance of a sum of random variables as V ar(aX + bY ) = a2 V ar(X) + b2 V ar(Y ) + 2abCov(X, Y ) The properties listed above generalize for any number of random variables, and they give us the tools to calculate the mean and variance of a collection of random variables. 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Probability Theory and Random Variables: Mean, Variance