Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Now we will cover discrete random variables. A random variable is a function which maps from the sample space Ξ© to the set of real numbers. Say I have a sample space of people. Say I am interested in their height in inches. Let H be the function from the sample space to the set of real numbers. In other words, H is a random variable. Here is the picture: Μ = 2.5H (i.e. the height in cm). π» Μ is a function of H. π» Μ is also a We also wish to define π» random variable because everything in the sample space gets mapped to something. It is a random variable which is a function of another random variable. Thus, a function of random a variable is also random variable. Notation: - Random variable X (capital letter) (i.e. X : function Ξ© ο β). Numerical value x (lower case letter) (i.e. x Ο΅ β). The Probability Mass Function (PMF) tells us how likely it is that we get an outcome that leads to a particular numerical value. We calculate the overall probability for each outcome that leads to a particular numerical value and we generate a bar graph. The bar graph we give a name to which is pX(x) which is the probability mass function. The picture is as follows. Probability mass function definitions, notations, and observations: px(x) = P(X = x) = P((w Ο΅ Ξ© s.t. X(w) = x)) Note that pX(x) β₯ 0 and βπ₯ px(x) = 1 which means that probabilities are nonnegative and the sum of all the possible probabilities adds to 1. Example X number of heads in n ( n is fixed) independent coin tosses P(H) = p Let n = 4, then pX(2) = P(HHTT) + P(HTHT) + P(HTTH) + P(THHT) + P(THTH) + P(TTHH) = 6p2(1- p)2 = (42)p2(1- p)2 In general, we have pX(k) = (ππ)pk(1-p)n-k Expected Value: Consider the following example. Consider the following computation: 1 1 1 5 1(6) + 2(2) + 4(3) = 2 We think of this value as an average of sorts. What we did was take each value, multiply it by its probability of occurring, and took the sum for all x. In general, what we did was β π₯ β px(x) π₯ We have a name for this computation and it is called the expected value. It is denoted by E[X]. Therefore, we have the following definition. E[X] = βπ₯(π₯ β px(x)) Properties of Expectations Let X be a random variable and let Y = g(X) We have the following picture. Suppose we wish to compute E[Y]. We could do this by using the definition; β π¦ β py(y) π¦ This means that for every particular value of y, we would collect all the outcomes that lead to that value of y and compute their probability. It does the addition of the βy line.β An alternative way, is the do the addition over the βx line.β Instead of using the probabilities of y, we use the probabilities of x, multiply by g(x) instead of x, and sum for all x. With the formula, β π(π₯) β px(x) π₯ The picture is as follows: We can see that we do the same arithmetic both ways. A proof will not be included here. Properties: If Ξ± and Ξ² are real numbers, then E[Ξ±] = ? Recall the definition, E[X] = βπ₯(π₯ β px(x)) Here, the only value of x is Ξ±, and pX(x) = 1 (since everything in the sample space maps to Ξ±) Here is the picture So we have E[Ξ±] = βπ₯(π₯ β px(x)) = βπ₯(πΌ β px(x)) = πΌ βπ₯ px(x) =Ξ±β1 =Ξ± Now letβs think about E[Ξ±X]. We recall our definition and rule for X is a random variable and Y = g(X): E[X] = βπ₯(π₯ β px(x)) And E[Y] =βπ₯(π(π₯) β px(x)) Here, g(x) = Ξ±X, so E[Ξ±X] = βπ₯(π(π₯) β px(x)) = βπ₯(πΌπ₯ β px(x)) =Ξ± βπ₯(π₯ β px(x)) = Ξ±E[X] Similarly, E[Ξ±X + Ξ²] = βπ₯(Ξ±x + Ξ²)px(x) = βπ₯(Ξ±x)px(x) + βπ₯(Ξ²)px(x) = Ξ± βπ₯ x β px(x) + Ξ² βπ₯ px(x) = Ξ±πΈ[π] + Ξ²(π₯)(1) = Ξ±πΈ[π] + Ξ²(π₯) Note: E[X2] = βπ₯ π₯ 2pX(x) If we think of E[X] as the average, then we define variance to be the average squared deviation from the mean. It is denoted V(X). Visually, what we do is take the deviation from the mean and square it. This gives more emphasis to big distances, such as outliers. We have the following picture. We square the distance to the mean (and do so far all points), then take the average of that. If the variance is large, then points are distributed far away from the mean. If the variance is small, then the likely values are around the mean. We now derive a formula to compute the variance. var(X) = E[|X β E[X]|2] = E[(X β E[X])2] = βπ₯(x β E[X])2pX(x) = βπ₯(x2 + (E[X])2 β 2E[X]xpX(x) = βπ₯ x2pX(x) + βπ₯(E[X])2pX(x) - βπ₯ 2E[X]xpX(x) = βπ₯ x2pX(x) + (E[X])2 β βπ₯ πX(x) - 2E[X] β βπ₯ x β pX(x) = E[X2] + (E[X])2 β (1) - 2E[X] β E[X] = E[X2] β (E[X])2 So, if we wish to compute the variance, we have the formula var(X) = E[X2] β (E[X])2 You may be wondering why we gave a complicated definition for the variance to tell us information about the average distance from the mean. The answer is because we had to. Consider the following quantity which seems like it would do the job. E[X β E[X]] This is the average distance from the mean. Recall that E[Ξ±X + Ξ²] = Ξ±E[X] + Ξ² Therefore, E[X β E[X]] = E[X] β E[X] = 0 This tells us that the average deviation from the mean is 0. This is not a useful quantity for us. One solution is to add absolute values to make |X - E[X]|, but it turns out to be more useful to use the quantity E[|X β E[X]|2] We call this quantity the variance. var(X) = E[|X β E[X]|2] = E[X2] β (E[X])2 A problem with the variance is that it gets the units wrong, if you want to talk about the spread of a distribution. For example, if we start with units in meters, the variance will have units in meters squared. To remedy this problem, we define standard deviation (Ο) to be ΟX = βπ£ππ(π) It gives us a quantity which is in the same units of the random variable we are dealing with. Example Consider the following distribution We wish to find E[X], V(X), and Ο. E[X] = βπ₯ π₯ β ππ₯(π₯) 1 1 1 =1β4+ 3β2+ 5β4 =3 V(X) = E[X2] β (E[X])2 Note: E[X2] = βπ₯ π₯ 2 β pX(x) 1 1 1 = 12 β 4 + 3 2 β 2 + 5 2 β 4 = 11 Therefore, var(X) = E[X2] β (E[X])2 = 11 - 32 =2 Thus, πX = β2