Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Discrete Random Variables STA 281 Fall 2011 1 Introduction When we introduce probability, we said we had a sample space S which contained every possible outcome of a random experiment. For a coin flip, the sample space might be S = {Heads, Tails}. It is extremely common for the sample space to consist solely of numbers. For example, if you roll a die, the sample space is S = {1,2,3,4,5,6}. Often, to allow use of various mathematical operations, we convert non-numerical sample spaces into numerical sample spaces. Instead of using S = {Heads, Tails} as the sample space, we could just assign a number to heads and a number to tails, for example S = {Heads = 1, Tails = 0}. This would result in a numerical sample space. Anytime you have a numerical sample space, the experiment is said to be a random variable. The purpose of this handout is to define some common terminology associated with random variables. In addition, since the sample space is restricted to numbers, we can perform mathematical operations on the outcomes. Expectations and variances are two common mathematical operations that have meaning for random variables. We will also discuss methods for combining multiple random variables through linear combinations. Random variables, meaning experiments that result in numbers, are typically denoted by capital letters near the end of the alphabet, like X and Y. The outcomes associated with the random variables are typically written in lower case letters, such as x and y. We also tend to write probability statements slightly differently. When we roll a die, the sample space is S = {1,2,3,4,5,6}. We had referred to probability statements such as P({1,2}) meaning the probability of the set {1,2}. When we are using random variables, we usually write statements such as P(X=1), P(X<3), P(4<X≤5), or P(X {1,4}). The actual probabilities are computed exactly as before. This is a notational difference, not any change in the underlying definition of probability. Since we are not changing the definition of probability, all of the theorems we previously derived still apply. We can still say P(X>4) = 1-P(X≤4) by the compliment rule, or that P(X≥3) = P(3≤X≤5)+P(X>5) by using axiom 3 (for disjoint events, the probability of a union is the sum of the probabilities). Probability tables can be turned into random variables by associating a number with each one of the cells. Suppose customers to a store can buy either a TV or a DVD player (they might buy both or neither) according to the probabilities TV TVC DVD 0.10 0.20 0.30 DVDC 0.50 0.20 0.70 0.60 0.40 1.00 Suppose we are given the information that a TV costs $250 and a DVD costs $100. We may also construct a table showing the costs for each outcome in the probability table. TV TVC DVD 350 100 DVDC 250 0 Calling this random variable C, we find P(C=350)=0.1, P(C=100)=0.2, P(C>150)=0.6, P(50<C<300)=0.7, and so on. Just refer back to the original sample space to find the probabilities. 1 2 Expected Values One fundamental aspect of a random variable is its expected value. Suppose we are selling insurance. On any particular day, we might sell 0, 1, or 2 policies. The probabilities associated with these events are P(X=0)=0.4, P(X=1)=0.5, and P(X=2)=0.1. Suppose we want to know, on average, how many policies we sell each day. Of course, the number of policies sold each day is random, so we can’t say for sure how many policies we will sell in any particular time period (day, week, year, etc.). However, we certainly know what we expect to happen. Probabilities indicate the long term frequency of an event. In the long term, we expect 40% of the days will result in 0 policies sold, 50% of the days will have 1 policy sold, and 10% of the days will have 2 policies sold. Over the next 100 days, we expect to have 40 days with no sales, 50 days with 1 sale, and 10 days with 2 sales. These are derived directly from the probabilities, with 40 being 40% of 100 days and so on. Let Xi be the number of policies sold on day i. Each day is random, so each day is a random variable. To find the expected sales per day over the hundred days, we just take the arithmetic average ∑ We expect 40 of the Xi to equal 0, 50 to equal 1, and 10 to equal 2. Therefore, we expect the average to be ( ) ( ) ( ) So we expect to sell 0.7 policies per day. That of course does not mean we expect to ever sell a fraction of a policy. The average number of policies per day is not an integer, but this is not a problem. We are calling this the average we expect to see over time. Thus, the 0.7 should be interpreted as we expect to sell 7 policies every 10 days, or 70 policies every 100 days, or 700 policies every 1000 days, and so on. 2.1 Formal Definition of Expectation Observe how the calculation of expected average worked. We computed ( ) ( ) ( ) ( ) ( ) ( ) Simplified in this way, we find that each outcome is multiplied by its corresponding probability, and then the sum is taken. This should make sense because we got the 40 in the first place by multiplying 0.4 (the probability of making 0 sales) by 100 (the number of days). Dividing by 100 in taking the average cancels out the multiplication. This expected long run average is called the expected value. We define the expected value of X (written E[X] or µX) to be [ ] ∑ ( ) where the sum is over all possible outcomes of X. As another example, if we are rolling a die, the possible values are x=1 through x=6, each with probability 1/6. The expected value of X is 2 ( ) ( ) ( ) ( ( ) ) ( ) Expected values are extremely valuable. They are fundamental to any financial institution that faces uncertainty, such as banks issuing loans, insurance companies, casinos, governments who run lotteries, etc. We motivated the expected value as our expected value of the sample average. It turns out that (perhaps intuitively), for large samples, the sample average is very likely to be very close to the expected value. The sample average is still random, but as the sample becomes larger, the sample average becomes less and less random, instead converging to the expected value. Thus casinos can set the price of bets just above the expected payoff to them and be virtually assured of making a profit. Insurance companies can price policies based on the expected value and almost certain their revenue will exceed their payouts, even though they cannot predict in any single case whether the policy will have to be paid out. We should investigate this property further when we discuss the Central Limit Theorem. For the moment, we will focus on calculating the expected value for a variety of situations. 2.2 Expectations for Functions of Random Variables Suppose we are interested in the expectation of a transformation of a random variable. For example, return to the insurance example. Suppose that we get commissions in addition to our salary. Suppose the commission for making 0 sales is $0, for 1 sale is $20, and for 2 sales is $50. This is really all that is meant by a transformation of a random variable. If I tell you the number of sales in a given day, you can determine how much the commission is. Let’s call the commission Y. If X=0, then Y=0, if X=1 then Y=20, and if X=2 then Y=50. For every value of X we can find the value of Y. In addition, we can also identify the possible values of Y and the probability associated with each one. To identify the possible values of Y, just look at each of the possible values of X and see where they map to. In our example, the possible values of Y are 0, 20, and 50. To find their corresponding probabilities, again look to the X variable. Because Y=20 only if X=1 and P(X=1)=0.5, the probability that Y=20 is also 0.5. Similarly, P(Y=0)=0.4 and P(Y=50)=0.1. The expected value of Y is [ ] ( ) ( ) ( ) The probabilities in this expectation are just the probabilities associated with the original variable. In general [ ( )] ∑ ( ) ( ) where h(X)=Y is the transformation of the random variable. In our example h(0)=0, h(1)=20, and h(2)=50. Basically, this just says that for any function h(X), you can just place it in the sum, so [ [ ] ∑( ] ∑( ) ( ) ( ) ) The function h(X)=Y does not need to be one to one. For example, construct a new random variable W which is 0 if there are no sales and 1 if there are 1 or 2 sales. All W is measuring is whether 3 or not there were any sales at all, with 0 being no and 1 being yes. For this example, h(0)=0, h(1)=1, and h(2)=1. The function is not one to one because two different values of X (1 and 2) both map to W=1. This does not make it any more difficult to calculate the probability associated with W or the expectation of W. By the reasoning from the previous example, P(W=0)=0.4. Since both X=1 and X=2 map to W=1, we combine their probabilities to find P(W=1)=0.5+0.1=0.6. The expectation of W is [ ] ( ) ( This could also be calculated using the formula [ ( )] [ ] 2.3 ( ) ( ) ∑ ) ( ) ( ( ), where ) Variance The expected value provides a summary of the central location of the random variable, specifying where, on average, the random variable tends to place mass. We might also be interested in the spread of the distribution. Suppose a football team averages 3 yards a play. If the team makes exactly 3 yards every play, the team is invincible. Every 4 plays they make a first down, and score every drive. However, if they make 0 yards in 90% of their plays and 30 yards on 10% of their plays (expected value is still 3), they are far from guaranteed to make a first down. So we also want to know how variable our variable is. To measure this, we look at something called the variance of X (written V[X] or defined as [ ] [( ) ] ∑( ) ( ), which is ) The square root of this quantity is called the standard deviation of X, and is typically written σX. The quantity ( ) measures how far the random variable X is from where we expect it to be. If this quantity is large, then X is likely to be far from its expected value, and hence is highly variable. If ( ) is small, then X tends to stay close to its expected value, and hence is not that variable. Notice the only way ( ) is if X=µX. If the variance is 0, that means that X is always equal to µX . Here are some facts about the variance of X V[X]≥0. This can be seen from just looking at the sum. ( P(X=x). Therefore the sum must be nonnegative. [ ] [ ] ( [ ]) . Proof: 4 ) is always nonnegative, as is The second item is usually used to compute V[X], as it involves fewer operations. In our insurance policy example, we found E[X]=0.7. To find the variance, we may either use [ ] ∑( ) ( ) ] ( [ ]) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) or [ ] [ (( )( ) ( )( )) ( ) The second calculation is slightly easier, and the recommended way to actually compute variances for a random variable. The first formula has some useful mathematical properties, and hence is used for the definition. As with expected value, the variance also is essential to understanding the long run behavior of the sample average. We stated in a previous section that, for large samples, the sample average tends to be close to the expected value. The variance determines how close is “close”. As with expected values, we should return more to their practical uses when we discuss the Central Limit Theorem. For the moment, you should be familiar with how to use the distribution of a random variable (the possible values of the random variable combined with their respective probabilities) to compute the expectation and variance of X. 3 Linear Transformations and Combinations Some functions of random variables have a particularly simple structure. For example, if X is a random variable measured in feet, then the random variable Y=12X would measure the same quantity in inches. Only slightly more complicated is changing temperature scales from Fahrenheit to Celsius. If F is a temperature measure in Fahrenheit, then C=(5/9)(F-32) is the same measurement in Celsius. We’ve already discussed computing means and variances for transformations. In particular, recall the formula [ ( )] ∑ ( ) ( ) For many transformations (i.e. functions h(X)) we have to go through this general formula. However, for a particular set of transformations, called linear transformations, the general formula simplifies considerably and thus we can acquire simpler formulas for E[h(X)] and V[h(X)]. 3.1 Linear Transformations Let X be a random variable. A linear transformation of X is the quantity Y=aX+b for some constants a and b. In problems, you will have to determine these constants from the given situation. For example, suppose you are running a carnival. You charge $5 for admission and then charge $2 for tickets that may be used for rides. Suppose further the number of tickets a particular person buys, X, is a random variable. Suppose we want an expression for the amount of money spent by that particular person. Letting Y denote the amount of money spent, we can find a function relating Y to X. If a person buys X tickets, then they spent 2X dollars on tickets. In addition, they spent $5 to enter the carnival. The total amount spent is Y=2X+5. Since Y has the correct form (aX+b, with a=2 and b=5), we say Y is a linear transformation of X. If we know the mean and variance of X, then there are simple formulas to compute the mean and variance of a linear transformation Y. We know that [ ] ∑ ( ) and [ ( )] ∑ ( ) ( ). A linear transformation is a particular form of h(x), so 5 [ ] [ ] Proof: [ ] [ ] Proof: The expectation formula is simple to remember, the expectation of a linear transformation is the same linear transformation of the expectation. The variance requires a little more explanation. Variance is a measure of spread. Adding a constant b doesn’t change the spread, so it can be ignored in computing the variance. When we multiply by a, we have to remember the variance is in squared units, so the constant a is squared in the formula. Suppose for our carnival example any particular person buys an average of 5 tickets with a variance of 9 tickets. What is the mean and variance of the amount of money spent by a particular person? 3.2 Linear Combinations Often we deal with several random variables at once. A linear combination of two random variables X and Y has the form aX+bY+c, where a, b, and c are fixed constants found from the problem. Linear combinations can involve many random variables. A linear combination Y of n random variables X1,…, Xn is 6 where the ai and b coefficients are fixed values which, again, are found in the problem. We will not go through the proofs on how to compute the mean and variance of a linear combination, but they are similar to the formulas for a linear transformation of a single random variable. The mean of a linear combination is [ ] [ ] [ ] [ ] so the expectation of a linear combination is the same linear combination of the expectations. If all the random variables in the linear combination are independent (the following is not true without this assumption), then the variance of a linear combination is [ ] [ ] [ ] [ ] Two simple linear combinations of two independent random variables are Z1=X+Y and Z2=X-Y, where X and Y are independent. These may be written Z1=1X+1Y+0 and Z2=1X+(-1)Y+0. Using the formulas, we may derive E[Z1]=E[X]+E[Y], E[Z2]=E[X]-E[Y], V[Z1]=V[X]+V[Y], and V[Z2]=V[X]+V[Y]. Note that while the variance of a sum is the sum of the variances, the variance of a difference is also the sum of the variances. If we add independent sources of noise into a problem, we increase the overall noise of the system. 3.3 Examples Tables and Chairs At a school, each room contains only tables and chairs. Suppose that, on average, each room contains 50 chairs with a standard deviation of 20 chairs. Suppose further that, on average, each room contains 5 tables with a standard deviation of 2 tables. Each chair weighs 10 pounds and each table weighs 30 pounds. You select a random room and place all the items in the room into a 10000 pound truck. a) Write a formula for Z, the total weight of the truck and its contents after the truck has been loaded with the contents of the room. b) What are the mean and variance of Z? 7 Christmas Donations Each year at Christmas, charity donations are accepted at a local grocery chain. Suppose the chain donates an initial $100 and then customers donate in either Lexington or Nicholasville. In Nicholasville, customers donate an average of $2100 with a standard deviation of $100, while in Lexington customers donate an average of $5500 with a variance of 90000. Suppose further that the local grocery chain matches the donations. For every dollar donated in Nicholasville, the local grocery chain gives an additional $0.25 and for every dollar donated in Lexington the local grocery chain gives an additional $0.50. What are the mean, variance, and standard deviation of the total amount of donations to the charity in a given year? TV, DVD player Example Recall our example with a customer buying either a TV or a DVD player. Letting C be the total cost, we may compute the mean and variance of C directly from the definitions, since we know the costs and probabilities associated with the four outcomes. We find [ ] [ ] ( ( [ ] ) ( ) [ ) ( ] ) ( ) ( ) ( ) ( ) ( [ ]) Notice that the cost is really just the sum of the amount spent on TVs, Ct, and the amount spent on DVD players, Cd. The random variable Ct has two possible values, 0 if they did not buy a TV, which occurs with probability 0.4, and 250 if they did buy a TV, which occurs with probability 0.6. Similarly, Cd has two values. We may find that [ ] [ ( ] ( ) ) ( ( ) ) We may also find V[Ct]=15000 and V[Cd]=2100. We said the total cost is the sum of Ct and Cd, so we may write Using the formula for expectation, we find E[C]=E[Ct]+E[Cd]=150+30=180, which agrees with our previous calculation. However, the variances do not sum to V[C], since V[Ct]+V[Cd]=15000+2100=17100, not the 13100 we derived previously. Why? The random variables Ct and Cd are not independent. The events “buy a TV” and “buy a DVD player” are not ( ) ( ) ( ) ( )( ) independent because . The variance formula only applies to independent random variables. 8