Download Discrete Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Discrete Random Variables
STA 281 Fall 2011
1 Introduction
When we introduce probability, we said we had a sample space S which contained every possible
outcome of a random experiment. For a coin flip, the sample space might be S = {Heads, Tails}. It is
extremely common for the sample space to consist solely of numbers. For example, if you roll a die, the
sample space is S = {1,2,3,4,5,6}. Often, to allow use of various mathematical operations, we convert
non-numerical sample spaces into numerical sample spaces. Instead of using S = {Heads, Tails} as the
sample space, we could just assign a number to heads and a number to tails, for example S = {Heads = 1,
Tails = 0}. This would result in a numerical sample space.
Anytime you have a numerical sample space, the experiment is said to be a random variable. The
purpose of this handout is to define some common terminology associated with random variables. In
addition, since the sample space is restricted to numbers, we can perform mathematical operations on
the outcomes. Expectations and variances are two common mathematical operations that have meaning
for random variables. We will also discuss methods for combining multiple random variables through
linear combinations.
Random variables, meaning experiments that result in numbers, are typically denoted by capital
letters near the end of the alphabet, like X and Y. The outcomes associated with the random variables
are typically written in lower case letters, such as x and y. We also tend to write probability statements
slightly differently. When we roll a die, the sample space is S = {1,2,3,4,5,6}. We had referred to
probability statements such as P({1,2}) meaning the probability of the set {1,2}. When we are using
random variables, we usually write statements such as P(X=1), P(X<3), P(4<X≤5), or P(X {1,4}).
The actual probabilities are computed exactly as before. This is a notational difference, not any change
in the underlying definition of probability. Since we are not changing the definition of probability, all of
the theorems we previously derived still apply. We can still say P(X>4) = 1-P(X≤4) by the compliment
rule, or that P(X≥3) = P(3≤X≤5)+P(X>5) by using axiom 3 (for disjoint events, the probability of a
union is the sum of the probabilities).
Probability tables can be turned into random variables by associating a number with each one of the
cells. Suppose customers to a store can buy either a TV or a DVD player (they might buy both or
neither) according to the probabilities
TV TVC
DVD 0.10 0.20 0.30
DVDC 0.50 0.20 0.70
0.60 0.40 1.00
Suppose we are given the information that a TV costs $250 and a DVD costs $100. We may also
construct a table showing the costs for each outcome in the probability table.
TV TVC
DVD 350 100
DVDC 250 0
Calling this random variable C, we find P(C=350)=0.1, P(C=100)=0.2, P(C>150)=0.6,
P(50<C<300)=0.7, and so on. Just refer back to the original sample space to find the probabilities.
1
2 Expected Values
One fundamental aspect of a random variable is its expected value. Suppose we are selling insurance.
On any particular day, we might sell 0, 1, or 2 policies. The probabilities associated with these events
are P(X=0)=0.4, P(X=1)=0.5, and P(X=2)=0.1. Suppose we want to know, on average, how many
policies we sell each day.
Of course, the number of policies sold each day is random, so we can’t say for sure how many
policies we will sell in any particular time period (day, week, year, etc.). However, we certainly know
what we expect to happen. Probabilities indicate the long term frequency of an event. In the long term,
we expect 40% of the days will result in 0 policies sold, 50% of the days will have 1 policy sold, and 10%
of the days will have 2 policies sold. Over the next 100 days, we expect to have 40 days with no sales,
50 days with 1 sale, and 10 days with 2 sales. These are derived directly from the probabilities, with 40
being 40% of 100 days and so on. Let Xi be the number of policies sold on day i. Each day is random, so
each day is a random variable.
To find the expected sales per day over the hundred days, we just take the arithmetic average
∑
We expect 40 of the Xi to equal 0, 50 to equal 1, and 10 to equal 2. Therefore, we expect the average to
be
(
)
(
)
(
)
So we expect to sell 0.7 policies per day. That of course does not mean we expect to ever sell a fraction
of a policy. The average number of policies per day is not an integer, but this is not a problem. We are
calling this the average we expect to see over time. Thus, the 0.7 should be interpreted as we expect to
sell 7 policies every 10 days, or 70 policies every 100 days, or 700 policies every 1000 days, and so on.
2.1
Formal Definition of Expectation
Observe how the calculation of expected average worked. We computed
(
)
(
)
(
)
(
)
(
)
(
)
Simplified in this way, we find that each outcome is multiplied by its corresponding probability, and
then the sum is taken. This should make sense because we got the 40 in the first place by multiplying
0.4 (the probability of making 0 sales) by 100 (the number of days). Dividing by 100 in taking the
average cancels out the multiplication.
This expected long run average is called the expected value. We define the expected value of X
(written E[X] or µX) to be
[ ]
∑
(
)
where the sum is over all possible outcomes of X. As another example, if we are rolling a die, the
possible values are x=1 through x=6, each with probability 1/6. The expected value of X is
2
(
)
( )
(
)
(
( )
)
( )
Expected values are extremely valuable. They are fundamental to any financial institution that
faces uncertainty, such as banks issuing loans, insurance companies, casinos, governments who run
lotteries, etc. We motivated the expected value as our expected value of the sample average. It turns
out that (perhaps intuitively), for large samples, the sample average is very likely to be very close to the
expected value. The sample average is still random, but as the sample becomes larger, the sample
average becomes less and less random, instead converging to the expected value. Thus casinos can set
the price of bets just above the expected payoff to them and be virtually assured of making a profit.
Insurance companies can price policies based on the expected value and almost certain their revenue will
exceed their payouts, even though they cannot predict in any single case whether the policy will have to
be paid out. We should investigate this property further when we discuss the Central Limit Theorem.
For the moment, we will focus on calculating the expected value for a variety of situations.
2.2
Expectations for Functions of Random Variables
Suppose we are interested in the expectation of a transformation of a random variable. For example,
return to the insurance example. Suppose that we get commissions in addition to our salary. Suppose
the commission for making 0 sales is $0, for 1 sale is $20, and for 2 sales is $50. This is really all that is
meant by a transformation of a random variable. If I tell you the number of sales in a given day, you can
determine how much the commission is. Let’s call the commission Y. If X=0, then Y=0, if X=1 then
Y=20, and if X=2 then Y=50. For every value of X we can find the value of Y.
In addition, we can also identify the possible values of Y and the probability associated with each
one. To identify the possible values of Y, just look at each of the possible values of X and see where they
map to. In our example, the possible values of Y are 0, 20, and 50. To find their corresponding
probabilities, again look to the X variable. Because Y=20 only if X=1 and P(X=1)=0.5, the probability
that Y=20 is also 0.5. Similarly, P(Y=0)=0.4 and P(Y=50)=0.1.
The expected value of Y is
[ ]
(
)
(
)
(
)
The probabilities in this expectation are just the probabilities associated with the original variable. In
general
[ ( )]
∑ ( ) (
)
where h(X)=Y is the transformation of the random variable. In our example h(0)=0, h(1)=20, and
h(2)=50. Basically, this just says that for any function h(X), you can just place it in the sum, so
[
[
]
∑(
]
∑(
) (
) (
)
)
The function h(X)=Y does not need to be one to one. For example, construct a new random
variable W which is 0 if there are no sales and 1 if there are 1 or 2 sales. All W is measuring is whether
3
or not there were any sales at all, with 0 being no and 1 being yes. For this example, h(0)=0, h(1)=1,
and h(2)=1. The function is not one to one because two different values of X (1 and 2) both map to
W=1. This does not make it any more difficult to calculate the probability associated with W or the
expectation of W. By the reasoning from the previous example, P(W=0)=0.4. Since both X=1 and X=2
map to W=1, we combine their probabilities to find P(W=1)=0.5+0.1=0.6. The expectation of W is
[ ]
(
)
(
This could also be calculated using the formula [ ( )]
[ ]
2.3
(
)
(
)
∑
)
( ) (
(
), where
)
Variance
The expected value provides a summary of the central location of the random variable, specifying where,
on average, the random variable tends to place mass. We might also be interested in the spread of the
distribution. Suppose a football team averages 3 yards a play. If the team makes exactly 3 yards every
play, the team is invincible. Every 4 plays they make a first down, and score every drive. However, if
they make 0 yards in 90% of their plays and 30 yards on 10% of their plays (expected value is still 3),
they are far from guaranteed to make a first down. So we also want to know how variable our variable
is.
To measure this, we look at something called the variance of X (written V[X] or
defined as
[ ]
[(
) ]
∑(
)
(
), which is
)
The square root of this quantity is called the standard deviation of X, and is typically written σX. The
quantity (
) measures how far the random variable X is from where we expect it to be. If this
quantity is large, then X is likely to be far from its expected value, and hence is highly variable. If
(
) is small, then X tends to stay close to its expected value, and hence is not that variable.
Notice the only way (
)
is if X=µX. If the variance is 0, that means that X is always equal to
µX .
Here are some facts about the variance of X


V[X]≥0. This can be seen from just looking at the sum. (
P(X=x). Therefore the sum must be nonnegative.
[ ]
[ ] ( [ ]) . Proof:
4
) is always nonnegative, as is
The second item is usually used to compute V[X], as it involves fewer operations. In our insurance
policy example, we found E[X]=0.7. To find the variance, we may either use
[ ]
∑(
)
(
)
]
( [ ])
(
) (
)
(
)
(
)(
) (
)
(
) (
)
or
[ ]
[
((
)(
)
(
)(
))
(
)
The second calculation is slightly easier, and the recommended way to actually compute variances for a
random variable. The first formula has some useful mathematical properties, and hence is used for the
definition.
As with expected value, the variance also is essential to understanding the long run behavior of the
sample average. We stated in a previous section that, for large samples, the sample average tends to be
close to the expected value. The variance determines how close is “close”. As with expected values, we
should return more to their practical uses when we discuss the Central Limit Theorem. For the
moment, you should be familiar with how to use the distribution of a random variable (the possible
values of the random variable combined with their respective probabilities) to compute the expectation
and variance of X.
3 Linear Transformations and Combinations
Some functions of random variables have a particularly simple structure. For example, if X is a random
variable measured in feet, then the random variable Y=12X would measure the same quantity in inches.
Only slightly more complicated is changing temperature scales from Fahrenheit to Celsius. If F is a
temperature measure in Fahrenheit, then C=(5/9)(F-32) is the same measurement in Celsius. We’ve
already discussed computing means and variances for transformations. In particular, recall the formula
[ ( )]
∑ ( ) (
)
For many transformations (i.e. functions h(X)) we have to go through this general formula. However,
for a particular set of transformations, called linear transformations, the general formula simplifies
considerably and thus we can acquire simpler formulas for E[h(X)] and V[h(X)].
3.1
Linear Transformations
Let X be a random variable. A linear transformation of X is the quantity Y=aX+b for some constants a
and b. In problems, you will have to determine these constants from the given situation.
For example, suppose you are running a carnival. You charge $5 for admission and then charge $2
for tickets that may be used for rides. Suppose further the number of tickets a particular person buys, X,
is a random variable. Suppose we want an expression for the amount of money spent by that particular
person. Letting Y denote the amount of money spent, we can find a function relating Y to X. If a
person buys X tickets, then they spent 2X dollars on tickets. In addition, they spent $5 to enter the
carnival. The total amount spent is Y=2X+5. Since Y has the correct form (aX+b, with a=2 and b=5),
we say Y is a linear transformation of X.
If we know the mean and variance of X, then there are simple formulas to compute the mean and
variance of a linear transformation Y. We know that [ ] ∑
(
) and [ ( )]
∑ ( ) (
). A linear transformation is a particular form of h(x), so
5
[
]
[ ]
Proof:
[
]
[ ]
Proof:
The expectation formula is simple to remember, the expectation of a linear transformation is the
same linear transformation of the expectation. The variance requires a little more explanation.
Variance is a measure of spread. Adding a constant b doesn’t change the spread, so it can be ignored in
computing the variance. When we multiply by a, we have to remember the variance is in squared units,
so the constant a is squared in the formula.
Suppose for our carnival example any particular person buys an average of 5 tickets with a variance
of 9 tickets. What is the mean and variance of the amount of money spent by a particular person?
3.2
Linear Combinations
Often we deal with several random variables at once. A linear combination of two random variables X
and Y has the form aX+bY+c, where a, b, and c are fixed constants found from the problem. Linear
combinations can involve many random variables. A linear combination Y of n random variables X1,…,
Xn is
6
where the ai and b coefficients are fixed values which, again, are found in the problem.
We will not go through the proofs on how to compute the mean and variance of a linear
combination, but they are similar to the formulas for a linear transformation of a single random variable.
The mean of a linear combination is
[
]
[
]
[
]
[
]
so the expectation of a linear combination is the same linear combination of the expectations.
If all the random variables in the linear combination are independent (the following is not
true without this assumption), then the variance of a linear combination is
[
]
[
]
[
]
[
]
Two simple linear combinations of two independent random variables are Z1=X+Y and Z2=X-Y, where
X and Y are independent. These may be written Z1=1X+1Y+0 and Z2=1X+(-1)Y+0. Using the
formulas, we may derive E[Z1]=E[X]+E[Y], E[Z2]=E[X]-E[Y], V[Z1]=V[X]+V[Y], and
V[Z2]=V[X]+V[Y]. Note that while the variance of a sum is the sum of the variances, the variance of a
difference is also the sum of the variances. If we add independent sources of noise into a problem, we
increase the overall noise of the system.
3.3
Examples
Tables and Chairs
At a school, each room contains only tables and chairs. Suppose that, on average, each room
contains 50 chairs with a standard deviation of 20 chairs. Suppose further that, on average, each room
contains 5 tables with a standard deviation of 2 tables. Each chair weighs 10 pounds and each table
weighs 30 pounds. You select a random room and place all the items in the room into a 10000 pound
truck.
a) Write a formula for Z, the total weight of the truck and its contents after the truck has been
loaded with the contents of the room.
b) What are the mean and variance of Z?
7
Christmas Donations
Each year at Christmas, charity donations are accepted at a local grocery chain. Suppose the chain
donates an initial $100 and then customers donate in either Lexington or Nicholasville. In
Nicholasville, customers donate an average of $2100 with a standard deviation of $100, while in
Lexington customers donate an average of $5500 with a variance of 90000. Suppose further that the
local grocery chain matches the donations. For every dollar donated in Nicholasville, the local grocery
chain gives an additional $0.25 and for every dollar donated in Lexington the local grocery chain gives
an additional $0.50. What are the mean, variance, and standard deviation of the total amount of
donations to the charity in a given year?
TV, DVD player Example
Recall our example with a customer buying either a TV or a DVD player. Letting C be the total
cost, we may compute the mean and variance of C directly from the definitions, since we know the costs
and probabilities associated with the four outcomes. We find
[ ]
[
]
(
(
[ ]
)
(
)
[
)
(
]
)
(
)
(
)
(
)
(
)
( [ ])
Notice that the cost is really just the sum of the amount spent on TVs, Ct, and the amount spent on
DVD players, Cd. The random variable Ct has two possible values, 0 if they did not buy a TV, which
occurs with probability 0.4, and 250 if they did buy a TV, which occurs with probability 0.6. Similarly,
Cd has two values. We may find that
[ ]
[
(
]
(
)
)
(
(
)
)
We may also find V[Ct]=15000 and V[Cd]=2100.
We said the total cost is the sum of Ct and Cd, so we may write
Using the formula for expectation, we find E[C]=E[Ct]+E[Cd]=150+30=180, which agrees with our
previous calculation.
However, the variances do not sum to V[C], since
V[Ct]+V[Cd]=15000+2100=17100, not the 13100 we derived previously. Why? The random
variables Ct and Cd are not independent. The events “buy a TV” and “buy a DVD player” are not
(
)
( ) (
) (
)(
)
independent because
. The variance
formula only applies to independent random variables.
8