Download ST 371 (IV): Discrete Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
ST 371 (IV): Discrete Random Variables
1
Random Variables
A random variable (rv) is a function that is defined on the sample space of
the experiment and that assigns a numerical variable to each possible outcome of the experiment. We denote random variables by uppercase letters,
often X, Y or Z.
Examples for random variables (rv).
• Toss a coin. The sample space S = {H, T }. Define an rv X such that
X({H}) = 1 and X({T }) = 0.
X is called a Bernoulli random variable.
• Toss a coin until a head is observed. The sample space
S = {H, T H, T T H, · · · }.
Define X = number of tosses needed until a head is observed. Then
X({T H}) = 2, X({T T T T H}) = 5.
• Roll a pair of dice. Define
– X= sum of the numbers on the dice
– Y =the difference between the two numbers on the dice
– Z=the maximum of the two numbers on the dice
Consider outcome ω = (2, 3). Then
X(ω) = 5, Y (ω) = −1, Z(ω) = 3.
• Define Y = the height above sea level at the selected location in US.
The largest possible value of Y is 14,494 and the smallest value of Y is
−282. The sample space is S = {y : −282 ≤ y ≤ 14, 494}.
1
Discrete and continuous random variables. A random variable that
can take on a finite or at most countably infinite number of values is said
to be discrete (countably infinite means that the members in a set can be
listed in an infinite sequence in which there is a first element, second element
and so on). Examples include:
• the gender of a randomly selected student in class
• the total number of coin tosses required for observing two heads
• the number of students who are absent on the first day of class or the
number of people arriving for treatment at an emergency room.
A random variable that can take on values in an interval of real numbers
is said to be continuous. Examples include:
• the depth at randomly chosen locations of a lake
• the amount of gas needed to drive to work on a given day
• the survival time of a cancer patient
We will focus on discrete random variables in Chapter 3 and consider
continuous random variables in Chapter 4.
2
Probability Mass Function
Associated with each discrete random variable X is a probability mass function (pmf) that gives the probability that X equals x:
p(x) = P ({X = x}) = P ({all s ∈ S : X(s) = x}).
2
Example 1 Consider whether the next customer buying a laptop at a university bookstore buys a Mac or a PC model. Let
½
1 if a customer purchases a Mac
X=
0 if a customer purchases a PC
If 20% of all customers during that week select a Mac, what is the pmf of
the rv X?
Example 2 Suppose two fair dice are tossed.
Let X be the random variable that is the sum of the two upturned faces.
X is a discrete random variable since it has finitely many possible values
(the 11 integers 2, 3, ..., 12).
The probability mass function of X is
x
p(x)
2
3
4
5
6
7
8
9
10 11 12
1
36
2
36
3
36
4
36
5
36
6
36
5
36
4
36
3
36
2
36
1
36
It is often instructive to present the probability mass function in a graphical format plotting p(xi ) on the y-axis against xi on the x-axis.
3
0.16
0.12
0.08
0.04
Probability Mass Function
2
4
6
8
10
12
X
Remarks: So far, we have been defining probability functions in terms of
the elementary outcomes making up an experiment’s sample space. Thus,
if two fair dice were tossed, a probability was assigned to each of the 36
possible pairs of upturned faces.
We have seen that in certain situations some attribute of an outcome may
hold more interest for the experimenter than the outcome itself. A craps
player, for example, may be concerned only that he throws a 7, not whether
the 7 was the result of a 5 and a 2, a 4 and a 3 or a 6 and a 1. That, being
the case, it makes sense to replace the 36-member sample space
S = {(i, j) : i = 1, · · · , 6; j = 1, · · · , 6}
with the more relevant (and simpler) 11-member sample space of all possible
two-dice sums,
S 0 = {x = i + j : i + j = 2, 3, · · · , 12}.
This redefinition of the sample space not only changes the number of outcomes in the space (from 36 to 11) but also changes the probability structure. In the original sample space, all 36 outcomes are equally likely. In the
revised sample space, the 11 outcomes are not equally likely.
4
Example 3 Three balls are to be randomly selected without replacement
from an urn containing balls numbered 1 through 20. Let X denote the
largest number selected. X is a random variable taking on values 3, 4, ...,
20. Since we select the balls randomly, each of the C3,20 combinations of the
balls is equally likely to be chosen.
The probability mass function of X is
P ({X = i}) =
C2,i−1
, i = 3, · · · , 20.
C3,20
0.10
0.05
0.00
Probability Mass Function
0.15
This equation follows because the number of selections that result in the
event {X = i} is just the number of selections that result in the ball numbered i and two of the balls numbered 1 through i − 1 being chosen.
5
10
15
20
X
Suppose the random variable X can take on values {x1 , x2 , · · · · · · }. Since
the probability mass function is a probability function on the redefined
∞
P
sample space that considers values of X, we have that
P (X = xi ) = 1.
i=1
5
This follows from
1 = P (S)
∞
[
= P ( {X = xi })
=
i=1
∞
X
P (X = xi ).
i=1
Example 4 Independent trials, consisting of the flipping of a coin having
probability p of coming up heads, are continually performed until a head
occurs. Let X be the random variable that denotes the number of times the
coin is flipped. The probability mass function for X is
P {X = 1} = P {H} = p
P {X = 2} = P {(T, H)} = (1 − p)p
P {X = 3} = P {(T, T, H)} = (1 − p)2 p
······
P {X = n − 1} = P {(T, T, . . . , T , H)} = (1 − p)n−2 p
| {z }
n−2
P {X = n} = P {(T, T, . . . , T , T )} = (1 − p)n−1 p
| {z }
n−1
······
3
Cumulative Distribution Function
The cumulative distribution function (CDF) of a random variable X is the
function
X
F (x) = P (X ≤ x) =
p(y).
y:y≤x
6
Example 5 The pmf of a random variable X is given by
x
1
2
3
4 5
p(x) 0.3 0.3 0.2 0.1 c
• What is c?
• What is the cdf of X?
• Calculate P (2 ≤ X ≤ 4).
7
All probability questions about X can be answered in terms of the cdf
F . Specifically for discrete random variables,
P (a < X ≤ b) = F (b) − F (a)
P (a ≤ X ≤ b) = F (b) − F (a − 1)
for all a < b. This can be seen by writing the event {X ≤ b} as the union
of the mutually exclusive events {X ≤ a} and {a < X ≤ b}. That is,
{X ≤ b} = {X ≤ a} ∪ {a < X ≤ b}. Therefore, we have P {X ≤ b} =
P {X ≤ a} + P {a < X ≤ b} and the result follows.
Example 6 Consider selecting at random a student who is among the
15,000 registered for the current semester at NCSU. Let X=the number
of courses for which the selected student is registered, and suppose that X
has the following pmf:
x
1
2
3
4
5
6
7
p(x) .01 .03 .13 .25 .39 .17 .02
What is the probability of a student chooses three or more courses?
8
4
Expected Value
Probability mass functions provide a global overview of a random variable’s
behavior. Detail that explicit, though, is not always necessary - or even
helpful. Often times, we want to focus the information contained in the pmf
by summarizing certain of its features with single numbers.
The first feature of a pmf that we will examine is central tendency, a term
referring to the “average” value of a random variable. The most frequently
used measure for describing central tendency is the expected value.
Generally, for a discrete random variable, the expected value of a random
variable X is a weighted average of the possible values X can take on, each
value being weighted by the probability that X assumes it:
X
E(X) =
xp(x)
x:p(x)>0
A simple fact:
E(X + Y ) = E(X) + E(Y ).
Example 7 Consider the experiment of rolling a die. Let X be the number
on the face.
• Compute E(X).
• Consider rolling a pair of dice. Let Y be the sum of the numbers.
Compute E(Y ).
9
Example 8 Consider Example 6. What is the average number of courses
per student at NCSU?
5
Expectation of Function of a Random Variable
Suppose we are given a discrete random variable X along with its pmf and
that we want to compute the expected value of some function of X, say
g(X).
One approach is to directly determine the pmf of g(X).
Example 9 Let X denote a random variable that takes on the values
−1, 0, 1 with respective probabilities
P (X = −1) = .2, P (X = 0) = .5, P (X = 1) = .3
Compute E(X 2 ).
10
Although the procedure we used in the previous example will always
enable us to compute the expected value of g(X) from knowledge of the pmf
of X, there is another way of thinking about E[g(X)]. Noting that g(X)
will equal g(x) whenever X is equal to x, it seems reasonable that should
just be a weighted average of the values g(x) with g(x) being weighted by
the probability that X is equal to x.
Proposition 1 If X is a discrete random variable that takes on one of the
values xi , i ≥ 1 with respective
probabilities p(xi ), then for any real valued
P
function g, E[g(X)] = i g(xi )p(xi ).
Applying the proposition to Example 3,
E(X 2 ) = (−1)2 (.2) + 02 (.5) + 12 (.3) = .5.
Proof of Proposition 1.
P
P P
g(xi )p(xi ) =
g(xi )p(xi )
i
j i:g(xi )=yj
P
P
p(xi )
=
yj
j
i:g(xi )=yj
P
=
yj P {g(X) = yj }
j
= E[g(X)]
Corollary 1 (The Rule of expected value.) If a and b are constants,
then E(aX + b) = aE(X) + b.
Proof of Corollary:
E(aX + b) =
X
(ax + b) · p(x)
x
= a
X
x · p(x) + b
x
= aE(X) + b.
11
X
x
p(x)
Special cases of Corollary 1:
• E(aX) = aE(X).
• E(X + b) = E(X) + b.
Example 10 A computer store has purchased three computers of a certain
type at $500 apiece. It will sell them for $1000 apiece. The manufacturer
has agreed to repurchase any computers still unsold after a certain period at
$200 apiece. Let X denote the number of computers sold, and suppose that
P (X = 0) = 0.1, P (X = 1) = 0.2, P (X = 2) = 0.3 and P (X = 3) = 0.4.
Let h(X) denote the profit associated with selling X units. What is the
expected profit?
12
6
Variance
Another useful summary of a random variable’s pmf besides its central tendency is its “spread”. This is a very important concept in real life. For
example, in the quality control of the lifetimes of a hard disk, we not only
want the lifetime of a hard disk is long, but also want the lifetimes not to be
too variable. Another example is in finance where investors not only want
the investments with good returns (i.e., have a high expected value) but
also want the investment not to be too risky (i.e., have a low spread).
A commonly used measure of spread is the variance of a random variable,
which is the expected squared deviation of the random variable from its
expected value. Specifically, let X have pmf p(x) and expected value µ,
2
then the variance of X, denoted by V (X), or just σX
, is
V (X) = E[(X − µ)2 ]
X
=
(x − µ)2 · p(x).
D
The second equality holds by applying Proposition 1.
Explanations and intuitions for variance:
• (X − µ)2 is the squared deviation of X from its mean
• The variance is the weighted average of squared deviations, where the
weights are probabilities from the distribution.
• If most values of x is close to µ, then σ 2 would be relatively small.
• If most values of x is far away from µ, then σ 2 would be relatively large.
Definition: the standard deviation (SD) of X is
q
p
2 .
σX = V (X) = σX
13
Consider the following situations:
• The following three random variables have expected value 0 but very
different spreads:
– X = 0 with probability 1
– Y = −1 with probability of 0.5, 1 with probability 0.5.
– Z = −100 with probability 0.5, 100 with probability 0.5.
Compare V (X), V (Y ) and V (Z).
• Suppose that the rate of return on stock A takes on the values of 30%,
10% and −10% with respective probabilities 0.25, 0.50 and 0.25 and on
stock B the values of 50%, 10% and −30% with the same probabilities
0.25, 0.50 and 0.25. Each stock then has the expected rate of return of
10%. Obviously stock A has less spread in its rate of return. Compare
V (A) and V (B).
14
An alternative formula for variance. V (X) = E(X 2 ) − [E(X)]2 .
Proof. Let E(X) = µ. Then
V (X) = E[(X − µ)2 ]
X
=
(x − µ)2 p(x)
=
=
x
X
x
X
(x2 − 2µx + µ2 )p(x)
2
x p(x) − 2µ
x
2
X
xp(x) + µ
x
2
2
2
X
p(x)
x
= E(X ) − 2µ + µ
= E(X 2 ) − µ2
= E(X 2 ) − [E(X)]2 .
The variance of a linear function. Let a, b be two constants, then
V (aX + b) = a2 · V (X).
Proof. Note that from Corollary 1, we have
E(aX + b) = aE(X) + b.
Let E(X) = µ. Then
V (aX + b) =
=
=
=
=
E[{(aX + b) − E(aX + b)}2 ]
E[(aX + b − aµ − b)]2
E[a2 (X − µ)2 ]
a2 [E(X − µ)2 ]
a2 V (X)
15
Example 11 Let X denote the number of computers sold, and suppose
that the pmf of X is
P (X = 0) = 0.1, P (X = 1) = 0.2, P (X = 2) = 0.3, P (X = 3) = 0.4.
The profit is a function of the number of computers sold:
h(X) = 800X − 900.
What are the variance and SD of the profit h(X)?
16