Download Lecture 10: Discrete Random Variables 1. Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Lecture 10: Discrete Random Variables
1. Random Variables
Definition: Let (Ω, F, P) be a probability space. A real-valued random variable is a function
X : Ω → R which satisfies the property that for every x ∈ R,
X −1 (−∞, x] = {ω ∈ Ω : X(ω) ≤ x} ∈ F.
(1)
Remarks:
• X is said to be a measurable mapping from Ω to R if it satisfies (1).
• For the purposes of this course, we will simply assume measurability.
• Formally, the random variable X(ω) depends on a particular outcome ω ∈ Ω, but usually
we will suppress the dependence on ω and simply write X.
• Random variables can also take values in more general spaces, e.g., we can define random
variables that are Rn -valued, complex-valued, matrix-valued, function-valued, and even
probability-measure-valued!
Intuition: We perform an experiment with an outcome ω in a sample space Ω, but we only
record some function of the outcome specified by X = X(ω).
Example: Suppose that we toss a coin ten times and record the number of tosses that come
up heads. Then our sample space is Ω = {(x1 , · · · , x10 ) : xi ∈ {H, T }} and we can represent the
number of heads as a random variable
X = X((x1 , · · · , x10 )) = #{i : xi = H}.
Definition: The distribution (or law) of a real-valued random variable X is the probability
distribution Q on R defined by
Q(E) = P{X ∈ E}
for any (measurable) subset E ⊂ R. It can be shown that Q is a probability measure on R.
Definition: The cumulative distribution function (or CDF) of a real-valued random variable X is the function F : R → [0, 1] defined by
F (x) = P{X ≤ x}.
The most important fact about the CDF is that it uniquely determines the distribution of the
random variable.
1
2.) Discrete Random Variables:
Definition: A real-valued random variable is said to be discrete if it can take on only countably many possible values.
A discrete random variable is uniquely determined by its probability mass function
p(a) = P{X = a},
which can be positive for at most a countably infinite set of values and is zero elsewhere. If X
can only take on one of the values x1 , x2 , · · · , then
p(xi ) ≥ 0
for i = 1, 2, · · ·
p(x) = 0
for all other values of x
∞
X
p(xi ) = 1.
i=1
Example (Ross, 2a): Find C so that the function p(n) defined below is the probability mass
function of a random variable taking values in the non-negative integers:
p(n) = C
λn
.
n!
Answer: We need the probabilities to sum to 1, so
1 =
X
p(n) = C
n≥0
X λn
n≥0
n!
= Ceλ ,
which shows that
C = e−λ .
(2)
C is said to be a normalizing constant, and the specific distribution defined by this probability
mass function is called the Poisson distribution with parameter λ (of which much more later in
the course).
Remark: The cumulative distribution function is related to the probability mass function via
the following formula:
X
F (x) = P{X ≤ x} =
p(y).
y≤x
Notice that F is a step function, and that the step size at a point x is equal to the probability
mass at that point.
2
3. Expectation
Definition: Let X be a discrete random variable with probability mass function p(x). The
expected value (or expectation or mean) of X is defined to be
X
E[X] =
p(x) · x.
x
Later in the course we will show that if we have an infinite sequence of independent random
variables, say (Xn ; n ≥ 1), all with the same distribution as X, then the partial sums converge
to the expected value of X:
N
1 X
Xi → E[X].
N
i=1
Example (Ross, 3a): Let X be the number obtained by rolling a fair die. Then
E[X] =
6
X
1
i=1
6
i =
7
.
2
Thus the expected value of a random variable X need not belong to the set of values that can
be assumed by X.
Example (Ross, 3b): Recall that the indicator function of an event E is the function
IA : Ω → {0, 1} defined by
1 if ω ∈ A
IA (ω) =
0 if ω ∈
/ A.
Then IA is a discrete random variable with expected value equal to
E[IA ] = P(A) · 1 + (1 − P(A)) · 0 = P(A).
4. Expectation of a function of a random variable
Remark: One way to obtain new random variables is to compose a real-valued function with
an existing random variable. Let X be a discrete random variable that takes on values xi , i ≥ 1
with probabilities p(xi ), and let g be a real-valued function. Then Y = g(X) is a discrete random
variable with values in some countable set {yi ; i ≥ 1} and probability mass function
X
p(yi ) =
p(xj ).
xj :g(xj )=yi
Notice that more than one value xi may contribute to the probability of each point yj if g is not
one-to-one.
The next result is sometimes known as the Law of the Unconscious Statistician.
3
Proposition 4.1: Let X be a discrete random variable that takes on values xi , i ≥ 1 with
probabilities p(xi ). Then, for any real-valued function g,
X
E[g(X)] =
p(xi )g(xi ).
i
Proof:
Let g(X) take on the values yi , i ≥ 1 with probabilities P(g(X) = yi ). Then,
X
E[g(X)] =
yj P(g(X) = yj )
j
=
X
yj
j
=
=
=
X
X
X
j
i:g(xi )=yj
X
X
j
i:g(xi )=yj
X
p(xi )
i:g(xi )=yj
yj p(xi )
g(xi )p(xi )
p(xi )g(xi ).
i
A simple, but very useful consequence of this proposition is:
Corollary 4.1: If a and b are constants, then
E[aX + b] = aE[X] + b.
Caveat: For non-linear functions g, it is generally true that
E[g(X)] 6= g (E[X]) ,
i.e., we cannot interchange the composition of functions with the taking of expectations.
Example: Let X be a random variable with P(X = 1) = P(X = −1) = 1/2. Then
1 if n is even
E[X n ] =
0 if n is odd.
However, E[X]n = 0 for all n ≥ 1.
5.) Variance
It is often useful to have a measure of the dispersion or spread of a random variable about its
expected value. One such measure is provided by the variance.
4
Definition: Let X be a random variable with mean µ = E[X]. Then the variance of X is the
quantity
Var(X) = E[(X − µ)2 ].
In other words, the variance of X measures the mean square deviation of X from its expectation.
This is often most easily calculated using the following formula:
E[(X − µ)2 ] = E X 2 − 2µX + µ2
= E[X 2 ] − 2µE[X] + µ2
= E[X 2 ] − (E[X])2 .
Example (Ross, 5a): Let X be the outcome when a fair die is rolled. Then
Var(X) = E[X 2 ] − (7/2)2
n
1 X 2 49
i −
=
6
4
i=1
=
91
35
− 494 =
.
6
12
A useful identity related to Corollary 4.1 is that
Var(aX + b) = a2 Var(X),
whenever a and b are constants. In other words, a change of scale but not of location changes
the variance of a random variable.
5