Download 2. Random Variables, Distribution Functions, Ex

Document related concepts
Transcript
2. Random Variables, Distribution Functions, Expectation, Moment generating Functions
Aim of this section:
• Mathematical definition of the concepts
random variable
(cumulative) distribution function
(probability) density function
expectation and moments
moment generating function
8
Preliminaries:
• Repetition of the notions
random experiment
outcome (sample point) and sample space
event
probability
(see Wilfling (2011), Chapter 2)
9
2.1 Basic Terminology
Definition 2.1: (Random experiment)
A random experiment is an experiment
(a) for which we know in advance all conceivable outcomes that
it can take on, but
(b) for which we do not know in advance the actual outcome
that it eventually takes on.
Random experiments are performed in controllable trials.
10
Examples of random experiments:
• Drawing of lottery numbers
• Roulette, tossing a coin, tossing a dice
• ’Technical experiments’
(testing the hardness of lots from steel production etc.)
In economics:
• Random experiments (according to Def. 2.1) are rare
(historical data, trials are not controllable)
• Modern discipline: Experimental Economics
11
Definition 2.2: (Sample point, sample space)
Each conceivable outcome ω of a random experiment is called a
sample point. The totality of conceivable outcomes (or sample
points) is defined as the sample space and is denoted by Ω.
Examples:
• Random experiment of tossing a single dice:
Ω = {1, 2, 3, 4, 5, 6}
• Random experiment of tossing a coin until HEAD shows up:
Ω = {H, TH, TTH, TTTH, TTTTH, . . .}
• Random experiment of measuring tomorrow’s exchange rate
between the euro and the US-$:
Ω = [0, ∞)
12
Obviously:
• The number of elements in Ω can be either (1) finite or (2)
infinite, but countable or (3) infinite and uncountable
Now:
• Definition of the notion Event based on mathematical sets
Definition 2.3: (Event)
An event of a random experiment is a subset of the sample space
Ω. We say ’the event A occurs’ if the random experiment has
an outcome ω ∈ A.
13
Remarks:
• Events are typically denoted by A, B, C, . . . or A1, A2, . . .
• A = Ω is called the sure event
(since for every sample point ω we have ω ∈ A)
• A = ∅ (empty set) is called the impossible event
(since for every ω we have ω ∈
/ A)
• If the event A is a subset of the event B (A ⊂ B) we say that
’the occurrence of A implies the occurrence of B’
(since for every ω ∈ A we also have ω ∈ B)
Obviously:
• Events are represented by mathematical sets
−→ application of set operations to events
14
Combining events (set operations):
• Intersection:
n
T
i=1
• Union:
n
S
i=1
Ai occurs, if all Ai occur
Ai occurs, if at least one Ai occurs
• Set difference:
C = A\B occurs, if A occurs and B does not occur
• Complement:
C = Ω\A ≡ A occurs, if A does not occur
• The events A and B are called disjoint, if A ∩ B = ∅
(both events cannot occur simultaneously)
15
Now:
• For any arbitrary event A we are looking for a number P (A)
which represents the probability that A occurs
• Formally:
P : A −→ P (A)
(P (·) is a set function)
Question:
• Which properties should the probability function (set function) P (·) have?
16
Definition 2.4: (Kolmogorov-axioms)
The following axioms for P (·) are called Kolmogorov-axioms:
• Nonnegativity: P (A) ≥ 0 for every A
• Standardization: P (Ω) = 1
• Additivity: For two disjoint events A and B (i.e. for A∩B = ∅)
P (·) satisfies
P (A ∪ B) = P (A) + P (B)
17
Easy to check:
• The three axioms imply several additional properties and rules
when computing with probabilities
Theorem 2.5: (General properties)
The Kolmogorov-axioms imply the following properties:
• Probability of the complementary event:
P (A) = 1 − P (A)
• Probability of the impossible event:
P (∅) = 0
• Range of probabilities:
0 ≤ P (A) ≤ 1
18
Next:
• General rules when computing with probabilities
Theorem 2.6: (Calculation rules)
The Kolmogorov-axioms imply the following calculation rules
(A, B, C are arbitrary events):
• Addition rule (I):
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
(probability that A or B occurs)
19
• Addition rule (II):
P (A ∪ B ∪ C) = P (A) + P (B) + P (C)
−P (A ∩ B) − P (B ∩ C)
−P (A ∩ C) + P (A ∩ B ∩ C)
(probability that A or B or C occurs)
• Probability of the ’difference event’:
P (A\B) = P (A ∩ B)
= P (A) − P (A ∩ B)
20
Notice:
• If B implies A (i.e. if B ⊂ A) it follows that
P (A\B) = P (A) − P (B)
21
2.2 Random Variable, Cumulative Distribution
Function, Density Function
Frequently:
• Instead of being interested in a concrete sample point ω ∈ Ω
itself, we are rather interested in a number depending on ω
Examples:
• Profit in euro when playing roulette
• Profit earned when selling a stock
• Monthly salary of a randomly selected person
Intuitive meaning of a random variable:
• Rule translating the abstract ω into a number
22
Definition 2.7: (Random variable [rv])
A random variable, denoted by X or X(·), is a mathematical
function of the form
X : Ω −→ R
ω −→ X(ω).
Remarks:
• A random variable relates each sample point ω ∈ Ω to a real
number
• Intuitively:
A random variable X characterizes a number that is a priori
unknown
23
• When the random experiment is carried out, the random
variable X takes on the value x
• x is called realization or value of the random variable X after
the random experiment has been carried out
• Random variables are denoted by capital letters, realizations
are denoted by small letters
• The rv X describes the situation ex ante, i.e. before carrying
out the random experiment
• The realization x describes the situation ex post, i.e. after
having carried out the random experiment
24
Example 1:
• Consider the experiment of tossing a single coin (H=Head,
T =Tail). Let the rv X represent the ’Number of Heads’
• We have
Ω = {H, T }
The random variable X can take on two values:
X(T ) = 0,
X(H) = 1
25
Example 2:
• Consider the experiment of tossing a coin three times. Let
X represent the ’Number of Heads’
• We have
Ω = {(H,
H,
H)}, |(H, {z
H, T )}, . . . , |(T, {z
T, T )}}
{z
|
=ω1
=ω2
=ω8
The rv X is defined by
X(ω) = number of H in ω
• Obviously:
X relates distinct ω’s to the same number, e.g.
X((H, H, T )) = X((H, T, H)) = X((T, H, H)) = 2
26
Example 3:
• Consider the experiment of randomly selecting 1 person from
a group of people. Let X represent the person’s status of
employment
• We have
Ω = {’employed’
{z
}, ’unemployed’
|
{z
}}
|
=ω1
=ω2
• X can be defined as
X(ω1) = 1,
X(ω2) = 0
27
Example 4:
• Consider the experiment of measuring tomorrow’s price of a
specific stock. Let X denote the stock price
• We have Ω = [0, ∞), i.e. X is defined by
X(ω) = ω
Conclusion:
• The random variable X can take on distinct values with specific probabilities
28
Question:
• How can we determine these specific probabilities and how
can we calculate with them?
Simplifying notation: (a, b, x ∈ R)
• P (X = a) ≡ P ({ω|X(ω) = a})
• P (a < X < b) ≡ P ({ω|a < X(ω) < b})
• P (X ≤ x) ≡ P ({ω|X(ω) ≤ x})
Solution:
• We can compute these probabilities via the so-called cumulative distribution function of X
29
Intuitively:
• The cumulative distribution function of the random variable
X characterizes the probabilities according to which the possible values x are distributed along the real line
(the so-called distribution of X)
Definition 2.8: (Cumulative distribution function [cdf])
The cumulative distribution function of a random variable X,
denoted by FX , is defined to be the function
FX : R −→ [0, 1]
x −→ FX (x) = P ({ω|X(ω) ≤ x}) = P (X ≤ x).
30
Example:
• Consider the experiment of tossing a coin three times. Let
X represent the ’Number of Heads’
• We have
Ω = {(H,
H,
H)}, (H,
H, T )}, . . . , |(T, {z
T, T )}}
|
|
{z
{z
= ω1
= ω2
= ω8
• For the probabilities of X we find
P (X = 0) = P ({(T, T, T )}) = 1/8
P (X = 1) = P ({(T, T, H), (T, H, T ), (H, T, T )}) = 3/8
P (X = 2) = P ({(T, H, H), (H, T, H), (H, H, T )}) = 3/8
P (X = 3) = P ({(H, H, H)}) = 1/8
31
• Thus, the cdf is given by
FX (x) =


0.000





 0.125
0.5


 0.875




1
for x < 0
for 0 ≤ x < 1
for 1 ≤ x < 2
for 2 ≤ x < 3
for x ≥ 3
Remarks:
• In practice, it will be sufficient to only know the cdf FX of X
• In many situations, it will appear impossible to exactly specify
the sample space Ω or the explicit function X : Ω −→ R.
However, often we may derive the cdf FX from other factual
consideration
32
General properties of FX :
• FX (x) is a monotone, nondecreasing function
• We have
lim FX (x) = 0
x→−∞
and
lim FX (x) = 1
x→+∞
• FX is continuous from the right; that is,
lim
F (z) = FX (x)
z→x X
z>x
33
Summary:
• Via the cdf FX (x) we can answer the following question:
’What is the probability that the random variable X takes
on a value that does not exceed x?’
Now:
• Consider the question:
’What is the value which X does not exceed with a
prespecified probability p ∈ (0, 1)?’
−→ quantile function of X
34
Definition 2.9: (Quantile function)
Consider the rv X with cdf FX . For every p ∈ (0, 1) the quantile
function of X, denoted by QX (p), is defined as
QX : (0, 1) −→ R
−→ QX (p) = min{x|FX (x) ≥ p}.
p
The value of the quantile function xp = QX (p) is called the pth
quantile of X.
Remarks:
• The pth quantile xp of X is defined as the smallest number
x satisfying FX (x) ≥ p
• In other words: The pth quantile xp is the smallest value that
X does not exceed with probability p
35
Special quantiles:
• Median: p = 0.5
• Quartiles: p = 0.25, 0.5, 0.75
• Quintiles: p = 0.2, 0.4, 0.6, 0.8
• Deciles: p = 0.1, 0.2, . . . , 0.9
Now:
• Consideration of two distinct classes of random variables
(discrete vs. continuous rv’s)
36
Reason:
• Each class requires a specific mathematical treatment
Mathematical tools for analyzing discrete rv’s:
• Finite and infinite sums
Mathematical tools for analyzing continuous rv’s:
• Differential- and integral calculus
Remarks:
• Some rv’s are partly discrete and partly continuous
• Such rv’s are not treated in this course
37
Definition 2.10: (Discrete random variable)
A random variable X will be defined to be discrete if it can take
on either
(a) only a finite number of values x1, x2, . . . , xJ or
(b) an infinite, but countable number of values x1, x2, . . .
each with strictly positive probability; that is, if for all j =
1, . . . , J, . . . we have
P (X = xj ) > 0
and
J,...
X
P (X = xj ) = 1.
j=1
38
Examples of discrete variables:
• Countable variables (’X = Number of . . .’)
• Encoded qualitative variables
Further definitions:
Definition 2.11: (Support of a discrete random variable)
The support of a discrete rv X, denoted by supp(X), is defined
to be the totality of all values that X can take on with a strictly
positive probability:
supp(X) = {x1, . . . , xJ }
or
supp(X) = {x1, x2, . . .}.
39
Definition 2.12: (Discrete density function)
For a discrete random variable X the function
fX (x) = P (X = x)
is defined to be the discrete density function of X.
Remarks:
• The discrete density function fX (·) takes on strictly positive
values only for elements of the support of X. For realizations
of X that do not belong to the support of X, i.e. for x ∈
/
supp(X), we have fX (x) = 0:
fX (x) =
(
P (X = xj ) > 0
0
for x = xj ∈ supp(X)
for x ∈
/ supp(X)
40
• The discrete density function fX (·) has the following properties:
fX (x) ≥ 0 for all x
X
fX (xj ) = 1
xj ∈supp(X)
• For any arbitrary set A ⊂ R the probability of the event
{ω|X(ω) ∈ A} = {X ∈ A} is given by
P (X ∈ A) =
X
fX (xj )
xj ∈A
41
Example:
• Consider the experiment of tossing a coin three times and
let X = ’Number of Heads’
(see slide 31)
• Obviously: X is discrete and has the support
supp(X) = {0, 1, 2, 3}
• The discrete density function of X is given by
fX (x) =


 P (X = 0) = 0.125




 P (X = 1) = 0.375
P (X = 2) = 0.375



 P (X = 3) = 0.125



0
for x = 0
for x = 1
for x = 2
for x = 3
for x ∈
/ supp(X)
42
• The cdf of X is given by (see slide 32)
FX (x) =

 0.000




 0.125

0.5

 0.875





1
for x < 0
for 0 ≤ x < 1
for 1 ≤ x < 2
for 2 ≤ x < 3
for x ≥ 3
Obviously:
• The cdf FX (·) can be obtained from fX (·):
FX (x) = P (X ≤ x) =
X
{xj ∈supp(X)|xj ≤x}
=P (X=xj )
z }| {
fX (xj )
43
Conclusion:
• The cdf of a discrete random variable X is a step function
with steps at the points xj ∈ supp(X). The height of the
step at xj is given by
FX (xj ) − x→x
lim F (x) = P (X = xj ) = fX (xj ),
j
x<xj
i.e. the step height is equal to the value of the discrete density
function at xj
(relationship between cdf and discrete density function)
44
Now:
• Definition of continuous random variables
Intuitively:
• In contrast to discrete random variables, continuous random
variables can take on an uncountable number of values
(e.g. every real number on a given interval)
In fact:
• Definition of a continuous random variable is quite technical
45
Definition 2.13: (Continuous rv, probability density function)
A random variable X is called continuous if there exists a function
fX : R −→ [0, ∞) such that the cdf of X can be written as
FX (x) =
Z x
−∞
fX (t)dt
for all x ∈ R.
The function fX (x) is called the probability density function (pdf)
of X.
Remarks:
• The cdf FX (·) of a continuous random variable X is a primitive function of the pdf fX (·)
• FX (x) = P (X ≤ x) is equal to the area under the pdf fX (·)
between the limits −∞ and x
46
Cdf FX (·) and pdf fX (·)
fX(t)
P(X ≤ x) = FX(x)
x
t
47
Properties of the pdf fX (·):
1. A pdf fX (·) cannot take on negative value, i.e.
fX (x) ≥ 0
for all x ∈ R
2. The area under a pdf is equal to one, i.e.
Z +∞
−∞
fX (x)dx = 1
3. If the cdf FX (x) is differentiable we have
0 (x) ≡ dF (x)/dx
fX (x) = FX
X
48
Example: (Uniform distribution over [0, 10])
• Consider the random variable X with pdf
fX (x) =
(
0
0.1
, for x ∈
/ [0, 10]
, for x ∈ [0, 10]
• Derivation of the cdf FX :
For x < 0 we have
FX (x) =
Z x
−∞
fX (t) dt =
Z x
−∞
0 dt = 0
49
For x ∈ [0, 10] we have
FX (x) =
Z x
=
Z 0
−∞
fX (t) dt
0 dt +
| −∞
{z
=0
}
Z x
0
0.1 dt
= [0.1 · t]x0
= 0.1 · x − 0.1 · 0
= 0.1 · x
50
For x > 10 we have
FX (x) =
Z x
=
Z 0
−∞
fX (t) dt
0 dt +
{z
| −∞
=0
= 1
}
Z 10
|0
0.1 dt +
{z
=1
}
Z ∞
0 dt
| 10{z }
=0
51
Now:
• Interval probabilities, i.e. (for a, b ∈ R, a < b)
P (X ∈ (a, b]) = P (a < X ≤ b)
• We have
P (a < X ≤ b) = P ({ω|a < X(ω) ≤ b})
= P ({ω|X(ω) > a} ∩ {ω|X(ω) ≤ b})
= 1 − P ({ω|X(ω) > a} ∩ {ω|X(ω) ≤ b})
= 1 − P ({ω|X(ω) > a} ∪ {ω|X(ω) ≤ b})
= 1 − P ({ω|X(ω) ≤ a} ∪ {ω|X(ω) > b})
52
= 1 − [P (X ≤ a) + P (X > b)]
= 1 − [FX (a) + (1 − P (X ≤ b))]
= 1 − [FX (a) + 1 − FX (b)]
= FX (b) − FX (a)
=
Z b
=
Z b
−∞
a
fX (t) dt −
Z a
−∞
fX (t) dt
fX (t) dt
53
Interval probability between the limits a and b
fX(x)
P(a < X ≤ b)
a
b
x
54
Important result for a continuous rv X:
P (X = a) = 0
for all a ∈ R
Proof:
P (X = a) = lim P (a < X ≤ b) = lim
b→a
=
Z a
a
Z b
b→a a
fX (x) dx
fX (x)dx = 0
Conclusion:
• The probability that a continuous random variable X takes
on a single explicit value is always zero
55
Probability of a single value
fX(x)
a
b3
b2
b1
x
56
Notice:
• This does not imply that the event {X = a} cannot occur
Consequence:
• Since for continuous random variables we always have P (X =
a) = 0 for all a ∈ R, it follows that
P (a < X < b) = P (a ≤ X < b) = P (a ≤ X ≤ b)
= P (a < X ≤ b) = FX (b) − FX (a)
(when computing interval probabilities for continuous rv’s, it
does not matter if the interval is open or closed)
57
2.3 Expectation, Moments and Moment Generating Functions
Repetition:
• Expectation of an arbitrary random variable X
Definition 2.14: (Expectation)
The expectation of the random variable X, denoted by E(X), is
defined by
E(X) =

X

xj · P (X = xj )



 {x ∈supp(X)}

j






Z +∞
−∞
x · fX (x) dx
, if X is discrete
.
, if X is continuous
58
Remarks:
• The expectation of the random variable X is approximately
equal to the sum of all realizations each weighted by the
probability of its occurrence
• Instead of E(X) we often write µX
• There exist random variables that do not have an expectation
(see class)
59
Example 1: (Discrete random variable)
• Consider the experiment of tossing two dice. Let X represent the absolute difference of the two dice. What is the
expectation of X?
• The support of X is given by
supp(X) = {0, 1, 2, 3, 4, 5}
60
• The discrete density function of X is given by
fX (x) =


 P (X = 0) = 6/36



 P (X = 1) = 10/36



 P (X = 2) = 8/36

P (X = 3) = 6/36



P (X = 4) = 4/36



 P (X = 5) = 2/36




0
• This gives
for x = 0
for x = 1
for x = 2
for x = 3
for x = 4
for x = 5
for x ∈
/ supp(X)
10
8
6
4
2
6
+1·
+2·
+3·
+4·
+5·
E(X) = 0 ·
36
36
36
36
36
36
=
70
= 1.9444
36
61
Example 2: (Continuous random variable)
• Consider the continuous random variable X with pdf

 x
, for 1 ≤ x ≤ 3
fX (x) =
4
 0
, elsewise
• To calculate the expectation we split up the integral:
E(X) =
=
Z +∞
−∞
Z 1
x · fX (x) dx
Z 3
Z
+∞
x
0 dx +
0 dx
x · dx +
4
−∞
3
1
62
=
•
”
1 1 3 3
dx =
·
·x
4 3
1 4
1
Z 3 2
x
’
27 1
1
=
·
−
4
3
3
“
26
=
= 2.1667
12
Frequently:
• Random variable X plus discrete density or pdf fX is known
• We have to find the expectation of the transformed random
variable
Y = g(X)
63
Theorem 2.15: (Expectation of a transformed rv)
Let X be a random variable with discrete density or pdf fX (·).
For any Baire-function g : R −→ R the expectation of the transformed random variable Y = g(X) is given by
E(Y ) = E[g(X)]
=

X

g(xj ) · P (X = xj )




 {xj ∈supp(X)}






Z +∞
−∞
g(x) · fX (x) dx
, if X is discrete
.
, if X is continuous
64
Remarks:
• All functions considered in this course are Baire-functions
• For the special case g(x) = x (the identity function) Theorem
2.15 coincides with Definition 2.14
Next:
• Some important rules for calculating expected values
65
Theorem 2.16: (Properties of expectations)
Let X be an arbitrary random variable (discrete or continuous),
c, c1, c2 ∈ R constants and g, g1, g2 : R −→ R functions. Then:
1. E(c) = c.
2. E[c · g(X)] = c · E[g(X)].
3. E[c1 · g1(X) + c2 · g2(X)] = c1 · E[g1(X)] + c2 · E[g2(X)].
4. If g1(x) ≤ g2(x) for all x ∈ R then
E[g1(X)] ≤ E[g2(X)].
Proof: Class
66
Now:
• Consider the random variable X (discrete or continuous) and
the explicit function g(x) = [x − E(X)]2
−→ variance and standard deviation of X
Definition 2.17: (Variance, standard deviation)
For any random variable X the variance, denoted by Var(X), is
defined as the expected quadratic distance between X and its
expectation E(X); that is
Var(X) = E[(X − E(X))2].
The standard deviation of X, denoted by SD(X), is defined to
be the (positive) square root of the variance:
q
SD(X) = + Var(X).
67
Remark:
• Setting g(X) = [X − E(X)]2 in Theorem 2.15 (on slide 64)
yields the following explicit formulas for discrete and continuous random variables:
Var(X) = E[g(X)]
=

X
2 · P (X = x )

(X)]
[x
−
E

j
j



 {xj ∈supp(X)}






Z +∞
−∞
[x − E(X)]2 · fX (x) dx
68
Example: (Discrete random variable)
• Consider again the experiment of tossing two dice with X
representing the absolute difference of the two dice (see Example 1 on slide 60). The variance is given by
Var(X) = (0 − 70/36)2 · 6/36 + (1 − 70/36)2 · 10/36
+ (2 − 70/36)2 · 8/36 + (3 − 70/36)2 · 6/36
+ (4 − 70/36)2 · 4/36 + (5 − 70/36)2 · 2/36
= 2.05247
Notice:
• The variance is an expectation per definitionem
−→ rules for expectations are applicable
69
Theorem 2.18: (Rules for variances)
Let X be an arbitrary random variable (discrete or continuous)
and a, b ∈ R real constants; then
1. Var(X) = E(X 2) − [E(X)]2.
2. Var(a + b · X) = b2 · Var(X).
Proof: Class
Next:
• Two important inequalities dealing with expectations and
transformed random variables
70
Theorem 2.19: (Chebyshev inequality)
Let X be an arbitrary random variable and g : R −→ R+ a nonnegative function. Then, for every k > 0 we have
P [g(X) ≥ k] ≤
E [g(X)]
.
k
Special case:
• Consider
g(x) = [x − E(X)]2
and
k = r2 · Var(X)
• Theorem 2.19 implies
n
P [X − E(X)]
2
≥ r2 · Var(X)
o
(r > 0)
Var(X)
1
≤ 2
= 2
r · Var(X)
r
71
• Now:
n
P [X − E(X)]
2
≥ r2 · Var(X)
o
= P {|X − E(X)| ≥ r · SD(X)}
= 1 − P {|X − E(X)| < r · SD(X)}
• It follows that
1
P {|X − E(X)| < r · SD(X)} ≥ 1 − 2
r
(specific Chebyshev inequality)
72
Remarks:
• The specific Chebyshev inequality provides a minimal probability of the event that any arbitrary random variable X takes
on a value from the following interval:
[E(X) − r · SD(X), E(X) + r · SD(X)]
• For example, for r = 3 we have
1
8
P {|X − E(X)| < 3 · SD(X)} ≥ 1 − 2 =
3
9
which is equivalent to
P {E(X) − 3 · SD(X) < X < E(X) + 3 · SD(X)} ≥ 0.8889
or
P {X ∈ (E(X) − 3 · SD(X), E(X) + 3 · SD(X))} ≥ 0.8889
73
Theorem 2.20: (Jensen inequality)
Let X be a random variable with mean E(X) and let g : R −→ R
be a convex function, i.e. for all x we have g 00(x) ≥ 0; then
E [g(X)] ≥ g(E[X]).
Remarks:
• If the function g is concave (i.e. if g 00(x) ≤ 0 for all x) then
Jensen’s inequality states that E [g(X)] ≤ g(E[X])
• Notice that in general we have
E [g(X)] =
6 g(E[X])
74
Example:
• Consider the random variable X and the function g(x) = x2
• We have g 00(x) = 2 ≥ 0 for all x, i.e. g is convex
• It follows from Jensen’s inequality that
E
E[X])
[g(X)] ≥ g(
| {z }
| {z }
=E(X 2)
i.e.
=[E(X)]2
E(X 2) − [E(X)]2 ≥ 0
• This implies
Var(X) = E(X 2) − [E(X)]2 ≥ 0
(the variance of an arbitrary rv cannot be negative)
75
Now:
• Consider the random variable X with expectation E(X) = µX ,
the integer number n ∈ N and the functions
g1(x) = xn
g2(x) = [x − µX ]n
Definition 2.21: (Moments, central moments)
(a) The n-th moment of X, denoted by µ0n, is defined as
µ0n ≡ E[g1(X)] = E(X n).
(b) The n-th central moment of X about µX , denoted by µn, is
defined as
µn ≡ E[g2(X)] = E[(X − µX )n].
76
Relations:
• µ01 = E(X) = µX
(the 1st moment coincides with E(X))
• µ1 = E[X − µX ] = E(X) − µX = 0
(the 1st central moment is always equal to 0)
• µ2 = E[(X − µX )2] = Var(X)
(the 2nd central moment coincides with Var(X))
77
Remarks:
• The first four moments of a random variable X are important
measures of the probability distribution
(expectation, variance, skewness, kurtosis)
• The moments of a random variable X play an important role
in theoretical and applied statistics
• In some cases, when all moments are known, the cdf of a
random variable X can be determined
78
Question:
• Can we find a function that gives us a representation of all
moments of a random variable X?
Definition 2.22: (Moment generating function)
Let X be a random variable with discrete density or pdf fX (·).
The expected value of et·X is defined to be the moment generating function of X if the expected value exists for every value
of t in some interval −h < t < h, h > 0. That is, the moment
generating function of X, denoted by mX (t), is defined as
mX (t) = E
h
i
t·X
e
.
79
Remarks:
• The moment generating function mX (t) is a function in t
• There are rv’s X for which mX (t) does not exist
• If mX (t) exists it can be calculated as
mX (t) = E
=
h
et·X
i

X

et·xj · P (X = xj )



 {x ∈supp(X)}

j






Z +∞
−∞
et·x · fX (x) dx
, if X is discrete
, if X is continuous
80
Question:
• Why is mX (t) called the moment generating function?
Answer:
• Consider the nth derivative of mX (t) with respect to t:

X
n · et·xj · P (X = x )

(x
)

j
j



 {xj ∈supp(X)}
dn
mX (t) =

dtn





Z +∞
−∞
xn · et·x · fX (x) dx
for discrete X
for continuous X
81
• Now, evaluate the nth derivative at t = 0:

X

(xj )n · P (X = xj )




 {xj ∈supp(X)}
dn
mX (0) =
n

dt





Z +∞
−∞
xn · fX (x) dx
for discrete X
for continuous X
= E(X n) = µ0n
(see Definition 2.21(a) on slide 76)
82
Example:
• Let X be a continuous random variable with pdf
fX (x) =
(
0
λ · e−λ·x
, for x < 0
, for x ≥ 0
(exponential distribution with parameter λ > 0)
• We have
h
i
mX (t) = E et·X =
=
for t < λ
Z +∞
0
Z +∞
−∞
et·x · fX (x) dx
λ · e(t−λ)·x dx =
λ
λ−t
83
• It follows that
m0X (t) =
and thus
λ
(λ − t)2
0 (0) = E(X) =
mX
1
λ
and
and
m00X (t) =
2λ
(λ − t)3
2
m00X (0) = E(X 2) = 2
λ
Now:
• Important result on moment generating functions
84
Theorem 2.23: (Identification property)
Let X and Y be two random variables with densities fX (·) and
fY (·), respectively. Suppose that mX (t) and mY (t) both exist
and that mX (t) = mY (t) for all t in the interval −h < t < h for
some h > 0. Then the two cdf’s FX (·) and FY (·) are equal; that
is FX (x) = FY (x) for all x.
Remarks:
• Theorem 2.23 states that there is a unique cdf FX (x) for a
given moment generating function mX (t)
−→ if we can find mX (t) for X then, at least theoretically, we
can find the distribution of X
• We will make use of this property in Section 4
85
Example:
• Suppose that a random variable X has the moment generating function
mX (t) =
1
1−t
for − 1 < t < 1
• Then the pdf of X is given by
fX (x) =
(
0
e−x
, for x < 0
, for x ≥ 0
(exponential distribution with parameter λ = 1)
86
2.4 Special Parametric Families of Univariate Distributions
Up to now:
• General mathematical properties of arbitrary distributions
• Discrimination: discrete vs continuous distributions
• Consideration of
the cdf FX (x)
the discrete density or the pdf fX (x)
expectations of the form E[g(X)]
the moment generating function mX (t)
87
Central result:
• The distribution of a random variable X is (essentially) determined by fX (x) or FX (x)
• FX (x) can be determined by fX (x)
(cf. slide 46)
• fX (x) can be determined by FX (x)
(cf. slide 48)
Question:
• How many different distributions are known to exist?
88
Answer:
• Infinitely many
But:
• In practice, there are some important parametric families of
distributions that provide ’good’ models for representing realworld random phenomena
• These families of distributions are decribed in detail in all
textbooks on mathematical statistics
(see e.g. Mosler & Schmid (2008), Mood et al. (1974))
89
• Important families of discrete distributions
Bernoulli distribution
Binomial distribution
Geometric distribution
Poisson distribution
• Important families of continuous distributions
Uniform or rectangular distribution
Exponential distribution
Normal distribution
90
Remark:
• The most important family of distributions at all is the normal distribution
Definition 2.24: (Normal distribution)
A continuous random variable X is defined to be normally distributed with parameters µ ∈ R and σ 2 > 0, denoted by X ∼
N (µ, σ 2), if its pdf is given by
fX (x) = √
‘2
x−µ
−1
2
σ
1
·e
2π · σ

,
x ∈ R.
91
PDF’s of the normal distribution
fX(x)
N(5,1)
N(0,1)
N(5,3)
N(5,5)
0
5
x
92
Remarks:
• The special normal distribution N (0, 1) is called standard normal distribution the pdf of which is denoted by ϕ(x)
• The properties as well as calculation rules for normally distributed random variables are important pre-conditions for
this course
(see Wilfling (2011), Section 3.4)
93