Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Appendix II – Probability
Theory Refresher
Leonard Kleinrock, Queueing Systems, Vol I: Theory
Nelson Fonseca,
State University of Campinas, Brazil
Appendix II – Probability Theory
Refresher
• Random event: statistical regularity
• Example: If one were to toss a fair coin four
times, one expects on the average two heads
and two tails.There is one chance in sixteen
that88 no heads will occur. If we tossed the
coin a million times, the odds are better than
10 to 1 that at least 490.000 heads will
occur.
II.1 Rules of the game
• Real-world experiments:
– A set of possible experimental outcomes
– A grouping of these outcomes into classes
called results
– The relative frequency of these classes in many
independent trials of the experiment
Frequency = number of times the experimental
outcome falls into that class, divided by number
of times the experiment is performed
•
Mathematical model: three quantities of
interest that are in one-to-one relation with
the three quantities of experimental world
1. A sample space is a collection of objects
that corresponds to the set of mutually
exclusive exhaustive outcomes of the
model of an experiment. Each object is
in the set S is referred to as a sample point
2. A family of events denoted {A, B,
C,…}in which each event is a set of
samples points { }
3. A probability measure P which is an
assignment (mapping) of the events defined
on S into the set of real numbers. The
notation is P[A], and have these mapping
properties:
a) For any event A,0 <= P[A] <=1
(II.1)
b) P[S]=1
(II.2)
c) If A and B are “mutually exclusive” events
then P[A U B]=P[A]+P[B]
(II.3)
• Notation
Ac : not in A complement of A
S c null evnt (contains no sample point since S contais all the points)
If AB , then A and B are said to be mutually exclusive (or disjoint)
• Exhaustive set of events: a set of events
whose union forms the sample space S
• Set of mutually exclusive exhaustive events
A1, A2 ,..., An , which have the properties
Ai Aj for all i j
Ai A2 ... An S
• The triplet (S, , P) along with Axioms (II.2)(II.3) form a probability system
• Conditional probability
PAB
PA B
PB 0
PB
• The event B forces us to restrict attention from the
original sample space S to a new sample space
defined by the event B, since B must now have a
total probability of unity. We magnify the
probabilities associated with conditional events by
dividing by the term P[B]
• Two events A, B are said to be statistically
independent if and only if
PAB PAPB
• If A and B are independent
PA | B PA
• Theorem of total probability
n
PB PAi B
i 1
If the event B is to occur it must occur in
conjunction with exactly one of the
mutually exclusive exhaustive events Ai
• The second important form of the theorem
of total probability
PB PB Ai PAi
n
i 1
• Instead of calculating the probability of
some complex event B, we calculate the
occurrence of this event with mutually
exclusive events
PB PB Ai PAi PBAi PB
n
n
i 1
i 1
• Bayes’ theorem
PAi B
PAB Ai P[ Ai ]
PAB A P[ A ]
n
j 1
j
j
i
Where {A }are
a set of events mutually exclusive
and exhaustive
• Example: You have just entered a casino and
gamble with a twin brother, one is honest and the
other not. You know that you lose with
probability=½ if you play with the honest brother,
and lose with probability=P if you play with the
cheating brother
II.2 Random variables
• Random variable is a variable whose value
depends upon the outcome of a random
experiment
• To each outcome, we associate a real number,
which is in fact the value the random variable
takes on that outcome
• Random variable is a mapping from the points of
the sample space into the (real) line
• Example: If we win the game we win $5, if
we lose we win -$5 and if we draw we win
S
$0.
5 W
L
D
W
(3/8)
X ( ) 0
D
(1/4)
(3/8)
5
L
Notation : [ X x] : X ( ) x
P[X x] probabilit y that X( ) is equal to x
P[X -5 ] 3 8
P[X 0 ] 1 4
P[X 5 ] 3 8
• Probability distribution function (PDF), also
known as the cumulative distribution
function
X x : X ( ) x
PDF : FX ( x) PX x
Properties : Fx ( x) 0
Fx () 1
Fx () 0
Fx (b) Fx (a ) P[a X b] for a b
Fx (b) Fx (a )
for a b
FX (x)
3
8
3
8
5
8
1
1
4
x
-5
0
+5
P[2 x 6] 5 8
P[1 x 4] 0
At points of discontinu ity the PDF takes on the upper valu e
• Probability density function (pdf)
dFX ( x)
f X ( x)
dx
x
FX ( x) f X ( y )dy
We have f X ( x) 0 and FX () 1 then
f X ( x)dx 1
• The pdf integrated over an interval gives the
probability that the random variable X lies
in that interval
Pa X b f X ( x)dx
b
a
• Distributed random variable
PDF :
pdf :
1 e x 0 x
FX ( x)
x0
0
e x 0 x
f X ( x)
x0
0
0
P[a x b] FX (b) FX (a) e a e b
b
P[a x b] f X ( x)dx e a e b
a
f X (x)
3
8
-5
1
4
0
3
8
+5
• Impulse function (discontinuous)
– Functions of more than one variable
FXY ( x, y ) P[ X x, Y y ]
d 2 FXY ( x, y )
f XY ( x, y )
dxdy
– “Marginal” density function
f X ( x)
y
f XY ( x, y )dy
– Two random variables X and Y are said to be
independent if and only if
f XY ( x, y) f X ( x) fY ( y)
f X1 X 2 ... X n ( x1 , x2 ,..., xn ) f X1 ( x1 ) f X 2 ( x2 ) f Xn ( xn )
• We can define conditional distributions and
densities
d
f XY ( x, y )
f X Y ( x y ) P X x Y y
dx
fY ( y )
• Function of one random variable
Y g( X )
Y Y ( ) g ( X ( ))
• Given the random variable X and its PDF, one
should be able to calculate the PDF for the
variable Y
FY ( y ) PY y P : g ( X ( )) y
In general cases
n
Y Xi
i 1
For the case of n 2, y x1 x2
FY ( y) PY y PX1 X 2 y
X1
y
y
X1 X 2 y 0
FY ( y)
X2
f X1 X 2 ( x1 , x2 )dx1dx2
Due to the independen ce of X1 and X 2 we then obtain
the PDF for Y as
y x2
FY ( y )
f X 1 ( x1 )dx1 f X 2 ( x2 )dx2
FY ( y ) F X1 ( y x2 ) f X 2 ( x2 )dx2
fY ( y )
f X 1 ( y x2 ) f X 2 ( x2 )dx2
fY ( y ) f X1 ( y ) f X 2 ( y )
fY ( y ) f X1 ( y ) f X 2 ( y ) f X n ( y )
II.3 Expectation
• Stieltjes integrals deal with discontinuities
and impulses
Let F(x) : a nondecreas ing function
(x) : a continuous function
{t k } and { k } : two sets of points such that t k 1 k t k
and there is a limit | t k t k 1 | 0
(
k
k
)[ F (t k ) F (t k 1 )] ( x)dF ( x)
PDF F ( x) and pdf dF ( x) f ( x)
dF ( x) f ( x)dx
• The Stieltjes integral will always exist and therefore it
avoids the issue of impulses
• Without impulses the pdf may not exist
• When impulses are permitted we have
( x)dF ( x) ( x) f ( x)dx
The expectatio n of a real random variable is
E[ X ] X xdFX ( x)
E[ X ] X xf X ( x)dx
The mean or average value of X is
0
0
E[ X ] [1 FX ( x)]dx FX ( x)dx
Expeted value - Random function
Y g( X )
EY [Y ] yfY ( y )dy
EY [ y ] E X [ g ( X )] g ( x) f X ( x)dx
• Expectation of the sum of two random variables
E[ X Y ]
( x y) f ( x, y)dxdy
xf ( x, y )dxdy
xf ( x)dx yf ( y )dy
XY
XY
X
Y
E[ X ] E[Y ]
E[ X Y ] E[ X ] E[Y ]
X Y X Y
yf XY ( x, y )dxdy
• The expectation of the sum of two random
variables is always equal to the sum of the
expectations of each variable
• This is true even if the variables are dependent
• The expectation operator is a linear operator
E[ X 1 X 2 ... X n ] E[ X 1 ] E[ X 2 ] ... E[ X n ]
The question is: what is the probability of your being
playing with the cheating brother since you lost?
PDC L
PL DC PDC
PL DC PDC PL DH PDH
1
p
2p
2
1 1 1 2 p 1
p
2 2 2
N!
N ( N 1) ( N K 1)
( N K )!
The number of combinatio ns of N things taken K at a time is denoted
N
N!
by
K K !( N K )!
E[ XY ]
E[ XY ]
xyf XY ( x, y )dxdy
xyf X ( x) fY ( y )dxdy E[ X ]E[Y ]
XY X Y
– The expected result of the product of variables is equal
to the product of the expected values if the variables
are independent
– Expected value of the product of random functions
E[ g ( X )h(Y )] E[ g ( X )]E[h(Y )]
– nth moment
E[ X ] X x n f X ( x)dx
n
n
– nth central moment
( X X ) ( x X ) n f X ( X )dx
n
– The nth central moment can be expressed as a function
of n moments
n k
( X X ) X ( X ) n k
k 0 k
n
n
n k
( X X ) X ( X ) n k
k 0 k
n
n k
X ( X ) n k
k 0 k
n
n
– First central moment = 0
(X X ) X X 0
– Second central moment => variance
( X X ) X 2 ( X )2
2
x
2
– Standard deviation (central moment)
x X2
– Coefficient of variation
CX
X
X
• Covariance of two random variables X1 and X2
Cov (X1, X2) = E[(X1 – E[X1]) (X2 – E[X2])]
var (X1 + X2) = var (X1)+var (X2) + 2Cov(X1, X2)
Corr (X1, X2) = Cov (X1, X2) / (1 2)
Normal
Notation
X ~ Nor ( , )
2
Range
X
Parameters – Scale
:0
Parameters – Shape
:
Normal
Probability Density Function
f (X )
e
1 X
2
2
2
1 X
exp
2
f (X )
2
2
Normal
=10 =2
=10 =1
Normal
=0 =2
=0 =1
=0 =1
Normal
Normal
Expected Value
E (X )
Normal
Variance
V (X )
2
Chebyshev Inequality
P
x X x 2
x
2
Strong Law of Large Numbers
Wn
Wn X
1
n
n
x
i
i 1
W
n
2
X
n
2
Strong Law of Large Numbers
lim
n
Wn X
Central Limit Theorem
n
Zn
X
i 1
X
i
nX
n
lim PZ n x x
n
Exponential
Probability Density Function
f ( X ) e
X
Distribution Function
F ( X ) 1 e
X
Exponential
• Inter arrival time of phone calls
• Inter arrival time of web session
• Duration of on and off periods for
voice models
Heavy-tailed distributions
PZ x cx
0 2
x
Heavy- Tailed distributions
• Hyperbolic decay
• Infinite variance
0 2
• Unbounded mean
0 1
• Network context
1 2
Pareto
Notation
X ~ Par( , )
Range
X
Parameters – Scale
:0
Parameters – Shape
:0
Pareto
Distribution Function
F ( x) 1
x
Pareto
Probability Density Function
f (X )
X
f (X )
X
1
1
Pareto
=1 =1
=1 =2
Pareto
=10 =5
=5 =10
=5 =10
Pareto
Pareto
Expected Value
E (X )
1
1
Pareto
Moments Uncentered
'
j
j
j
j
Pareto
• Distribution of file size in Unix
systems
• Duration of on and off periods in data
models (ethernet individual user)
Weibull
Notation
X ~ Wei(b, c)
Range
0 X
Parameters – Scale
b:0 b
Parameters – Shape
c:0 c
Weibull
Probability Density Function
f ( X ) cb X e
c
c 1
( X / b )c
X
cX
f (X )
exp
b
b
c 1
c
c
Weibull
Distribution Function
F ( x) 1 e
( X / b )c
X
F ( x) 1 exp
b
c
Weibull
b=1 c=1
b=2 c=1
Weibull
b=1 c=2
b=2 c=2
Weibull
b=10 c=5
b=5 c=10
b=25 c=10
Weibull
Weibull
Moments Uncentered
' b [(c j ) / c]
j
j
c
' b
c
j
j
j
j
b
1
c
j
Weibull
Expected Value
c 1
1 b 1
E ( X ) b
b
1
c c c
c
b
E ( X ) [1 / c]
c
Weibull
Variance
2 1 1
2
c c c
b
1
V ( X ) 22 / c 1 / c
c
c
2
b
V (X )
c
2
2
2
Lognormal
Notation
X ~ Logn ( , )
2
Range
0 X
Parameters – Scale
: 0 or m : m 0
Parameters – Shape
: 0 or w : w 0
Lognormal
Probability Density Function
1 ln ( X )
exp
2
f (X )
2 X
2
2
2
Lognormal
Expected Value
E( X ) e
1 2
2
1
exp
2
or
E( X ) m w
2
Lognormal
Variance
V (X ) e
2 2 2
e
2 2
V ( X ) e e e 1
V ( X ) exp[2 ] exp[ ] exp[ ] 1
or
V ( X ) m w( w 1)
2
2
2
2
2
2
Lognormal
=0 =0.5
=0 =0.7
Lognormal
=1 =0.5
=1 =0.7
Lognormal
=0 =0.1
=1 =0.1
Lognormal
=0 =1
=1 =1
=0 =1
Lognormal
Lognormal
• Multiplicative efffect
II.4 Transforms, generating functions and
characteristic function
• Characteristic function of a random variable
x(X(u)) is given by:
X (u) Ε[e
juX
] e jux f X ( x)dx
j 1
– u – real variable
X (u) e jux f X ( x) dx
e juX 1
X (u) f X ( x)dx
X (u) 1
– Expanding ejux and integrating
( jux) 2
X (u) f X ( x)[1 jux
...]dx
2!
( ju) 2 2
1 ju X
x ...
2!
X( 0 ) 1
d n X (u)
n
n
j
X
du n u 0
n
d
– Notation g ( n ) ( x0 ) g(x)
dx n x x
0
X( n)( 0 ) j ( n) X n
– Moment generation function
M x (v) E[e vX ]
e vx f x ( x)dx
M X( n ) (0) X n
• Laplace transform of the pdf
– Notation:
A( x) P[ X x]
a( x)
PDF
p.d . f .
A* ( x)
Transform
A ( s ) E[e sX ]
*
e sx a ( x)dx
A*(n ) (0) (1) n X n
x ( sj ) MX ( s ) A* ( s )
X n j n X( n ) (0)
X n M X( n ) (0)
X n (1) n A*(n ) (0)
– Example
e x
f X ( x) a( x)
0
x (u )
x0
x0
ju
M X (v )
v
*
A ( s)
s
X (0) M X (0) A* (0) 1
X
1
X
2
2
2
– Probability generating function – discrete variable
G ( z ) E[ z X ] z k g k
k
G (1) (1) X
G ( 2) (1) X 2 X
G (1) 1
– Sum of n independent variables
u
xi , Y X i
i 1
ju i1 X i
juY
Y (u ) E[e ] E e
E[e juX1 e juX 2 e juX n ]
u
Y (u ) E[e juX ]E[e juX ] E[e juX ]
Y (u ) X (u ) X (u ) X (u )
1
n
2
1
2
– xi – Identically distributed
Y (u ) [ X (u )]n
n
– Sum of independent variables
Y X1 X 2 X n
n2
Y 2 ( X 1 X 2 ) 2 X 12 2 X 1 X 2 X 22
(Y ) 2 ( X 1 X 2 ) 2 ( X 1 ) 2 2 X 1 X 2 ( X 2 ) 2
Y2 Y 2 (Y ) 2 X 12 ( X 1 ) 2 X 22 ( X 2 ) 2 2( X 1 X 2 X 1 X 2 )
x21 x22 2( X 1 X 2 X 1 X 2 )
– x1 and x2 independent
X1 X 2 X1 X 2
Y2 X2 X2
1
2
– The variance of the sum of independent random
variables is equal to the sum of variances
Y2 X2 X2 X2
1
2
n
– Variable sum of independent variables and the
number of variables is a random variable
N
Y Xi
i 1
– Where N: is a random variable with mean N and
variance X2
– [ Xi ] is independent and identically distributed
– N and [ Xi ] independent
– FY(y) - Compound distribution
i1
Y ( s ) E e
n
s i1 X i
E e
P[ N n]
n 0
N
s X i
*
E[e sX1 ] E[e sXn ]P[ N n]
n 0
– [ Xi ] - identically distributed variables
Y ( s ) [ X * ( s )]n P[ N n]
n 0
*
z - transform for N
Y * ( s ) N ( X * ( s ))
Y NX
Y2 N X2 ( X ) 2 N2
II.6. Stochastic process
– To each point of the sample process space S a time
function x is associated => Stochastic process family
PDF :
FX ( x, t ) P[ X (t ) x]
FX ( x, t ) FX 1 X 2 X n ( x1 , x2 , , xn ; t1 , t 2 , t n )
P[ X (t1 ) x1 , X (t 2 ) x2 , X (t n ) xn ]
FX ( X ; t ) FX ( X ; t )
pdf .
FX ( x; t )
f X ( X , t)
X
X (t ) E[ X (t )] xf X ( x; t )dx
– Autocorrelation:
RXX (t1 , t2 ) E[ X (t1 ) X (t2 )]
x1 x2 f X1 X 2 ( x1 , x2 ; t1t2 )dx1dx2
– Wide sense stationary process
X (t ) X
RXX (t1 , t 2 ) RXX (t 2 t1 )