Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Appendix II – Probability Theory Refresher Leonard Kleinrock, Queueing Systems, Vol I: Theory Nelson Fonseca, State University of Campinas, Brazil Appendix II – Probability Theory Refresher • Random event: statistical regularity • Example: If one were to toss a fair coin four times, one expects on the average two heads and two tails.There is one chance in sixteen that88 no heads will occur. If we tossed the coin a million times, the odds are better than 10 to 1 that at least 490.000 heads will occur. II.1 Rules of the game • Real-world experiments: – A set of possible experimental outcomes – A grouping of these outcomes into classes called results – The relative frequency of these classes in many independent trials of the experiment Frequency = number of times the experimental outcome falls into that class, divided by number of times the experiment is performed • Mathematical model: three quantities of interest that are in one-to-one relation with the three quantities of experimental world 1. A sample space is a collection of objects that corresponds to the set of mutually exclusive exhaustive outcomes of the model of an experiment. Each object is in the set S is referred to as a sample point 2. A family of events denoted {A, B, C,…}in which each event is a set of samples points { } 3. A probability measure P which is an assignment (mapping) of the events defined on S into the set of real numbers. The notation is P[A], and have these mapping properties: a) For any event A,0 <= P[A] <=1 (II.1) b) P[S]=1 (II.2) c) If A and B are “mutually exclusive” events then P[A U B]=P[A]+P[B] (II.3) • Notation Ac : not in A complement of A S c null evnt (contains no sample point since S contais all the points) If AB , then A and B are said to be mutually exclusive (or disjoint) • Exhaustive set of events: a set of events whose union forms the sample space S • Set of mutually exclusive exhaustive events A1, A2 ,..., An , which have the properties Ai Aj for all i j Ai A2 ... An S • The triplet (S, , P) along with Axioms (II.2)(II.3) form a probability system • Conditional probability PAB PA B PB 0 PB • The event B forces us to restrict attention from the original sample space S to a new sample space defined by the event B, since B must now have a total probability of unity. We magnify the probabilities associated with conditional events by dividing by the term P[B] • Two events A, B are said to be statistically independent if and only if PAB PAPB • If A and B are independent PA | B PA • Theorem of total probability n PB PAi B i 1 If the event B is to occur it must occur in conjunction with exactly one of the mutually exclusive exhaustive events Ai • The second important form of the theorem of total probability PB PB Ai PAi n i 1 • Instead of calculating the probability of some complex event B, we calculate the occurrence of this event with mutually exclusive events PB PB Ai PAi PBAi PB n n i 1 i 1 • Bayes’ theorem PAi B PAB Ai P[ Ai ] PAB A P[ A ] n j 1 j j i Where {A }are a set of events mutually exclusive and exhaustive • Example: You have just entered a casino and gamble with a twin brother, one is honest and the other not. You know that you lose with probability=½ if you play with the honest brother, and lose with probability=P if you play with the cheating brother II.2 Random variables • Random variable is a variable whose value depends upon the outcome of a random experiment • To each outcome, we associate a real number, which is in fact the value the random variable takes on that outcome • Random variable is a mapping from the points of the sample space into the (real) line • Example: If we win the game we win $5, if we lose we win -$5 and if we draw we win S $0. 5 W L D W (3/8) X ( ) 0 D (1/4) (3/8) 5 L Notation : [ X x] : X ( ) x P[X x] probabilit y that X( ) is equal to x P[X -5 ] 3 8 P[X 0 ] 1 4 P[X 5 ] 3 8 • Probability distribution function (PDF), also known as the cumulative distribution function X x : X ( ) x PDF : FX ( x) PX x Properties : Fx ( x) 0 Fx () 1 Fx () 0 Fx (b) Fx (a ) P[a X b] for a b Fx (b) Fx (a ) for a b FX (x) 3 8 3 8 5 8 1 1 4 x -5 0 +5 P[2 x 6] 5 8 P[1 x 4] 0 At points of discontinu ity the PDF takes on the upper valu e • Probability density function (pdf) dFX ( x) f X ( x) dx x FX ( x) f X ( y )dy We have f X ( x) 0 and FX () 1 then f X ( x)dx 1 • The pdf integrated over an interval gives the probability that the random variable X lies in that interval Pa X b f X ( x)dx b a • Distributed random variable PDF : pdf : 1 e x 0 x FX ( x) x0 0 e x 0 x f X ( x) x0 0 0 P[a x b] FX (b) FX (a) e a e b b P[a x b] f X ( x)dx e a e b a f X (x) 3 8 -5 1 4 0 3 8 +5 • Impulse function (discontinuous) – Functions of more than one variable FXY ( x, y ) P[ X x, Y y ] d 2 FXY ( x, y ) f XY ( x, y ) dxdy – “Marginal” density function f X ( x) y f XY ( x, y )dy – Two random variables X and Y are said to be independent if and only if f XY ( x, y) f X ( x) fY ( y) f X1 X 2 ... X n ( x1 , x2 ,..., xn ) f X1 ( x1 ) f X 2 ( x2 ) f Xn ( xn ) • We can define conditional distributions and densities d f XY ( x, y ) f X Y ( x y ) P X x Y y dx fY ( y ) • Function of one random variable Y g( X ) Y Y ( ) g ( X ( )) • Given the random variable X and its PDF, one should be able to calculate the PDF for the variable Y FY ( y ) PY y P : g ( X ( )) y In general cases n Y Xi i 1 For the case of n 2, y x1 x2 FY ( y) PY y PX1 X 2 y X1 y y X1 X 2 y 0 FY ( y) X2 f X1 X 2 ( x1 , x2 )dx1dx2 Due to the independen ce of X1 and X 2 we then obtain the PDF for Y as y x2 FY ( y ) f X 1 ( x1 )dx1 f X 2 ( x2 )dx2 FY ( y ) F X1 ( y x2 ) f X 2 ( x2 )dx2 fY ( y ) f X 1 ( y x2 ) f X 2 ( x2 )dx2 fY ( y ) f X1 ( y ) f X 2 ( y ) fY ( y ) f X1 ( y ) f X 2 ( y ) f X n ( y ) II.3 Expectation • Stieltjes integrals deal with discontinuities and impulses Let F(x) : a nondecreas ing function (x) : a continuous function {t k } and { k } : two sets of points such that t k 1 k t k and there is a limit | t k t k 1 | 0 ( k k )[ F (t k ) F (t k 1 )] ( x)dF ( x) PDF F ( x) and pdf dF ( x) f ( x) dF ( x) f ( x)dx • The Stieltjes integral will always exist and therefore it avoids the issue of impulses • Without impulses the pdf may not exist • When impulses are permitted we have ( x)dF ( x) ( x) f ( x)dx The expectatio n of a real random variable is E[ X ] X xdFX ( x) E[ X ] X xf X ( x)dx The mean or average value of X is 0 0 E[ X ] [1 FX ( x)]dx FX ( x)dx Expeted value - Random function Y g( X ) EY [Y ] yfY ( y )dy EY [ y ] E X [ g ( X )] g ( x) f X ( x)dx • Expectation of the sum of two random variables E[ X Y ] ( x y) f ( x, y)dxdy xf ( x, y )dxdy xf ( x)dx yf ( y )dy XY XY X Y E[ X ] E[Y ] E[ X Y ] E[ X ] E[Y ] X Y X Y yf XY ( x, y )dxdy • The expectation of the sum of two random variables is always equal to the sum of the expectations of each variable • This is true even if the variables are dependent • The expectation operator is a linear operator E[ X 1 X 2 ... X n ] E[ X 1 ] E[ X 2 ] ... E[ X n ] The question is: what is the probability of your being playing with the cheating brother since you lost? PDC L PL DC PDC PL DC PDC PL DH PDH 1 p 2p 2 1 1 1 2 p 1 p 2 2 2 N! N ( N 1) ( N K 1) ( N K )! The number of combinatio ns of N things taken K at a time is denoted N N! by K K !( N K )! E[ XY ] E[ XY ] xyf XY ( x, y )dxdy xyf X ( x) fY ( y )dxdy E[ X ]E[Y ] XY X Y – The expected result of the product of variables is equal to the product of the expected values if the variables are independent – Expected value of the product of random functions E[ g ( X )h(Y )] E[ g ( X )]E[h(Y )] – nth moment E[ X ] X x n f X ( x)dx n n – nth central moment ( X X ) ( x X ) n f X ( X )dx n – The nth central moment can be expressed as a function of n moments n k ( X X ) X ( X ) n k k 0 k n n n k ( X X ) X ( X ) n k k 0 k n n k X ( X ) n k k 0 k n n – First central moment = 0 (X X ) X X 0 – Second central moment => variance ( X X ) X 2 ( X )2 2 x 2 – Standard deviation (central moment) x X2 – Coefficient of variation CX X X • Covariance of two random variables X1 and X2 Cov (X1, X2) = E[(X1 – E[X1]) (X2 – E[X2])] var (X1 + X2) = var (X1)+var (X2) + 2Cov(X1, X2) Corr (X1, X2) = Cov (X1, X2) / (1 2) Normal Notation X ~ Nor ( , ) 2 Range X Parameters – Scale :0 Parameters – Shape : Normal Probability Density Function f (X ) e 1 X 2 2 2 1 X exp 2 f (X ) 2 2 Normal =10 =2 =10 =1 Normal =0 =2 =0 =1 =0 =1 Normal Normal Expected Value E (X ) Normal Variance V (X ) 2 Chebyshev Inequality P x X x 2 x 2 Strong Law of Large Numbers Wn Wn X 1 n n x i i 1 W n 2 X n 2 Strong Law of Large Numbers lim n Wn X Central Limit Theorem n Zn X i 1 X i nX n lim PZ n x x n Exponential Probability Density Function f ( X ) e X Distribution Function F ( X ) 1 e X Exponential • Inter arrival time of phone calls • Inter arrival time of web session • Duration of on and off periods for voice models Heavy-tailed distributions PZ x cx 0 2 x Heavy- Tailed distributions • Hyperbolic decay • Infinite variance 0 2 • Unbounded mean 0 1 • Network context 1 2 Pareto Notation X ~ Par( , ) Range X Parameters – Scale :0 Parameters – Shape :0 Pareto Distribution Function F ( x) 1 x Pareto Probability Density Function f (X ) X f (X ) X 1 1 Pareto =1 =1 =1 =2 Pareto =10 =5 =5 =10 =5 =10 Pareto Pareto Expected Value E (X ) 1 1 Pareto Moments Uncentered ' j j j j Pareto • Distribution of file size in Unix systems • Duration of on and off periods in data models (ethernet individual user) Weibull Notation X ~ Wei(b, c) Range 0 X Parameters – Scale b:0 b Parameters – Shape c:0 c Weibull Probability Density Function f ( X ) cb X e c c 1 ( X / b )c X cX f (X ) exp b b c 1 c c Weibull Distribution Function F ( x) 1 e ( X / b )c X F ( x) 1 exp b c Weibull b=1 c=1 b=2 c=1 Weibull b=1 c=2 b=2 c=2 Weibull b=10 c=5 b=5 c=10 b=25 c=10 Weibull Weibull Moments Uncentered ' b [(c j ) / c] j j c ' b c j j j j b 1 c j Weibull Expected Value c 1 1 b 1 E ( X ) b b 1 c c c c b E ( X ) [1 / c] c Weibull Variance 2 1 1 2 c c c b 1 V ( X ) 22 / c 1 / c c c 2 b V (X ) c 2 2 2 Lognormal Notation X ~ Logn ( , ) 2 Range 0 X Parameters – Scale : 0 or m : m 0 Parameters – Shape : 0 or w : w 0 Lognormal Probability Density Function 1 ln ( X ) exp 2 f (X ) 2 X 2 2 2 Lognormal Expected Value E( X ) e 1 2 2 1 exp 2 or E( X ) m w 2 Lognormal Variance V (X ) e 2 2 2 e 2 2 V ( X ) e e e 1 V ( X ) exp[2 ] exp[ ] exp[ ] 1 or V ( X ) m w( w 1) 2 2 2 2 2 2 Lognormal =0 =0.5 =0 =0.7 Lognormal =1 =0.5 =1 =0.7 Lognormal =0 =0.1 =1 =0.1 Lognormal =0 =1 =1 =1 =0 =1 Lognormal Lognormal • Multiplicative efffect II.4 Transforms, generating functions and characteristic function • Characteristic function of a random variable x(X(u)) is given by: X (u) Ε[e juX ] e jux f X ( x)dx j 1 – u – real variable X (u) e jux f X ( x) dx e juX 1 X (u) f X ( x)dx X (u) 1 – Expanding ejux and integrating ( jux) 2 X (u) f X ( x)[1 jux ...]dx 2! ( ju) 2 2 1 ju X x ... 2! X( 0 ) 1 d n X (u) n n j X du n u 0 n d – Notation g ( n ) ( x0 ) g(x) dx n x x 0 X( n)( 0 ) j ( n) X n – Moment generation function M x (v) E[e vX ] e vx f x ( x)dx M X( n ) (0) X n • Laplace transform of the pdf – Notation: A( x) P[ X x] a( x) PDF p.d . f . A* ( x) Transform A ( s ) E[e sX ] * e sx a ( x)dx A*(n ) (0) (1) n X n x ( sj ) MX ( s ) A* ( s ) X n j n X( n ) (0) X n M X( n ) (0) X n (1) n A*(n ) (0) – Example e x f X ( x) a( x) 0 x (u ) x0 x0 ju M X (v ) v * A ( s) s X (0) M X (0) A* (0) 1 X 1 X 2 2 2 – Probability generating function – discrete variable G ( z ) E[ z X ] z k g k k G (1) (1) X G ( 2) (1) X 2 X G (1) 1 – Sum of n independent variables u xi , Y X i i 1 ju i1 X i juY Y (u ) E[e ] E e E[e juX1 e juX 2 e juX n ] u Y (u ) E[e juX ]E[e juX ] E[e juX ] Y (u ) X (u ) X (u ) X (u ) 1 n 2 1 2 – xi – Identically distributed Y (u ) [ X (u )]n n – Sum of independent variables Y X1 X 2 X n n2 Y 2 ( X 1 X 2 ) 2 X 12 2 X 1 X 2 X 22 (Y ) 2 ( X 1 X 2 ) 2 ( X 1 ) 2 2 X 1 X 2 ( X 2 ) 2 Y2 Y 2 (Y ) 2 X 12 ( X 1 ) 2 X 22 ( X 2 ) 2 2( X 1 X 2 X 1 X 2 ) x21 x22 2( X 1 X 2 X 1 X 2 ) – x1 and x2 independent X1 X 2 X1 X 2 Y2 X2 X2 1 2 – The variance of the sum of independent random variables is equal to the sum of variances Y2 X2 X2 X2 1 2 n – Variable sum of independent variables and the number of variables is a random variable N Y Xi i 1 – Where N: is a random variable with mean N and variance X2 – [ Xi ] is independent and identically distributed – N and [ Xi ] independent – FY(y) - Compound distribution i1 Y ( s ) E e n s i1 X i E e P[ N n] n 0 N s X i * E[e sX1 ] E[e sXn ]P[ N n] n 0 – [ Xi ] - identically distributed variables Y ( s ) [ X * ( s )]n P[ N n] n 0 * z - transform for N Y * ( s ) N ( X * ( s )) Y NX Y2 N X2 ( X ) 2 N2 II.6. Stochastic process – To each point of the sample process space S a time function x is associated => Stochastic process family PDF : FX ( x, t ) P[ X (t ) x] FX ( x, t ) FX 1 X 2 X n ( x1 , x2 , , xn ; t1 , t 2 , t n ) P[ X (t1 ) x1 , X (t 2 ) x2 , X (t n ) xn ] FX ( X ; t ) FX ( X ; t ) pdf . FX ( x; t ) f X ( X , t) X X (t ) E[ X (t )] xf X ( x; t )dx – Autocorrelation: RXX (t1 , t2 ) E[ X (t1 ) X (t2 )] x1 x2 f X1 X 2 ( x1 , x2 ; t1t2 )dx1dx2 – Wide sense stationary process X (t ) X RXX (t1 , t 2 ) RXX (t 2 t1 )