Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ELEMENTS OF PROBABILTY … A. N. Kolmogorov (1933) + … ( , B , P) Carlos Mana Benasque-2014 Astrofísica y Física de Partículas CIEMAT ( ) 1) Sample Space Event: Object of questions that we make about the result of the experiment such that the possible answers are: “it occurs” or “it does not occur” elementary = those that can not be decomposed in others of lesser entity Sample Space: {Set of all the possible elementary results of a random experiment} The elementary events have to be: exclusive: if one happens, no other occurs exhaustive: any possible elemental result has to be included in {ek } is a partition of sure: impossible: random event: ek k get any result contained in to get a result that is not contained in ek e j 0 ; k , j k j any event that is neither impossible nor sure dim( ) Finite drawing a die e1 , e2 , e3 , e4 , e5 , e6 denumerable throw a coin and stop when we get head c, xc, xxc, xxxc, non-denumerable decay time of a particle t 2) Measurable Space B Why algebra ( , B ) Sample Space -algebra (Boole) B ? Class closed under complementation and denumerable union We are interested in a class of events that: 1) Contains all possible results of the experiment on which we are interested 2) Is closed under union and complementation A1 , A2 B A1 B ; A1 A2 B B ; 0 B ; A1 A2 B ; A1 A2 B ; A1 A2 B ; So now: 1) 2) B has all the elementary events (to which we shall assign probabilities) has all the events we are interested in (deduce their probabilities) e1 , e2 , e3 , e4 , e5 , e6 EXAMPLE: Several possible algebras: Minimal: Bmin E,0 Interest in evenness: A={get an even number} Ā={get an odd number} B E,0, A, A Maximal: Bmax E,0 , all possible subsets of dim ( ) n n Subsets with k elements k n 0 1 0 0 : n : 1 n n k 0 n n 2 k elements dim( ) finite dim( ) denumerable B has the structure of Boole algebra Generalize the Boole algebra such that and can be performed infinite number of times resulting on events of the same class (closed) A i i 1 B A B i i 1 Ai B ,... i 1 A B Ac B B has structure of σ-algebra Remember that: 1) All -algebras are Boole algebras 2) Not all Boole algebras are -algebras dim( ) non-denumerable In general, in non-denumerable topological spaces there are subsets that can not be considered as events Which are the “elementary” events? We are mainly interested in R n R : linear set of points Among its possible subsets are the intervals points (degenerated interval) a a, a R , a, a a, a a, a 0 any interval, denumerable or not, is a subset of R …but… finite or denumerable of intervals is an interval finite or denumerable of intervals is not, in general, an interval Generate a σ-algebra, for instance, from half open intervals on the right 1) Initial Set (Ω): Contains all half-open intervals on the right a, b 2) Form the set A by adding their denumerable unions and complements a, b a 1 n , b a, b a, b 1 n a, b a, b 1 n a a, a n 1 n 1 n 1 It has all intervals and points (degenerated intervals) 3.1) There is at least one σ-algebra containing A (add sets to close under denumerable union and complementation) 3.2) The intersection of any number of σ-algebras is a σ-algebra There exists a smallest σ-algebra containing A Borel σ-algebra ( BR ) : Minimum σ-algebra of subsets of (May do with (a, b], (a, b), [a, b] as well) R generated by a, b Its elements are Borel sets (borelians) Ω: intervals assigns P to intervals N , Z , Q BR ( , B , ) Set function : A B R 3) Measure Space 3.1) Measure of sets of B i) -additive For any succession of disjoint A1 , A2 ,; Ai A j 0 i , j 1 i j ii) Non-negative : A B ( A) 0 Ai ( Ai ) i 1 i 1 ►Measure Space 3.2) Probability Measure (notation P ( Ac ) 1 ( A),... ) : A B 0,1 R ( ) 1 (univoque) ( , B , ) (certainty) ►Probability Space ( , B , P) Remember that: Any bounded measure can be converted in a probability measure All Borel Sets of Rn are Lebesgue Measurable There are non-denumerable subsets of R with zero Lebesgue measure (Cantor Ternary Set) If axiom of choice (Solovay 1970) not all subsets of R are measurable are not Borel (Ex.: Vitali’s set) Random variables Results of the experiment not necessarily numeric,… Associate to each elemental event of the sample space Ω one, and only one, real number through a function (misfortunately called “random variable”) X (w) : w X (w) R Induced Space X (, B) (I , BI ) (, B, P) (I , BI , PI ) X 1 ( A) B; A BI Usually: ( EI , BI ) ( R, BR ) P( X k ) or P( X (a, b)) To keep the structure of the σ-algebra it is necessary that X(w) be Lebesgue (…Borel) measurable 1 X ( A) B; A BR (f(w):ΩΔ Borel measurable measurable wrt the σ-algebra associated to Ω) Is neither variable nor random What is random is the outcome of the experiment before it is done; our knowledge on the result before observation,… EXAMPLE: Interest in evenness: e1 , e2 , e3 , e4 , e5 , e6 A={get an even number} Ā={get an odd number} X (w) : w X (w) R 1) X (e1 ) X (e2 ) X (e3 ) 1 X (e4 ) X (e5 ) X (e6 ) 1 2) X (e1 ) X (e3 ) X (e5 ) 1 X (e2 ) X (e4 ) X (e6 ) 1 B E,0, A, A 1 X (, a] a 1 0 B 1 a 1 {e1 , e2 , e3} B 1 a B a 1 0 B 1 a 1 {e2 , e4 , e4} A B 1 a B Types of Random Quantities According to the Range of X(w) … (, B, Q) X (w) : w R ( R, BR , P) finite or denumerable set non-denumerable set Discrete Continuous ► simple random quantity: Ak ; k 1,, n finite partition of simple function: n X ( ) xk 1Ak ( ) X xk R; k 1,, n R k 1 countable partition of elementary function: X ( ) X xk R; k 1, R R P( X A) 1A ( )dP( ) R dP( ) f ( )dw xk 1Ak ( ) A k 1 P( X xk ) P( Ak ) 1Ak ( )dQ( ) P( X A) 1A ( )dP( ) ► X ( ) absolutely continuous: ► elementary random quantity: Ak ; k 1, X R non denumerable set A (Radon-Nikodym Theorem) ► singular Radon-Nikodym Theorem (1913;1930) , If two σ-finite measures over the measurable space f (w) absolutely continuous : ( A) 0 A 0 measurable function over B with range in 0, (non-negative) unique (if g (w) same properties as f (w) such that A B (w) x | f ( x) g ( x) 0 ) dv d ( w) f ( w) d ( w) d A A if (, B, Q) f (w) then X (w) : w R Q A Lebesgue measure A B A dv( w) A Probability density function: (, B) p( x) dx BR X 1 ( ) A ( R, BR , P) Remember that: The set of points of R with finite probabilities is denumerable Set of points of R with finite probabilities Wk partition of W x R P( x) 0 W WK W1 x 1 P( x) 1 2 W2 x 1 P( x) 1 3 2 K 1 WK If x W, then x some Wk If x Wk , then x W Each set Wk has, at most, k points for otherwise … x 1 P( x) 1 k …k 1 P (x ) 1 k i i W if the denumerable union of finite sets P( x ) 1 i i If X is AC W is a denumerable set so if it is ∞-denumerable, it is not possible that all the points have the same probability ([a]) 0 P( X a) 0 but {X=a} is not an impossible result DISTRIBUTION FUNCTION Distribution Function F : x X R R Gen.Def: One-dimensional DF 1) Continuous on the right: lim 0 F ( x ) F ( x) ; x R 2) Monotonous non-decreasing: if x1 , x2 R and x1 x2 F ( x1 ) F ( x2 ) lim x F ( x) 0 ; lim x F ( x) 1 3) Limits: F ( x 0 ) F ( x) F () 0 ; F () 1 Distribution Function of a Random Quantity X w Def.- DF associated to the Random Quantity X is the function F ( x) P( X x) P X , x ; x R The Distribution Function of a Random Quantity has all the information needed to describe the properties of the random process. Properties F ( x) P( X x) P X , x ; x R P( X x) F ( x ) P( X x) 1 P X x 1 F ( x) P( x1 X x2 ) P X x1, x2 F ( x2 ) F ( x1 ) Properties of the DF (some) x R X takes values in a, b R DF defined If 0 x a F ( x) 1 x b Set of points of discontinuity of the DF is finite or denumerable F (x ) D x R / F ( x ) F ( x ) 1) monotonous non-decreasing F ( x ) F ( x ) 2) monótona x D creciente r ( x) Q such that F ( x ) r ( x) F ( x ) r ( x1 ) F ( x1 ) F ( x2 ) r ( x2 ) 3) x1 , x2 D / x1 x2 r (x) associated is different for each x x D r ( x) Q F (x ) is a one-to-one relation At each point of discontinuity … P( x X x) P X x , x F ( x ) F ( x) lim 0 F ( x) F ( x ) P( X x) F (x) has a jump of amplitude P( X x) Probability Measure Distribution Function For each DF there exists a unique probability measure defined over Borel Sets that assigns the probability F ( x2 ) F ( x1 ) to each half-open interval x1 , x2 R Reciprocally, to each probability measure defined on the measurable space (, B) , corresponds a DF Random Quantity discrete continuous Distribution Function singular or absolutely continuous Discrete Random Quantity X (, B, Q) X (w) : R Range of X (w) : X x1 , x2 ,... with probabilities p1, p2 ,... takes values R finite or P( X xi ) pi pk DF: X ( R, BR , P) denumerable set real, non-negative and p k k F ( x) P( X x) pk 1( , x ] ( xk ) F () 0 ; F () 1 k 1) Step-wise and monotonous non-decreasing 2) Constant everywhere but on points of discontinuity where it has a jump F ( xk ) F ( xk ) P( X xk ) pk 1 EXAMPLE: X 1,2,... pk P ( X k ) 1 p 1 i F ( x) P( X x) pk 1( , x ] ( xk ) k 2k i F ( xk ) F ( xk ) P( X xk ) pk F () 0 ; F () 1 Continuous Random Quantity (, B, Q) X (w) : R Range of X (w) : F (x) X ( R, BR , P) R non-denumerable set F ( x ) F ( x) F ( x ) F ( x) P( X x) F ( x) 0 dP P( A) dP d p( w)dw d A A A continuous everywhere in R AC: Radon-Nikodym Lebesgue measure x p(x) Probability Density Function P( X x) F ( x) p(u)du 1) p( x) 0 ; x R 2) bounded in every bounded interval of 3) dF ( x) p ( x) dx p( x)dx 1 R uniquely a.e. and Riemann integrable on it EXAMPLE: 1 1 p ( x) 1 x2 x F ( x) p(u )du 1 1 arctan( x) 2 EXAMPLE: X ~ Cs(0,1) Xn X n Example n 1 3 supp[ X n ] 0,2 1 P( X n 0) P( X n 2) 2 F ( x) P( X x) x F ( x) p(u )du General Distribution Function (Lebesgue Decomposition) ND NC NS F ( x) ai Fi ( x) b j F ( x) ak F ( x) D i 1 discrete Step Function (simple or elementary) with denumerable number of jumps P( X xn ) P( X x ) 1 n n (Poisson, Binomial,…) AC j j 1 k 1 Abs. continuous D k Singular x F ( x) p(u )du p( x) F ( x) F (x) continuous F ( x) 0 almost everywhere almost everywhere pdf: p( x) | p( x)dx 1 (Normal, Gamma,…) (Dirac Delta, Cantor,…) CONDITIONAL PROBABILITY and BAYES THEOREM Given a probability space (, B, P) ● The information assigned to an event depends on the information we have A B Two consecutive extractions without replacement: What is the probability to get a red ball in the second extraction? 1) I do not know the outcome of the first : P(r)=1/2 2) It was black: P(r)=2/3 All probabilities are conditional Conditional Probability Statistical Independence E BB Consider (, B , P) and two not disjoint sets A, B B A B 0 P( A) P( A E ) P( A B) P( A B ) Probability to happen A and B A and not B P( A, B) P( A, B ) What is the probability for A to happen if we know that B has already occured? P ( A B) P( A B) C P( A B) P( B B) 1 C P( B B) C P( B) P( A, B) P( A B) P( B) C 1 P( B) Notation: P( A B C ) P( A, B, C ,) P( B) 0 Generalization: P( A1 , A2 ,, An ) P( A1 A2 ,, An ) P( A2 ,, An ) P( A1 A2 ,, An ) P( A2 A3 ,, An )P( An ) n ! possible arrangements P( A1 , A2 ,, An ) P( A2 A1 ,, An ) P( A1 A3 ,, An )P( An ) Statistical Independence P( A B) P( A) A does not depend on B That B has already happened does not change the probability of ocurrence of A P( A B) P( A) Correlation : P( A B) P( A) : P( A B) P( A) Caution ! For a finite collection of n events are independent iff: for each subset A A1 , A2 ,, An B P( Ap ,, Am ) P( Ap ) P( Am ) Ap ,, Am A P( Ai , A j ) P( Ai ) P( A j ) P( Ai , A j , Ak ) P( Ai ) P( A j ) P( Ak ) i, j 1, , n i, j , k 1,, n i j i jk Conditional dependence P( A B) P( A) A independent of B … …should say ”inconditionally” independent It may happen that Adepends on B through C P( A, B) P( A) P( B) but P( A, B C) P( A C) P( B C) Theorem of Total Probability Partition of the Sample Space Bk , k 1,n n Bj j 1 Bi B j 0 i j n n P( A) P( A ) P( A Bk ) P( A Bk ) k 1 k 1 n n P( A Bk ) P( A Bk ) P( Bk ) k 1 k 1 n n k 1 k 1 P( A) P( A, Bk ) P( A | Bk ) P( Bk ) Theorem of Total Probability with Conditional Probabilities P( A, B, C ) P( A | B, C ) P( B, C ) P( A | B, C ) P(C | B) P( B) P( A, B) P( A, B, C ) C P( A B) P( A C , B) P(C B) C Bayes Theorem The Reverend Thomas Bayes, F.R.S. (1701?-1761) LII. An Essay towards solving a Problem in the Doctrine of Chances. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M. A. and F. R. S. Dear Sir, Read Dec. 23, 1763. I now send you an essay which I have found among the papers of our deceased friend Mr. Bayes, and which, in my opinion, has great merit, and well deserves to be preserved. Experimental philosophy, you will find, is nearly interested in the subject of it; and on this account there seems to be particular reason for thinking that a communication of it to the Royal Society cannot be improper. … to find out a method by which we might judge concerning the probability that an event has to happen, in given circumstances, upon supposition that we know nothing concerning it but that, under the same circumstances, it has happened a certain number of times, and failed a certain other number of times. …some rule could be found, according to which we ought to estimate the chance that the probability for the happening of an event perfectly unknown, should lie between any two named degrees of probability, antecedently to any experiments made about it; … Common sense is indeed sufficient to shew us that, form the observation of what has in former instances been the consequence of a certain cause or action, one may make a judgement what is likely to be the consequence of it another time. P( A, B) P( A B) P( B) P( B A) P( A) Partition of the Sample Space P( B A) P( A B) P( B) P( A) H k , k 1,n Probability of occurrence of event A having occurred (cause, hypothesis) Hi Probability of occurrence of the event (cause, hypothesis) Hi “a priori”, before we know if event A has happened P( H i A) P( A H i ) P( H i ) P( A) i 1, n normalisation Probability (“a posteriori”) fo event Hi to happen having observed the occurrence of event (efect) A Probability that Hi be the cause (hypothesis) of the observed effect A Other forms of Bayes Theorem: Partition of the Sample Space H k , k 1,n BT+ Total Probability Theorem P( H i A) n P( A) P( A H k ) P( H k ) P( A H i ) P( H i ) n P( A H k 1 k 1 k ) P( H k ) + general hypothesis (H0) (all probabilities are condicional) P( H i A, H 0 ) P( A, H 0 ) P( A H i , H 0 ) P( H i H 0 ) P( H 0 ) n P( A H k 1 k ) P( H k H 0 ) P( H 0 ) … and many more to come… Marginal and Conditional Densities (Stieltjes-Lebesgue integral F ( x1, x2 ) P( X1 x1 , X 2 x2 ) x1 x2 F ( x1 , x2 ) dw1 p( w1 , w2 )dw2 p( x1 , x2 ) p( x2 ) p( x1 , x2 )dx1 p( x1 ) p( x1 , x2 )dx2 p( x1, x2 ) p( x2 | x1 ) p( x1 ) p( x1 | x2 ) p( x2 ) Def p( x1 , x2 ) p( x2 | x1 ) p( x1 ) Def p( x1 , x2 ) p( x1 | x2 ) p ( x2 ) ( p( x) 0) p( x2 | x1 ) p( x2 ) p( x1 | x2 ) p( x1 ) ) EXAMPLE: CAUSE-EFFECT Certain disease occurs in 1 out of 1000 individuals There is a diagnostic test such that: If a person is sic, gives positive in the 99% of the cases If a person is healthy, gives positive in 2% of the cases A person has given positive in the test. What are the chances that he is sic? Hypothesis o causes to analyze exclusive y exhaustive Data incidence in the population “a priori” probabilities if T denotes the event T={give positive in the test } H1 : is sic H 2 H1 : is healthy 1 P ( H1 ) 1000 P( H 2 ) 1 P( H1 ) 99 P(T H1 ) 100 P(T H1 ) A person has given positive. What are the chances that he is sic? P( H1 T ) ??? 2 100 999 1000 H1 : be sic T={give positive in the test } P ( H1 T ) Bayes Theorem: P(T H1 ) P( H1 ) P(T ) 2 Total Probability Theorem: P(T ) P(T H k ) P( H k ) k 1 99 1 100 1000 P ( H1 T ) 0.047 n 99 999 1 2 P(T ) P(T H k ) P( H k ) 100 1000 100 1000 P(T H1 ) P( H1 ) k 1 P(T H1 ) 0.99 !! The test is costly, agresive,… if a person gives positive… What are the chances that he is healty? P( H1 T ) 1 P( H1 T ) 0.953 The disease is serious… What are the chances to be sic giving negative in the test? P ( H1 T ) P(T H1 ) P( H1 ) 1 P(T ) (1 P(T H1 )) P( H1 ) 1 P(T ) 105 Probabilities of interest as function of known data and incidence of the disease in the population Incidence of the disease in the population P( H1 ) x Probability to be sic giving positive Probability to be healthy giving positive Probability to be sic giving negative P ( H1 T ) P( H1 T ) P ( H1 T ) P(T H1 ) x P(T H1 ) x P(T H1 ) (1 x) P(T H1 ) (1 x) P(T H1 ) x P(T H1 ) (1 x) P(T H1 ) x (1 P(T H1 )) x 1 P(T ) 1 P(T ) (1 P(T H1 )) x 1 P(T H1 ) x P(T H1 ) (1 x) From the given data: Interest to minimise P( H1 T ) P ( H 1 T ) maximize P( H1 T ) optimum eficiency correctly detected sic P( H1 T ) 99 x 2 97 x sic undetected P( H1 T ) P( H1 T ) healthy giving positive undergo costly and agresive treatment x 102 101x 2(1 x) 2 97 x x P( H1 ) incidence in the population Receiver Operating Characteristic (ROC curve) For each cut value c P( H , c) P ( H , c) P( H , c) P ( H , c) c P( H , c) F1 (c) f (u )du 1 c P( H , c) F2 (c) f 2 (u )du ( f ( x), x) random A 0.5 ( x, x ) 1 2 x F2 (c) f ( x) F1 ( F ( x)) 1 A f ( x)dx F1 ( F ( x)) dx 0 0 A F (u) f 1 2 P( H ) 1 2 u (u )du f 2 (u )du f1 ( x)dx d 2 ( f ( x) x) 2 max( d ) False positives Reverse selection criteria (0,0) (0,1) P( H ) True positives 1 (1,1) H True positive P(+|H) H optimum (0,1) P( H ) False Positives STOCHASTIC CHARACTERISTICS X ( ) : R Mathematical Expectation As(usual: ( R, B, P ) , B , Q) n X ( ) Discrete X ( ) xk 1Ak ( ) k 1 X ( ) xk 1Ak ( ) P( X xk ) P( Ak ) 1Ak ( )dP( ) R k 1 X ( ) Absolutely continuous P( X A) 1A ( )dP( ) dP( ) f ( )dw R A A Y g[ X ( )] Def.: Expectation: g ( x ) P( X x ) k E[Y ] E[ g ( X )] g[ X ( )]dP( ) R k g ( x)dF ( x) g ( x) f ( x)dx R (Stieltjes-Lebesgue integral k ) R Moments (wrt origin) n E[ X ] x p( x)dx n n mn R0 1 Mean: ► x n p( x) L1 ( R) n n mn E[ X ] xp( x)dx R Linear operator X c0 ci X i ci R i ► { X i }in1 independent Moments wrt point i E[ X ] c0 ci E[ X i ] i E[ X ] E[ X i ] i c R E[( X c) n ] ( x c) n p( x)dx R X Xi if 2n 0 min cR E[( X c) 2 ] c …Moments wrt Mean Moments wrt Mean n E[( X ) n ] ( x ) n p( x)dx R Variance: ► NOT Linear Skewness: Watch!! 2 V [ X ] E[( X ) 2 ] ( x ) 2 p( x)dx ( 0) Y c0 c1 X 3 1 3 R V [Y ] Y2 c12 X2 4 2 4 3 Kurtosis: x n p( x) L1 ( R) Poisson P( X k ) e k! k 0,1,2, k n n n X P( X k ) e k k! k 0 k 0 n 1 a 1 1 lim k 1 lim 1 0 k a k k k k Abs. Conv. k n 6 2n2 n 1,2, P ( X n) k E[ X k ]; k 1 Cauchy p ( x) | x n p( x) | dx 1 (1 x 2 ) n 1 (CPV for n 1) No moments (no mean, no variance,…) Global Picture 4 2 4 3 Peaked: Kurtosis 2 0 0 Normal 0 Dispersion: Variance Asymmetry: Skewness 3 1 3 0 Right 0 Symmetric 0 Left x0 xm Position: Mean: E[X ] ,… Mode: x0 sup x p( x) Median: F ( xm ) P( X xm ) 1 / 2 1 0 Mode < Median < Mean quantile: F ( x ) P( X x ) q 1 0 Mode > Median > Mean Covariance (and “Linear Correlation” ) V [ X1, X 2 ] E[( X1 1 )( X 2 2 )] E[ X1 X 2 ] 12 {X1 , X 2} V [ X1 , X 2 ] 0 independent V [ X1, X 2 ] 1 12 1 1 2 Holder inequality: | 12 | 1 Comments Linear relation: X 2 aX1 b Quadratic: X 2 a cX 2 1 12 1 if for X 1 2 is 1 , then 12 0 Useful expression: Taylor Expansion for the Variance of Y g ( X1 , X 2 ,) E ( X i ) i g g Y g ( X 1 , X 2 ) g ( 1 , 2 ) ( x1 1 ) ( x2 2 ) O( Dij2 ) x1 ( 1 ,2 ) x2 ( 1 ,2 ) E[Y ] E[ g ( X1 , X 2 )] g (1 , 2 ) O( Dij2 ) g g Y E[Y ] ( x1 1 ) ( x2 2 ) x1 ( 1 ,2 ) x2 ( 1 ,2 ) V [Y ] E Y E[Y ] Y2 2 2 2 g g g g V [ X 1 ] V [ X 2 ] 2 V [ X1 X 2 ] x1 ( 1 ,2 ) x2 ( 1 ,2 ) x1 x2 ( 1 ,2 ) 2 2 g g g 2 g 2 1 2 2 1 2 12 x1 ( 1 ,2 ) x2 ( 1 ,2 ) x1 x2 ( 1 ,2 ) (mind for the re-maind-er…) INTEGRAL TRANSFORMS Fourier (… Laplace) Transform Mellin Transform Fourier Transform (Characteristic Function) (t ) eixt f ( x)dx f L1 ( R) f :R C tR :t R C Inversion Theorem (Lévy,1925) Probability Density… (t ) E[e ] ixt Properties: Exists for all X ( ) ► (0) 1 ► bounded (t ) 1 ►Schwarz symmetry (t ) (t ) ►Uniformly continuous in R 0, | (t ) (t ) (all necessary but not sufficient) 1 itx p( x) e (t )dt 2 1 itxk e (t )dt Discrete: p( X xk ) 2 xk a bn a, b R, b 0, n Z Reticular: b p( X xk ) 2 /b itxk e (t )dt / b ► One-to-one correspondence between DF and CF ► Two DF with same CF are the same a.e. Useful Relations: ► Y g ( X ) Y (t ) E[e ] E[e iYt ► ] Y (t ) eiat X (bt ) Y a bX a, b R ► If { X i ~ pi ( xi )}in1 ig ( X )t are n independent random quantities it ( X 1 X n ) X X1 X n X (t ) E[e X X1 X 2 X (t ) 1 (t ) 2 (t ) 1 (t )2 (t ) If distribution of ] 1 (t ) n (t ) X is symmetric, then X (t ) is a real function X (t ) X (t ) X (t ) X (t ) Example X1 ~ Po(n1 | 1 ) X 2 ~ Po(n2 | 2 ) X X1 X 2 i (t ) e i (1eit ) X (t ) 1 (t )2 (t ) e 1 P( X n) 2 X: Discrete reticular: a=0, b=1 n/2 ( 1 2 ) ( 1eit 2e it ) S e z 2i C C:z e w ( z 1/ z ) ( n 1) 2 e 1/ 2 1 ; ( , ] 2 Pole of order n+1 at z=0 ( 12 ) n / 2 p Re s{ f ( z ), z 0] p 0 ( p n 1)( p 1) c n/2 dz 1 ( 1 2 ) P( X n) e I n (21 2 ) 2 Some Useful Cases: X k ~ Po( xk | k ) X k ~ N ( xk | k , k ) X X1 X n X ~ Po( x | S ) X ~ N ( x | S , S ) S 1 n S 1 n S2 12 n2 X k ~ Ca( xk | ak , bk ) X ~ Ca( x | aS , bS ) aS a1 an bS b1 bn X k ~ Ga( xk | a, bk ) X ~ Ga( x | a, bS ) bS b1 bn Moments of a Distribution (F-LT usually called “moment generating functions) k k k E[ X ] (i) k (t ) t t 0 (t ) E[e ] iXt (t1 ,, tn ) E[e Example X ~ Cs(0,1) i ( X 1t1 X 1t n ) ] kj ki kj ki k j ki E[ X i X j ] (i) ki k j (t1 ,, tn ) ti t j t ,,t 0 1 n Xn X n n 1 3 supp[ X n ] 0,2 P( X n 0) P( X n 2) 1 2 1 (t ) E[e ] 1 e 2it 2 iXt i 1 E[ X ] 2 2 3 3 1 2) (0) E[ X 2 ] 2 8 8 8 1) (0) f : R C Mellin Transform i M ( f ; s) f ( x) x dx s 1 0 s C Obviously, if exists … Probability Densities… lim f ( x) O( x ) lim f ( x) O( x ) 1 s f ( x) M ( f , s ) x ds 2i i M ( f ; s) E[ X s 1 ] Re(s) x 0 f L1 ( R ) Strip of holomorphy , Im x f1 ( x) e x f 2 ( x) e 1 M ( f1, 2 ; s) ( s) 0, 1,0 0 Re x 1 0 Re( ) , M ( f ; s) , Useful Relations: ► Y aX s 1 M Y (s) a M X (bs b 1) b a , b R, a 0 a 1, b 1 ► { X i ~ pi ( xi )} n i 1 Y X 1 M Y ( s ) M X (2 s ) X X1 X 2 X n M X ( s) M1 ( s) M n ( s) X X 1 X 21 M X ( s ) M1 ( s ) M 2 (2 s ) independent xi [0, ) Non- negative Example X1 ~ Ex( x1 | a1 ) X 2 ~ Ex( x2 | a2 ) X X1 X 2 ( s ) 2 M X ( s) (a1a2 ) s 1 ( s ) M i (s) s 1 ai c i a1a2 z 2 p ( x) ( a a x ) ( z ) dz 1 2 2i ci c0 c 0, real (a1a2 x) n 2(n 1) ln( a1a2 x) Re s{ f ( z ), zn n] 2 (n 1) Newman series… p( x) 2a1a2 K0 (2 a1a2 x ) Some Useful Cases: X1 ~ Ex( x1 | a1 ) X 2 ~ Ex( x2 | a2 ) X1 ~ Ga( x1 | a1, b1 ) X 2 ~ Ga( x2 | a2 , b2 ) X X1 X 2 X X1 X 2 ~ 2a1a2 K0 (2 a1a2 x ) ~ a1a2 (a2 a1 x) 2 2a a a2 2 (b1 b2 ) / 21 x ~ K v (2 a1a2 x ) (b1 )(b2 ) a1 b1 b2 1 2 b2 b1 0 obviously… p( x) p1 ( w) p2 ( x / w) 1 dw w 0 1 (b1 b2 ) a1 1 a2 2 x b1 1 ~ (b1 )(b2 ) (a1 a2 x)b1 b2 b b p( x) p1 ( xw) p2 ( w)dw 0 …Densities with support in R… X1 ~ N ( x1 | 0,1 ) X 2 ~ N ( x2 | 0, 2 ) a X X 1 X 2 ~ K 0 (a | x |) 1 a 1 2 DISTRIBUTIONS AND GENERALIZED FUNCTIONS Distribution Functional: T : ( x) D T , C D C`c Linear functional: T ,1 2 T ,1 T ,2 Continuous functional: {n } n T , n T , n f : R R Locally Lebesgue integrable (LLI) defines a distribution , C T D is a Distribution T f , f ( x) ( x)dx “regular” distributions (“singular” the rest) Some Basic Properties (see text for more) T , G D T G D ( ae) , C T G T , G, supp T supp 0 T , 0 D; D T , T , T , T , D pT , (1) p T , D p {Tn } n T iff Tn , n T , ~ ~ T , T , Fourier Transform SaT , T ( x a), T , ( x a) 1 PaT , T (ax), T , ( x ) a |a| Two examples , (0) ,1 2 1 (0) 2 (0) ,1 ,2 lim n | , n , | lim n | n (0) (0) | 0 Sa , a , , ( x a) (a) 1 1 Pa , ax , , ( x ) (0) a |a| |a| , , (0) n f n 1[ 1/ n,1/ n ] ( x) 2 H ( x) 1[0,) ( x) LLI: defines a distribution Tn , n , 0 TH , 1[0, ) ( x) ( x)dx ( x)dx H , H , (0) , D C`c Tempered Distributions “rapidly decreasing” (Schwartz Space) S { : R C | C and f ( x) e x EXAMPLE: EXAMPLE: EXAMPLE: admissible S S f ( x) e x 1(0,) ( x) | Tn , , | n 0 admissible C1 C 0 f ( x) R (a x 2 )m C f L1 ( R) n | x| x R x R n f n 1[ 1/ n,1/ n ] ( x) 2 lim | x D | 0 n, m N 0 } m …. for some m N 0 defines a Tempered Distribution T f Convergence to zero {n ( x)} n S max | x k D mn ( x) | n 0 xR T , n ( x) n 0 T k , m N0 Probability Distributions (, B , Q) F ( x) P( X x) Q( X 1 (, x]) X ( ) : R LLI: defines a distribution In general T is a Probability Distribution if: TF , T , 0 0 T ,1 1 ( x) 1 D, S , T does not have to be generated by a LLI function …but any LLI function defines a probability distribution if T f , f ( x) ( x)dx 0 ( x) 0 Probability Density “Distribution” T f ,1 f ( x)dx 1 T , TF , TF , Delta Distribution unifies discrete and AC random quantities: X ( ) Discrete: rec( X ) x1 , x2 , pk P( X xk ) Continuous: rec( X ) R n n (it ) n 0 n! (t ) (t ) e p( x)dx k 1 itx TD pk ( xk ) p( x)1A R ( x) CF: n 1 itx n 1 itx p( x) e (t )dt e (it ) dt 2 n 0 n! 2 1 n n n) itx Tp (1) ( x,0) delta distribution ( x,0) e dt n! 2 n 0 n n) n n n) Tp , (1) ( x,0), (0) n 0 n! TPROB TD (1 )Tp n 0 n! Example 1 F2 ( x) F2 (a) F1 (a) F1 ( x) F1 (0) 0 a b F ( x) F1 ( x)H ( x) H ( x a) F2 (a)H ( x a) H ( x b) F2 ( x) H ( x b) TF , Df , TF ,´ Df F1 (0) (0) F2 (a) F1 (a) (a) f1 ( x)1[0,a ) ( x) f 2 ( x)1[b,) ( x) a 0 b E X m Df , X m F1 (0) m0 F2 (a) F1 (a)a m f1 ( x) x m dx f 2 ( x) x m dx a 0 b 1 F1 (0) F2 (a) F1 (a) f1 ( x)dx f 2 ( x)dx LIMIT THEOREMS and CONVERGENCE General Problem: Find the limit behaviour of a sequence of random quantities Example: X1 X 2 1 n X1, X 2 ,..., X n ,... ,..., Z n X k ,... Z1 X 1 , Z 2 2 n k 1 How is Z n distributed when n ( ) ? ~“distance” convergence criteria 1) More or less strong, 2) May have convergence for some criteria and not for others Convergence in: Distribution Probability Almost Sure L p (R) Norm Uniform Logarithmic Central Limit Theorem Glivenko-Cantelli Theorem Weak Law of Large Numbers Strong Law of Lage Numbers Convergence in Quadratic Mean Glivenko-Cantelli Theorem Chebyshev Theorem Y g( X ) 0 X ~ F ( x) E[ g ( X )] P g ( X ) k k X 1 2 E[Y ] PY k ? 1 X g ( X ) k 2 X g ( X ) k g ( x)dF ( x) g ( x)dF ( x) 1 2 g( X ) 0 g( X ) k 0 kdF ( x) kP X kP( g ( X ) k ) 2 2 Bienaymé-Chebyshev Inequality X with finite mean and variance , 2 g( X ) ( X ) 2 1 P X k 2 k Convergence in Probability X1, X 2 ,..., X n ,... Let Def.: and F1 ( x1 ), F2 ( x2 ),..., Fn ( xn ),... X n converges in probability to X if, and only if lim n P X n ( x) X 0 ; 0 o, equivalently, lim(Prob) lim n P X n ( x) X 1 ; 0 Weak Law of Large Numbers (J. Bernouilli…) Let X1, X 2 ,..., X n ,... be r.q. with the same distribution and finite mean n X X 1 1 2 The sequence Z1 X 1 , Z 2 ,..., Z n X k ,... converges in Probability to 2 n k 1 lim n P Z n 0 ; 0 LLN in practice:… WLLN: When n is very large, the probability that Zn differs from μ by a small amount is very small Zn gets more and more concentrated around the real number μ But “very small “ is not zero: it may happen that for some k>n, Zk differs from μ by more than ε… Convergence Almost Sure Def.: Let X n converges “almost sure” to X X1, X 2 ,..., X n ,... if, and only if Plim n X n ( x) X 0 ; 0 o, equivalently, Prob(lim) Plim n X n ( x) X 1 ; 0 Strong Law of Large Numbers (E.Borel, A.N. Kolmogorov,…) Let X1, X 2 ,..., X n ,... be r.q. with the same distribution and finite mean n X X 1 1 2 The sequence Z1 X 1 , Z 2 ,..., Z n X k ,... converges Amost Sure to 2 n k 1 Plim n Z n ( x) 0 ; 0 LLN in practice:… WLLN: When n is very large, the probability that Zn differs from μ by a small amount is very small Zn gets more and more concentrated around the real number μ But “very small “ is not zero: it may happen that for some k>n, Zk differs from μ by more than ε… SLLN: as n grows, the probability for this to happen tends to zero Convergence in Distribution Let X1, X 2 ,..., X n ,... Def.: Xn F1 ( x1 ), F2 ( x2 ),..., Fn ( xn ),... and their corresponding tends to be distributed as X ~ F ( x) if, and only if lim n Fn ( x) F ( x) lim n P( X n x) P( X x) ; x C ( F ) o, equivalently, lim n n ( x) ( x) ; t R Central Limit Theorem (Lindberg-Levy,…) Sequence of independent r.q. X1, X 2 ,..., X n ,... same distribution finite mean and variance X1 X 2 1 n Form the sequence Z X , Z ,..., Z X ,... 1 1 2 n k 2 n k 1 easy dem. from CF How is Z n distributed when n ( ) ? 1 n Zn X k ~ N z , n n k 1 ~ Z Zn n ~ N z 0,1 n standarized , 2 Unifom DF 1 n Zn X k n k 1 n 1 n2 n5 n 10 n 20 n 50 Parabolic DF n 1 n2 1 n Zn X k n k 1 n5 n 10 n 20 n 50 Cauchy DF n 1 n2 1 n Zn X k n k 1 n5 n 10 n 20 n 50 fn , f : S R Uniform Convergence Def.: The sequence f n ( x)n1 converges uniformly to f (x) if, and only if lim n sup x f n ( x) f ( x) 0 xS (example of aplication of Chebishev Th.) Glivenko-Cantelli Theorem experiment e(1) e(n) one observation of {x1} x1 , x2 ,..., xn ,... X independent, identically distributed Empiric Distribution Function 1 n Fn ( x) I ( , x ] ( xk ) n k 1 If observations are iid: number of values xi lower or equal to x n lim n Plim n sup x Fn ( x) F ( x) 0 1 The Empiric Distribution Function converges uniformly to the Distribution Function F (x) of the r.q. X Demonstration of Convergence in Probability: 1 n Fn ( x) I ( , x ] ( xk ) n k 1 Empiric Distribution Function 1) A (, x0 ] Y I A (x) is a Bernouilli random quantity P(Y 1) P( X A) F ( x0 ) P(Y 0) P( X A) 1 F ( x0 ) with Characteristic Function Y (t ) eit F ( x0 ) 1 F ( x0 ) n 2) Sum of iid Bernouilli random quantities Z n I ( , x ] ( xk ) nFn ( x) k 1 W Z n nFn (x) is a Binomial r.q. Fn ( x) EW nF V W nF (1 F ) W n n k P(W k ) F (1 F ) nk k EFn F F (1 F ) V Fn n 3) Bienaymé-Chebyshev Inequality P X k 1 k2 P Fn ( x) F ( x) F (1 F ) n 2 Convergence in Lp(R) Norm Def.: X n (w)n1 Converges in L p (R) norm to if, and only if, X n ( w) L p ( R) ; n that is: X ( w) L p ( R) and X (w) lim n E X n ( w) X ( w) 0 p 0 n0 ( ) | n n0 ( ) E X n ( w) X ( w) p p=2 convergence in quadratic mean Logarithmic Convergence Kullback-Leibler “Discrepancy” (see Lect. on Information) Logarithmic Divergence of a pdf ~ p( x) ; x X from its true pdf A sequence p(x ) pi ( x) of i 1 p( x ) DKL [ ~ p | p] p( x ) log ~ dx p( x) X pdf “Converges Logarithmically” to a density p(x ) lim k DKL [ p | pk ] 0 iff