Download Statistics

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
ELEMENTS OF PROBABILTY
… A. N. Kolmogorov (1933) + …
( , B , P)
Carlos Mana
Benasque-2014
Astrofísica y Física de Partículas
CIEMAT
( )
1) Sample Space
Event:
Object of questions that we make about the result of the experiment such
that the possible answers are: “it occurs” or “it does not occur”
elementary = those that can not be decomposed in others of lesser entity
Sample Space:

{Set of all the possible elementary results of a
random experiment}
The elementary events have to be:
exclusive: if one happens, no other occurs
exhaustive: any possible elemental result has to be included in
{ek } is a partition of 
sure:
impossible:
random event:
   ek
k

get any result contained in
to get a result that is not contained in

ek  e j  0 ; k , j k  j

any event that is neither impossible nor sure
dim( ) 
Finite
drawing a die
  e1 , e2 , e3 , e4 , e5 , e6 
denumerable
throw a coin and stop when we get head
  c, xc, xxc, xxxc,
non-denumerable
decay time of a particle
  t  


2) Measurable Space

B
Why algebra
( , B )
Sample Space
-algebra (Boole)
B ?
Class closed under complementation
and denumerable union
We are interested in a class of events that:
1)
Contains all possible results of the experiment on which we are interested
2)
Is closed under union and complementation
A1 , A2  B  A1  B ; A1  A2  B
  B ; 0  B ; A1  A2  B ; A1  A2  B ; A1  A2  B ;
So now:
1)
2)

B
has all the elementary events (to which we shall assign probabilities)
has all the events we are interested in (deduce their probabilities)
  e1 , e2 , e3 , e4 , e5 , e6 
EXAMPLE:
Several possible algebras:
Minimal: Bmin  E,0 
Interest in evenness:
A={get an even number}
Ā={get an odd number}
B  E,0, A, A
Maximal: Bmax  E,0 , all possible subsets of 
dim ( )  n
n
  Subsets with k elements
k 
n
0     1
0
0 :
n
 :    1
n
n

k 0
n n
   2
k 
elements
dim( )
finite
dim( )
denumerable
B
has the structure of Boole algebra
Generalize the Boole algebra such that  and  can be performed
infinite number of times resulting on events of the same class (closed)
A 

i i 1
 B 

A B
i

i 1


  Ai  B ,...
 i 1

A  B  Ac  B
B
has structure of σ-algebra
Remember that: 1) All -algebras are Boole algebras
2) Not all Boole algebras are -algebras
dim(
 ) non-denumerable
In general, in non-denumerable topological
spaces there are subsets that can not be
considered as events
Which are the “elementary” events?
We are mainly interested in
R
n
R : linear set of points
Among its possible subsets
are the intervals
points
(degenerated interval)
a  a, a
  
R   , 
a, a  a, a  a, a  0
 
 

any interval,
denumerable or not,
is a subset of R
…but…

finite or denumerable of
intervals is an interval

finite or denumerable of
intervals is not, in general,
an interval
Generate a σ-algebra, for instance, from
half open intervals on the right
1) Initial Set (Ω):
Contains all half-open intervals on the right
a, b
2) Form the set A by adding their denumerable unions and complements
a, b   a  1 n , b a, b   a, b  1 n  a, b   a, b  1 n  a  a, a


n 1

n 1

n 1
It has all intervals and points (degenerated intervals)
3.1) There is at least one σ-algebra containing A
(add sets to close under denumerable union and complementation)
3.2) The intersection of any number of σ-algebras is a σ-algebra
There exists a smallest σ-algebra containing A
Borel σ-algebra
( BR ) :
Minimum σ-algebra of subsets of
(May do with (a, b], (a, b), [a, b] as well)
R generated by a, b
Its elements are Borel sets (borelians)
Ω: intervals  assigns P to intervals
N , Z , Q  BR
( , B ,  )
Set function
 : A  B  R
3) Measure Space
3.1) Measure
of sets of B
i) -additive For any succession of disjoint

A1 , A2 ,;
Ai  A j  0
i , j 1
i j
ii) Non-negative
 : A  B   ( A)    0

  
   Ai     ( Ai )
 i 1  i 1
►Measure Space
3.2) Probability Measure
(notation
P
 ( Ac )  1   ( A),...
)
 : A  B  0,1 R
 ( )  1
(univoque)
( , B ,  )
(certainty)
►Probability Space
( , B , P)
Remember that:
Any bounded measure can be converted in a probability measure
All Borel Sets of Rn are Lebesgue Measurable
There are non-denumerable subsets of R with zero Lebesgue measure (Cantor Ternary Set)
If axiom of choice (Solovay 1970) not all subsets of R are measurable  are not Borel
(Ex.: Vitali’s set)
Random variables
Results of the experiment not
necessarily numeric,…
Associate to each elemental event of the sample space Ω one, and
only one, real number through a function
(misfortunately called “random variable”)
X (w) : w    X (w)  R
Induced Space
X
(, B)  (I , BI )
(, B, P)  (I , BI , PI )
X 1 ( A)  B; A  BI
Usually:
( EI , BI )  ( R, BR )
P( X  k )
or
P( X  (a, b))
To keep the structure of the σ-algebra it is
necessary that X(w) be Lebesgue (…Borel)
measurable
1
X ( A)  B; A  BR
(f(w):ΩΔ Borel measurable measurable
wrt the σ-algebra associated to Ω)
Is neither variable nor random
What is random is the outcome of the experiment before
it is done; our knowledge on the result before
observation,…
EXAMPLE:
Interest in evenness:
  e1 , e2 , e3 , e4 , e5 , e6 
A={get an even number}
Ā={get an odd number}
X (w) : w    X (w)  R
1) X (e1 )  X (e2 )  X (e3 )  1
X (e4 )  X (e5 )  X (e6 )  1
2) X (e1 )  X (e3 )  X (e5 )  1
X (e2 )  X (e4 )  X (e6 )  1
B  E,0, A, A
1
X (, a]
a  1  0  B
 1  a  1  {e1 , e2 , e3} B
1 a   B
a  1  0  B
1  a  1  {e2 , e4 , e4}  A  B
1 a   B
Types of Random Quantities
According to the Range of X(w) …
(, B, Q)
X (w) : w    R
( R, BR , P)
finite or denumerable set
non-denumerable set
Discrete
Continuous
► simple random quantity:
Ak ; k  1,, n
finite partition of 
simple function:
n
X ( )   xk 1Ak ( )
 X  xk  R; k  1,, n  R
k 1
countable partition of 
elementary function: X ( ) 
 X  xk  R; k  1,  R
R
P( X  A)   1A ( )dP( ) 
R
  dP( )  f ( )dw

 xk 1Ak ( )
A
k 1
P( X  xk )  P( Ak )   1Ak ( )dQ( )

P( X  A)   1A ( )dP( )
► X ( ) absolutely continuous:
► elementary random quantity:
Ak ; k  1,
 X  R non  denumerable set
A
(Radon-Nikodym Theorem)
► singular
Radon-Nikodym Theorem (1913;1930)
,
If
two σ-finite measures over the measurable space
  
 f (w)
absolutely
continuous :  ( A)  0   A  0
measurable function over B
with range in 0,   (non-negative)
unique (if g (w) same properties as f (w)
such that A B
 (w)
x | f ( x)  g ( x)  0 )
dv
d ( w)   f ( w) d ( w)
d
A
A
 if
(, B, Q)
 f (w) then    
X (w) : w    R
Q  A 
Lebesgue measure
A  B
  A   dv( w)  
A
Probability density function:
(, B)
 p( x) dx
BR X 1 (  )  A
( R, BR , P)
Remember that:
The set of points of R with finite probabilities is denumerable
Set of points of R with finite probabilities
Wk  partition of
W  x  R P( x)  0


W  WK

W1  x   1  P( x)  1
2
W2  x   1  P( x)  1
3
2
K 1
WK
If x W, then x  some Wk
If x Wk , then x W
Each set Wk has, at most, k points for otherwise
 …

 x   1
 P( x)  1 
k
…k  1
 P (x )  1
k
i
i
W if the denumerable union of finite sets
 P( x )  1
i
i
If X is AC
W is a denumerable set
so if it is ∞-denumerable, it is not possible that all the points have the
same probability
 ([a])  0
P( X  a)  0 but {X=a} is not an impossible result
DISTRIBUTION FUNCTION
Distribution Function
F : x   X  R  R
Gen.Def: One-dimensional DF
1) Continuous on the right:
lim  0 F ( x   )  F ( x) ; x  R
2) Monotonous non-decreasing:
if
x1 , x2  R and x1  x2
 F ( x1 )  F ( x2 )
lim x F ( x)  0 ; lim x F ( x)  1
3) Limits:

F ( x  0 )  F ( x)
F ()  0 ; F ()  1
Distribution Function of a Random Quantity X w
Def.- DF associated to the Random Quantity
X is the function
F ( x)  P( X  x)  P X   , x ; x  R
The Distribution Function of a Random Quantity has all the
information needed to describe the properties of the random
process.
Properties
F ( x)  P( X  x)  P X   , x ; x  R
P( X  x)  F ( x   )
P( X  x)  1  P X  x  1  F ( x)
P( x1  X  x2 )  P X  x1, x2   F ( x2 )  F ( x1 )
Properties of the DF (some)
x R
X takes values in a, b R
DF defined
If
0 x  a
F ( x)  
1 x  b
Set of points of discontinuity of the DF is finite or denumerable
F (x   ) 
D  x  R / F ( x   )  F ( x   )
1) monotonous non-decreasing F ( x   )  F ( x   )
2) monótona
x  D  creciente
r ( x)  Q such that F ( x   )  r ( x)  F ( x   )
r ( x1 )  F ( x1   )  F ( x2   )  r ( x2 )
3) x1 , x2  D / x1  x2
r (x) associated is different for each x
x  D  r ( x)  Q
F (x   )
is a one-to-one relation
At each point of discontinuity …
P( x    X  x)  P X  x   , x  F ( x   )  F ( x)
lim  0 F ( x)  F ( x   )  P( X  x)
F (x)
has a jump of amplitude
P( X  x)
Probability
Measure
Distribution
Function
For each DF there exists a unique probability measure defined
over Borel Sets that assigns the probability F ( x2 )  F ( x1 ) to
each half-open interval x1 , x2  R
Reciprocally, to each probability measure defined on the
measurable space (, B) , corresponds a DF
Random Quantity
discrete
continuous
Distribution Function
singular or absolutely continuous
Discrete Random Quantity
X (, B, Q)
X (w) :   R
Range of X (w) : 
 X  x1 , x2 ,...
with probabilities
p1, p2 ,...
takes values
 R finite or
P( X  xi )  pi
pk
DF:
X
( R, BR , P)
denumerable set
real, non-negative and
p
k
k
F ( x)  P( X  x)   pk 1(  , x ] ( xk )
F ()  0 ; F ()  1
k
1) Step-wise and monotonous non-decreasing
2) Constant everywhere but on points of discontinuity where it has a jump
F ( xk )  F ( xk  )  P( X  xk )  pk
1
EXAMPLE:
 X  1,2,...
pk  P ( X  k )  1
 p 1
i
F ( x)  P( X  x)   pk 1(  , x ] ( xk )
k
2k
i
F ( xk )  F ( xk  )  P( X  xk )  pk
F ()  0 ; F ()  1
Continuous Random Quantity
(, B, Q)
X (w) :   R
Range of X (w) : 
F (x)
X
( R, BR , P)
 R non-denumerable set
F ( x   )  F ( x)
F ( x   )  F ( x)  P( X  x)  F ( x)
0
dP
P( A)   dP   d   p( w)dw
d
A
A
A
continuous everywhere in R
AC: Radon-Nikodym
 Lebesgue measure
x
p(x)
Probability Density Function P( X  x)  F ( x) 
 p(u)du

1)
p( x)  0 ; x  R
2) bounded in every bounded interval of

3)
dF ( x)
p ( x) 
dx
 p( x)dx  1

R
uniquely a.e.
and Riemann integrable on it
EXAMPLE:
1 1
p ( x) 
 1 x2
x
F ( x)   p(u )du 

1 1
  arctan( x)
2 
EXAMPLE:

X ~ Cs(0,1)
Xn
X  n
Example
n 1 3
supp[ X n ]  0,2
1
P( X n  0)  P( X n  2) 
2
F ( x)  P( X  x)
x
F ( x)   p(u )du

General Distribution Function (Lebesgue Decomposition)
ND
NC
NS
F ( x)   ai Fi ( x)   b j F ( x)   ak F ( x)
D
i 1
discrete
Step Function
(simple or elementary)
with denumerable number
of jumps
P( X  xn )
 P( X  x )  1
n
n
(Poisson,
Binomial,…)
AC
j
j 1
k 1
Abs. continuous
D
k
Singular
x
F ( x)   p(u )du

p( x)  F ( x)
F (x)
continuous
F ( x)  0 almost
everywhere
almost everywhere

pdf: p( x) |  p( x)dx  1

(Normal,
Gamma,…)
(Dirac Delta,
Cantor,…)
CONDITIONAL PROBABILITY and
BAYES THEOREM
Given a probability space
(, B, P)
● The information assigned to an event
depends on the information we have
A B
Two consecutive extractions without replacement:
What is the probability to get a red ball in the second extraction?
1) I do not know the outcome of the first : P(r)=1/2
2) It was black: P(r)=2/3
All probabilities are conditional
Conditional Probability
Statistical Independence
E  BB
Consider (, B , P)
and two not disjoint sets
A, B  B
A  B  0
P( A)  P( A  E )  P( A  B)  P( A  B )
Probability to happen A and B
A and not B
 P( A, B)  P( A, B )
What is the probability for A to happen
if we know that B has already occured?
 P ( A B)
P( A B)  C  P( A  B)
P( B B)  1  C  P( B  B)  C  P( B)
P( A, B)
P( A B) 
P( B)
C 1  P( B)
Notation: P( A  B  C  )  P( A, B, C ,)
P( B)  0
Generalization: P( A1 , A2 ,, An ) 
 P( A1 A2 ,, An ) P( A2 ,, An ) 
 P( A1 A2 ,, An ) P( A2 A3 ,, An )P( An )
n ! possible arrangements
P( A1 , A2 ,, An )  P( A2 A1 ,, An ) P( A1 A3 ,, An )P( An )
Statistical Independence
P( A B)  P( A)
A does not depend on B
That B has already happened does not change the
probability of ocurrence of A
P( A B)  P( A)
Correlation
 : P( A B)  P( A)
 : P( A B)  P( A)
Caution !
For a finite collection of n events
are independent iff:
for each subset
A  A1 , A2 ,, An   B
P( Ap ,, Am )  P( Ap )  P( Am )
Ap ,, Am  A
P( Ai , A j )  P( Ai ) P( A j )
P( Ai , A j , Ak )  P( Ai ) P( A j ) P( Ak )

i, j  1,  , n
i, j , k  1,, n
i j
i jk
Conditional dependence
P( A B)  P( A)
A independent of B …
…should say ”inconditionally” independent
It may happen that Adepends on B through C
P( A, B)  P( A) P( B) but P( A, B C)  P( A C) P( B C)
Theorem of
Total Probability
Partition of the Sample Space
Bk , k  1,n
n
   Bj
j 1
Bi  B j  0
i j
n
n

P( A)  P( A  )  P( A    Bk )  P(  A  Bk ) 
k 1
k 1 
n
n
  P( A  Bk )   P( A Bk ) P( Bk )
k 1
k 1
n
n
k 1
k 1
P( A)   P( A, Bk )   P( A | Bk ) P( Bk )
Theorem of Total Probability with Conditional Probabilities
P( A, B, C )  P( A | B, C ) P( B, C )  P( A | B, C ) P(C | B) P( B)
P( A, B)   P( A, B, C )
C
P( A B)   P( A C , B)  P(C B)
C
Bayes Theorem
The Reverend Thomas Bayes, F.R.S.
(1701?-1761)
LII. An Essay towards solving a Problem in the Doctrine of Chances. By the late Rev. Mr.
Bayes, communicated by Mr. Price, in a letter to John Canton, M. A. and F. R. S.
Dear Sir,
Read Dec. 23, 1763. I now send you an essay which I have found among the papers of our deceased
friend Mr. Bayes, and which, in my opinion, has great merit, and well deserves to be preserved.
Experimental philosophy, you will find, is nearly interested in the subject of it; and on this account
there seems to be particular reason for thinking that a communication of it to the Royal Society cannot
be improper.
… to find out a method by which we might judge concerning the probability that an event has to
happen, in given circumstances, upon supposition that we know nothing concerning it but that, under
the same circumstances, it has happened a certain number of times, and failed a certain other
number of times.
…some rule could be found, according to which
we ought to estimate the chance that the
probability for the happening of an event perfectly
unknown, should lie between any two named
degrees of probability, antecedently to any
experiments made about it; …
Common sense is indeed sufficient to shew us
that, form the observation of what has in
former instances been the consequence of a
certain cause or action, one may make a
judgement what is likely to be the consequence
of it another time.
P( A, B)  P( A B)  P( B)  P( B A)  P( A)
Partition of the Sample Space
P( B A) 
P( A B)  P( B)
P( A)
H k , k  1,n
Probability of occurrence of event
A having occurred (cause,
hypothesis) Hi
Probability of occurrence of the event
(cause, hypothesis) Hi “a priori”, before
we know if event A has happened
P( H i A) 
P( A H i ) P( H i )
P( A)
i  1,  n
normalisation
Probability (“a posteriori”) fo event Hi to happen having observed the
occurrence of event (efect) A
Probability that Hi be the cause (hypothesis) of the observed effect A
Other forms of
Bayes Theorem:
Partition of the Sample Space
H k , k  1,n
BT+ Total Probability Theorem
P( H i A) 
n
P( A)   P( A H k ) P( H k )
P( A H i ) P( H i )
n
 P( A H
k 1
k 1
k
) P( H k )
+ general hypothesis (H0) (all probabilities are condicional)
P( H i A, H 0 ) 
P( A, H 0 )
P( A H i , H 0 ) P( H i H 0 ) P( H 0 )
n
 P( A H
k 1
k
) P( H k H 0 ) P( H 0 )
… and many more to come…
Marginal and Conditional Densities
(Stieltjes-Lebesgue integral  
F ( x1, x2 )  P( X1  x1 , X 2  x2 )
x1
x2
F ( x1 , x2 )   dw1  p( w1 , w2 )dw2



p( x1 , x2 )

p( x2 )   p( x1 , x2 )dx1
p( x1 )   p( x1 , x2 )dx2


p( x1, x2 )  p( x2 | x1 ) p( x1 )  p( x1 | x2 ) p( x2 )
Def
p( x1 , x2 )
p( x2 | x1 ) 
p( x1 )
Def
p( x1 , x2 )
p( x1 | x2 ) 
p ( x2 )
( p( x)  0)
p( x2 | x1 )  p( x2 )
p( x1 | x2 )  p( x1 )
)
EXAMPLE: CAUSE-EFFECT
Certain disease occurs in 1 out of 1000 individuals
There is a diagnostic test such that:
If a person is sic, gives positive in the 99% of the cases
If a person is healthy, gives positive in 2% of the cases
A person has given positive in the test. What are the chances that he is sic?
Hypothesis o causes to analyze
exclusive y exhaustive
Data
incidence in the population
“a priori” probabilities
if T denotes the event
T={give positive in the test }
H1 :
is sic
H 2  H1 :
is healthy
1
P ( H1 ) 
1000
P( H 2 )  1  P( H1 ) 
99
P(T H1 ) 
100
P(T H1 ) 
A person has given positive.
What are the chances that he is sic?
P( H1 T )  ???
2
100
999
1000
H1 : be sic
T={give positive in the test }
P ( H1 T ) 
Bayes Theorem:
P(T H1 ) P( H1 )
P(T )
2
Total Probability Theorem:
P(T )   P(T H k ) P( H k )
k 1
99
1
100
1000
P ( H1 T ) 

 0.047
n
99
999
1
2

P(T )   P(T H k ) P( H k )
100 1000 100
1000
P(T H1 ) P( H1 )
k 1
P(T H1 )  0.99
!!
The test is costly, agresive,… if a person gives positive…
What are the chances that he is healty?
P( H1 T )  1  P( H1 T )  0.953
The disease is serious…
What are the chances to be sic giving negative in the test?
P ( H1 T ) 
P(T H1 ) P( H1 )
1  P(T )

(1  P(T H1 )) P( H1 )
1  P(T )
 105
Probabilities of interest as function of known data and
incidence of the disease in the population
Incidence of the disease in the population P( H1 )  x
Probability to be
sic giving positive
Probability to be healthy
giving positive
Probability to be sic giving
negative
P ( H1 T ) 
P( H1 T ) 
P ( H1 T ) 

P(T H1 ) x
P(T H1 ) x  P(T H1 ) (1  x)
P(T H1 ) (1  x)
P(T H1 ) x  P(T H1 ) (1  x)
P(T H1 ) x

(1  P(T H1 )) x
1  P(T )
1  P(T )
(1  P(T H1 )) x

1  P(T H1 ) x  P(T H1 ) (1  x)
From the given data:
Interest to minimise
P( H1 T ) P ( H 1 T )
maximize P( H1 T )
optimum
eficiency
correctly detected
sic
P( H1 T ) 
99 x
2  97 x
sic undetected
P( H1 T ) 
P( H1 T ) 
healthy giving
positive
undergo costly and
agresive treatment
x
102  101x
2(1  x)
2  97 x
x  P( H1 )
incidence in the population
Receiver Operating Characteristic
(ROC curve)
For each cut
value c


P( H , c) P (  H , c)
P( H , c) P (  H , c)
c
P( H , c)  F1 (c) 
 f (u )du
1

c
P( H , c)  F2 (c) 
f
2
(u )du

( f ( x), x)
random
A  0.5
( x, x )
1
2
x  F2 (c)  f ( x)  F1 ( F ( x))
1
A   f ( x)dx   F1 ( F ( x)) dx
0
0

A
 F (u) f
1

2
P( H )
1
2

u


(u )du   f 2 (u )du  f1 ( x)dx
d 2  ( f ( x)  x) 2  max( d )
False positives
Reverse selection criteria
(0,0)
(0,1)
P( H )
True positives
1
(1,1)
H
True positive P(+|H)
H
optimum
(0,1)
P( H )
False Positives
STOCHASTIC CHARACTERISTICS
X ( ) :   R
Mathematical Expectation As(usual:
( R, B, P )
, B , Q)
n
X ( ) Discrete
X ( )   xk 1Ak ( )
k 1

X ( )   xk 1Ak ( )
P( X  xk )  P( Ak )   1Ak ( )dP( )
R
k 1
X ( ) Absolutely continuous
P( X  A)   1A ( )dP( )   dP( )  f ( )dw
R
A
A
Y  g[ X ( )]
Def.: Expectation:
 g ( x ) P( X  x )
k
E[Y ]  E[ g ( X )]   g[ X ( )]dP( ) 
R
k
 g ( x)dF ( x)   g ( x) f ( x)dx
R
(Stieltjes-Lebesgue integral  
k
)
R
Moments (wrt origin)
 n  E[ X ]   x p( x)dx
n
 n   mn
R0  1
Mean:
►
x n p( x)  L1 ( R)
n
 n   mn
  E[ X ]   xp( x)dx
R
Linear operator
X  c0   ci X i
ci  R
i
►
{ X i }in1
independent
Moments wrt point
i
E[ X ]  c0   ci E[ X i ]
i
E[ X ]   E[ X i ]
i
c R
E[( X  c) n ]   ( x  c) n p( x)dx
R
X   Xi
if  2n  0
min cR E[( X  c) 2 ]
c
…Moments wrt Mean
Moments wrt Mean
n  E[( X   ) n ]   ( x   ) n p( x)dx
R
Variance:
►
NOT Linear
Skewness:
Watch!!
 2  V [ X ]  E[( X   ) 2 ]   ( x   ) 2 p( x)dx ( 0)
Y  c0  c1 X
3
1  3

R
V [Y ]   Y2  c12 X2
4
2  4 3

Kurtosis:
x n p( x)  L1 ( R)
Poisson P( X  k )  e   
k!
k  0,1,2,
k

n

n 
 n   X P( X  k )  e  k
k!
k 0
k 0
n 1
a
1 1
lim k 1  lim 1    0
k  a
k  k
 k
k
Abs. Conv.  
k
n
6
 2n2
n  1,2,
P ( X  n) 
 k  E[ X k ]; k  1
Cauchy

p ( x) 
  | x n p( x) | dx
1
 (1  x 2 )
n 1

(CPV
for n  1)
No moments (no mean,
no variance,…)
Global Picture
4
2  4 3

Peaked: Kurtosis
2
0
 0 Normal
0
Dispersion: Variance
Asymmetry: Skewness
3
1  3

 0 Right
 0 Symmetric
 0 Left
x0 xm 
Position: Mean: E[X ]
,…
Mode:
x0  sup x p( x)
Median: F ( xm )  P( X  xm )  1 / 2
1  0
Mode < Median < Mean
quantile: F ( x )  P( X  x )  q
1  0
Mode > Median > Mean
Covariance (and “Linear Correlation” )
V [ X1, X 2 ]  E[( X1  1 )( X 2  2 )]  E[ X1 X 2 ]  12
{X1 , X 2}
V [ X1 , X 2 ]  0
independent
V [ X1, X 2 ]
 1  12 
1
 1 2
Holder inequality:
| 12 | 1
Comments
Linear relation:
X 2  aX1  b
Quadratic:
X 2  a  cX
2
1
12  1
if
for X 1
 2
is  1 
, then 12  0

Useful expression:
Taylor Expansion for the Variance of
Y  g ( X1 , X 2 ,)
E ( X i )  i
 g 
 g 
Y  g ( X 1 , X 2 )  g ( 1 ,  2 )   
( x1  1 )   
( x2   2 )  O( Dij2 )
 x1  ( 1 ,2 )
 x2  ( 1 ,2 )
E[Y ]  E[ g ( X1 , X 2 )]  g (1 , 2 )  O( Dij2 )
 g 
 g 
Y  E[Y ]   
( x1  1 )   
( x2   2 )  
 x1  ( 1 ,2 )
 x2  ( 1 ,2 )


V [Y ]  E Y  E[Y ]   Y2 
2
2
2
 g 
 g 
 g g 
   V [ X 1 ]    V [ X 2 ]  2
V [ X1 X 2 ]  

 x1  ( 1 ,2 )
 x2  ( 1 ,2 )
 x1 x2  ( 1 ,2 )
2
2
 g 
 g g 
2  g 
2
    1     2  2
 1 2 12  

 x1  ( 1 ,2 )
 x2  ( 1 ,2 )
 x1 x2  ( 1 ,2 )
(mind for the re-maind-er…)
INTEGRAL TRANSFORMS
Fourier (… Laplace) Transform
Mellin Transform
Fourier Transform
(Characteristic Function)

(t )   eixt f ( x)dx
f  L1 ( R)
f :R C
tR
:t R C

Inversion Theorem (Lévy,1925)
Probability Density…
(t )  E[e ]
ixt
Properties:

Exists for
all X ( )

► (0)  1
► bounded
(t )  1
►Schwarz symmetry
(t )   (t )
►Uniformly continuous in R
   0,  | (t   )  (t )  
(all necessary but not
sufficient)
1
itx
p( x) 
e
(t )dt

2 
1
itxk
e
(t )dt
Discrete: p( X  xk ) 

2 
xk  a  bn
a, b  R, b  0, n  Z
Reticular:
b
p( X  xk ) 
2
 /b
itxk
e
 (t )dt
 / b
► One-to-one correspondence between DF and CF
► Two DF with same CF are the same a.e.
Useful Relations:
► Y  g ( X )  Y (t )  E[e ]  E[e
iYt
►
]
Y (t )  eiat X (bt )
Y  a  bX
a, b  R
► If { X i ~ pi ( xi )}in1
ig ( X )t
are n independent random quantities
it ( X 1  X n )
X  X1    X n
 X (t )  E[e
X  X1  X 2
 X (t )  1 (t ) 2 (t )  1 (t )2 (t )
If distribution of
]  1 (t ) n (t )
X is symmetric, then  X (t ) is a real function
 X (t )    X (t )   X (t )   X (t )
Example
X1 ~ Po(n1 | 1 )
X 2 ~ Po(n2 | 2 )
X  X1  X 2
i (t )  e
 i (1eit )
 X (t )  1 (t )2 (t )  e
 1 
P( X  n)   
 2 
X: Discrete reticular: a=0, b=1
n/2
 ( 1  2 ) ( 1eit  2e  it )
S
e
z

2i C

C:z

e
w
 ( z 1/ z )
( n 1)
2
e
1/ 2

 1 
   ;  ( ,  ]

 2 
Pole of order n+1 at z=0
( 12 ) n / 2 p
Re s{ f ( z ), z  0]  
p 0 ( p  n  1)( p  1)

c
n/2
dz
 1  ( 1  2 )
P( X  n)    e
I n (21 2 )
 2 
Some Useful Cases:
X k ~ Po( xk | k )
X k ~ N ( xk | k , k )
X  X1    X n
X ~ Po( x | S )
X ~ N ( x |  S , S )
S  1    n
S  1    n
 S2   12     n2
X k ~ Ca( xk | ak , bk )
X ~ Ca( x | aS , bS )
aS  a1    an
bS  b1    bn
X k ~ Ga( xk | a, bk )
X ~ Ga( x | a, bS )
bS  b1    bn
Moments of a Distribution
(F-LT usually called “moment generating functions)
k



k
k
E[ X ]  (i)  k (t )
 t
 t 0
(t )  E[e ] 
iXt
(t1 ,, tn )  E[e
Example
X ~ Cs(0,1)
i ( X 1t1  X 1t n )
]
kj
ki




kj
ki  k j
ki
E[ X i X j ]  (i)  ki k j (t1 ,, tn )
  ti  t j
 t ,,t 0
1
n

Xn
X  n
n 1 3
supp[ X n ]  0,2
P( X n  0)  P( X n  2) 

1
2
1
(t )  E[e ]  1  e 2it
2
iXt

i
1
 E[ X ] 
2
2
3
3
1
 2) (0)    E[ X 2 ]    2 
8
8
8
1) (0) 
f : R  C
Mellin Transform

 i
M ( f ; s)   f ( x) x dx
s 1
0
s  C
Obviously, if exists …
Probability Densities…
lim f ( x)  O( x )

lim f ( x)  O( x )
1
s
f ( x) 
M
(
f
,
s
)
x
ds

2i  i
M ( f ; s)  E[ X s 1 ]
   Re(s)  
x 0
f  L1 ( R  )
Strip of holomorphy
  ,  
Im
x
f1 ( x)  e
x
f 2 ( x)  e  1
M ( f1, 2 ; s)  ( s)
 0,  
 1,0 
0
Re
x
1
0
Re( )   , 
  
M ( f ; s)    , 
Useful Relations:
► Y  aX
s 1
M Y (s)  a M X (bs  b  1)
b
a , b  R, a  0
a  1, b  1
► { X i ~ pi ( xi )}
n
i 1
Y  X 1
M Y ( s )  M X (2  s )
X  X1 X 2  X n
M X ( s)  M1 ( s)  M n ( s)
X  X 1 X 21
M X ( s )  M1 ( s ) M 2 (2  s )
independent
xi [0, )
Non- negative
Example
X1 ~ Ex( x1 | a1 )
X 2 ~ Ex( x2 | a2 )
X  X1 X 2
( s ) 2
M X ( s) 
(a1a2 ) s 1
( s )
M i (s)  s 1
ai
c i
a1a2
z
2
p ( x) 
(
a
a
x
)

(
z
)
dz
1 2

2i ci
c0
c
 0,  
real
(a1a2 x) n
2(n  1)  ln( a1a2 x)
Re s{ f ( z ), zn  n] 
2
(n  1)
Newman series…
p( x)  2a1a2 K0 (2 a1a2 x )
Some Useful Cases:
X1 ~ Ex( x1 | a1 )
X 2 ~ Ex( x2 | a2 )
X1 ~ Ga( x1 | a1, b1 )
X 2 ~ Ga( x2 | a2 , b2 )
X  X1 X 2
X  X1 X 2
~ 2a1a2 K0 (2 a1a2 x )
~ a1a2 (a2  a1 x) 2

2a a  a2  2 (b1 b2 ) / 21
  x
~
K v (2 a1a2 x )
(b1 )(b2 )  a1 
b1 b2
1
2
  b2  b1  0

obviously…
p( x)   p1 ( w) p2 ( x / w) 1 dw
w
0
1
(b1  b2 ) a1 1 a2 2 x b1 1
~
(b1 )(b2 ) (a1  a2 x)b1 b2
b
b

p( x)   p1 ( xw) p2 ( w)dw
0
…Densities with support in R…
X1 ~ N ( x1 | 0,1 )
X 2 ~ N ( x2 | 0, 2 )
a
X  X 1 X 2 ~   K 0 (a | x |)
 
1
a   1 2 
DISTRIBUTIONS AND
GENERALIZED FUNCTIONS
Distribution
Functional:
T :  ( x)  D  T ,   C
D  C`c
Linear functional:
 T ,1  2    T ,1    T ,2 
Continuous functional:

{n } n

 T , n    T ,  
n
f :  R  R
Locally Lebesgue integrable (LLI)
defines a distribution
,  C
T  D
is a Distribution

 T f ,    f ( x) ( x)dx

“regular” distributions (“singular” the rest)
Some Basic Properties (see text for more)
T , G  D
T  G  D
( ae)
,  C
T  G   T ,     G,  
supp T  supp   0   T ,   0
  D;   D   T ,   T , 
 T ,     T ,   
 D pT ,  (1) p  T , D p 


{Tn } n
 T iff   Tn ,   n
  T ,  
~
~
 T ,   T ,  
Fourier Transform
 SaT ,   T ( x  a),   T ,  ( x  a) 
1
 PaT ,   T (ax),    T ,  ( x ) 
a
|a|
Two examples
  ,    (0)
  ,1  2  1 (0)  2 (0)     ,1     ,2 
lim n |  , n     ,  | lim n | n (0)   (0) | 0
 Sa ,    a ,    ,  ( x  a)   (a)
1
1
 Pa ,    ax ,     ,  ( x )   (0)
a
|a|
|a|
  ,      ,     (0)
n
f n  1[ 1/ n,1/ n ] ( x)
2
H ( x)  1[0,) ( x)
LLI: defines a distribution

  Tn ,   n
   ,  



0
 TH ,    1[0, ) ( x) ( x)dx   ( x)dx
 H ,     H ,     (0)   ,  
D  C`c
Tempered Distributions
“rapidly decreasing” (Schwartz Space)
S  { : R  C |   C  and
f ( x)  e  x
EXAMPLE:
EXAMPLE:
EXAMPLE:
admissible
S
S
f ( x)  e x 1(0,) ( x)

 | Tn ,      ,  | n
 0
  admissible   C1
  C 0
f ( x)
R (a  x 2 )m  C  
f  L1 ( R)
n
| x|
x R
x  R
n
f n  1[ 1/ n,1/ n ] ( x)
2

lim | x D  | 0 n, m  N 0 }
m
….
for some
m N 0
defines a Tempered Distribution T f
Convergence to zero
{n ( x)} n  S

max | x k D mn ( x) | n
 0
xR

 T , n ( x)  n
 0
T
k , m  N0
Probability Distributions
(, B , Q)
F ( x)  P( X  x)  Q( X 1 (, x])
X ( ) :   R
LLI: defines a distribution
In general T is a Probability Distribution if:
 TF ,  
 T ,   0   0
 T ,1  1
 ( x)  1 D, S ,
T does not have to be generated by a LLI function
…but any LLI function defines a probability distribution if

 T f ,    f ( x) ( x)dx  0  ( x)  0

Probability Density “Distribution”

 T f ,1   f ( x)dx  1

 T ,  TF ,    TF ,  
Delta Distribution unifies discrete and AC random quantities:
X ( )
Discrete: rec( X )  x1 , x2 , pk  P( X  xk )
Continuous: rec( X )  R
n n
(it )
n  0 n!
 (t )  
 (t )   e p( x)dx

k 1

itx

TD   pk  ( xk )
p( x)1A R ( x)

CF:


n 1 itx n
1
itx
p( x) 
e  (t )dt  
e (it ) dt


2 
n 0 n! 2 



1
n n n)
itx
Tp   (1)
 ( x,0)
delta distribution  ( x,0) 
e dt

n!
2 
n 0


n n)
n n
n)
 Tp ,    (1)
  ( x,0),    (0)
n 0
n!
TPROB  TD  (1   )Tp
n 0
n!
Example
1
F2 ( x)
F2 (a)
F1 (a)
F1 ( x)
F1 (0)
0
a
b
F ( x)  F1 ( x)H ( x)  H ( x  a)  F2 (a)H ( x  a)  H ( x  b)  F2 ( x) H ( x  b)
 TF ,  Df ,    TF ,´
Df  F1 (0) (0)  F2 (a)  F1 (a) (a)  f1 ( x)1[0,a ) ( x)  f 2 ( x)1[b,) ( x)
 
a

0
b
E X m  Df , X m  F1 (0) m0  F2 (a)  F1 (a)a m   f1 ( x) x m dx   f 2 ( x) x m dx
a

0
b
1  F1 (0)  F2 (a)  F1 (a)   f1 ( x)dx   f 2 ( x)dx
LIMIT THEOREMS
and
CONVERGENCE
General Problem:
Find the limit behaviour of a sequence of random quantities
Example:
X1  X 2
1 n


X1, X 2 ,..., X n ,...
,..., Z n   X k ,...
Z1  X 1 , Z 2 
2
n k 1


How is Z n distributed when n  ( ) ?
~“distance”
convergence criteria
1) More or less strong,
2) May have convergence for some criteria and not for others
Convergence in:
Distribution
Probability
Almost Sure
L p (R) Norm
Uniform
Logarithmic
Central Limit Theorem
Glivenko-Cantelli Theorem
Weak Law of Large Numbers
Strong Law of Lage Numbers
Convergence in Quadratic Mean
Glivenko-Cantelli Theorem
Chebyshev Theorem
Y  g( X )  0
X ~ F ( x)
E[ g ( X )]
P g ( X )  k  
k
 X  1  2
E[Y ] 
PY  k  ?
1  X g ( X )  k 
2  X g ( X )  k 
 g ( x)dF ( x)   g ( x)dF ( x)
1
2
g( X )  0
g( X )  k
0

 kdF ( x)  kP X     kP( g ( X )  k )
2
2
Bienaymé-Chebyshev Inequality

X with finite mean and variance  ,  2
g( X )  ( X  )
2

1
P X    k   2
k
Convergence in Probability
X1, X 2 ,..., X n ,...
Let
Def.:
and
F1 ( x1 ), F2 ( x2 ),..., Fn ( xn ),...
X n converges in probability to X if, and only if
lim n P X n ( x)  X     0 ;   0
o, equivalently,
lim(Prob)
lim n P X n ( x)  X     1 ;   0
Weak Law of Large Numbers (J. Bernouilli…)
Let
X1, X 2 ,..., X n ,...
be r.q. with the same distribution and finite mean
 
n
X

X
1


1
2
The sequence Z1  X 1 , Z 2 
,..., Z n   X k ,... converges in Probability to
2
n k 1



lim n P Z n       0 ;   0
LLN in practice:…
WLLN: When n is very large, the probability that Zn differs from μ by a small amount is very small
 Zn gets more and more concentrated around the real number μ
But “very small “ is not zero: it may happen that for some k>n, Zk differs from μ by more than ε…
Convergence Almost Sure
Def.:
Let
X n converges “almost sure” to X
X1, X 2 ,..., X n ,...
if, and only if
Plim n X n ( x)  X     0 ;   0
o, equivalently,
Prob(lim)
Plim n X n ( x)  X     1 ;   0
Strong Law of Large Numbers (E.Borel, A.N. Kolmogorov,…)
Let
X1, X 2 ,..., X n ,...
be r.q. with the same distribution and finite mean
 
n
X

X
1


1
2
The sequence Z1  X 1 , Z 2 
,..., Z n   X k ,... converges Amost Sure to
2
n k 1



Plim n Z n ( x)       0 ;   0
LLN in practice:…
WLLN: When n is very large, the probability that Zn differs from μ by a small amount is very small
 Zn gets more and more concentrated around the real number μ
But “very small “ is not zero: it may happen that for some k>n, Zk differs from μ by more than ε…
SLLN: as n grows, the probability for this to happen tends to zero
Convergence in Distribution
Let
X1, X 2 ,..., X n ,...
Def.:
Xn
F1 ( x1 ), F2 ( x2 ),..., Fn ( xn ),...
and their corresponding
tends to be distributed as
X ~ F ( x)
if, and only if
lim n Fn ( x)  F ( x)  lim n P( X n  x)  P( X  x) ; x  C ( F )
o, equivalently,
lim n n ( x)   ( x) ; t  R
Central Limit Theorem (Lindberg-Levy,…)
Sequence of independent r.q.
X1, X 2 ,..., X n ,...
same distribution
finite mean and variance
X1  X 2
1 n


Form the sequence
Z

X
,
Z

,...,
Z

X
,...
 1


1
2
n
k
2
n
k 1


easy dem. from CF
How is Z n distributed when n  ( ) ?
1 n


Zn   X k ~ N  z , 

n
n k 1


~ Z 
Zn  n
~ N z 0,1

n standarized
, 
2
Unifom DF
1 n
Zn   X k
n k 1
n 1
n2
n5
n  10
n  20
n  50
Parabolic DF
n 1
n2
1 n
Zn   X k
n k 1
n5
n  10
n  20
n  50
Cauchy DF
n 1
n2
1 n
Zn   X k
n k 1
n5
n  10
n  20
n  50
fn , f : S  R
Uniform Convergence
Def.: The sequence
 f n ( x)n1 converges uniformly to
f (x)
if, and only if
lim n sup x f n ( x)  f ( x)  0
xS
(example of aplication of Chebishev Th.)
Glivenko-Cantelli Theorem
experiment
e(1)
e(n)
one observation of
{x1}
x1 , x2 ,..., xn ,...
X
independent, identically distributed
Empiric Distribution Function
1 n
Fn ( x)   I (  , x ] ( xk )
n k 1
If observations are iid:
number of values xi lower or equal to x
n
lim n Plim n sup x Fn ( x)  F ( x)  0  1
The Empiric Distribution Function converges uniformly to the Distribution
Function F (x) of the r.q. X
Demonstration of Convergence in Probability:
1 n
Fn ( x)   I (  , x ] ( xk )
n k 1
Empiric Distribution Function
1) A  (, x0 ]
Y  I A (x) is a Bernouilli random quantity P(Y  1)  P( X  A)  F ( x0 )
P(Y  0)  P( X  A)  1  F ( x0 )
with Characteristic Function
Y (t )  eit F ( x0 )  1  F ( x0 )
n
2) Sum of iid Bernouilli random quantities
Z n   I (  , x ] ( xk )  nFn ( x)
k 1
W  Z n  nFn (x) is a Binomial r.q.
Fn ( x) 
EW   nF
V W   nF (1  F )
W
n
n k
P(W  k )    F (1  F ) nk
k 
EFn   F
F (1  F )
V Fn  
n
3) Bienaymé-Chebyshev Inequality
P X    k  
1
k2
P Fn ( x)  F ( x)    
F (1  F )
n 2
Convergence in Lp(R) Norm
Def.:
X n (w)n1
Converges in L p (R) norm to
if, and only if,
X n ( w)  L p ( R) ; n
that is:
X ( w)  L p ( R)
and
X (w)


lim n E X n ( w)  X ( w)  0

p

  0 n0 ( ) | n  n0 ( ) E X n ( w)  X ( w)  
p
p=2 convergence in quadratic mean
Logarithmic Convergence
Kullback-Leibler “Discrepancy” (see Lect. on Information)
Logarithmic Divergence of a pdf ~
p( x) ; x  X
from its true pdf
A sequence
p(x )
pi ( x)

of
i 1
p( x )
DKL [ ~
p | p]   p( x ) log ~ dx
p( x)
X
pdf “Converges Logarithmically” to a density
p(x )
lim k  DKL [ p | pk ]  0
iff