Download Multiple Random Variables and Joint Distributions

Multiple Random Variables and Joint Distributions • The conditional dependence between random variables serves as a foundation for time series analysis. • When multiple random variables are related they are described by their joint distribution and density functions F( x, y)  P( X  x, Y  y) 2F f ( x, y)  xy P( x, y  A)   f ( x, y)dxdy A P( x, y  A)   f ( x, y)dxdy A y f(x,y) x Conditional and Joint Probability Definition P( D  E )  P ( D | E ) P( E )  P ( E | D ) P( D ) Bayes Rule P( D  E ) P ( E | D ) P ( D ) P( D | E )   P( E ) P( E ) If D and E are independent P( D | E )  P ( D )  P( D  E )  P( D ) P( E ) Partition of domain into non overlaping sets P( E)  P( D1  E)  P( D2  E)  P( D3  E) D1 D2 P( E)  P( E | D1) P( D1)  P( E | D2 )P( D2 )  P( E | D3 ) P( D3 ) Larger form of Bayes Rule P( D i | E )  P( E | Di ) P( Di ) P( E | D1 ) P( D1 )  P( E | D2 ) P( D2 )  P( E | D3 ) P( D3 ) D3 E Conditional and joint density functions P( D  E ) P( D | E )  P( E ) Conditional density function f ( x, y) f (y | x)  f (x) Marginal density function P( E)  P( D1  E)  P( D2  E)  P( D3  E)  f (x)   f (x, y)dy y   If X and Y are independent P( D  E )  P( D ) P( E ) f ( x , y )  f ( x )f ( y ) Marginal Distribution  f ( y)   f (x, y)dx x   y f(x,y) x Conditional Distribution f ( x, y) f (y | x)  f (x) y f(x,y) x Expectation and moments of multivariate random variables Population Mean Expectation operator x     xf (x, y)dxdy  E(g( X, Y))   Cov( X, Y )   Covariance Correlation Sample   g(x, y)f (x, y)dxdy    ( x   x )( y   y )f ( x, y)dxdy   E([ X  E( X )][ Y  E( Y )])  E( XY )  E( X ) E( Y ) Cov( X, Y )  x  y 1 N X   Xi N i 1 1 N Ê( g( X, Y ))   g( X i , Yi ) N i 1 N 1 SXY   ( Xi  X )( Yi  Y ) N ( 1) i 1 ˆ  SXY SXS Y R = 0.67 5 4 3 Log(Alafia Flow (cfs)) 6 7 Covariance and Correlation are measures of Linear Dependence 20 25 MD-11 DP Water Level 30 35 0.5 -1.0 -0.5 0.0 R=0.007 sc • Is there a relationship between these two variables plotted? • Correlation, the linear measure of dependence is 0. • How to quantify that a relationship exists? 1.0 Mutual Information -1.0 -0.5 0.0 s 0.5 1.0 Entropy • Entropy is a measure of randomness. The more random a variable is, the more entropy it will have.  H( X)   E[log( f ( x ))]    f ( x ) log( f ( x ))dx  f(x) f(x) Mutual Information • Mutual information is a general information theoretic measure of the dependence between two variables. • The amount of information gained about X when Y is learned, and vice versa. • I(X,Y) = 0 if and only if X and Y are independent I( X, Y )  H ( X )  H ( Y )  H ( X, Y )   E[log( f ( x ))]  E[log( f ( y))]  E[log( f ( x, y))]    f ( x , y )   f ( x, y)   E log      f ( x, y) log  dxdy f ( x )f ( y )     f ( x )f ( y )    Mutual Information Sample Statistic 1 N   f̂ ( x i , yi )   Î( X, Y)   log  N i 1   f̂ ( x i )f̂ ( yi )  • Requires Monte-Carlo procedure to determine significance. (See later) The theoretical basis for time series models • A random process is a sequence of random variables indexed in time X( t1), X( t 2 ), X( t 3 ), X( t 4 ).... X1, X2 , X3, X4 .... • A random process is fully described by defining the (infinite) joint probability distribution of the random process at all times F( X( t1), X( t 2 ), X( t 3 )....)  P(X( t1)  x1, X( t 2 )  x 2 , X( t 3 )  x3,...)    f ( x1, x 2 , x 3....)  ...F( x1, x 2 , x 3...) x1 x 2 x 3 Random Processes • A sequence of random variables indexed in time • Infinite joint probability distribution X1, X2 , X3, X4 .... f ( x1, x 2 , x3, x 4 ....) f ( x t , x t 1,..., x t d ) f ( x t | x t 1,..., x t d )   f (x t , x t 1,..., x t d )dx t xt+1 = g(xt, xt-1, …,) + random innovation (errors or unknown random inputs) Classification of Random Quantities A time series constitutes a possible realization of a random process completely described by the full (infinite) joint probability distribution Bras, R. L. and I. Rodriguez-Iturbe, (1985), Random Functions and Hydrology, Addison-Wesley, Reading, MA, 559 p. The infinite set of all possible realizations is called the Ensemble. Bras, R. L. and I. Rodriguez-Iturbe, (1985), Random Functions and Hydrology, Addison-Wesley, Reading, MA, 559 p. Random process properties are formally defined with respect to the ensemble. First order marginal density function f(x(t)) from which the mean and variance can be evaluated  m( t )   x( t)f (x( t))dx( t)   2 ( t )  2 [ x ( t )  m ( t )] f ( x( t ))dx( t )   Stationarity A strictly stationary stochastic process {xt1, xt2, xt3, …} has the same joint distribution as the series of {xt1+h, xt2+h, xt3+h, …} for any given value of h. d f ( X( t1 ), X( t 2 ),..., X( t N ))  f ( X( t1  h ), X( t 2  h ),...X( t N  h )) This applies for all values of N, i.e. all orders of joint distribution function Stationarity of a specific order • 1st Order. A random process is classified as first-order stationary if its first-order probability density function remains equal regardless of any shift in time to its time origin d f(x(t1)) = f(x(t1+h)) for any value of h • 2nd Order. A random process is classified as second-order stationary if its second-order probability density function does not vary over any time shift applied to both values. d f(x(t1), x(t2)) = f(x(t1+h), x(t2+h)) for any value of h This means that the joint distribution is not a function of the absolute values of t1 and t2 but only a function of the lag =(t2-t1) First order stationarity d f(x(t1)) = f(x(t2))  t1, t2 Stationarity of moments m( t )  m 2 ( t )  2 Second order density function f(x(t1), x(t2)) Second order moments Cov( X( t1 ), X( t 2 ))  Cov( t1, t 2 )    (X( t1)  m( t1))( X( t 2 )  m( t 2 ))f (x( t1), x( t 2 ))dx( t1)dx( t 2 )  Correlation Cov( t1, t 2 ) ( t1, t 2 )  ( t1 )( t 2 ) Second order stationarity f(x(t1), x(t2)) is not a function of the absolute values of t1 and t2 but only a function of the lag =(t2-t1) Second moment stationarity m( t )  m 2 ( t )  2 Cov( X( t1), X( t 2 ))  Cov( t 2  t1)  Cov( ) ( t1, t 2 )  ( t 2  t1)  () Stationarity of the moments (weak or wide sense stationarity) 2nd Moment. A random process is classified as 2nd Moment stationary if its first and second moments are not a function of the specific time. mean: µ(t) = µ variance: σ2(t)= σ and: covariance: Cov( X(t1), X(t2)) = Cov( X(t1+h), X(t2+h)) This means that the covariance is not a function of the absolute values of t1 and t2 but only a function of the lag  = (t2- t1). -Subset of 2nd order stationarity -For gaussian process equivalent to 2nd order stationarity Periodic Stationarity In hydrology it is common to work with data subject to a seasonal cycle, i.e. that is formally non-stationary, but is stationary once the period is recognized. Periodic variable X y,m y=year, m=month Periodic first order stationarity d f(xy1,m) = f(xy2,m)  y1, y2 for each m Periodic second moment stationarity Cov(Xy,m1, Xy+,m2) = Cov(m1, m2, ) Ergodicity • • • • Definitions givin are with respect to the ensemble It is often possible to observe only one realization How can statistics be estimated from one realization The Ergodicity assumption for stationary processes asserts that averaging over the ensemble is equivalent to averaging over a realization  m  T 1 xf ( x )dx  lim x( t )dt  T  T   0 T 1 2    [x  m] f ( x )dx  lim [ x ( t )  m ] dt  T  T 2 2    Cov( )   0 T 1 ( x  m )( x  m ) f ( x , x ,  ) dx dx  lim ( x ( t  )  m)( x ( t )  m)dt 2 1 2 1 2  1  T T    0 Discrete representation • A continuous random process can only be observed at discrete intervals over a finite domain Z( t )  Zt , t  1,2,3... • Zt may be averages from t-1 to t (Rainfall) or instantaneous measurements at t (Streamflow) Markov Property • The infinite joint PDF construct is not practical. • A process is Markov order d if the joint PDF characterizing the dependence structure is of dimension no more than d+1. Joint Distribution f ( X t , X t 1,..., X t d ) Conditional Distribution f ( X t , X t 1,..., X t d ) f ( X t | X t 1,..., X t d )   f (X t , X t 1,..., X t d )dXt Assumption of the Markov property is the basis for simulation of time series as sequences of later values conditioned on earlier values Linear approach to time series modeling e.g. Xt=Xt-1+Wt AR1 • Model structure and parameters identified to match second moment properties • Skewness accommodated using – Skewed residuals – Normalizing transformation (e.g. log, Box Cox) • Seasonality through seasonally varying parameters Nonparametric/Nonlinear approach to time series modeling e.g. Multivariate nonparametric f̂ (X t , X t 1) estimated directly from data then used to obtain f̂ (X t | X t 1) NP1 • 2nd Moments and Skewness inherited by distribution • Seasonality through separate distribution for each season Other variants f̂ (X t | X t 1) Estimated directly using nearest neighbor method KNN f̂ (Xt | Xt 1)  LP(Xt 1)  Vt Local polynomial trend function plus residual

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Multiple Random Variables and Joint Distributions