Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Modeling phylogenetic comparative method with
(Non)normal Distributed Stochastic Processes
Tony Jhwueng
5th Annual Graduate Student Probability Conference
April 30, 2011.
Outline:
1
2
Modeling phylogenetic comparative method with
normal stochastic processes(BM and OU).
Modeling phylogenetic comparative method a
non-normal stochastic process(CIR).
A phylogenetic tree of 3 species
Chimpanzee
Human
Gorilla
4.6 million years ago
7.2 million years ago
(Takahata et al. 1995. J. of Theo. Bio.)
Comparative Data
Examples:
Body mass (adult male)(kg)
Brain mass (adult male)(gram)
Xbody
x1
56.7
y1
440
= x2 = 55.5 , Ybrain = y2 = 1361 .
x3
172.4
y3
570
(Jerison. 1973. Evolution of the Brain and Intelligence.)
Phylogenetic tree and comparative data
y1
1
y2
2
y3
3
Observed values
t=T
t=s
Trait
value axis
t=0
Trait evolution under Brownian motion assumption
200
−400
−200
0
Trait Value y(t)
200
0
−200
−400
Trait Value y(t)
400
BM: σ=3
400
BM: σ=1
0
2000
4000
6000
8000
0
2000
Time t
4000
6000
Time t
dyt = σdBt
8000
Covariation between pair of species
dyt = σdBt , where 0 ≤ t ≤ T .
yt − yt−1 = σ(Bt − Bt−1 ).
yt = yt−1 + σ(Bt − Bt−1 ) = · · · = y0 + σ(Bt − B0 ).
For two species(variables) y1,t and y2,t ,
cov (y1,t , y2,t ) = cov [y1,0 + σi (B1,t − B1,0 ), y2,0 + σj (B2,t − B2,0 )]
= σi σj cov [B1,t , B2,t ]
= σi σj s
where B1,t = B2,t , 0 ≤ t ≤ s and B1,t , B2,t are independent for
s < t ≤ T.
The distribution function for data analysis
Let Yt = (y1,t , y2,t , · · · , yn,t )t , 0 < t ≤ T , then the distribution for
Yt constraint by (1) Brownian motion process (2) tree dependency
is
Yt ∼ MVN(µ1, Ct )
where
Ct [i, j] = cov [yi,t , yj,t ] = σi σj s and yi,t = yj,t for 0 ≤ t ≤ s.
An example
y1
y2
Human Chimp.
1
y1
µ
y2 ∼ MVN
µ , σ 2 0.6
0
y3
µ
0.6
1
0
0
0
1
0.4
y3
Gorilla
0.4
1
0.6
Evolutionary correlation between traits can also be studied through
either maximum likelihood analysis or regression analysis.
Trait Evolution under OU process
Let yt be an Ornstein-Uhlenbeck process, that is
dyt = α(µ − yt )dt + σdBt
where Bt is a Brownian motion and α, σ > 0, µ ∈ R. Note: α is
called a constraining force that pulls extreme values back towards
the optimal state µ.
The covariation between pair of variables under the constraint of
(1) the OU process (2) tree dependency is
Yt ∼ MVN(µ1, Vt )
where
Vt [i, j] = cov [yi,t , yj,t ] = σi σj exp(−2α(T − s))
1 − exp(−2αs)
2α
and yi,t = yj,t for 0 ≤ t ≤ s.
Sketch of proof: similar to BM case, take care of
non-independent increment property.
Trait evolution under OU process
with randomly evolving environment
dy = −α1 (y − µ)dt + σ1 dBt
(1)
dµ = −α2 µdt + σ2 dBt ,
(2)
special case: α2 = 0 (optimum θ evolves according the BM)
dy = −α(y − µ)dt + σ1 dBt
(3)
dµ = σ2 dBt
(4)
Special case: µ: BM
The covariation between pair of species is
cov (yi , yj ) =
+σ 2 ta (1 −
σ2
(1 − exp(−2αta ) exp(−αtij ))
α
2 exp(−αtij /2)(1 − exp(−αta )
).
αta
where ta is time from the base of the phylogeny to the most
common ancestor, ti means time from the base of the phylogeny
to species i and tij is the time separating two species.
Short description of proof: Differential equations (1) and (2) for
the first and second moments can be derived by using Ito’s formula
to obtain differentials for the power terms and then taking
expectation. then solve the system of ODE’s yield the result.
µ: OU
Let α1 = α2 and σ1 = σ2 .
dy = −α(y − µ)dt + σdBt
(5)
dµ = −αµdt + σdBt
(6)
Then we have
Cov [yi , yj ] = Cov [[E [yi |ya ], E [yj |ya ]]
tij 2
1 − 2αta
) exp(−αtij )
2
2α
3(1 − exp(−2αta )) ta (1 + αta ) exp(−2αta )
+ exp(−αtij ){
−
}
4α
2
(1 − exp(−2αta )) ta exp(−2αta )
+αtij exp(−αtij ){
−
}
4α
2
= (α
BM and OU models are normal distributed. However, real data is
rarely normal distributed (even after log transform), call for:
Non-normal stochastic processes of tree type
dependency.
Examples
y1
0.2
y2
y3
y4
BM
0.2
0.6
0.6
1
0.8
0
0
0.8
0
0
1
0
0
0
1
0.4
0
0.4
1
0.8
OU, α = 0.5
0.4
yanc
0.63
0.45
0
0
0.45
0.63
0
0
0
0
0
0
0.63
0.18
0.18
0.63
Non-normal
?
?
?
?
0
0
0
0
0
0
0
0
?
?
?
?
A possible approach: CIR process
Cox Ingersoll Ross (CIR) model
√
dyt = α(µ − yt )dt + σ yt dBt
Goal: determine the distribution. (i.e. construct the covariation
between pair of variables)
CIR
The probability density at time t, conditional on its value at the
current time, s, is given by:
v
f (y (t), t; y (s), s) = c exp[−u − v ]( )q/2 Iq (2(uv )1/2 ),
u
where
c=
2α
,
σ 2 (1 − exp[−α(t − s)])
2αθ
−1
σ2
and Iq (.) is the modified Bessel function of the first kind of order q.
u = cy (s) exp[−α(t − s)], v = cy (t), q =
CIR
The distribution function is the noncentral chi-square,
χ2 [2cr (t); 2q + 2, 2u], with 2q + 2 degrees of freedom.
Straightforward calculations give the expected value and
variance of y (t) as:
E [y (t)|y (s)] = y (s) exp[−α(t − s)] + θ(1 − exp[−α(t − s)])
and
var (y (t)|y (s)) = ys
+
σ2
(exp(−αt − exp(−2αt)))
α
θσ 2
(1 − exp(−αt))2
2α
Some ongoing work
We first consider the phylogeny is of star shape which means that
species evolve independently. We fit the CIR process of this type,
the joint distribution of Yt = (y1,t , y2,t , · · · , yn,t )t is
n
Y
f (yi (t), yi (0))
i=1
1
How to determine the distribution when the two variables are
of tree type dependency in general?
2
Does the non-normal process supply adequate information for
the comparative data ?
3
How about the fit of data between the non-normal models
and the normal models ?