Download From Gene Trees with HGT to Species Trees

Document related concepts

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Gene expression programming wikipedia , lookup

Koinophilia wikipedia , lookup

Microevolution wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Transcript
From Gene Trees with HGT to Species Trees
Marc Hellmuth
Dptm. Mathematics and Computer Science
University of Greifswald, Germany
+ Nikolai, Manuela, Nic, Sarah, Paul, John, Peter
TBI-Winterseminar 2017
Intro
From Gene Tree to Species Tree
Phylogenomics . . .
. . . aims at finding plausible hypotheses of the evolutionary history of
genes or species based on genomic sequence information
(1837)
“I think” by Charles Darwin
(2014)
http://www.greennature.ca
1/8
Intro
From Gene Tree to Species Tree
Central Objects: Rooted Trees and Triples
Rooted tree T:
a
b
x
y
z
connected,acyclic graph
2/8
Intro
From Gene Tree to Species Tree
Central Objects: Rooted Trees and Triples
Rooted tree T:
Triples:
T displays a triple (ab|z) if the path from a to b
does not intersect the path from z to the root.
a
b
x
y
z
2/8
Intro
From Gene Tree to Species Tree
Central Objects: Rooted Trees and Triples
Rooted tree T:
Triples:
T displays a triple (ab|z) if the path from a to b
does not intersect the path from z to the root.
a
b
x
y
z
2/8
Intro
From Gene Tree to Species Tree
Central Objects: Rooted Trees and Triples
Rooted tree T:
Triples:
T displays a triple (ab|z) if the path from a to b
does not intersect the path from z to the root.
R(T ) = {(ab|x), (ab|y), (ab|z), (xy|a) . . . }
a
b
x
y
z
2/8
Intro
From Gene Tree to Species Tree
Central Objects: Rooted Trees and Triples
Rooted tree T:
Triples:
T displays a triple (ab|z) if the path from a to b
does not intersect the path from z to the root.
R(T ) = {(ab|x), (ab|y), (ab|z), (xy|a) . . . }
An arbitrary set of triples R is consistent,
if there is a tree T with R ⊆ R(T )
a
b
x
y
z
2/8
Intro
From Gene Tree to Species Tree
Central Objects: Rooted Trees and Triples
Rooted tree T:
Triples:
T displays a triple (ab|z) if the path from a to b
does not intersect the path from z to the root.
R(T ) = {(ab|x), (ab|y), (ab|z), (xy|a) . . . }
An arbitrary set of triples R is consistent,
if there is a tree T with R ⊆ R(T )
a
b
x
y
z
Consistency-check via BUILD in polynomial time.
BUILD returns also a tree T with R ⊆ R(T ) if one exists.
2/8
Intro
From Gene Tree to Species Tree
The “true” evolutionary History
G
F
E
A
B
C
D
3/8
Intro
From Gene Tree to Species Tree
The “true” evolutionary History
speciation
duplication
HGT
• species are characterized by its genome:
a “bag of genes”
x
x
• “Genes” evolve along a rooted tree with
unique event labeling
t : V 0 → M = {•, , N}
x
a
A
b1 b2 b3
B
c1 c2 c3
C
d
D
3/8
Intro
From Gene Tree to Species Tree
The “true” evolutionary History
speciation
duplication
HGT
• species are characterized by its genome:
a “bag of genes”
x
x
• “Genes” evolve along a rooted tree with
unique event labeling
t : V 0 → M = {•, , N}
x
a
Gene duplication : an offspring has two
copies of a single gene of its ancestor
A
b1 b2 b3
B
c1 c2 c3
C
d
D
• Speciation : two offspring species inherit
the entire genome of their common
ancestor
HGT
N HGT : transfer of genes between
organisms in a manner other than
traditional reproduction and across
different species
3/8
Intro
From Gene Tree to Species Tree
The “true” evolutionary History
speciation
duplication
HGT
• species are characterized by its genome:
a “bag of genes”
x
x
• “Genes” evolve along a rooted tree with
unique event labeling
t : V 0 → M = {•, , N}
x
a
A
b1 b2 b3
B
c1 c2 c3
C
d
D
In practice, it is possible to compute (observable part of) event-labeled gene
trees directly from sequence data (using orthologs, xenologs, paralogs)
without the need to know the species tree!
3/8
Intro
From Gene Tree to Species Tree
speciation
duplication
HGT
x
x
x
x
a
A
x
x
b1 b2 b3
B
c1 c2 c3
C
d
D
a
A
b1 b2 b3
B
c1 c2 c3
C
d
D
In practice, it is possible to compute (observable part of) event-labeled gene
trees directly from sequence data (using orthologs, xenologs, paralogs)
without the need to know the species tree!
4/8
Intro
From Gene Tree to Species Tree
speciation
duplication
HGT
x
x
x
a
A
b1 b2 b3
B
c1 c2 c3
C
d
D
a
A
b1 b2 b3
B
c1 c2 c3
C
d
D
In practice, it is possible to compute (observable part of) event-labeled gene
trees directly from sequence data (using orthologs, xenologs, paralogs)
without the need to know the species tree!
4/8
Intro
From Gene Tree to Species Tree
speciation
duplication
HGT
x
x
x
a
b1 b2 b3
A
Questions:
B
c1 c2 c3
C
d
D
a
A
b1 b2 b3
B
c1 c2 c3
C
d
D
Are the gene trees biologically feasible? ⇐⇒
Are there species trees for the gene trees?
4/8
Intro
From Gene Tree to Species Tree
speciation
duplication
HGT
x
x
x
a
b1 b2 b3
A
Questions:
B
c1 c2 c3
C
d
D
a
A
b1 b2 b3
B
c1 c2 c3
C
d
D
Are the gene trees biologically feasible? ⇐⇒
Are there species trees for the gene trees?
If so, how to construct species tree? ⇐⇒
4/8
Intro
From Gene Tree to Species Tree
speciation
duplication
HGT
x
x
x
a
b1 b2 b3
A
Questions:
B
c1 c2 c3
C
d
D
a
b1 b2 b3
A
B
c1 c2 c3
C
d
D
Are the gene trees biologically feasible? ⇐⇒
Are there species trees for the gene trees?
If so, how to construct species tree? ⇐⇒
How much information about a (putative) species
tree is contained in the gene tree?
4/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
a
A
Question:
b1 b2
B
c1 c2 c3
C
d
D
A
B
C
D
When does there exist a species tree for a given gene tree
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
a
A
Question:
b1 b2
B
c1 c2 c3
C
d
D
A
B
C
D
When does there exist a species tree for a given gene tree
and a reconciliation map µ between them?
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
a
A
Question:
b1 b2
B
c1 c2 c3
C
d
D
A
B
C
D
When does there exist a species tree for a given gene tree
and a reconciliation map µ between them?
µ : V (T ) → V (S ) ∪ E (S ) such that µ preserves T .
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
For three leaves a, b, c in T we write (ab|c) if
the path from a to b does not intersect the path
from c to the root.
a
A
Question:
b1 b2
B
c1 c2 c3
C
d
D
When does there exist a species tree for a given gene tree
and a reconciliation map µ between them?
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
For three leaves a, b, c in T we write (ab|c) if
the path from a to b does not intersect the path
from c to the root.
a
A
b1 b2
B
c1 c2 c3
C
d
D
R(T ) = {(ab1 |x) with x = b2 , c1 , c2 , c3 , d ;
(ab2 |x) with x = c1 , c2 , c3 , d ;
(b1 b2 |x) with x = c1 , c2 , c3 , d ;
...}
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
For three leaves a, b, c in T we write (ab|c) if
the path from a to b does not intersect the path
from c to the root.
a
A
b1 b2
B
c1 c2 c3
C
d
D
R(T ) = {(ab1 |x) with x = b2 , c1 , c2 , c3 , d ;
(ab2 |x) with x = c1 , c2 , c3 , d ;
(b1 b2 |x) with x = c1 , c2 , c3 , d ;
...}
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
For three leaves a, b, c in T we write (ab|c) if
the path from a to b does not intersect the path
from c to the root.
We write (ab|c)• if (ab|c) ∈ R(T )
lca(a, b, c ) = • = “speciation00
a
A
b1 b2
B
c1 c2 c3
C
d
D
R(T ) = {(ab1 |x) with x = b2 , c1 , c2 , c3 , d ;
(ab2 |x) with x = c1 , c2 , c3 , d ;
(b1 b2 |x) with x = c1 , c2 , c3 , d ;
...}
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
For three leaves a, b, c in T we write (ab|c) if
the path from a to b does not intersect the path
from c to the root.
We write (ab|c)• if (ab|c) ∈ R(T )
lca(a, b, c ) = • = “speciation00
a
A
b1 b2
B
c1 c2 c3
C
d
D
Examples: (ab1 |c2 )• , (ab1 |d)• , (b2 c3 |d)• (ac2 |d)• , . . .
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
For three leaves a, b, c in T we write (ab|c) if
the path from a to b does not intersect the path
from c to the root.
We write (ab|c)• if (ab|c) ∈ R(T )
lca(a, b, c ) = • = “speciation00
We know the assignment of genes to the
species in which they occur. This gives us the
triple set:
a
A
b1 b2
B
c1 c2 c3
C
d
D
S = {(AB|C) : ∃ (ab|c)• with a ∈ A, b ∈ B , c ∈ C }
Examples: (ab1 |c2 )• , (ab1 |d)• , (b2 c3 |d)• (ac2 |d)• , . . .
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
For three leaves a, b, c in T we write (ab|c) if
the path from a to b does not intersect the path
from c to the root.
We write (ab|c)• if (ab|c) ∈ R(T )
lca(a, b, c ) = • = “speciation00
We know the assignment of genes to the
species in which they occur. This gives us the
triple set:
a
A
b1 b2
B
c1 c2 c3
C
d
D
S = {(AB|C) : ∃ (ab|c)• with a ∈ A, b ∈ B , c ∈ C }
Examples: (ab1 |c2 )• , (ab1 |d)• , (b2 c3 |d)• (ac2 |d)• , . . .
S = {(AB|C), (AB|D), (BC|D), (AC|D)}
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
For three leaves a, b, c in T we write (ab|c) if
the path from a to b does not intersect the path
from c to the root.
We write (ab|c)• if (ab|c) ∈ R(T )
lca(a, b, c ) = • = “speciation00
We know the assignment of genes to the
species in which they occur. This gives us the
triple set:
a
A
b1 b2
B
c1 c2 c3
C
d
D
S = {(AB|C) : ∃ (ab|c)• with a ∈ A, b ∈ B , c ∈ C }
S = {(AB|C), (AB|D), (BC|D), (AC|D)}
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
For three leaves a, b, c in T we write (ab|c) if
the path from a to b does not intersect the path
from c to the root.
We write (ab|c)• if (ab|c) ∈ R(T )
lca(a, b, c ) = • = “speciation00
We know the assignment of genes to the
species in which they occur. This gives us the
triple set:
a
A
b1 b2
B
c1 c2 c3
C
d
D
S = {(AB|C) : ∃ (ab|c)• with a ∈ A, b ∈ B , c ∈ C }
Theorem (2012)
There is a species tree S for the gene tree T ⇐⇒ the triple set S is consistent
(can be tested efficiently).
A reconciliation map µ from T to S can be constructed in polynomial time.
From Event-Labeled Gene Trees to Species Trees., H.-Rosales M et al. BMC Bioinformatics, 2012
5/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
w
v
v
xx x
w
x
a
A
b1 b2
B
Observation 1:
c1
C
c2
d
D
a
b1 c1 c2
b2
d
Partial Order in Gene tree T and Species tree S: w ≺T v but µ(w ) 6S µ(v )
6/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
w
v
v
xx x
w
x
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
d
Observation 1:
Partial Order in Gene tree T and Species tree S: w ≺T v but µ(w ) 6S µ(v )
Observation 2:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
6/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
xx x
x
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
d
Observation 1:
Partial Order in Gene tree T and Species tree S: w ≺T v but µ(w ) 6S µ(v )
Observation 2:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
6/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
A
B
C
D
a
b1 c1 c2
b2
d
Observation 1:
Partial Order in Gene tree T and Species tree S: w ≺T v but µ(w ) 6S µ(v )
Observation 2:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
Define µ on T ? and T ? (µ preserves order on conn.components of T ? )
If (v , w ) is transfer-edges, then µ(v ) and µ(w ) should be incomparable
6/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
A
B
C
D
a
b1 c1 c2
b2
d
Observation 1:
Partial Order in Gene tree T and Species tree S: w ≺T v but µ(w ) 6S µ(v )
Observation 2:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
Define µ on T ? and T ? (µ preserves order on conn.components of T ? )
If (v , w ) is transfer-edges, then µ(v ) and µ(w ) should be incomparable
→ Consistent with standard “DTL-scenarios”
Simultaneous Identification of Duplications and Lateral Gene Transfers., Tofigh et al. IEEE/ACM TCBB, 2011
6/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
xx x
x
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
Observation:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
d
Use only (xy|z)• of conn.comp. of T ? → (ab1 |d)• → species triple (AB|D)
6/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
xx x
x
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
Observation:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
d
Use only (xy|z)• of conn.comp. of T ? → (ab1 |d)• → species triple (AB|D) (not sufficient)
6/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
v
xx x
w
x
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
Observation:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
d
Use only (xy|z)• of conn.comp. of T ? → (ab1 |d)• → species triple (AB|D) (not sufficient)
For each transfer edge (v , w ) add all triples (xy|z)N with
x , y descend. of w, z descend. of v in T ? (or vice versa):
(c2 b2 |d)N → (CB|D)
6/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
xx x
v
x
a
A
b1 b2
B
c1
C
c2
d
D
a
w
b1 c1 c2
b2
Observation:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
d
Use only (xy|z)• of conn.comp. of T ? → (ab1 |d)• → species triple (AB|D) (not sufficient)
For each transfer edge (v , w ) add all triples (xy|z)N with
x , y descend. of w, z descend. of v in T ? (or vice versa):
(c2 b2 |d)N → (CB|D) and (ab1 |c1 )N → (AB|C)
6/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
d
S = {(AB|D), (CB|D), (AB|C)}
6/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
d
S = {(AB|D), (CB|D), (AB|C)}
Theorem (2017)
There is a species tree S for the gene tree T with HGT ⇐⇒
the triple set S is consistent (can be tested efficiently).
A reconciliation map µ from T to S can be constructed in polynomial time.
From Event-Labeled Gene Trees with HGT to Species Trees., Hellmuth M. submitted J. Math. Biol., 2017
6/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
An issue that is not covered, so-far:
µ(u ) ≺S µ(u 0 ) and µ(v ) S µ(v 0 )
If we associate a “time” τ to vertices of T :
τu 0 = τv 0 , but τu 0 < τu = τv < τv 0 E
7/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
Each species tree S must display S.
7/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
Each species tree S must display S.
Here, S is the only tree that displays S and
µ is the only reconciliation map from T to S
7/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
Each species tree S must display S.
Here, S is the only tree that displays S and
µ is the only reconciliation map from T to S
⇒ There is no TIME - CONSISTENT (TC) scenario for T !
7/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
Each species tree S must display S.
Here, S is the only tree that displays S and
µ is the only reconciliation map from T to S
⇒ There is no TIME - CONSISTENT (TC) scenario for T !
⇒ Consistency of S is still necessary for TC scenario, but not sufficient!
7/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
There is no sufficient TC-condition provided in the literature.
Hence, we need to establish one!
7/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
1
2.
.
.
time
x
u
v
y
a
A
b1 b2
B
c1 c2
C
d1 d2
D
e1 e2
E
A reconciliation map µ is TC if there is a gene-tree time map τT : V (T ) → R and species tree
time map τT : V (T ) → R such that
C0 u ≺T v =⇒ τT (u ) > τT (v )
C1 u ≺S v =⇒ τS (u ) > τS (v )
C2 If u speciation, then τT (u ) = τS (µ(u ))
C3 If u is no speciation, then τS (x ) < τT (u ) < τS (y ) where µ(u ) = (x , y ) ∈ E (S )
7/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
1
2.
.
.
time
x
u
v
y
a
A
b1 b2
B
c1 c2
C
d1 d2
D
e1 e2
E
C0-C3 are satisfied ⇐⇒ there is a time map τ : V (T ) → R such that for any x , y ∈ V (T ):
T0 If y ≺T x then τ(y ) > τ(x ).
T1 If x and y are speciation vertices and µ(x ) = µ(y ), then τ(x ) = τ(y ).
7/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
1
2.
.
.
time
x
u
v
y
a
A
b1 b2
B
c1 c2
C
d1 d2
D
e1 e2
E
C0-C3 are satisfied ⇐⇒ there is a time map τ : V (T ) → R such that for any x , y ∈ V (T ):
T0 If y ≺T x then τ(y ) > τ(x ).
T1 If x and y are speciation vertices and µ(x ) = µ(y ), then τ(x ) = τ(y ).
7/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
1
2.
.
.
time
x
u
v
y
a
A
b1 b2
B
c1 c2
C
d1 d2
D
e1 e2
E
C0-C3 are satisfied ⇐⇒ there is a time map τ : V (T ) → R such that for any x , y ∈ V (T ):
T0 If y ≺T x then τ(y ) > τ(x ).
T1 If x and y are speciation vertices and µ(x ) = µ(y ), then τ(x ) = τ(y ).
For given T and S one can determine in poly-time whether there is a TC-µ from T to S and
construct it. (proof by Nikolai Nøjgaard in Bled 2017)
7/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes - Open Problems
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
7/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes - Open Problems
0
v
u'
now
time
v'
u
a
A
b1
b2
c1
B
Showing that for given S there is no
c2
C
TC
d
D
e1 e2
E
µ 6⇒ NO species tree exists for T .
Maybe, there is another species tree that displays all triples in S with valid µ ?!
7/8
Intro
From Gene Tree to Species Tree
The general problem statement
A gene tree T is “biol. meaningful” IFF S is consistent.
Let S be the set of species tree that display S
A biol. meaningful gene tree T is “biologically feasible” ⇐⇒
there is a species tree S ∈ S with TC-µ from T to S.
Observation:
|S| “small” =⇒ only few restrictions on the species tree S =⇒
number of possible species trees |S| “high”
Example: If S = 0/ and n species, then |S| = (2n − 3)!!
Tasks:
Check if T is “biol. meaningful” → DONE
Check if for T exists TC-µ w.r.t. a given S → DONE
Check if T is “biologically feasible” → Poly-time? NP-hard?
Great Task for a PhD-student (Manuela, Sarah, Nikolai)!
8/8
Intro
From Gene Tree to Species Tree
The general problem statement
A gene tree T is “biol. meaningful” IFF S is consistent.
Let S be the set of species tree that display S
A biol. meaningful gene tree T is “biologically feasible” ⇐⇒
there is a species tree S ∈ S with TC-µ from T to S.
Observation:
|S| “small” =⇒ only few restrictions on the species tree S =⇒
number of possible species trees |S| “high”
Example: If S = 0/ and n species, then |S| = (2n − 3)!!
Tasks:
Check if T is “biol. meaningful” → DONE
Check if for T exists TC-µ w.r.t. a given S → DONE
Check if T is “biologically feasible” → Poly-time? NP-hard?
Great Task for a PhD-student (Manuela, Sarah, Nikolai)!
8/8
Intro
From Gene Tree to Species Tree
The general problem statement
A gene tree T is “biol. meaningful” IFF S is consistent.
Let S be the set of species tree that display S
A biol. meaningful gene tree T is “biologically feasible” ⇐⇒
there is a species tree S ∈ S with TC-µ from T to S.
Observation:
|S| “small” =⇒ only few restrictions on the species tree S =⇒
number of possible species trees |S| “high”
Example: If S = 0/ and n species, then |S| = (2n − 3)!!
Tasks:
Check if T is “biol. meaningful” → DONE
Check if for T exists TC-µ w.r.t. a given S → DONE
Check if T is “biologically feasible” → Poly-time? NP-hard?
Great Task for a PhD-student (Manuela, Sarah, Nikolai)!
8/8
Intro
From Gene Tree to Species Tree
The general problem statement
A gene tree T is “biol. meaningful” IFF S is consistent.
Let S be the set of species tree that display S
A biol. meaningful gene tree T is “biologically feasible” ⇐⇒
there is a species tree S ∈ S with TC-µ from T to S.
Observation:
|S| “small” =⇒ only few restrictions on the species tree S =⇒
number of possible species trees |S| “high”
Example: If S = 0/ and n species, then |S| = (2n − 3)!!
Tasks:
Check if T is “biol. meaningful” → DONE
Check if for T exists TC-µ w.r.t. a given S → DONE
Check if T is “biologically feasible” → Poly-time? NP-hard?
Great Task for a PhD-student (Manuela, Sarah, Nikolai)!
8/8
Intro
From Gene Tree to Species Tree
The general problem statement
A gene tree T is “biol. meaningful” IFF S is consistent.
Let S be the set of species tree that display S
A biol. meaningful gene tree T is “biologically feasible” ⇐⇒
there is a species tree S ∈ S with TC-µ from T to S.
Observation:
|S| “small” =⇒ only few restrictions on the species tree S =⇒
number of possible species trees |S| “high”
Example: If S = 0/ and n species, then |S| = (2n − 3)!!
Tasks:
Check if T is “biol. meaningful” → DONE
Check if for T exists TC-µ w.r.t. a given S → DONE
Check if T is “biologically feasible” → Poly-time? NP-hard?
Great Task for a PhD-student (Manuela, Sarah, Nikolai)!
8/8
Intro
From Gene Tree to Species Tree
The general problem statement
A gene tree T is “biol. meaningful” IFF S is consistent.
Let S be the set of species tree that display S
A biol. meaningful gene tree T is “biologically feasible” ⇐⇒
there is a species tree S ∈ S with TC-µ from T to S.
Observation:
|S| “small” =⇒ only few restrictions on the species tree S =⇒
number of possible species trees |S| “high”
Example: If S = 0/ and n species, then |S| = (2n − 3)!!
Tasks:
Check if T is “biol. meaningful” → DONE
Check if for T exists TC-µ w.r.t. a given S → DONE
Check if T is “biologically feasible” → Poly-time? NP-hard?
Great Task for a PhD-student (Manuela, Sarah, Nikolai)!
8/8
Intro
From Gene Tree to Species Tree
The general problem statement
A gene tree T is “biol. meaningful” IFF S is consistent.
Let S be the set of species tree that display S
A biol. meaningful gene tree T is “biologically feasible” ⇐⇒
there is a species tree S ∈ S with TC-µ from T to S.
Observation:
|S| “small” =⇒ only few restrictions on the species tree S =⇒
number of possible species trees |S| “high”
Example: If S = 0/ and n species, then |S| = (2n − 3)!!
Tasks:
Check if T is “biol. meaningful” → DONE
Check if for T exists TC-µ w.r.t. a given S → DONE
Check if T is “biologically feasible” → Poly-time? NP-hard?
Great Task for a PhD-student (Manuela, Sarah, Nikolai)!
8/8
Intro
From Gene Tree to Species Tree
The general problem statement
A gene tree T is “biol. meaningful” IFF S is consistent.
Let S be the set of species tree that display S
A biol. meaningful gene tree T is “biologically feasible” ⇐⇒
there is a species tree S ∈ S with TC-µ from T to S.
Observation:
|S| “small” =⇒ only few restrictions on the species tree S =⇒
number of possible species trees |S| “high”
Example: If S = 0/ and n species, then |S| = (2n − 3)!!
Tasks:
Check if T is “biol. meaningful” → DONE
Check if for T exists TC-µ w.r.t. a given S → DONE
Check if T is “biologically feasible” → Poly-time? NP-hard?
Great Task for a PhD-student (Manuela, Sarah, Nikolai)!
8/8
Intro
From Gene Tree to Species Tree
The general problem statement
A gene tree T is “biol. meaningful” IFF S is consistent.
Let S be the set of species tree that display S
A biol. meaningful gene tree T is “biologically feasible” ⇐⇒
there is a species tree S ∈ S with TC-µ from T to S.
Observation:
|S| “small” =⇒ only few restrictions on the species tree S =⇒
number of possible species trees |S| “high”
Example: If S = 0/ and n species, then |S| = (2n − 3)!!
Tasks:
Check if T is “biol. meaningful” → DONE
Check if for T exists TC-µ w.r.t. a given S → DONE
Check if T is “biologically feasible” → Poly-time? NP-hard?
Great Task for a PhD-student (Manuela, Sarah, Nikolai)!
8/8
Intro
From Gene Tree to Species Tree
THANKS TO
Nikolai, Nic, Peter, John, Paul, Sarah, Manuela ..
8/8
Intro
From Gene Tree to Species Tree
THANKS TO
Nikolai, Nic, Peter, John, Paul, Sarah, Manuela ..
.. and YOU!
– It’s time for an isotonic sports drink! –
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
a
A
Question:
b1 b2
B
c1 c2 c3
C
d
D
When does there exist a species tree for a given gene tree
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
a
A
Question:
b1 b2
B
c1 c2 c3
C
d
D
When does there exist a species tree for a given gene tree
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
a
A
Question:
b1 b2
B
c1 c2 c3
C
d
D
A
B
C
D
When does there exist a species tree for a given gene tree
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
a
A
Question:
b1 b2
B
c1 c2 c3
C
d
D
A
B
C
D
When does there exist a species tree for a given gene tree
and a reconciliation map µ between them?
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
Given:
Gene tree T on G
Species tree S on S
Gene-Species map
σ :G→S
a
A
b1 b2
B
c1 c2 c3
C
d
D
A
B
C
D
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
Given:
Gene tree T on G
Species tree S on S
Gene-Species map
σ :G→S
a
A
b1 b2
B
c1 c2 c3
C
d
D
A
B
C
D
Map µ : V (T ) → V (S ) ∪ E (S ) is a reconciliation map if for all x ∈ V (T ):
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
Given:
Gene tree T on G
Species tree S on S
Gene-Species map
σ :G→S
a
A
b1 b2
B
c1 c2 c3
C
d
D
A
B
C
D
Map µ : V (T ) → V (S ) ∪ E (S ) is a reconciliation map if for all x ∈ V (T ):
Leaf Constraint. If x ∈ G, then µ(x ) = σ (x ).
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
Given:
Gene tree T on G
Species tree S on S
Gene-Species map
σ :G→S
a
A
b1 b2
c1 c2 c3
B
C
d
D
A
B
C
D
Map µ : V (T ) → V (S ) ∪ E (S ) is a reconciliation map if for all x ∈ V (T ):
Leaf Constraint. If x ∈ G, then µ(x ) = σ (x ).
Event Constraint.
(i) If t (x ) = •, then
µ(x ) = lcaS (σ (LT (x ))).
(ii) If t (x ) ∈ {}, then µ(x ) ∈ F .
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -without HGT - to Species Tree
Given:
Gene tree T on G
Species tree S on S
Gene-Species map
σ :G→S
a
A
b1 b2
c1 c2 c3
B
C
d
D
A
B
C
D
Map µ : V (T ) → V (S ) ∪ E (S ) is a reconciliation map if for all x ∈ V (T ):
Leaf Constraint. If x ∈ G, then µ(x ) = σ (x ).
Event Constraint.
(i) If t (x ) = •, then
µ(x ) = lcaS (σ (LT (x ))).
(ii) If t (x ) ∈ {}, then µ(x ) ∈ F .
Ancestor Constraint.
Let x , y ∈ V with x ≺T y .
(i) If t (x ), t (y ) ∈ {}, then µ(x ) S µ(y ),
(ii) otherwise, i.e., at least one of t (x ) and
t (y ) is a speciation •, µ(x ) ≺S µ(y ).
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
w
v
v
xx x
w
x
a
A
b1 b2
B
Observation 1:
c1
C
c2
d
D
a
b1 c1 c2
b2
d
Partial Order in Gene tree T and Species tree S: w ≺T v but µ(w ) 6S µ(v )
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
w
v
v
xx x
w
x
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
d
Observation 1:
Partial Order in Gene tree T and Species tree S: w ≺T v but µ(w ) 6S µ(v )
Observation 2:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
xx x
x
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
d
Observation 1:
Partial Order in Gene tree T and Species tree S: w ≺T v but µ(w ) 6S µ(v )
Observation 2:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
A
B
C
D
a
b1 c1 c2
b2
d
Observation 1:
Partial Order in Gene tree T and Species tree S: w ≺T v but µ(w ) 6S µ(v )
Observation 2:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
Define µ on T ? and T ? (µ preserves order on conn.components of T ? )
If (v , w ) is transfer-edges, then µ(v ) and µ(w ) should be incomparable
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
A
B
C
D
a
b1 c1 c2
b2
d
Observation 1:
Partial Order in Gene tree T and Species tree S: w ≺T v but µ(w ) 6S µ(v )
Observation 2:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
Define µ on T ? and T ? (µ preserves order on conn.components of T ? )
If (v , w ) is transfer-edges, then µ(v ) and µ(w ) should be incomparable
→ Consistent with standard “DTL-scenarios”
Simultaneous Identification of Duplications and Lateral Gene Transfers., Tofigh et al. IEEE/ACM TCBB, 2011
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
A
B
C
D
Leaf Constraint. If x ∈ G, then µ(x ) = σ (x ).
Event Constraint.
(i) If t (x ) = •, then
µ(x ) = lcaS (σ (LT ? (x ))).
(ii) If t (x ) ∈ {, 4}, then µ(x ) ∈ F .
(iii) If t (x ) = 4 and (x , y ) ∈ E, then µ(x )
and µ(y ) are incomparable in S.
a
b1 c1 c2
b2
d
Ancestor Constraint.
Let x , y ∈ V with x ≺T ? y .
(i) If t (x ), t (y ) ∈ {, 4}, then
µ(x ) S µ(y ),
(ii) otherwise, i.e., at least one of t (x ) and
t (y ) is a speciation •, µ(x ) ≺S µ(y ).
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
xx x
x
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
Observation:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
d
Use only (xy|z)• of conn.comp. of T ? → (ab1 |d)• → species triple (AB|D)
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
xx x
x
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
Observation:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
d
Use only (xy|z)• of conn.comp. of T ? → (ab1 |d)• → species triple (AB|D) (not sufficient)
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
v
xx x
w
x
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
Observation:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
d
Use only (xy|z)• of conn.comp. of T ? → (ab1 |d)• → species triple (AB|D) (not sufficient)
For each transfer edge (v , w ) add all triples (xy|z)N with
x , y descend. of w, z descend. of v in T ? (or vice versa):
(c2 b2 |d)N → (CB|D)
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
xx x
v
x
a
A
b1 b2
B
c1
C
c2
d
D
a
w
b1 c1 c2
b2
Observation:
We see (ac1 |d)• and (c2 d|a)• =⇒ (AC|D) and (CD|A) E
Idea:
Remove Transfer edges from T , we get T ?
d
Use only (xy|z)• of conn.comp. of T ? → (ab1 |d)• → species triple (AB|D) (not sufficient)
For each transfer edge (v , w ) add all triples (xy|z)N with
x , y descend. of w, z descend. of v in T ? (or vice versa):
(c2 b2 |d)N → (CB|D) and (ab1 |c1 )N → (AB|C)
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
d
S = {(AB|D), (CB|D), (AB|C)}
8/8
Intro
From Gene Tree to Species Tree
From Gene Tree -with HGT - to Species Tree
a
A
b1 b2
B
c1
C
c2
d
D
a
b1 c1 c2
b2
d
S = {(AB|D), (CB|D), (AB|C)}
Theorem (2017)
There is a species tree S for the gene tree T with HGT ⇐⇒
the triple set S is consistent (can be tested efficiently).
A reconciliation map µ from T to S can be constructed in polynomial time.
From Event-Labeled Gene Trees with HGT to Species Trees., Hellmuth M. submitted J. Math. Biol., 2017
8/8
Intro
From Gene Tree to Species Tree
0
1
2.
.
.
time
x
u
v
y
a
A
b1 b2
B
c1 c2
C
d1 d2
D
e1 e2
E
A reconciliation map µ is TC if there is a gene-tree time map τT : V (T ) → R and
species tree time map τT : V (T ) → R such that
C0 u ≺T v =⇒ τT (u ) > τT (v )
C1 u ≺S v =⇒ τS (u ) > τS (v )
C2 If u speciation, then τT (u ) = τS (µ(u ))
C3 If u is no speciation, then τS (x ) < τT (u ) < τS (y ) where
µ(u ) = (x , y ) ∈ E (S )
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
An issue that is not covered, so-far:
µ(u ) ≺S µ(u 0 ) and µ(v ) S µ(v 0 )
If we associate a “time” τ to vertices of T :
τu 0 = τv 0 , but τu 0 < τu = τv < τv 0 E
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
Each species tree S must display S.
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
Each species tree S must display S.
Here, S is the only tree that displays S and
µ is the only reconciliation map from T to S
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
Each species tree S must display S.
Here, S is the only tree that displays S and
µ is the only reconciliation map from T to S
⇒ There is no TIME - CONSISTENT (TC) scenario for T !
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
Each species tree S must display S.
Here, S is the only tree that displays S and
µ is the only reconciliation map from T to S
⇒ There is no TIME - CONSISTENT (TC) scenario for T !
⇒ Consistency of S is still necessary for TC scenario, but not sufficient!
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
v
u'
now
time
v'
u
a
A
b1
b2
B
c1
c2
C
d
D
e1 e2
E
There is no sufficient TC-condition provided in the literature.
Hence, we need to establish one!
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
1
2.
.
.
time
x
u
v
y
a
A
b1 b2
B
c1 c2
C
d1 d2
D
e1 e2
E
A reconciliation map µ is TC if there is a gene-tree time map τT : V (T ) → R and species tree
time map τT : V (T ) → R such that
C0 u ≺T v =⇒ τT (u ) > τT (v )
C1 u ≺S v =⇒ τS (u ) > τS (v )
C2 If u speciation, then τT (u ) = τS (µ(u ))
C3 If u is no speciation, then τS (x ) < τT (u ) < τS (y ) where µ(u ) = (x , y ) ∈ E (S )
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
1
2.
.
.
time
x
u
v
y
a
A
b1 b2
B
c1 c2
C
d1 d2
D
e1 e2
E
C0-C3 are satisfied ⇐⇒ there is a time map τ : V (T ) → R such that for any x , y ∈ V (T ):
T0 If y ≺T x then τ(y ) > τ(x ).
T1 If x and y are speciation vertices and µ(x ) = µ(y ), then τ(x ) = τ(y ).
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
1
2.
.
.
time
x
u
v
y
a
A
b1 b2
B
c1 c2
C
d1 d2
D
e1 e2
E
C0-C3 are satisfied ⇐⇒ there is a time map τ : V (T ) → R such that for any x , y ∈ V (T ):
T0 If y ≺T x then τ(y ) > τ(x ).
T1 If x and y are speciation vertices and µ(x ) = µ(y ), then τ(x ) = τ(y ).
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes
0
1
2.
.
.
time
x
u
v
y
a
A
b1 b2
B
c1 c2
C
d1 d2
D
e1 e2
E
C0-C3 are satisfied ⇐⇒ there is a time map τ : V (T ) → R such that for any x , y ∈ V (T ):
T0 If y ≺T x then τ(y ) > τ(x ).
T1 If x and y are speciation vertices and µ(x ) = µ(y ), then τ(x ) = τ(y ).
For given T and S one can determine in poly-time whether there is a TC-µ from T to S and
construct it. (proof by Nikolai Nøjgaard in Bled 2017)
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes - Open Problems
0
u
v
v'
u'
now
time
a
A
b1 b2
B
c1 c2
C
d
D
e1 e2
E
In general, neither µ nor species tree S must be unique!
If µ is not
TC ,
then there might be another µ 0 that is
TC .
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes - Open Problems
0
v
v'
u'
u
now
time
a
A
b1 b2
B
c1 c2
C
d
D
e1 e2
E
In general, neither µ nor species tree S must be unique!
If µ is not
TC ,
then there might be another µ 0 that is
TC .
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes - Open Problems
0
u
v
v'
u'
now
time
a
A
b1 b2
c1 c2
B
C
d
D
e1 e2
E
In general, neither µ nor species tree S must be unique!
If µ is not
TC ,
then there might be another µ 0 that is
Showing that for given S there is no
TC
TC .
µ 6⇒ NO species tree exists for T .
8/8
Intro
From Gene Tree to Species Tree
Time Travel of Genes - Open Problems
0
u
v
v'
u'
now
time
a
A
b1 b2
c1 c2
B
C
d
D
e1 e2
E
In general, neither µ nor species tree S must be unique!
If µ is not
TC ,
then there might be another µ 0 that is
Showing that for given S there is no
TC
TC .
µ 6⇒ NO species tree exists for T .
Maybe, there is another species tree that displays all triples in S with valid µ ?!
8/8