Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Common Intervals in Sequences,
Trees, and Graphs
Steffen Heber and Jiangtian Li
Genome Comparison of Bacteria
Kim et al., Nat. Biotechnol., 2004]
Gene Order & Function in Bacteria
•
Gene order in bacteria is weakly conserved.
[Gene order is not conserved in bacterial evolution. Mushegian,
Koonin; Trends Genet. 1996]
•
•
Some genes cluster together even in
unrelated species.
Genes inside a cluster are functionally
associated.
[Conserved clusters of functionally related genes in two bacterial
genomes. Tamames et al.; J Mol Evol. 1997]
Gene Order & Function in Bacteria
Gene Order & Function in Bacteria
Formalization of Gene Clusters
permutations π1, π2 ,…, πk
numbers 1,…,n
Genomes:
Genes:
1
π1
π3
π4
6
3
8
7
1
2
π2
3
2
7
4
6
5
4
8
2
5
6
4
5
7
1
7
6
3
8
2
1
4
8
5
3
Intervals
• For permutation of [n] = {1, 2, …, n},
an interval (=gene cluster) is a set
{(i), (i+1), …, (j)} for 1 i < j n.
• Any permutation of [n] has n(n-1)/2 intervals.
1
3
5
4
2
6
7
Common Intervals
• For a family F = (0, 1, …, k-1) of permutations,
a common interval of F (=conserved gene cluster) is
a subset S [n], iff S is interval in all i.
• We say SCF .
1
0
1
3
5
4
2
6
7
2
4
5
1
3
7
6
Common Intervals
• For a family F = (0, 1, …, k-1) of permutations,
a common interval of F (=conserved gene cluster) is
a subset S [n], iff S is interval in all i.
• We say SCF .
1
0
1
3
5
4
2
6
7
2
4
5
1
3
7
6
Common Intervals
• For a family F = (0, 1, …, k-1) of permutations,
a common interval of F (=conserved gene cluster) is
a subset S [n], iff S is interval in all i.
• We say SCF .
1
0
1
3
5
4
2
6
7
2
4
5
1
3
7
6
Lemma
Let F = (0, 1, …, k-1) and c, d CF .
• If c d then c d CF.
1
0
1
3
5
4
2
6
7
2
4
5
1
3
7
6
Lemma
Let F = (0, 1, …, k-1) and c, d CF .
• If c d then c d CF.
• We call c d reducible.
1
0
1
3
5
4
2
6
7
2
4
5
1
3
7
6
irreducible
reducible interval
Analysis
• We have K n(n-1)/2 common intervals,
and I<n irreducible intervals.
• Find all K common intervals of k 2
permutations of [n]:
O(kn + K) time & O(n) space
Common Intervals of Trees
Let T,T1,…,Tk be trees with vertex set [n].
Definition:
• S [n] is interval of T iff
T[S] connected, and |S|>1
• S [n] is common interval of T1,…,Tk, iff
S is interval in all trees.
• Tree intervals generalize intervals of
permutations.
Miscellaneous
Example:
1
2
3
4
2
1
5
3
5
T1
4
T2
common intervals of T1, T2: { [2], [3], [4], [5] }
• (Common) Intervals in trees are induced
subtrees.
Structure of Tree Intervals
• Tree intervals have the Helly property, i.e.
for any family of tree intervals (Ti)iI, the
assumption Tp Tq for every p,qI
implies iI Ti
Extreme Cases
n-vertex stars Sn-1
# non-trivial induced subtrees: 2n-1-1
The Common Interval Graph
• Given T = (T1,…,Tk ) and corresponding
common intervals CT. The common
interval graph GT = (V,E) is the graph with
V = CT
E = {(c,d) | c,d CF, cd , c d}
Example
• V=[n], T=(Pn, Sn-1)
2
1
2
3
4
1
3
4
• We have
CT = { [2],[3],…,[n] },
GT = K(CT).
[2]
[n]
[3]
[4]
GT
Common Interval Graphs cont’d
A graph is called chordal, if it does not
contain an induced cycle Cn on n>3
vertices.
Proposition: Common interval graphs of
trees are chordal graphs.
Irreducible Common Intervals
For a common interval c CT and a subset V CT
we say that V generates c, iff
1
i. for each d V, d c
3
5
ii. c = Ud
2
4
iii. GT[V] is connected.
6
7
If there is no such V then c is irreducible.
The irred. intervals generate all common intervals.
Finding Irreducible Intervals
• We have K < 2n-1 common intervals, and
I<n irreducible intervals.
• Find all irreducible common intervals of k
trees on n vertices:
O(kn2) time & O(kn) space
Finding Irreducible Intervals
• Irreducible intervals are minimal common
intervals containing an adjacent vertex
pair.
z
x y
m
l
x y l
z
m
m
l
l
y
x
m
z
x
y
z
Graph Intervals
G=(V,E), undirected, connected graph, V=[n]
S V is interval (convex), iff the induced subgraph
G[S] is connected, and includes every shortest
path with end-vertices in S.
1
2
1
3
2
3
4
4
convex
NOT!
Common Intervals of Graphs
Let G=(G1,…,Gk) family of connected
undirected graphs, with vertex set [n].
Definition: S [n] is common interval of G,
iff S is interval in all graphs.
• Graph intervals generalize tree intervals.
1
G0
2
2
3
4
G1
4
1
3
Some Differences
• The union of convex sets is NOT always
convex.
Some Differences
• The common convex hull of an adjacent
vertex pair is NOT always irreducible.
3
3
1
2
G1
1
2
G2
Finding Irreducible Graph Intervals
Sketch: Given G=(G0, G1, …, Gk-1)
For each edge (i,j)Ei* do
S(i,j) := {i,j}
For each (k,l)S(i,j)
Add vertices ‘between’ k and l to S(i,j)
Remove reducible intervals
Extreme Cases
Permutations (identical permutations):
• C n(n-1)/2
I<n
Trees (identical star-trees):
• C < 2n-1
I<n
Graphs (complete graphs):
• C < 2n
I n(n-1)/2
Example: InterDom
Database of protein domain interactions.
• Gene fusions
• Protein-protein interactions (DIP & BIND)
• Protein complexes (PDB)
Comparing Two Networks
irreducible intervals
2500
2000
DIP
1500
BIND
1000
PDB
Gene Fusion
500
0
Gene
Fusion
PDB
BIND
DIP
616
814
836
BIND
134
100
PDB
636
Gene Fusion
636
DIP
836
100
814
134
616
Comparing Three Networks
irreducible intervals
250
200
150
100
50
0
GPB
G : Gene fusion
P : PDB
B : BIND
D : DIP
GPD
GBD
PBD
Irreducible Intervals
size of irreducible interval
Biological Meaningful?
regulator of chromosome condensation
RAS family domain
protein kinase
ankyrin repeat
PH domain
THANK YJU!!!