Download CG5a-NJ

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Spanning Tree Protocol wikipedia , lookup

Transcript
Computational Genomics 5a
Distance Based Trees
Reconstruction (cont.)
.
Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
Phylogenetic Trees - Methods
There are several methods with which we construct trees
and estimate how good a tree describes the data (and thus
the evolution process)
• Distance based methods
• Parsimony
character based methods
• Likelihood
• Whole genome/proteome methods
2
Additive Distances
We say that a distance metric D on L objects is additive
if there is an unrooted binary tree on L leaves, with
positive edge weights, that realizes the distance D.
Namely for all i,j, D(i,j)=DT(i,j)
3
Characterizing Additive Distances
An additive distance is fully characterized by the four point
condition: Any 4 points can be renamed such that
d ( x, y)  d (u, v)  d ( x, u)  d ( y, v)  d ( x, v)  d ( y, u)
4
Trees from Additive Distances: Algorithm
•Verify that the distance matrix constitutes an additive metric
•Choose a pair of objects, which results in the first path in the tree.
•Choose a third object and establish the linear equations to let the object branch off the path.
•Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object
is inserted at.
1. If the new path branches off an existing branch in the tree: Do the insertion step once more,
replacing one of the two original leaves by another leaf along the branching path.
2. Once the new path branches off an edge in the tree, this insertion is finished.
A B C D E
A 0 2 7 4 7
B
C
D
E
0
7
0
4
7
0
A
7
C
7
6
7
0
5
Trees from Additive Distances: Algorithm
•Verify that the distance matrix constitutes an additive metric
•Choose a pair of objects, which results in the first path in the tree.
•Choose a third object and establish the linear equations to let the object branch off the path.
•Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object
is inserted at.
1. If the new path branches off an existing branch in the tree: Do the insertion step once more,
replacing one of the two original leaves by another leaf along the branching path.
2. Once the new path branches off an edge in the tree: This insertion is finished.
A B C D E
A 0 2 7 4 7
B
C
D
E
0
7
0
4
7
0
7
6
7
0
A
B
1
1
6
C
X
6
Trees from Additive Distances: Algorithm
•Verify that the distance matrix constitutes an additive metric
•Choose a pair of objects, which results in the first path in the tree.
•Choose a third object and establish the linear equations to let the object branch off the path.
•Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object
is inserted at.
1. If the new path branches off an existing branch in the tree: Do the insertion step once more,
replacing one of the two original leaves by another leaf along the branching path.
2. Once the new path branches off an edge in the tree: This insertion is finished.
A B C D E
A 0 2 7 4 7
d(A,B)=d(A,X)+d(X,B)
B
C
D
d(B,C)=d(B,X)+d(X,C)
E
0
7
0
4
7
0
7
6
7
0
d(A,C)=d(A,X)+d(X,C)
7
Trees from Additive Distances: Algorithm
•Verify that the distance matrix constitutes an additive metric
•Choose a pair of objects, which results in the first path in the tree.
•Choose a third object and establish the linear equations to let the object branch off the path.
•Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object
is inserted at.
1. If the new path branches off an existing branch in the tree: Do the insertion step once more,
replacing one of the two original leaves by another leaf along the branching path.
2. Once the new path branches off an edge in the tree: This insertion is finished.
A B C D E
A 0 2 7 4 7
B
C
D
E
0
7
0
4
7
0
7
6
7
0
C
A
B
1
1
5
1
2
D
8
Trees from Additive Distances: Algorithm
•Verify that the distance matrix constitutes an additive metric
•Choose a pair of objects, which results in the first path in the tree.
•Choose a third object and establish the linear equations to let the object branch off the path.
•Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object
is inserted at.
1. If the new path branches off an existing branch in the tree: Do the insertion step once more,
replacing one of the two original leaves by another leaf along the branching path.
2. Once the new path branches off an edge in the tree: This insertion is finished.
A B C D E
A 0 2 7 4 7
B
C
D
E
0
7
0
4
7
0
7
6
7
0
C
A
B
1
1
5
1
E
5
2
D
NO!
9
Trees from Additive Distances: Algorithm
•Verify that the distance matrix constitutes an additive metric
•Choose a pair of objects, which results in the first path in the tree.
•Choose a third object and establish the linear equations to let the object branch off the path.
•Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object
is inserted at.
1. If the new path branches off an existing branch in the tree: Do the insertion step once more,
replacing one of the two original leaves by another leaf along the branching path.
2. Once the new path branches off an edge in the tree: This insertion is finished.
A B C D E
A 0 2 7 4 7
B
C
D
E
0
7
0
4
7
0
7
6
7
0
E
3
A
B
1
1
1
2
3
C
2
D
10
Trees from Additive Distances: Algorithm
•Verify that the distance matrix constitutes an additive metric
is this necessary?
•Choose a pair of objects, which results in the first path in the tree.
•Choose a third object and establish the linear equations to let the object branch off the path.
•Choose a pair of leaves in the tree constructed so far and compute the point a newly chosen object
is inserted at.
1. If the new path branches off an existing branch in the tree: Do the insertion step once more,
replacing one of the two original leaves by another leaf along the branching path.
2. Once the new path branches off an edge in the tree: This insertion is finished.
A B C D E
A 0 2 7 4 7
B
C
D
E
0
7
0
4
7
0
7
6
7
0
E
3
A
B
1
1
1
2
3
C
2
D
11
Reconstructing a Tree from an Additive Distance
By algorithm, given a distance matrix constituting an
additive metric, the topology of the corresponding
additive tree is unique.
Q.: Given an additive metric on n leaves, what is the run
time of the algorithm?
A.: Number of phases is n. Work per phase is O(n).
E
So total is O(n2).
A B C D E
3
A 0 2 7 4 7
A
2
1
1
B
0 7 4 7
3
C
D
E
0
7
0
6
7
0
B
1
2
C
D
12
Approximating Additive Metrices
In practice, the distance matrix between molecular sequences
will not be additive. In such case we want to find a tree T
whose distance matrix is “close” to the given one.
The methods for exact tree reconstruction provide an
inventory for heuristics for tree construction based on
approximating additive metrics.
Heuristics give exact results when operating on additive
metrics, but the performance of solutions gets unclear
when non additive metrics are handled.
13
Neighbor Finding
How can we find from distances alone a pair of sisters
(neighboring leaves)?
Closest nodes are not necessarily neighboring leaves.
A
B
C
D
Next, we show a way to find neighbors from distances.
14
Neighbour Joining Algorithm: Outline
• Identify a pair of leaves u,v as neighbors.
• Combine u,v into a new node, w.
• Update the distance matrix: Calculate w’s distance from
any other node x of the tree using
d (w, x)  [d (v, x)  d (u, x)  d (u, v)]/ 2
Notice that all 3 quantities on rhs are known.
• When only 3 nodes are left – compute 3 distances & finish.
15
Neighbour Joining Algorithm
• Identify a pair of neighbors i,j among n leaves.
i
• Combine i,j into a new node u.
m
0.1
• Update the distance matrix.
0.1
k
0.1
l
• When only 3 nodes are left – finish.
0.4
Let ri be the sum of distances
from i to every other node
n
ri   Dij
j 1
The measure between i and j we use
in the algorithm is
0.4
X D i, j   Di , j 
j
n
ri  rj
n2
16
Neighbour Joining Algorithm
Let ri be the sum of distances
from i to all other nodes
n
ri   Dij
i
j 1
m
0.1
0.1
k
The measure between i and j we use in the
algorithm is
X D  i, j   D(i, j ) 
0.4
0.1
l
0.4
ri  rj
n2
j
n
17
Neighbor Finding: Seitou & Nei method
For a leaf i, let ri   D(i, m).
m is a leaf
Definition: Let i, j be two leaves (out of L leaves in T ).
Then their divergence is XD(i, j )  D(i, j )  (ri  rj ) /( L  2)
Theorem (Saitou&Nei) Assume D is additive, and all tree
edge weights are positive. If XD(i,j) is minimal (among all
pairs of leaves), then i and j are sister
T1
taxa in the tree.
T
2
The proof is rather involved, and
will be skipped (no tears pls).
m
l
k
i
j
18
Complexity of Neighbor Joining Algorithm
Naive Implementation:
Initialization: θ(L2) to compute the XD(i,j)’s.
Each Iteration:
 O(L) to update {XD(i,k):i L} for the new node k.
 O(L2) to find the minimal XD(i,j).
m
k
i
j
Total of O(L3).

This can be improved using better data structures
(e.g. heap)
19
Reconstructing Trees from Additive Matrices
Q: Do we have to test additivity before running NJ?
A: By Seito-Nei, if matrix is additive, NJ will
construct the correct tree. Algorithm does not care
about awareness and need not know anything
about the matrix!
A B C D E
A 0 2 7 4 7
B
0 7 4 7
C
0 7 6
D
0 7
E
0
E
3
A
B
1
1
1
2
3
2
C
D
20
Running NJ: Example on 4 Leaves
A
B
C
D
A
0
2
3
6
B
2
0
3
5
C
3
3
0
6
D
6
5
6
0
U
B
A
XD ( A, B )  8.5
XD ( A, C )  8.5
rA  11 rB  10 rC  12 rD  17
Remark: The XD values imply that
the distances are not additive (why?).
XD ( A, D )  8
XD ( B, C )  8
XD ( B, D )  8.5
XD (C , D )  8.5
21
Updated Distance Matrix,
Choosing A,B as Neighbors
V
U
U
C
D
U
0
2
4.5
C
2
0
6
D
4.5
6
0
rU  6.5 rC  8 rD  10.5
Notice that now we have only one
Choice: The neighbors are U and D.
D
B
A
XD (U , C )  5.25
XD (U , D)  6.5
XD (C , D)  3.25
22
Final Distance Matrix
V
C
V
0
5.6
C
5.6
0
V
U
C
D
B
A
Remark: Resulting tree is unrooted.
23
Reconstructing Trees from
non Additive Matrices
Q: What if the distance matrix is not additive?
A: We could still run NJ!
Q: But can anything be said about the resulting
tree?
A: Not really. Resulting tree topology could even
vary according to way ties are resolved on the way.
Remark: This indeed was the case with last example.
24
Almost Additive Matrix
A distance matrix d’ is “almost additive” if
there exists an additive matrix D such that
 l (e) 
| D  D ' | max{| Di , j  D 'i , j |}  min 

e
i, j
 2 
Atteson: If d’ is almost additive with respect to a tree
T, then the output of NJ is a tree T’ with the same
topology as T
25
Distance Matrix Example
26
Unrooted Tree - NJ
Root
27
Output - NJ Tree
Branch length
is proportional
to distance
28
N-J Method produces an Unrooted,
Additive tree
29