Download 1 2 , 3 4 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Community fingerprinting wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Metabolism wikipedia , lookup

Western blot wikipedia , lookup

Two-hybrid screening wikipedia , lookup

SR protein wikipedia , lookup

Protein wikipedia , lookup

Genetic code wikipedia , lookup

Biochemistry wikipedia , lookup

Point mutation wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Construction of Phylogenetic Trees
Walter M. Fitch and Emanuel Margoliash
Science, New Series, Volume 155, Issue 3760(Jan. 20, 1967), 279-284
Speaker : Fang-Ling Lin
Advisor : Prof. R.C. T. Lee
National Chi-Nan University
1
Outline
Basic nouns
Construct phylogenetic tree
Analyze the phylogenetic tree
Reconstruction of the ancestral cytochrome c
amino acid sequences.
2
Introduction
 Biochemists have attempted to use
quantitative estimates of variance between
substances obtained from different species to
construct phylogenetic trees.
 These methods have not been completely
satisfactory because
1. restricted
2. accuracy
3. mathematical
3
What is cytochrome c?
Cytochrome c is a protein that participates in
the metabolism of the mitochondrion .
It will move from the mitochondrion to the
cytoplasm and the cell will die.
4
Determining the Mutation Distance
The mutation distance : The minimal number
of nucleotides that would need to be altered in
order for the gene for one cytochrome to code
for the other.
ACTGAT
A C T G AT T C T - AT C
TCTATC
5
Problem
Given:
Output: phylogenetic tree
6
The construction of the tree
 Assume there are proteins, A, B and C, and
their mutation distances.
A
B
B
24
C
28
32
 There are two fundamental problems:
1. Which pair does one join together first?
2. What are the lengths of edges a, b, and c?
7
Which pair does one join together first ?
It is simply by choosing the pair with the
smallest mutation distance.
A
B
B
24
C
28
32
A
B
C
8
What are the lengths of legs a, b, and c?
A
B
B
24
C
28
32
a+b=24
a+c=28
b+c=32
c
a
b
A
B
C
a=10
b=14
c=18
9
When information from more than three
proteins is utilized
When information from more than three
proteins is utilized, the basic procedure is the
same.
One then simply joins two subsets to create a
single subset.
Until all proteins are members of a single
subset.
10
Example: 5 proteins
1
1,2
1
2
3
4
5
0
1
13
17
16
0
12
16
15
0
10
8
3
0
1
4
2
3
4
5
0
c=14.33
a=1
b=0
1
2
3,4,5
1,2
0
3
4
5
(13+12)/2 (17+16)/2 (16+15)/2
=12.5
=16.5
=15.5
0
10
8
0
1
5
0
a+b=1
a+c=(13+17+16)/3=15.33
b+c=(12+16+15)/3=14.33
a=1
b=0
c=14.33
11
Example: 5 proteins
1,2
1,2
3
4,5
0
12.5
(16.5+15.5)/2
=16
0
(10+8)/2
=9
3
1
1
0
2 , 3
4,5
0
c=12
a+b=1
a+c=(16.5+10)/2=13.25
b+c=(15.5+8)/2=11.75
a=1.25
4
b=-0.25
5
a=1.25
b=-0.25
c=12
12
Example: 5 proteins
1,2
1,2
3,4,5
0
(12.5+16)/2
=14.25
3,4,5
0
a+b=9
a+c=12.5
b+c=16
c=9.75
b=6.25
a=2.75
1
1
1.25
0
2
3
a=2.75
b=6.25
c=9.75
4
-0.25
5
13
Example: 5 proteins
1,2
1,2
3,4,5
0
14.25
3,4,5
0
((x+1.25)+(x-0.25))/2=6.25
x=5.75
c=9.75
y=9.25
x=5.75
2.75
1
1
b=6.25
1.25
0
2
3
((y+1)+(y+0))/2=9.75
y=9.25
4
-0.25
5
14
Testing Alternative Trees
In this method, the output is generated by input,
and the results are the same by using the same
input every time.
Since a particular assignment of species to A
and B subsets defines a tree, thus different
assignments of species to A and B produce
different trees. Check this out.
Fig. 1 is the best of 40 phylogenetic trees.
15
Phylogenetic Tree of 20 species
•Back 1
•Back 2
Fig.1
16
Reconstructed distances
j
reconstruct value
i
original
input
 Values in the upper right half of the table are
reconstructed distances found by summing the leg
lengths in Fig.1.
17
Standard deviation
the percentage of change from the input data
standard deviation:
summed over all values of i<j
18
The statistically optimal tree
In testing phylogenetic alternatives, one is
seeking to minimize the percent “standard
deviation.”
Fig.1 has a percent “standard deviation” of 8.7,
the lowest of the 40 alternatives so far tested.
The percent “standard deviation” for the initial
tree was 12.3.
19
The statistically optimal tree
20
Fig.1 is remarkably like that constructed in
accord with classical zoological comparisons.
Almost all the alternative phylogenetic
schemes tested involved rearrangements with
the groups birds (turkey, chicken) and
nonprimate mammals (cow, sheep, pig).
21
Three noticeable deviations
 Birds of flight (Neognathae) and penguin
(Impennae)
 Kangaroo v.s. nonprimate mammals and
placental mammals v.s. marsupials
 The turtle appears more closely associated
with the birds than to its fellow reptile the
rattlesnake.
Fig.1
22
 Indeed, from any phylogenetic ancestor, today’s
descendants are equidistant with respect to time but
not equidistant genetically.
 The method indicates those lines in which the gene
has undergone the more rapid changes.
 For example, The mutation distance between
mammals and primates is 7.5 and that between
mammals and non-primates is 5.8. The change in the
cytochrome c gene has been much more rapid in the
descent of the primates than in that of the other
mammals.
Fig.1
23
Reconstruction of the ancestral cytochrome c
amino acid sequences.
The procedure is dependent upon the
phylogenetic tree on which these sequence
data are arranged.
24
Amino acid No.
Ancestral
Mammal
Ancestral
Primate
Monkey
Man
---------Kangaroo
---------Rabbit
---------Dog
Ancestral
Ungulate
Pig
Ancestral
Perissodactyl
Donkey
Horse
17 18 21 39 41 50 52 53 56 64 66 68 89 94 95 98 109
V Q L H U P O S A E Y A L
W M
W M
W M
V Q
V Q
V Q
V Q
V Q
V Q
S
S
S
L
L
L
L
L
L
H
H
H
H
N
H
H
H
H
U
U
U
U
W
U
U
U
U
P
P
P
P
P
P
V
P
P
O
O
F
F
F
F
F
O
F
S
S
S
S
S
S
S
S
S
L
L
L
A
A
A
A
A
A
E
E
W
E
W
E
E
E
E
Y A
Y A
Y A
Y A
Y A
Y A
Y A
Y G
Y G
V
V
V
L
L
L
L
L
L
I G L N
I
I
I
I
I
I
I
I
E
G
G
G
Y
Y
Y
A
Y
Y
L
L
L
L
L
L
L
L
L
N
N
N
N
N
N
N
N
I
V Q L H U P F S A E Y G L I Y Q N
V Q L H U P F S A E Y G L I Y Q N
V Q L H U P F S A E I G L I E Q N
V Q L H U P F S A E I G L I E Q N
V Q L H U P F E A E I G L I E Q N
25
There is presently no detectable relationship
between the primary structures of cytochrome
c and those of hemoglobins. The
reconstruction and comparison of the ancestral
amino acid sequences may reval a homology
that cannot be detected in present-day proteins.
The employment of such ancestral sequences
may be generally useful for detecting common
ancestry not otherwise observable.
26
Thank you !
27