Download ICBEnzyEvol

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

RNA-Seq wikipedia , lookup

Citric acid cycle wikipedia , lookup

Non-coding DNA wikipedia , lookup

Digestion wikipedia , lookup

Gene wikipedia , lookup

Multilocus sequence typing wikipedia , lookup

Metalloprotein wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Lipid signaling wikipedia , lookup

Molecular ecology wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Evolution of metal ions in biological systems wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Restriction enzyme wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Point mutation wikipedia , lookup

Biochemistry wikipedia , lookup

Metabolism wikipedia , lookup

Proteolysis wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Enzyme wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
IAENG_IMECS_ICB II, Room E
10:45~13:00, March 21, 2007, Hong Kong
Pseudo-Reverse Approach in Genetic Evolution:
An Empirical Study with Enzymes
Sukanya Manna
Cheng-Yuan Liou
National Taiwan University
Department of Computer Science and
Information Engineering
• NTU land size ~ 360 平方公里
• huge botanic garden in high mountains>3000meters
台大扁泥蟲
• eleven colleges,
• 54 departments,
• 96 graduate institutes (which offer 96 Master's
programs and 83 doctoral programs),
research centers: the Division of Population and
Gender Studies, the Center for Condensed Matter
Sciences, the Center for Biotechnology, Japanese
Research Center, and the Biodiversity Center.
• The number of students reached 29,877 in 2004,
including the students from the division of
2
Continuing Education & Professional development
Concepts used
• Under neutral evolution
– Rate of synonymous substitution = Rate of
Nonsynonymous substitutions
– Estimation of rate of synonymous and
nonsynonymous substitutions has become an
important subject in molecular evolution
3
Why?
• ‘Draft’ theory: initial and intuitive evolution
model
• Part of evol based on a set of core systems.
• They are relatively invariant (hard and strong)
over evolution.
• Qualitative changes occur as distinct systems are
integrated.
• Separate systems conjoin to produce distinctively
patterns of evol change.
• This model provides evol flexiblity.
4
Assumptions
• For comparative genomics
– nondistantly related species like human and
mouse share the vast majority of their genes
• amino acid sequences obtained for each
enzymes share a great similarity like
homologous genes
5
Our Approach
Amino acid sequences for each
enzyme proteins.
Least Mismatch between two aa
sequences, and selection of trio
Overview
of the
steps
undertaken
Generating the nucleotide (nt)
sequences for the aa
sequences from the trio.
Perform dn/ds ratio test
among the pair of species
with randomly generated nt
sequences.
6
Our Approach (contd.)
AATGATTGTCAAGAGCAT AAG TTT TAT
AA to nt
Nt to AA
NDCQEHKFY
R
AATGATTGTCAAGAGCAT AAG TTT TAT
E
AACGATTGCCAAGAACAT AAG TTT TAT
V
AATGACTGTCAGGAGCAC AAG TTC TAT
E
…
R
…
S
All possible
combinations,
Infeasible,
High space and
time complexity
…
E
7
Basic Concepts
• Nucleotides
– A,G,T,C (DNA)
– A,G,U,C(mRNA)
• Amino acid
– 20 naturally occurring
– Coded by a triplet of
nucleotide bases (referred
as a codon)
• Synonymous/Nonsynony
mous substitution
– A substitution of a base
within the codon that
does not / does change
the type amino acid it
represents.
43=64 codons code for 20 amino acids
3 of the 64 codons are stop codons
that marks the end of a gene section
8
(ie. end of exon)
Model Used
• Jukes and Cantor (one parameter method)
– Assumes rate of substitution between all pairs
of A,T,C,G
is4 the
same.
3
d  
ln( 1 
p)
4
3
–
where p is either ps or pn (result is ds and dn
respectively)
• ps = Sd/S
• pn = Nd/N
• Sd / Nd – total # of synonymous / nonsynonymous
difference for all codons compared
• S / N – numbers of synonmous / nonsynonmous
9
Our Approach (contd.)
•
•
•
•
•
•
Normally, we have seen that the amino acids sequences
are obtained from nucleotide sequences by using the
universal genetic mapping table.
Generating the nucleotide sequences from the amino
acid sequences is a concept of reverse process.
For a particular amino acid sequences, there can be
numerous nucleotide sequences for all the possible
combination of codons.
But generation of all sequences is infeasible because of
very large time and space complexity.
We use here this reverse mechanism, to match the
closely related nucleotide sequences of the respective
amino acids.
The next slide will show, what method we have used to
proceed with this situation.
10
Our Approach (contd.)
40000000
30000000
20000000
10000000
C
UA
G
GU
U
GU
AC
C
U
AG
UC
C
CG
C
AG
A
CC
C
CA
A
CU
A
AA
U
G
UU
A
AA
A
UA
G
U
AU
GG
G
U
GG
GA
U
GA
GC
G
0
GC
U
Frequency of Codons
Comparison of Frequency of Codons
Codons
Human
Calculated the total
frequency of
codons from each
genome
Mouse
Rat
Calculated cumulative
probability of the
codons from these
frequencies
11
Our Approach (contd.)
• Generated the random sequences using the
cumulative probability:
– Best matched pairs
• Generate sequences for trio
– All pairs with least mismatch
• Generate sequences only with the all pairs
12
Our Approach (contd.)
A = [a1, a2,…an]
aa sequences for HUMAN
B = [b1, b2,…bm]
aa sequences for MOUSE
C = [c1, c2,…ck]
aa sequences for RAT
Calculate all
possible
mismatch
between AB,
BC and CA
a1b1, a1b3, a2b2, a1r2, a2r5, a1r1, b1r1, b1r2, b2r6
aa sequences
with least
mismatch
a1b1r2 is the trio
Selecting the best
matched pair
Choose randomly such
that three pairs will be:
a1b1, b1r2 and a1r2
13
Our Approach (contd.)
•
•
•
•
•
•
Least mismatch means maximum similarity in
their sequences.
Let A, B, C be the amino acid sequences for
human, mouse and rat respectively.
We compare the two sequences with one amino
acids at a time.
Calculated the possible mismatches between all
sequences .
Separated out the ones with least mismatch.
Here the example is shown for the amino acid
sequences for one particular enzyme.
14
Our Approach (contd.)
• Generalized algorithm
–
–
–
–
Pathway analysis by model of Nei and Gojobori
No transition matrix used here
No phylogenetic tree for codon comparison
Sliding buffer of 3 characters used for codon
comparison.
– Used Jukes and Cantor’s model for multiple
nucleotide substitution correction.
15
Our Approach (contd.)
AATGATTGTCAAGAGCAT AAG TTT TAT
AATGACTGTCAGGAGCAC AAG TTC TAT
Sliding buffer compares
codons for each
sequences each time
Use Nei and
Goobori’s model to
calculate the pathways
and Jukes and
Cantor’s model to get
16
dn/ds.
Experimental Results
(Best matched pairs)
Pyruvate
oxidase
(392)
Pyridoxal
phosphatase
(241)
Glucose-6phosphatase
(357)
Carboxyleste
rase (563)
Transaldolas
e (337)
Catalase
(526)
Acid
phosphatase
(157)
5
4
3
2
1
0
Glutamate
dehydrogen
ase (558)
dn/ds Ratio
Comparison of dn/ds Ratio for the Enzymes found in all three Species
Enzymes
HM
MR
HR
dn/ds Ratio of the Human-Mouse, Mouse-Rat and Human-Rat
Comparison for the Enzymes Common in all.
Numbers in brackets is the length of sequence compared.
17
Experimental Results (contd.)
(Best matched pairs)
Oligopepti
dase A
(686)
Malate
dehydroge
nase
(333)
Aminopep
tidase
(965)
Trypsin
(246)
20
15
10
5
0
Peroxidase
(223)
dn/ds Ratio
dn/ds Ratio for the Enzymes Not Common for Human-Mouse and Human-Rat
Respectively
Enzymes
HM
HR
dn/ds Ratio of Human-Mouse and Mouse-Rat Comparison for the
Enzymes not Common in them.
18
Experimental Results (contd.)
(Best matched pairs)
Glucose
dehydrogenase
(493)
Aldehyde
oxidase
(1334)
Pyruvate
carboxilase
(622)
Lysophospholi
pase (230)
Hexokinase
(298)
Lipase (137)
14
12
10
8
6
4
2
0
Lactate
dehydrogenase
(332)
dn/ds Ratio
dn/ds Ratio for Mouse-Rat Comparison
Enzymes
Valid dn/ds Ratio of the Mouse-Rat Comparison for the Enzymes
found only in these two species but not Human
19
Experimental Results (contd.)
Pyruvate
oxidase (392)
Pyridoxal
phosphatase
(241)
Glucose-6phosphatase
(357)
Hexokinase
(298)
Catalase (526)
Acid
phosphatase
(157)
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Glutamate
dehydrogenase
(558)
dn/ds Ratio
(All pairs with least mismatch)
Enzymes
HM
HR
MR
dn/ds Ratio of the Human-Mouse, Mouse-Rat and Human-Rat
Comparison for the Enzymes Common in all.
This graph shows the enzymes with only one least
mismatch sequence pair for each species pair.
20
Experimental Results (contd.)
(All pairs with least mismatch)
5
0.12
4
dn/ds Ratio
dn/ds ratio
0.1
0.08
0.06
0.04
3
2
1
0.02
0
0
1
2
3
No. of genes for each case
HM
HR
MR
Transaldolase
4
1
2
3
No. of genes for each case
HM
HR
MR
Carboxylesterase
For all three species comparison, enzymes with more than one least mismatch.
dn/ds ratio of human-mouse, mouse-rat and human-rat comparison
for the enzymes common in all. The graphs show the enzymes with
multiple least mismatch sequence pair for each species pair.
The label in x-axis indicates the sequence pair number and is 21
insignificant.
Experimental Results (contd.)
1
0.156
0.99
0.154
dn/ds Ratio
dn/ds Ratio
(All pairs with least mismatch)
0.98
0.97
0.96
0.152
0.15
0.148
0.146
0.95
0.144
1
2
Different gene pair
Trypsin
3
1
2
Different gene pairs
Alkaline phosphatase
Enzymes found only for Human-mouse comparison
22
Experimental Results (contd.)
(All pairs with least mismatch)
3.195
0.25
3.19
dn/ds Ratio
dn/ds Ratio
0.2
0.15
0.1
3.185
3.18
3.175
0.05
3.17
0
Lactate
dehydrogenase
Lysophospholipase
Tyrosine
Pyruvate carboxylase
1
2
Different gene pairs
Enxymes
Aldehyde oxidase
Enzymes found only for Mouse-rat comparison
23
Experimental Results (contd.)
(All pairs with least mismatch)
40
35
dn/ds Ratio
30
25
20
15
10
5
0
Ribonuclease
Oligopeptidase-A
Tyrosine
Enzymes
Enzymes found only for human-rat comparison
24
Experimental Results (contd.)
(All pairs with least mismatch)
Estimated time for aa substitution per for the enzymes
25
Experimental Results (contd.)
Pyruvate
oxidase
Pyridoxal
phosphatase
Glucose-6phosphatase
Carboxylesteras
e
Transaldolase
Catalase
Acid
phosphatase
500
400
300
200
100
0
Glutamate
dehydrogenase
Time in Myr
(All pairs with least mismatch)
Enzymes
HM
HR
MR
Estimated time for aa substitution per for the enzymes
common in all three species
26
Summary
• Rate of synonymous substitution varies
considerably from gene to gene
• Many enzymes, inspite of being proteins in
nature, do not provide the valid results
• Accuracy rate is about 50% to 55%.
• Nonsynonymous sites were too high for
some cases, so no valid result.
27
Summary
(contd.)
• In cases of enzymes, the variation is high in
comparison to the ordinary proteins as
mentioned in the case study with ordinary
proteins by Prof Li.
• Enzymes possess restoration capability after
chemical reactions, that means it can resist
many mutations.
28
Summary
(contd.)
• Here, in this work, estimated time for
mutation is around 5 times more (~400
Myr).
• We can say that they are 5 times stronger
than ordinary proteins.
29
Summary (contd.)
Li’s Approach
Enzymes
Our Approach
Codons
compared
(H-M/R)
dn/ds
ratio
Codons
compared
(H-M)
dn/ds
ratio
Codons
compared
(H-R)
dn/ds
ratio
Aldolase A
363
0.03
363
0.10
363
NVR
Creatine kinase M
380
0.06
381
0.10
381
0.10
Lactate dehydrogenase A
331
0.02
332
0.50
332
0.53
Glyceraldehyde-3phosphate dehydrogenase
332
0.09
332
NVR
332
NVR
Glutamine synthetase
371
0.08
372
0.10
372
0.11
Adenine
phosphoribosyltransferase
179
0.19
179
NVR
179
NVR
Carbonic anhydrase I
260
0.26
259
NVR
259
0.26
Comparison between already Established Result and Our Approach
(NVR – No Valid Results, H-Human, M-Mouse, R-Rat)
30
Summary
•
•
•
•
•
•
•
(contd.)
None of the values can be considered to be accurate.
All may vary with the parameters or the assumption
taken into account.
We can just observe the nature of selection – whether
neutral or purifying or diversifying.
In this table, the variations have occurred , but we don’t
know which pair of genes have been taken by Prof Li.
For our case, the random sequence generated might have
varied a lot from what the nucleotide sequence for that
gene should have been originally.
NVR means- not valid result.
In these cases the ratio could not be calculated as the
value of ds obtained was not a valid number that could
be computed.
31
Thank You
Suppl. Materials in website.
Evol model is Hairy model.