* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ICBEnzyEvol
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Citric acid cycle wikipedia , lookup
Non-coding DNA wikipedia , lookup
Multilocus sequence typing wikipedia , lookup
Metalloprotein wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Lipid signaling wikipedia , lookup
Molecular ecology wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Evolution of metal ions in biological systems wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Restriction enzyme wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Point mutation wikipedia , lookup
Biochemistry wikipedia , lookup
Proteolysis wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
IAENG_IMECS_ICB II, Room E 10:45~13:00, March 21, 2007, Hong Kong Pseudo-Reverse Approach in Genetic Evolution: An Empirical Study with Enzymes Sukanya Manna Cheng-Yuan Liou National Taiwan University Department of Computer Science and Information Engineering • NTU land size ~ 360 平方公里 • huge botanic garden in high mountains>3000meters 台大扁泥蟲 • eleven colleges, • 54 departments, • 96 graduate institutes (which offer 96 Master's programs and 83 doctoral programs), research centers: the Division of Population and Gender Studies, the Center for Condensed Matter Sciences, the Center for Biotechnology, Japanese Research Center, and the Biodiversity Center. • The number of students reached 29,877 in 2004, including the students from the division of 2 Continuing Education & Professional development Concepts used • Under neutral evolution – Rate of synonymous substitution = Rate of Nonsynonymous substitutions – Estimation of rate of synonymous and nonsynonymous substitutions has become an important subject in molecular evolution 3 Why? • ‘Draft’ theory: initial and intuitive evolution model • Part of evol based on a set of core systems. • They are relatively invariant (hard and strong) over evolution. • Qualitative changes occur as distinct systems are integrated. • Separate systems conjoin to produce distinctively patterns of evol change. • This model provides evol flexiblity. 4 Assumptions • For comparative genomics – nondistantly related species like human and mouse share the vast majority of their genes • amino acid sequences obtained for each enzymes share a great similarity like homologous genes 5 Our Approach Amino acid sequences for each enzyme proteins. Least Mismatch between two aa sequences, and selection of trio Overview of the steps undertaken Generating the nucleotide (nt) sequences for the aa sequences from the trio. Perform dn/ds ratio test among the pair of species with randomly generated nt sequences. 6 Our Approach (contd.) AATGATTGTCAAGAGCAT AAG TTT TAT AA to nt Nt to AA NDCQEHKFY R AATGATTGTCAAGAGCAT AAG TTT TAT E AACGATTGCCAAGAACAT AAG TTT TAT V AATGACTGTCAGGAGCAC AAG TTC TAT E … R … S All possible combinations, Infeasible, High space and time complexity … E 7 Basic Concepts • Nucleotides – A,G,T,C (DNA) – A,G,U,C(mRNA) • Amino acid – 20 naturally occurring – Coded by a triplet of nucleotide bases (referred as a codon) • Synonymous/Nonsynony mous substitution – A substitution of a base within the codon that does not / does change the type amino acid it represents. 43=64 codons code for 20 amino acids 3 of the 64 codons are stop codons that marks the end of a gene section 8 (ie. end of exon) Model Used • Jukes and Cantor (one parameter method) – Assumes rate of substitution between all pairs of A,T,C,G is4 the same. 3 d ln( 1 p) 4 3 – where p is either ps or pn (result is ds and dn respectively) • ps = Sd/S • pn = Nd/N • Sd / Nd – total # of synonymous / nonsynonymous difference for all codons compared • S / N – numbers of synonmous / nonsynonmous 9 Our Approach (contd.) • • • • • • Normally, we have seen that the amino acids sequences are obtained from nucleotide sequences by using the universal genetic mapping table. Generating the nucleotide sequences from the amino acid sequences is a concept of reverse process. For a particular amino acid sequences, there can be numerous nucleotide sequences for all the possible combination of codons. But generation of all sequences is infeasible because of very large time and space complexity. We use here this reverse mechanism, to match the closely related nucleotide sequences of the respective amino acids. The next slide will show, what method we have used to proceed with this situation. 10 Our Approach (contd.) 40000000 30000000 20000000 10000000 C UA G GU U GU AC C U AG UC C CG C AG A CC C CA A CU A AA U G UU A AA A UA G U AU GG G U GG GA U GA GC G 0 GC U Frequency of Codons Comparison of Frequency of Codons Codons Human Calculated the total frequency of codons from each genome Mouse Rat Calculated cumulative probability of the codons from these frequencies 11 Our Approach (contd.) • Generated the random sequences using the cumulative probability: – Best matched pairs • Generate sequences for trio – All pairs with least mismatch • Generate sequences only with the all pairs 12 Our Approach (contd.) A = [a1, a2,…an] aa sequences for HUMAN B = [b1, b2,…bm] aa sequences for MOUSE C = [c1, c2,…ck] aa sequences for RAT Calculate all possible mismatch between AB, BC and CA a1b1, a1b3, a2b2, a1r2, a2r5, a1r1, b1r1, b1r2, b2r6 aa sequences with least mismatch a1b1r2 is the trio Selecting the best matched pair Choose randomly such that three pairs will be: a1b1, b1r2 and a1r2 13 Our Approach (contd.) • • • • • • Least mismatch means maximum similarity in their sequences. Let A, B, C be the amino acid sequences for human, mouse and rat respectively. We compare the two sequences with one amino acids at a time. Calculated the possible mismatches between all sequences . Separated out the ones with least mismatch. Here the example is shown for the amino acid sequences for one particular enzyme. 14 Our Approach (contd.) • Generalized algorithm – – – – Pathway analysis by model of Nei and Gojobori No transition matrix used here No phylogenetic tree for codon comparison Sliding buffer of 3 characters used for codon comparison. – Used Jukes and Cantor’s model for multiple nucleotide substitution correction. 15 Our Approach (contd.) AATGATTGTCAAGAGCAT AAG TTT TAT AATGACTGTCAGGAGCAC AAG TTC TAT Sliding buffer compares codons for each sequences each time Use Nei and Goobori’s model to calculate the pathways and Jukes and Cantor’s model to get 16 dn/ds. Experimental Results (Best matched pairs) Pyruvate oxidase (392) Pyridoxal phosphatase (241) Glucose-6phosphatase (357) Carboxyleste rase (563) Transaldolas e (337) Catalase (526) Acid phosphatase (157) 5 4 3 2 1 0 Glutamate dehydrogen ase (558) dn/ds Ratio Comparison of dn/ds Ratio for the Enzymes found in all three Species Enzymes HM MR HR dn/ds Ratio of the Human-Mouse, Mouse-Rat and Human-Rat Comparison for the Enzymes Common in all. Numbers in brackets is the length of sequence compared. 17 Experimental Results (contd.) (Best matched pairs) Oligopepti dase A (686) Malate dehydroge nase (333) Aminopep tidase (965) Trypsin (246) 20 15 10 5 0 Peroxidase (223) dn/ds Ratio dn/ds Ratio for the Enzymes Not Common for Human-Mouse and Human-Rat Respectively Enzymes HM HR dn/ds Ratio of Human-Mouse and Mouse-Rat Comparison for the Enzymes not Common in them. 18 Experimental Results (contd.) (Best matched pairs) Glucose dehydrogenase (493) Aldehyde oxidase (1334) Pyruvate carboxilase (622) Lysophospholi pase (230) Hexokinase (298) Lipase (137) 14 12 10 8 6 4 2 0 Lactate dehydrogenase (332) dn/ds Ratio dn/ds Ratio for Mouse-Rat Comparison Enzymes Valid dn/ds Ratio of the Mouse-Rat Comparison for the Enzymes found only in these two species but not Human 19 Experimental Results (contd.) Pyruvate oxidase (392) Pyridoxal phosphatase (241) Glucose-6phosphatase (357) Hexokinase (298) Catalase (526) Acid phosphatase (157) 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Glutamate dehydrogenase (558) dn/ds Ratio (All pairs with least mismatch) Enzymes HM HR MR dn/ds Ratio of the Human-Mouse, Mouse-Rat and Human-Rat Comparison for the Enzymes Common in all. This graph shows the enzymes with only one least mismatch sequence pair for each species pair. 20 Experimental Results (contd.) (All pairs with least mismatch) 5 0.12 4 dn/ds Ratio dn/ds ratio 0.1 0.08 0.06 0.04 3 2 1 0.02 0 0 1 2 3 No. of genes for each case HM HR MR Transaldolase 4 1 2 3 No. of genes for each case HM HR MR Carboxylesterase For all three species comparison, enzymes with more than one least mismatch. dn/ds ratio of human-mouse, mouse-rat and human-rat comparison for the enzymes common in all. The graphs show the enzymes with multiple least mismatch sequence pair for each species pair. The label in x-axis indicates the sequence pair number and is 21 insignificant. Experimental Results (contd.) 1 0.156 0.99 0.154 dn/ds Ratio dn/ds Ratio (All pairs with least mismatch) 0.98 0.97 0.96 0.152 0.15 0.148 0.146 0.95 0.144 1 2 Different gene pair Trypsin 3 1 2 Different gene pairs Alkaline phosphatase Enzymes found only for Human-mouse comparison 22 Experimental Results (contd.) (All pairs with least mismatch) 3.195 0.25 3.19 dn/ds Ratio dn/ds Ratio 0.2 0.15 0.1 3.185 3.18 3.175 0.05 3.17 0 Lactate dehydrogenase Lysophospholipase Tyrosine Pyruvate carboxylase 1 2 Different gene pairs Enxymes Aldehyde oxidase Enzymes found only for Mouse-rat comparison 23 Experimental Results (contd.) (All pairs with least mismatch) 40 35 dn/ds Ratio 30 25 20 15 10 5 0 Ribonuclease Oligopeptidase-A Tyrosine Enzymes Enzymes found only for human-rat comparison 24 Experimental Results (contd.) (All pairs with least mismatch) Estimated time for aa substitution per for the enzymes 25 Experimental Results (contd.) Pyruvate oxidase Pyridoxal phosphatase Glucose-6phosphatase Carboxylesteras e Transaldolase Catalase Acid phosphatase 500 400 300 200 100 0 Glutamate dehydrogenase Time in Myr (All pairs with least mismatch) Enzymes HM HR MR Estimated time for aa substitution per for the enzymes common in all three species 26 Summary • Rate of synonymous substitution varies considerably from gene to gene • Many enzymes, inspite of being proteins in nature, do not provide the valid results • Accuracy rate is about 50% to 55%. • Nonsynonymous sites were too high for some cases, so no valid result. 27 Summary (contd.) • In cases of enzymes, the variation is high in comparison to the ordinary proteins as mentioned in the case study with ordinary proteins by Prof Li. • Enzymes possess restoration capability after chemical reactions, that means it can resist many mutations. 28 Summary (contd.) • Here, in this work, estimated time for mutation is around 5 times more (~400 Myr). • We can say that they are 5 times stronger than ordinary proteins. 29 Summary (contd.) Li’s Approach Enzymes Our Approach Codons compared (H-M/R) dn/ds ratio Codons compared (H-M) dn/ds ratio Codons compared (H-R) dn/ds ratio Aldolase A 363 0.03 363 0.10 363 NVR Creatine kinase M 380 0.06 381 0.10 381 0.10 Lactate dehydrogenase A 331 0.02 332 0.50 332 0.53 Glyceraldehyde-3phosphate dehydrogenase 332 0.09 332 NVR 332 NVR Glutamine synthetase 371 0.08 372 0.10 372 0.11 Adenine phosphoribosyltransferase 179 0.19 179 NVR 179 NVR Carbonic anhydrase I 260 0.26 259 NVR 259 0.26 Comparison between already Established Result and Our Approach (NVR – No Valid Results, H-Human, M-Mouse, R-Rat) 30 Summary • • • • • • • (contd.) None of the values can be considered to be accurate. All may vary with the parameters or the assumption taken into account. We can just observe the nature of selection – whether neutral or purifying or diversifying. In this table, the variations have occurred , but we don’t know which pair of genes have been taken by Prof Li. For our case, the random sequence generated might have varied a lot from what the nucleotide sequence for that gene should have been originally. NVR means- not valid result. In these cases the ratio could not be calculated as the value of ds obtained was not a valid number that could be computed. 31 Thank You Suppl. Materials in website. Evol model is Hairy model.