* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Presentation @ 3:30
Survey
Document related concepts
Genome (book) wikipedia , lookup
Public health genomics wikipedia , lookup
Koinophilia wikipedia , lookup
Designer baby wikipedia , lookup
Minimal genome wikipedia , lookup
Genome evolution wikipedia , lookup
Pathogenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression profiling wikipedia , lookup
Microevolution wikipedia , lookup
Transcript
Genomic signatures of an aquatic lifestyle: rate variation of orthologous genes from arthropods living in water versus those living on land Prediction of genes involved in influencing terrestrial and aquatic lifestyles in arthropods A Bioinformatics PipeLine Mortha Sharat Kumar Sumit Middha John K. Colbourne May 17th 2007 Overview Problem Statement Background Data Methods Results Future Work References Acknowledgements >Problem Statement Problem Statement Based on the knowledge of : Organismal lineages of arthropods Morphology Habitat diversity Gene sequence data Consider arthropods with aquatic and terrestrial lifestyles, Using just techniques and tools in Comparative Genomics to predict rate variations in Orthologs. Can we - Predict the genes which might have a key role in supporting aquatic or terrestrial lifestyles in arthropods? >Problem Statement Problem Statement A Bioinformatics Pipeline : Have a structured methodology, steps - a Pipeline for future projects. Spend less time and effort on thinking about the correct steps to be followed - Have a fixed methodology. Learn from mistakes. Spend minimal time tweaking the code. Spend more time playing with the data and analyses than spend time on writing code for future projects. Is it a Tool? No. You won`t get all the results on the click of a button. Too many things involved. Programs. Some tweaking necessary based on the data, number of organisms. Problem Statement >Background Background Homologs, Orthologs and Paralogs Homologs A gene related to a second gene by descent from a common ancestral DNA sequence. Superset of Orthologs and Paralogs. Orthologs Orthologs are genes in different species that evolved from a common ancestral gene Result of a Speciation event. Normally, orthologs retain the same function in the course of evolution. Paralogs Paralogs are genes related by duplication within a genome. Paralogs evolve new functions, even if these are related to the original one. Problem Statement >Background Background Homologs, Orthologs and Paralogs http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/orthologs3.gif What we are interested are Orthologs. Problem Statement >Background Background Evolutionary Rates : Rate at which genes evolve in a particular lineage. Measured by the number of amino acid substitution with some underlying algorithm model [ Substitution Model ] Molecular Clock Hypothesis : This postulates that the rate of evolution measured by the amino acid substitution is roughly constant overtime and across different lineages. However the evolutionary rates of some genes are higher/lower across certain lineage groups. What does correlation in Evolutionary Rates mean? Selective forces acting on these genes have been similar between the lineages. What does this mean? Lifestyles / Environment of the organisms. Problem Statement >Background Species Introduced Anopheles gambiae Daphnia magna Apis mellifera Tribolium castenum Drosophila melanogaster Daphnia pulex Caenorhabditis elegans Problem Statement >Background Fruit Fly [Drosophila melanogaster] Lifestyle - Terrestrial Mosquito [Anopheles gambiae] Lifestyle - Aquatic / Terrestrial Beetle [Tribolium casteneum] Lifestyle - Terrestrial Honey Bee [Apis mellifera] Lifestyle - Terrestrial Water Flea [Daphnia magna] Lifestyle - Aquatic Water Flea [Daphnia pulex] Lifestyle - Aquatic Nematode worm [Caenorhabditis elegans] Lifestyle - Terrestrial Problem Statement >Background Phylogeny Problem Statement >Background Phylogeny of the Species Problem Statement >Background Aquatic Genes Ox` Ox`` Ox``` The Ortholog Cluster Ox has Similar Substitution Rates - Similar Evolutionary Rates - Similar Selective Forces acting on them? More closely related to other species. Could they play role in supporting aquatic lifestyle? Problem Statement >Background Terrestrial Genes Oy` Oy`` Oy``` What about Ortholog Cluster Oy? Could they play role in supporting Terrestrial lifestyle? Problem Statement Background Data >Data About the Data :Varied sources. The number of sequences for each organism vary. Annotated amino acids to EST Contigs. Lengths of the sequences differ greatly. Problem Statement Background >Data ….Data - Sequence Lengths in base pairs Problem Statement Background Data >Methods Methods Detect Orthologs All-against-All Criteria Alignments Cleaning of Alignments Evolutionary rate tests Analysis Problem Statement Background Methods - The PipeLine Data >Methods Detect Orthologs RBBH All-against-All Criteria Scripts Alignments TCoffee Cleaning of Alignments Scripts Evolutionary rate tests RRTree Results Analysis Problem Statement Background Methods - RBBH Data >Methods RBBH [Reciprocal Best Blast Hits] - What is it? Proteins from different organisms that are each others top Blast hit. Ax By Gene x from A and gene y from B are orthologs. What if - Ax By Cz Can x, y and z be considered an ortholog cluster? Problem Statement Background Data >Methods Methods - All-Against-All Criteria One protein sequence from each organism is accepted into an Ortholog cluster if each protein has a RBBH from every other Organism. All-against- All is very stringent.Very high confidence in the inferred Orthology. A For 5 Organisms. We have. C B D E Problem Statement Background Data Methods - All-Against-All Criteria No of Organisms >Methods No of Blasts 2 2 3 6 4 12 5 20 6 30 7 42 After checking the All-Against-All Criteria we are left with high confidence ortholog clusters. Problem Statement Background Data >Methods Methods - Alignments and Cleaning Alignments were carried out using TCoffee. The leading and the trailing gaps Do not correspond to Indels. Do not have information associated with them. If the leading and trailing gaps are not clipped? Inaccurate Substitution Rates result. They leading and the trailing gaps have to be clipped Clipped from the start and the end of an alignment when a highly conserved block is encountered. Problem Statement Background Data >Methods Problem Statement Background Data Methods - Alignments and Cleaning >Methods Black - Before trimming, Red - After trimming Problem Statement Background Data >Methods Relative Rate Tests - RRTree What exactly is Relative Rate Tests? Calculates the rate of amino acid/nucleotide substitution across lineages with respect to an outgroup. Problem Statement Background Relative Rate Tests - Models Data >Methods Kimura 2 Parameter - Jukes Cantor Uncorrected Distance Substitution Matrix Base Frequencies? Problem Statement Background Data Methods >Results Results Pairwise Ortholog Distribution between species : Problem Statement Background Data Methods >Results Ortholog Detection Tools Each have their own underlying Algorithm COGs - Clusters of Orthologous Groups OrthoMCL InParanoid KOG - euKaryotic Orthologous Groups The paper Tim Hulsen, Martin A Huygen, Jacob de Vileg and Peter MA Groenen “Benchmarking ortholog identification methods using functional genomics data” Rated InParanoid as the best Ortholog Detection tool. InParanoid is also one of the most widely used tool . Problem Statement Background Data Methods Why not just used a published tool like InParanoid for Ortholog Detection? >Results The benchmarking paper - InParanoid gave the largest number of False Positive. False Positives - Paralogs. Paralogs are undesirable in our study. We are interested in genes with the same function.. RBBH gave the least number of False Positives How did our RBBH method fare when compared to InParanoid? Problem Statement Background Results Data Methods Orthologs clusters present is all -- >Results Drosophila melanogaster Anopheles gambiae Tribolium casteneum Apis mellifera Daphnia pulex 932 - 380 = 552 ~59 % met All-Against-All Daphnia magna Caenorhabditis elegans The 5 species with atleast Daphnia magna or Daphnia pulex 69 1052 - 360 = 692 ~ 65 % met All-Against-All Total Genes to work with = 1244 Problem Statement Background Results Data Methods >Results When considering the all the seven species ~6% of the genes had high similarity in evolutionary rates in Anopheles gambiae and the Daphnia (both Daphnia pulex and Daphnia magna). Aquatic Lifestyle? .. . Problem Statement Background Results Data Methods >Results Now What? - We have Gene IDs See if the genes belong to some gene families? Statistical Tests. GO ! What is Gene Ontology? The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. Problem Statement Background Results Data Methods >Results .. . .. . Problem Statement Background Data Future Work/Project Methods Results Prediction of genes involved in influencing Social behavior in Insects. >Future Work Use the same methodology , the PipeLine The approach would exactly be the same - instead of arthropod species with aquatic and terrestrial lifestyle, the study will have insect species with known social behavioral and non-social behavioral traits. social non-social social Problem Statement Background Data Methods Results References Zdobnov EM, von Mering C ,et al. - Comparative genome and protein analysis of Anopheles gambiae and Drosophila melanogaster. Dirk Steinke, Walter Salzburger, Ingo Braasch and Axel Meyer - Many genes in fish have species specific asymmetric rates of molecular evolution.\newline Future Work >References J. W. Kijas,M. Menzies and A.Ingham - Sequence diversity and rates of molecular evolution between sheep and cattle genes. “Phylogenetic Inference”, Swofford, Olsen, Waddell, and Hillis, in Molecular Systematics, 2nd ed., Sinauer Ass., Inc., 1996, Ch. 11. F. Tajima and M. Nei, Mol. Biol. Evol. 1984, 1, 269. M. Kimura, J. Mol. Evol. 1980, 16, 111.4.K. Tamura, Mol. Biol. Evol. 1992, 9, 678. L. Jin and M. Nei, Mol. Biol. Evol. 1990, 7, 82. M. Kimura, The Neutral Theory of Molecular Evolution, Camb. Uni. Press, Camb., 1983.\ Insights into social insects from the genome of the honeybee Apis mellifera Nature 443, 931-949(26 October 2006). Problem Statement Background Data Methods Results References Alexandre Hassanin (2006). Phylogeny of Arthropoda inferred from mitochondrial sequences: Strategies for limiting the misleading effects of multiple changes in pattern and rates of substitution. Molecular Phylogenetics and Evolution 38: 100 116. Future Work >References Tim Hulsen ,Martijn A Huynen et al, Benchmarking ortholog identification methods using functional genomics data. Joel Savard, Diethard Tautz and Martin J Lercher., Genome-wide acceleration of protein evolution in flies(Diptera), BMC Evolutionary Biology 2006 Cedric Notredame, Desmond Higgins and Jaap Heringa., T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment, JMB 2000 Robinson-Rechavi M, Huchon D., RRTree: Relative-rate tests between groups of sequences on a phylogenetic tree., Bioinformatics 2000, 16, 296-297. Tim Hulsen, Martijn A Huynen, Jacob de Vlieg and Peter MA Groenen : Benchmarking ortholog identification methods using functional genomics data, Genome Biology 2006 Jukes TH, Cantor CR (1969) Evolution of protein molecules. in Munro HN (Ed.) Mammalian protein metabolism. Academic Press, New York 13:2178-2189. Problem Statement Background Acknowledgements Data Methods Results Future Work References >Acknowledgements This could not have been possible without the aid, support, guidance and patience of - John K. Colbourne Sumit Middha The CGB Staff - The Bioinformatics Group & The Genomics Group Thanks to Memo Dalkilic and Haixu Tang for their valuable feedback on the project. Computing Facilities - CGB Special Thanks to my family and friends and Professor Edward L Robertson. Thank You