* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PhyloPat2 - Department of Computing Science
Vectors in gene therapy wikipedia , lookup
Oncogenomics wikipedia , lookup
Human genome wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Gene therapy wikipedia , lookup
DNA barcoding wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Essential gene wikipedia , lookup
Public health genomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene desert wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Koinophilia wikipedia , lookup
Genomic imprinting wikipedia , lookup
Pathogenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene expression programming wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Minimal genome wikipedia , lookup
Genome (book) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome evolution wikipedia , lookup
Designer baby wikipedia , lookup
PHYLOPAT: AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Tim Hulsen et.al., Nucleic Acids Research, 2009, Vol. 37, Database issue Presenter: Reihaneh Rabbany Presented in Bioinformatics Course (CMPUT 606), Instructed by Prof. Guohui Lin, Computing Science Department, University of Alberta, Winter 2009 INTRODUCTION Phylogenetic patterns Show the presence or absence of certain genes in a set of whole genome sequences Can be used to determine sets of genes that occur only in certain evolutionary branches More Common as increasing amounts of orthology data have become available Phylogenetic Patterns Search tools are available for querying proteins, but not for querying genes 2 PHYLOPAT PhyloPat is a database which offers the possibility of querying the Ensembl database using any phylogenetic pattern Functionalities : Gene neighborhood view Anticorrelating patterns Support of Entrez ‘ Gene IDs Direct sequence retrieval of members of a phylogenetic lineage 3 ENSEMBL Human genome 3 billion base-pairs 35,000 genes The genome alone is of little use Locations and relationships of individual genes Manual annotation Ensembl Ensembl (freely accessible) Sequence data is fed into a software "pipeline“ Creates a set of predicted gene locations Saves them in a MySQL database Originally focus on Human Now includes mouse, fruitfly, zebrafish, plants, fungi, … 4 PHYLOPAT - DATABASE CONTENT A set of phylogenetic lineages Complete set of orthologies Collected All 39 species’ genes in Ensembl 741 species pairs 815 452 genes 19 010 478 orthologous relationships 11 446 546 one-to-one 4 588 300 one-to-many 2 975 632 many-to-many Ensembl ortholog detection pipeline Similarity values by Best reciprocal hits and best score ratio (WU BLASTP) Graph of gene relations and Clustering Multiple alignment (MUSCLE ) Phylogenetic tree (TreeBeST ) Orthologous relationships 5 PHYLOPAT - DATABASE CONSTRUCTION Generating phylogenetic lineages Determining evolutionary order Using the NCBI Taxonomy Phylogenetic tree Phylogenetic lineages For each gene in the first species Look for orthologs in the other species Add all orthologs to the phylogenetic lineage Check for orthologs themselves, until no additional orthologies were found for any of the genes Repeat for all genes in all 39 species that were not yet connected to any phylogenetic lineage 6 WEB APPLICATION A web interface Query the PhyloPat MySQL database Phylogenetic lineages Phylogenetic patterns 7 OMNIPRESENT - OLIGOPRESENT POLYPRESENT GENES Omnipresent Genes present in all 39 species 688 omnipresent genes Which most likely have important functions, since they are present in all species. Oligopresent Genes that exist in only one or two species phylogenetic pattern ‘11111111111111111111111111111111111111’ (or MySQL regular expression ‘^1+$’) Which species are evolutionary most related Polypresent Genes that are missing in only one or two species Measure for evolutionary relatedness 8 ANTICORRELATING PATTERNS Patterns that are exactly opposite Phylogenetic lineages with anticorrelating patterns can be functionally completely different, but could also be highly similar in function ‘000000000000000010111001111001111110010’ ‘111111111111111101000110000110000001101’ These genes can be analogous i.e. performing a similar function without being evolutionary related. 9 GENE NEIGHBORHOOD Inferring ‘true’ orthology Orthologous conservation of gene neighborhood Human gene ENSG00000134398 Has two predicted orthologs in chimpanzee: gene ENSPTRG00000007893 gene ENSPTRG00000009535 Only correspond to the gene neighborhoods of gene ENSPTRG00000007893, for nine of the nearest neighbors Inferring functional annotation Build hypotheses about the processes or pathways that genes might be involved in 10 FASTA-FORMAT SEQUENCE FILES Both the pattern search output and the gene neighborhood view contain links to FASTA files of the peptide sequences 11 DISCUSSION AND CONCLUSION PhyloPat is useful in Orthology detection Evolutionary studies Gene annotation Complex Queries It is possible to determine A species set that should be included (1), A species set that should be excluded (0) A species set which presence is indifferent (*) Using of regular expression queries Easy-to-use web interface Relies only on one database (Ensembl) 12 DISCUSSION AND CONCLUSION (CONT.) Gene neighborhood view Locating evolutionary-related genomic clusters of genes Detecting the ‘true orthologs’ within large sets of predicted orthologs Functional annotating less well known genes PhyloPat will be updated with each major Ensembl release to ensure up-to-date and reliable phylogenetic lineages (species added) 13 LINEAGE INFORMATION OF PP000255 14 QUESTIONS 15