* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download An update on ongoing projects within Biorange SP3.2.2.1
RNA interference wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genetic engineering wikipedia , lookup
Oncogenomics wikipedia , lookup
Gene therapy wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Essential gene wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Public health genomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Pathogenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene desert wikipedia , lookup
Genomic imprinting wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Minimal genome wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome (book) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Microevolution wikipedia , lookup
An update on ongoing projects within Biorange SP3.2.2.1 Biorange Project Meeting Leiden, September 15 Tim Hulsen Biorange SP3.2.2 User Gene annotation through applications: PhyloPat, BioVenn, OrthoPath, CoPub Knowledge integration CoPub Xref db ArrayExpress db Overview • PhyloPat • Published in BMC Bioinformatics (2006) • Update submitted to Nucleic Acids Res. Database issue • BioVenn • Revised version submitted to BMC Genomics • Orthologous networks & OrthoPath • Manuscript in preparation • CoPub (Taverna workflows) • Published in Nucleic Acids Res. Web Server issue (2008) PhyloPat - Introduction • Phylogenetic patterns show the presence or absence of certain genes in a set of full genomes derived from different species • PhyloPat allows the complete Ensembl gene database to be queried using phylogenetic patterns • Published in september 2006, now new version with: • Ensembl v50 • Support of HGNC and EntrezGene IDs • FASTA-format sequences of the members of a phylogenetic lineage • Gene neighborhood view • http://www.cmbi.ru.nl/phylopat PhyloPat: Update to Ensembl v50 • 39 species, under which model organisms such as C. elegans, D. melanogaster, D. rerio, G. gallus, M. musculus, R. norvegicus, C. familiaris, M. mulatta, and human • In total 814,936 genes • In total 244,114 orthologous groups, created by clustering the orthologous gene pairs predicted by Ensembl PhyloPat: Support of HGNC and EntrezGene IDs Choose form four types of IDs • HGNC-Ensembl mapping for 29 species • EntrezGene-Ensembl mapping for 18 species PhyloPat: FASTA-format sequences “L”: Longest peptide sequences from this orthologous group (only the longest peptide per gene) “A”: All peptide sequences from this orthologous group (all peptides per gene) PhyloPat: Gene neighborhood view • The ‘Gene neighborhood view’ shows all genes from all species in a certain phylogenetic lineage, and all genes in their proximity on the genome (10 genes to both sides) • Neighbouring genes are color-coded according to the orthologous groups they belong to • Gene neighborhood gives information about functional relationships (genes involved in similar processes are often clustered together) • Can be used to find the ‘true’ ortholog from a set of genes, by using not only phylogenetic information but also genomic context PhyloPat: Gene neighborhood view ERN1 and ERN2 can be distinguished by looking At gene context Each cell: - Ensembl Gene ID - PhyloPat ID - HGNC Symbol Overview • PhyloPat • Published in BMC Bioinformatics (2006) • Update submitted to Nucleic Acids Res. Database issue • BioVenn • Revised version submitted to BMC Genomics • Orthologous networks & OrthoPath • Manuscript in preparation • CoPub (Taverna workflows) • Published in Nucleic Acids Res. Web Server issue (2008) BioVenn • Web application to see the overlap between different lists of biological identifiers, using area-proportional Venn diagrams • Support of wide range of IDs, which are recognized and linked to the corresponding database: Affymetrix, COG, Ensembl, EntrezGene, Gene Ontology, InterPro, IPI, KEGG Pathway, KOG, PhyloPat and RefSeq • Optional mapping of Affymetrix and EntrezGene to Ensembl • Output in SVG (with drag-and-drop functionality) or PNG • http://www.cmbi.ru.nl/biovenn/ BioVenn Absolute numbers / percentages Embedded / standalone, SVG / PNG ID mapping BioVenn • Lists for all 13 sets (X total, X only, XY total overlap, XY only overlap, XYZ overlap, etc.) • If type of ID (e.g. Affymetrix, Ensembl) is recognized, output is linked to the corresponding database Overview • PhyloPat • Published in BMC Bioinformatics (2006) • Update submitted to Nucleic Acids Res. Database issue • BioVenn • Revised version submitted to BMC Genomics • Orthologous networks & OrthoPath • Manuscript in preparation • CoPub (Taverna workflows) • Published in Nucleic Acids Res. Web Server issue (2008) Assessing orthologous biology in groups of genes: Application to GC induced insulin resistance Biorange meeting 2008-03-11: Goal: Gain better insight into the conservation of genes involved in glucocorticoid induced insulin resistance (GC induced IR) between human, mouse and rat. Use CoPub to build literature networks, map orthology Validation needed Network Insulin signaling Adipocyte differentiation Fatty acid oxidation/catabolism Cytochrome P450s Jak/Stat/IL6 Dexamethosone & insulin Lipid transport Misc: amino acid metabolism, MAPK signaling, osteoblast Validation approach Get all genes from a KEGG pathway Repeat with varying thresholds Select random 10% of these genes Create Gene Network using these genes (CoPub) Compare with original KEGG pathway Results Pathway ID Hematopoietic cell lineage hsa04640 88 0.04 0.69 0.11 0.53 Jak-STAT signaling pathway hsa04630 153 0.02 0.54 0.20 0.33 Cytokine-cytokine receptor interaction hsa04060 256 0.03 0.51 0.24 0.53 Toll-like receptor signaling pathway hsa04620 90 0.02 0.50 0.17 0.40 Metabolism of xenobiotics by cytochrome P450 hsa00980 70 0.00 0.50 0.43 0.32 Melanoma hsa05218 71 0.02 0.49 0.11 0.20 Renal cell carcinoma hsa05211 69 0.04 0.49 0.06 0.40 VEGF signaling pathway hsa04370 70 0.04 0.49 0.06 0.20 GnRH signaling pathway hsa04912 97 0.03 0.48 0.09 0.13 Endometrial cancer hsa05213 52 0.02 0.48 0.07 0.47 Average TP = 0.24 Average FP = 0.01 # genes in pathway FP_rate TP_rate Pos.Pred.Val Percentage Manual Pos Application to all human genes Non-conserved Create network for each gene with R scaled =30, literature count = 5 6,181 networks with size>2 Calculate average conservation for each network based on conservation for all the genes in the network in 4 species (P.tro.,M.mus.,R.nor.,C.fam.) Get all genes in 100 most conserved networks Get all genes in 100 least conserved networks 211 genes Calculate GO enrichment 309 genes Calculate GO enrichment Compare Term GO:0051704~multi-organism process GO:0006952~defense response GO:0009615~response to virus GO:0051707~response to other organism GO:0007586~digestion GO:0050896~response to stimulus GO:0009607~response to biotic stimulus GO:0006955~immune response GO:0030101~natural killer cell activation GO:0007565~female pregnancy PValue 1.23E-07 1.48E-07 4.47E-06 2.65E-05 7.07E-05 1.04E-04 4.16E-04 6.66E-04 0.003385226 0.003668999 Conserved Term GO:0048856~anatomical structure development GO:0007399~nervous system development GO:0005977~glycogen metabolic process GO:0006073~glucan metabolic process GO:0048731~system development GO:0032502~developmental process GO:0007275~multicellular organismal development GO:0044262~cellular carbohydrate metabolic process GO:0032501~multicellular organismal process GO:0006813~potassium ion transport GO:0048513~organ development GO:0044264~cellular polysaccharide metabolic process GO:0005976~polysaccharide metabolic process PValue 5.80E-05 7.35E-05 9.18E-05 1.02E-04 1.12E-04 1.16E-04 2.67E-04 2.90E-04 4.49E-04 5.58E-04 6.85E-04 7.09E-04 7.94E-04 OrthoPath • OrthoPath is a gene centric search tool for literature networks and their orthologs • Three input methods: • Single gene search: Get the literature network for a given gene. OrthoPath will create a network of genes that are connected to this single gene. • Keyword Search: Get the literature network based on a certain keyword. OrthoPath looks for genes that are connected to the keyword, and creates a network from all these genes. • Multi Gene Search: Get the literature network for a set of genes. OrthoPath creates a network from only these genes that are entered by the user. • http://ws2.grid.sara.nl/cgi-bin/orthopath/op.pl OrthoPath Search with a single gene Search with a keyword Search with a list of genes Set the minimum strength of a co-citation between two keywords Set the minimum number of abstracts in which a co-citation between the 2 genes is found Output in HTML, SVG, Cytoscape or Ingenuity format OrthoPath Each node in the network: - EntrezGene information: ID, symbol, description - number of neighbours - number of orthologs from human (for all five species) Overview • PhyloPat • Published in BMC Bioinformatics (2006) • Update submitted to Nucleic Acids Res. Database issue • BioVenn • Revised version submitted to BMC Genomics • Orthologous networks & OrthoPath • Manuscript in preparation • CoPub (Taverna workflows) • Published in Nucleic Acids Res. Web Server issue (2008) CoPub Taverna Workflows • Taverna: free software tool for designing and executing workflows • Workflow files for CoPub have been developed: (1) Search gene (2) Get literature neighbours CoPub Taverna Workflows (4) Get the complete network (3) Get a list of categories Acknowledgements • Wynand Alkema • Wilco Fleuren • Raoul Frijters • Peter Groenen