Download An update on ongoing projects within Biorange SP3.2.2.1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

RNA interference wikipedia , lookup

Epistasis wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genetic engineering wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene therapy wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Essential gene wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Public health genomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Pathogenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene desert wikipedia , lookup

Genomic imprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression programming wikipedia , lookup

Nutriepigenomics wikipedia , lookup

NEDD9 wikipedia , lookup

Gene wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
An update on ongoing projects
within Biorange SP3.2.2.1
Biorange Project Meeting
Leiden, September 15
Tim Hulsen
Biorange SP3.2.2
User
 Gene annotation through
applications: PhyloPat,
BioVenn, OrthoPath, CoPub
Knowledge integration
CoPub
Xref db
ArrayExpress db
Overview
• PhyloPat
• Published in BMC Bioinformatics (2006)
• Update submitted to Nucleic Acids Res. Database issue
• BioVenn
• Revised version submitted to BMC Genomics
• Orthologous networks & OrthoPath
• Manuscript in preparation
• CoPub (Taverna workflows)
• Published in Nucleic Acids Res. Web Server issue (2008)
PhyloPat - Introduction
• Phylogenetic patterns show the presence or absence of certain
genes in a set of full genomes derived from different species
• PhyloPat allows the complete Ensembl gene database to be
queried using phylogenetic patterns
• Published in september 2006, now new version with:
• Ensembl v50
• Support of HGNC and EntrezGene IDs
• FASTA-format sequences of the members of a phylogenetic
lineage
• Gene neighborhood view
• http://www.cmbi.ru.nl/phylopat
PhyloPat: Update to Ensembl v50
• 39 species, under which model organisms such as C. elegans,
D. melanogaster, D. rerio, G. gallus, M. musculus, R.
norvegicus, C. familiaris, M. mulatta, and human
• In total 814,936 genes
• In total 244,114 orthologous groups, created by clustering the
orthologous gene pairs predicted by Ensembl
PhyloPat: Support of HGNC and EntrezGene IDs
Choose form four types of IDs
• HGNC-Ensembl mapping for 29 species
• EntrezGene-Ensembl mapping for 18 species
PhyloPat: FASTA-format sequences
“L”: Longest peptide sequences from this orthologous group
(only the longest peptide per gene)
“A”: All peptide sequences from this orthologous group (all
peptides per gene)
PhyloPat: Gene neighborhood view
• The ‘Gene neighborhood view’ shows all genes from all
species in a certain phylogenetic lineage, and all genes in their
proximity on the genome (10 genes to both sides)
• Neighbouring genes are color-coded according to the
orthologous groups they belong to
• Gene neighborhood gives information about functional
relationships (genes involved in similar processes are often
clustered together)
• Can be used to find the ‘true’ ortholog from a set of genes, by
using not only phylogenetic information but also genomic
context
PhyloPat: Gene neighborhood view
ERN1 and ERN2 can be
distinguished by looking
At gene context
Each cell: - Ensembl Gene ID
- PhyloPat ID
- HGNC Symbol
Overview
• PhyloPat
• Published in BMC Bioinformatics (2006)
• Update submitted to Nucleic Acids Res. Database issue
• BioVenn
• Revised version submitted to BMC Genomics
• Orthologous networks & OrthoPath
• Manuscript in preparation
• CoPub (Taverna workflows)
• Published in Nucleic Acids Res. Web Server issue (2008)
BioVenn
• Web application to see the overlap between different lists of
biological identifiers, using area-proportional Venn diagrams
• Support of wide range of IDs, which are recognized and linked
to the corresponding database: Affymetrix, COG, Ensembl,
EntrezGene, Gene Ontology, InterPro, IPI, KEGG Pathway,
KOG, PhyloPat and RefSeq
• Optional mapping of Affymetrix and EntrezGene to Ensembl
• Output in SVG (with drag-and-drop functionality) or PNG
• http://www.cmbi.ru.nl/biovenn/
BioVenn
Absolute numbers / percentages
Embedded / standalone, SVG / PNG
ID mapping
BioVenn
• Lists for all 13 sets (X total, X only, XY total overlap, XY only
overlap, XYZ overlap, etc.)
• If type of ID (e.g. Affymetrix, Ensembl) is recognized, output is
linked to the corresponding database
Overview
• PhyloPat
• Published in BMC Bioinformatics (2006)
• Update submitted to Nucleic Acids Res. Database issue
• BioVenn
• Revised version submitted to BMC Genomics
• Orthologous networks & OrthoPath
• Manuscript in preparation
• CoPub (Taverna workflows)
• Published in Nucleic Acids Res. Web Server issue (2008)
Assessing orthologous biology in groups of genes:
Application to GC induced insulin resistance
Biorange meeting 2008-03-11:
Goal: Gain better insight into the conservation of genes
involved in glucocorticoid induced insulin resistance
(GC induced IR) between human, mouse and rat.
Use CoPub to build literature networks, map orthology
Validation needed
Network
Insulin
signaling
Adipocyte differentiation
Fatty acid
oxidation/catabolism
Cytochrome P450s
Jak/Stat/IL6
Dexamethosone
& insulin
Lipid transport
Misc: amino acid metabolism,
MAPK signaling, osteoblast
Validation approach
Get all genes from a KEGG
pathway
Repeat with
varying
thresholds
Select random 10% of these
genes
Create Gene Network using
these genes (CoPub)
Compare with original
KEGG pathway
Results
Pathway
ID
Hematopoietic cell lineage
hsa04640
88
0.04
0.69
0.11
0.53
Jak-STAT signaling pathway
hsa04630
153
0.02
0.54
0.20
0.33
Cytokine-cytokine receptor interaction
hsa04060
256
0.03
0.51
0.24
0.53
Toll-like receptor signaling pathway
hsa04620
90
0.02
0.50
0.17
0.40
Metabolism of xenobiotics by cytochrome P450
hsa00980
70
0.00
0.50
0.43
0.32
Melanoma
hsa05218
71
0.02
0.49
0.11
0.20
Renal cell carcinoma
hsa05211
69
0.04
0.49
0.06
0.40
VEGF signaling pathway
hsa04370
70
0.04
0.49
0.06
0.20
GnRH signaling pathway
hsa04912
97
0.03
0.48
0.09
0.13
Endometrial cancer
hsa05213
52
0.02
0.48
0.07
0.47
Average TP = 0.24
Average FP = 0.01
# genes in
pathway
FP_rate
TP_rate
Pos.Pred.Val
Percentage
Manual Pos
Application to all human genes
Non-conserved
Create network for each gene with R
scaled =30, literature count = 5
6,181 networks with size>2
Calculate average conservation for each
network based on conservation for all the
genes in the network in 4 species
(P.tro.,M.mus.,R.nor.,C.fam.)
Get all genes in 100
most conserved
networks
Get all genes in 100
least conserved
networks
211 genes
Calculate GO
enrichment
309 genes
Calculate GO
enrichment
Compare
Term
GO:0051704~multi-organism process
GO:0006952~defense response
GO:0009615~response to virus
GO:0051707~response to other organism
GO:0007586~digestion
GO:0050896~response to stimulus
GO:0009607~response to biotic stimulus
GO:0006955~immune response
GO:0030101~natural killer cell activation
GO:0007565~female pregnancy
PValue
1.23E-07
1.48E-07
4.47E-06
2.65E-05
7.07E-05
1.04E-04
4.16E-04
6.66E-04
0.003385226
0.003668999
Conserved
Term
GO:0048856~anatomical structure development
GO:0007399~nervous system development
GO:0005977~glycogen metabolic process
GO:0006073~glucan metabolic process
GO:0048731~system development
GO:0032502~developmental process
GO:0007275~multicellular organismal development
GO:0044262~cellular carbohydrate metabolic process
GO:0032501~multicellular organismal process
GO:0006813~potassium ion transport
GO:0048513~organ development
GO:0044264~cellular polysaccharide metabolic
process
GO:0005976~polysaccharide metabolic process
PValue
5.80E-05
7.35E-05
9.18E-05
1.02E-04
1.12E-04
1.16E-04
2.67E-04
2.90E-04
4.49E-04
5.58E-04
6.85E-04
7.09E-04
7.94E-04
OrthoPath
• OrthoPath is a gene centric search tool for literature networks and
their orthologs
• Three input methods:
• Single gene search: Get the literature network for a given gene.
OrthoPath will create a network of genes that are connected to
this single gene.
• Keyword Search:
Get the literature network based on a certain keyword. OrthoPath
looks for genes that are connected to the keyword, and creates a
network from all these genes.
• Multi Gene Search:
Get the literature network for a set of genes. OrthoPath creates a
network from only these genes that are entered by the user.
• http://ws2.grid.sara.nl/cgi-bin/orthopath/op.pl
OrthoPath
Search with a single gene
Search with a keyword
Search with a list of genes
Set the minimum strength of a
co-citation between two keywords
Set the minimum number of abstracts
in which a co-citation between the 2
genes is found
Output in HTML, SVG,
Cytoscape or Ingenuity format
OrthoPath
Each node in
the network:
-
EntrezGene
information:
ID, symbol,
description
-
number of
neighbours
-
number of
orthologs
from human
(for all five
species)
Overview
• PhyloPat
• Published in BMC Bioinformatics (2006)
• Update submitted to Nucleic Acids Res. Database issue
• BioVenn
• Revised version submitted to BMC Genomics
• Orthologous networks & OrthoPath
• Manuscript in preparation
• CoPub (Taverna workflows)
• Published in Nucleic Acids Res. Web Server issue (2008)
CoPub Taverna Workflows
• Taverna: free software tool for designing and executing
workflows
• Workflow files for CoPub have been
developed:
(1) Search gene
(2) Get literature
neighbours
CoPub Taverna Workflows
(4) Get the complete
network
(3) Get a list of categories
Acknowledgements
• Wynand Alkema
• Wilco Fleuren
• Raoul Frijters
• Peter Groenen