Download intro-to-ptools-and-biocyc - Bioinformatics Research Group at

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Paracrine signalling wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Biochemical cascade wikipedia , lookup

Transcript
Introduction to the
Pathway Tools Software
and
BioCyc Database Collection
MetaCyc Family of
Pathway/Genome Databases
SRI International
Bioinformatics
 2,500+
databases from multiple institutions
 Cover all domains of life with microbial emphasis
 All
DBs derived from MetaCyc via computational
pathway prediction
 Common
schema
 Common controlled
vocabularies
 Common methodologies
Curated Databases Within the
MetaCyc Family
SRI International
Bioinformatics
Database
Organism
Organization
Curated From
MetaCyc
Multiorganism
SRI
34,000
EcoCyc
E. coli
SRI
23,000
HumanCyc
H. sapiens
SRI
AraCyc
A. thaliana
Carnegie Instit.
2,282
YeastCyc
S. cerevisiae
Stanford Univ
565
MouseCyc
M. musculus
Jackson Labs
BioCyc Collection of 1,700
Pathway/Genome Databases
Database (PGDB) –
combines information about
 Pathways, reactions, substrates
 Enzymes, transporters
 Genes, replicons
 Transcription factors/sites, promoters,
operons
Pathway/Genome
Tier
1: Literature-Derived PGDBs
 MetaCyc, HumanCyc, YeastCyc
 EcoCyc -- Escherichia coli K-12
 AraCyc – Arabidopsis thaliana
Tier
2: Computationally-derived DBs,
Some Curation -- 34 PGDBs
 Bacillus subtilis, Mycobacterium tuberculosis
Tier
3: Computationally-derived DBs, No
Curation -- The remainder
SRI International
Bioinformatics
SRI International
Bioinformatics
Pathway/Genome Database
Pathways
Reactions
Proteins
RNAs
Genes
Compounds
Sequence Features
Operons
Promoters
DNA Binding Sites
Regulatory Interactions
Chromosomes
Plasmids
CELL
Pathway Tools Software:
PGDBs Created Outside SRI
3,000+
SRI International
Bioinformatics
licensees: 250+ groups applying software to 1,700 organisms
Saccharomyces
cerevisiae, SGD project, Stanford University
 135 pathways / 565 publications – BioCyc.org
FungiCyc, Broad Institute
Candida albicans, CGD project, Stanford University
dictyBase, Northwestern University
Mouse,
MGD, Jackson Laboratory -- BioCyc.org
Drosophila, FlyBase, Harvard University -- BioCyc.org
Under development:
 C. elegans, WormBase
Arabidopsis
thaliana, TAIR, Carnegie Institution of Washington
 288 pathways / 2282 publications – BioCyc.org
ChlamyCyc, GoFORSYS
PlantCyc, Carnegie Institution of Washington
Six Solanaceae species, Cornell University
GrameneDB, Cold Spring Harbor Laboratory
Medicago truncatula, Samuel Roberts Noble Foundation
Pathway Tools Software:
PGDBs Created Outside SRI
G.
SRI International
Bioinformatics
Serres, MBL, Shewanella oneidensis
M. Bibb, John Innes Centre, Streptomyces coelicolor
TBDB Project, Mycobacterium tuberculosis
F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa
Genoscope, Acinetobacter
R.J.S. Baerends, University of Groningen, Lactococcus
lactis IL1403, Lactococcus lactis MG1363, Streptococcus
pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus
ATCC14579
Matthew Berriman, Sanger Centre, Trypanosoma brucei,
Leishmania major
Sergio Encarnacion, UNAM, Sinorhizobium meliloti
Mark van der Giezen, University of London, Entamoeba
histolytica, Giardia intestinalis
Pathway Tools Software:
PGDBs Created Outside SRI

SRI International
Bioinformatics
Large scale users:
 C. Medigue, Genoscope, 500+ PGDBs
 J. Zucker, Broad Inst, 94 PGDBs
 G. Sutton, J. Craig Venter Institute, 80+ PGDBs
 G. Burger, U Montreal, 60+ PGDBs
 E. Uberbacher, ORNL 33 Bioenergy-related organisms
 Bart Weimer, UC Davis, Lactococcus lactis, Brevibacterium linens,
Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii,
Listeria monocytogenes
 Partial
listing of outside PGDBs at
http://biocyc.org/otherpgdbs.shtml
Pathway Tools Software
 Comprehensive
SRI International
Bioinformatics
software environment spanning
computational genomics and systems biology
 Create and maintain an organism database
integrating genome, pathway, regulatory
information
 Computational inference tools
 Interactive editing tools
 Query and visualize that database
 Interpret genome-scale datasets
 Comparative analysis tools
 Generate flux-balance models
Pathway Tools Software
Annotated
Genome
Genome-Scale
Flux Model
+
SRI International
Bioinformatics
PathoLogic
Pathway/Genome
Database
Pathway/Genome
Editors
Briefings in Bioinformatics 11:40-79 2010
Pathway/Genome
Navigator
SRI International
Bioinformatics
Pathway Tools Software: PathoLogic
 Computational
creation of new Pathway/Genome
Databases
 Transforms
genome into Pathway Tools schema
and layers inferred information above the genome
 Predicts
operons
 Predicts metabolic network
 Predicts which genes code for missing enzymes
in metabolic pathways
 Infers transport reactions from transporter names
Bioinformatics 18:S225 2002
Pathway Tools Software:
Pathway/Genome Editors

Interactively update PGDBs
with graphical editors

Support geographically
distributed teams of
curators with object
database system

Gene editor
Protein editor
Reaction editor
Compound editor
Pathway editor
Operon editor
Publication editor






SRI International
Bioinformatics
What is Curation?









SRI International
Bioinformatics
Ongoing updating and refinement of a PGDB
Correcting false-positive and false-negative
predictions
Incorporating information from experimental literature
Authoring of comments and citations
Updating database fields
Gene positions, names, synonyms
Protein functions, activators, inhibitors
Addition of new pathways, modification of existing
pathways
Defining TF binding sites, promoters, regulation of
transcription initiation and other processes
Pathway Tools Software:
Pathway/Genome Navigator

Querying and visualization of:
 Pathways
 Reactions
 Metabolites
 Proteins
 Genes
 Chromosomes

Two modes of operation:
 Web mode
 Desktop mode
 Most functionality shared, but each
has unique functionality
SRI International
Bioinformatics
SRI International
Bioinformatics
Pathway Tools Ontology / Schema
 Ontology
classes: 1621
 Datatype classes: Define objects from genomes to pathways
 Classification systems for pathways, chemical compounds,
enzymatic reactions (EC system)
 Protein Feature ontology
 Controlled vocabularies:


Cell Component Ontology
Evidence codes
 Comprehensive
relationships
set of 248 attributes and
What is a Pathway?
A
SRI International
Bioinformatics
connected sequence of biochemical reactions
 Occurs in one organism
 Conserved through evolution
 Regulated as a unit
 Starts or stops at one of 13 common intermediate
metabolites
SRI International
Bioinformatics
Comparison of BioCyc to KEGG

KEGG approach: Static collection of reference pathway
diagrams are color-coded to produce organism-specific
views

KEGG vs MetaCyc: Resource on literature-derived pathways
 KEGG maps are not pathways
Nuc Acids Res 34:3687 2006
 KEGG maps contain multiple biological pathways
 KEGG maps are composites of pathways in many organisms -- do not
identify what specific pathways elucidated in what organisms
 KEGG has no literature citations, no comments, less enzyme detail

KEGG vs BioCyc organism-specific PGDBs
 KEGG does not curate or customize pathway networks for each organism
 Highly curated PGDBs now exist for important organisms such as E. coli,
yeast, mouse, Arabidopsis
 KEGG re-annotates entire genome for each organism
Comparison of
Pathway Tools to KEGG
 Inference
SRI International
Bioinformatics
tools
 KEGG does not predict presence or absence of pathways
 KEGG lacks pathway hole filler, operon predictor
 Curation tools
 KEGG does not distribute curation tools
 No ability to customize pathways to the organism
 Pathway Tools schema much more comprehensive
 Visualization and analysis
 KEGG does not perform automatic pathway layout
 No comparative pathway analysis
SRI International
Bioinformatics
Pathway Tools Implementation Details
 Allegro
Common Lisp
 PC/Windows, Linux, Macintosh platforms
 Ocelot
object database
 600,000+
lines of code
 Lisp-based
WWW server at BioCyc.org
 Manages 1,100+ PGDBs
EcoCyc iPhone App
 Available
SRI International
Bioinformatics
in iTunes store
 Free
 Look
up gene information while on travel, at a
conference, in the library
Automated Generation of
Metabolic Flux Models from
PGDBs
Joint work with Mario Latendresse
SRI International
Bioinformatics
Flux-Balance Analysis


Nutrients
A
Steady state, constraint-based quantitative models of
metabolism
Starting information for organism of interest:
Secretions
Metabolic Reaction List
A
B
C
X
D
Biomass
D
Flux Balance Models
SRI International
Bioinformatics

Submit to linear optimization package
 Optimize biomass production, ATP production, etc

Results
 Steady-state reaction fluxes for the metabolic network

Remove reactions from the model to predict knock-out
phenotypes

Supply alternative nutrient sets to predict growth phenotypes
Approach: Derive FBA Models
from PGDBs
SRI International
Bioinformatics

Store and update metabolic model within Pathway Tools
 The PGDB is the model
 All query and visualization tools applicable to FBA model
 FBA model is tightly coupled to genome and regulatory information

Export to constraint solver for model execution/solving

Reaction balance checking
Dead-end metabolite analysis
Visualize reaction flux using cellular overview
Multiple gap filling



SRI International
Bioinformatics
Multiple Gap Filling of FBA Models
 Reaction
gap filling (Kumar et al, BMC Bioinf 2007 8:212):
 Reverse directionality of selected reactions
 Add a minimal number of reactions from MetaCyc to the
model to enable a solution
 Reaction cost is a function of reaction taxonomic range
 Metabolite
gap filling: Postulate additional
nutrients and secretions
 Partial solutions: Identify maximal subset of
biomass components for which model can yield
positive production rates
Downloading Pathway Tools
SRI International
Bioinformatics
 Obtain
license
 http://biocyc.org/download.shtml
 Download
 Choose
directory offers several configurations
platform and database configuration
 Many combinations of databases available
 All databases requires a lot of memory
 Use registry to add PGDBs to configuration you downloaded
Information Sources





SRI International
Bioinformatics
Pathway Tools User’s Guide
 aic-export/pathway-tools/ptools/14.0/doc/manuals/userguide.pdf
 NOTE: Location of the aic-export directory can vary across different
computers
Pathway Tools Web Site
 http://bioinformatics.ai.sri.com/ptools/
 Publications, FAQ, programming examples, etc.
Slides from this tutorial
 http://bioinformatics.ai.sri.com/ptools/tutorial/sessions/
BioCyc Webinars
 http://biocyc.org/webinar.shtml
Desktop vs Web functionality in Pathway Tools
 http://biocyc.org/desktop-vs-web-mode.shtml
Information Sources
SRI International
Bioinformatics
 Publications
“Pathway Tools version 13.0: Integrated Software for
Pathway/Genome Informatics and Systems Biology”,
Briefings in Bioinformatics 11:40-79 2010
 “A survey of metabolic databases emphasizing the MetaCyc
family”, Archives of Toxicology 2011

Information Sources
 BioCyc
Web site: Help Menu
 Basic Help
 Search Help
 BioCyc Glossary
 Publications
 Website User Guide
 PGDB Concepts
 Guide to EcoCyc
 Guide to MetaCyc
SRI International
Bioinformatics