Download Network & Pathway Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Methods and resources for
pathway analysis
PABIO590B
Week 2
Pathways overview
•
•
•
•
•
Introduction to pathways and networks
Examples of pathways and networks
Review of pathway databases and tools
Representing pathways and networks
Methods of inferring pathways and
networks
• Pathway and cellular simulations
Pathways vs. networks
Gene networks
• Clusters of genes (or gene products) with evidence of coexpression
• Connections usually represent degrees of co-expression
• In-depth knowledge of process is not necessary
• Networks are non-predictive
Biochemical pathways
• Series of chained, chemical reactions
• Connections represent describable (and quantifiable) relations
between molecules, proteins, lipids, etc.
• Enzymatic process is elucidated
• Changes via perturbation are predictable downstream
Pathways vs. networks
Gene networks
Curation Relatively easy:
Biochemical pathways
Difficult: mostly manual
automated and manual
Nodes Genes or gene products
Any general molecule
Edges Levels of co-
Representation of possibly
quantifiable mechanisms
between compounds
expression/influence or a
qualitative relation
Fidelity Low – usually very little
High – specific processes
detail
Predictive power Relatively low
Relatively high
Effort to curate
Pathway and network granularity
Level of detail
•
•
•
•
•
Introduction to pathways and networks
Examples of pathways and networks
Review of pathway databases and tools
Representing pathways and networks
Methods of inferring pathways and
networks
• Pathway and cellular simulations
Yeast gene interaction network
Tong, et al., Science 303, 808 (2004)
Characteristics of the yeast gene
network
• Some genes (e.g. regulatory factors) act as ‘hubs’ in a
network and have many interactions
– Degrees of connectivity follows the power law
– Hubs may make interesting anti-cancer targets
• Clusters of genes with known function suggest function
for hypothetical genes in same cluster
• Network characteristics can be used to predict proteinprotein interactions
• Path between two genes tends to be short
(average ~3.3 hops)
Tong, et al., Science 303, 808 (2004)
E. coli metabolic pathway
glycolysis
Karp, et al., Science 293, 2040 (2001)
Pathways: E. coli metabolic map
• Encompasses >791 chemical compounds in
>744 noted biochemical reactions
• Pathway was compiled via literature information
extraction and extensive manual curation
– System allows for users to indicate evidence of
pathway annotations
– Curation is done collaboratively with numerous
experts outside of EcoCyc
Karp, et al., Science 293, 2040 (2001)
Pathways in bioinformatics
• Most resources for pathways focus on metabolic
pathways (signaling and regulatory gaining
prominence)
• Pathways as a very specific subtype of networks
– Like networks, can be made in computable (symbolic)
form
– Specificities in chemical reactions are more predictive
– Pathways can chain together, forming larger
pathways
Karp, et al., Science 293, 2040 (2001)
•
•
•
•
•
Introduction to pathways and networks
Examples of pathways and networks
Review of pathway databases and tools
Representing pathways and networks
Methods of inferring pathways and
networks
• Pathway and cellular simulations
Pathway repositories
• BioCyc/MetaCyc
• Kyoto Encyclopedia of Genes and
Genomes (KEGG) PATHWAY DB
• BioCarta
• BioModels database
BioCyc database
http://www.biocyc.org
• Pathway/genome database (PGDB) for
organisms with completely sequenced genomes
• 409 full genomes and pathways deposited
• Species-specific pathways are inferred form
MetaCyc
• Query/navigation/pathway creation support
through the Pathway Tools software suite
http://www.biocyc.org
MetaCyc database
http://www.metacyc.org
• Non-redundant reference database for
metabolic pathways, reactions, enzymes
and compounds
• Curation through experimental verification
and manual literature review
• >1200 pathways from 1600+ species
(mostly plants and microorganisms)
http://www.metacyc.org
Glycolysis pathway in MetaCyc
http://www.metacyc.org
KEGG PATHWAY database
http://www.kegg.com
• Consolidated set of databases that cover
genomics (GENE), chemical compounds
(LIGAND) and reaction networks
(PATHWAY)
• Broad focus on metabolics, signal
transduction, disease, etc.
• Species-specific views available (but
networks are static across all organisms)
http://www.kegg.com
Glycolysis pathway in KEGG
http://www.kegg.com
Global Pathway Map
BioCarta database
http://www.biocarta.com
• Corporate-owned, publicly-curated
pathway database
• Series of interactive, “cartoon” pathway
maps
• Predominantly human and mouse
pathways
• Contains 120,000 gene entries and 355
pathways
http://www.biocarta.com
Glycolysis pathway in BioCarta
http://www.biocarta.com
BioModels database
http://www.biomodels.net
• Database for published, quantitative
models of biochemical processes
• All models/pathways curated manually,
compliant with MIRIAM
• Models can be output in SBML format for
quantitative modeling
• 86 curated models, 40 models pending
curation
http://www.biomodels.net
Glycolysis pathways in BioModels
http://www.biomodels.net
Comparison of pathway databases
MetaCyc/
BioCyc
Curation Manual and
KEGG
PATHWAYS
BioCarta
BioModels
Automated
Manual
Manual
~289 reference
pathways
~355 pathways
~126 models
EC, KO
None
GO
Various
Primarily human
and mouse
~475 species
Reference and
species-specific
Animated,
cartoonish
Non-standardized
PGDB, pathway
comparisons
Human
pathways,
disease
Simulations,
modeling
automated
Size ~621+ pathways
Nomenclature EC, GO
Organism ~500 species
coverage
Visuals Species-specific
custom
Primary usage PGDB,
computational
biology
•
•
•
•
•
Introduction to pathways and networks
Examples of pathways and networks
Review of pathway databases and tools
Representing pathways and networks
Methods of inferring pathways and
networks
• Pathway and cellular simulations
Pathway formats
• Extensible Markup Language (XML)
• Systems Biology Markup Language (SBML)
• BioPax
Extensible Markup Language (XML)
• Standard of representing information in a
machine-readable way
• Similar to HTML; tags can enclose or contain
data
<myXMLData>
<someTag>Some data here</someTag>
<anotherTag>More stuff here</anotherTag>
<attributeTag data=“embedded in tag” />
</myXMLData>
Systems Biology Markup Language
• XML-based language for representing
biochemical reactions
• Oriented towards software data-sharing
• Tiered, upward-compatible architecture
(two, upward-compatible levels, third
planned)
• Primary intended use is for quantitative
model simulations
SBML
BioPax
• Like SBML, XML-based pathway
representation
• Tiered structure
– Level 1: Metabolic pathway information
– Level 2: Level 1 + Molecular interaction, posttranslational modification
• Intended to be a lingua franca for pathway
databases
BioPax XML representation
•
•
•
•
•
Introduction to pathways and networks
Examples of pathways and networks
Review of pathway databases and tools
Representing pathways and networks
Methods of inferring pathways and
networks
• Pathway and cellular simulations
Inferring pathways and networks
• Experimental methods
–
–
–
–
–
Microarray co-expression
Quantitative trait locus mapping (QTL)
Isotope-coded affinity tagging (ICAT)
Yeast two-hybrid assay
Green florescent protein tagging (GFP tagging)
• Computational methods
– Database-driven protein-protein interactions
– Expression clustering techniques
– Literature-mining for specified interactions
•
•
•
•
•
Introduction to pathways and networks
Examples of pathways and networks
Review of pathway databases and tools
Representing pathways and networks
Methods of inferring pathways and
networks
• Pathway and cellular simulations
Cellular simulations
• Study the effect perturbation has on a pathway
(and thus the organism)
• Generally require extensive detail on the
pathway or reactions of interest (flux equations,
metabolite concentration, etc.)
• Cellular pathway simulations must manage both
temporal and spatial complexity
microsec. millisec. sec. min. yr.
Temporal intervals
nanosec.
picosec.
0.1 nm
10nm
1um
1mm
1cm
1m
Spatial dimension
Adapted from Kelly, H., http://www.fas.org/resource/05242004121456.pdf , via Neal, Yngve 2006 VHS, UW MEBI 591
Simulation methods and techniques
Biological process
Phenomena
Metabolism Enzymatic reaction
Signal transduction Binding
Computation scheme
Differential-algebraic equations,
flux-based analysis
Differential-algebraic equations,
stochastic algorithms, diffusionreaction
Gene expression Binding
Polymerization
Degradation
Object-oriented modeling,
differential-algebraic equations,
stochastic algorithms, boolean
networks
DNA replication Binding
Polymerization
Object-oriented modeling,
differential-algebraic equations
Membrane transport Osmotic pressure
Membrane potential
Differential-algebraic equations,
electrophysiology
Adapted from Tomita 2001
Research in simulation and modeling
• Virtual Cell (National Resource for Cell Analysis and
Modeling)
• MCell (the Salk Institute)
• Gepasi (Virginia Tech)
• E-CELL (Institute for Advanced Biosciences, Keio
University)
• Karyote/CellX (Indiana University)
Exercise
Your task is to:
• Identify the functions of proteins X, Y & Z
• Identify the pathway(s) in which they are
involved
• Look for differences in pathways between
databases
• Examine the same pathway(s) in humans