Download NIH_Bioinformatics_t.. - Protein Information Resource

Document related concepts
no text concepts found
Transcript
Functional Interpretation of
Large-scale Omics Data through
Pathway and Network Analysis
Bio-Trac 40 (Protein Bioinformatics)
October 9, 2008
Zhang-Zhi Hu, M.D.
Research Associate Professor
Protein Information Resource, Department of
Biochemistry and Molecular & Cellular Biology
Georgetown University Medical Center
1
Overview
• Introduction
- What are large-scale omics data?
- What do they tell you? How to interpret?
• Approaches
- Omics data integration
- Resources: databases and tools
• Case studies
• Systems biology
- Top-down, bottom-up
- Pathway, network modeling
2
Bioinformatics focus is changing…
• Individual molecules
– DNA, RNA, proteins
– Sequence, structure, function
– Evolutionary analysis
• Population of molecules
–
–
–
–
Genome, proteome and other “-omes”
Interactions, complexes
Genomics, Proteomics
Pathways, processes
High level organizations
3
From One Gene:
multiple genetic variants, multiple transcripts,
multiple protein products…
and PTMs…
4
To Global Knowledge:
Genome
The “-ome” and “-omics”
Transcriptome
Proteome
Metabolome
Other “-omes”:
 ORFeome
 Promoterome
 Interactome
 Receptome
 Phenome
 more…
5
Gastric Cancer
ECM cluster
Genes
Global analysis
Potential Gene Markers
SPARC
COL3A1
SULF1
YARS
ABCA5
THY1
6
SIDT2
Corresponding to ECM cluster (Chen et al., 2003; Qiu et al, 2007)
Identification of novel MAP kinase pathway signaling targets
(PMA/TPA  K562 cells  MAPK pathway  targets)
~3500
spots
Digest of U-24
~91spot
changes
reproducible
Twenty-five targets of this signaling pathway were identified, of which
only five were previously characterized as MKK/ERK effectors. The
remaining targets suggest novel roles for this signaling cascade in
cellular processes of nuclear transport, nucleotide excision repair,
nucleosome assembly, membrane trafficking, and cytoskeletal
regulation. -- Mol Cell. 6:1343-54, 2000
7
Drosophila Embryo Interaction Map
Using Y2H technology, 102 bait
protein homologous to human
cancer genes, 2300 interactions
detected, 710 high confidence.
The proteins in the map that bear
an RA (Ras Association) or RBD
(Raf-like Ras-binding) domain
define a discrete subnetwork
around Ras-like GTPases (colored
in yellow).
Genome Res. 15:376-84, 2005.
The exploration of the present map
leads to numerous biological
hypothesis and expands our
knowledge of regulatory protein
networks important in human
cancer as shown by the biological
analysis of a particularly
interesting network surrounding
the Ras oncogene.
8
Strategy for Functional Analyses of Omics Data
Omics Data
Microarray, 2D, IP, MS, etc.
Protein mapping
Bioinformatics Databases
Data integration
Gene, Protein, PPI,
Pathway, PTM, etc.
Text mining
Literature (MEDLINE)
Functional annotation
Functional analysis
~50% GO annotations
biological insights
GO Profiling:
Molecular function, biological process,
cellular component
Molecular networks
(e.g. interaction, association)
Biological pathways (e.g. KEGG,
Reactome, PID, BioCarta)
<10% pathway annotations
Pathway, network, biomarker discovery
9
Methods for Functional Analysis
•
•
•
•
Omics data integration
Functional profiling
Pathway analysis
Resources/knowledgebases
– Molecular databases
– Omics data repositories
• Bioinformatics tools
– Open source: DAVID, FatiGO, iProXpress
– Commercial: Ingenuity, GeneGO
• Literature
– Text mining
10
Principles of multi-omics data integration for
Systems Biology
Protein-Centric –Omics Analysis
Transcriptomics
iProXpress
mRNA
microarray
dbEST
coding EST
DNA methylation
profiling: coding
genes
Proteomics
Protein
Protein
precursor
Splicing
forms
Function
Sites
Epigenomics
Peptide
Natural
peptides
Protease/
Peptidase
Peptidomics
Enzyme1
Metabolic
Pathways
Metabolites:
HMDB
Enzyme2
dbSNP/
HapMap:
NS-SNP
Signaling
Pathways
Genomics
Functional Profiling and Analysis
Biological
Processes
Metabolomics
11
ID Mapping
Batch gene/protein
retrieval and
profiling
Enter ID, gi #
Information
matrix
Functional
profiling
http://pir.georgetown.edu/pirwww/
search/idmapping.shtml
12
Protein annotations
Well annotated entry:
human p53
Comments
(CC line)
Features
(FT line)
References
(RX line)
(P53_HUMAN)
21 years!
Cross
References
(DR line)
GO
13
what molecular function?
what biological process?
what cellular component?
14
Biological Pathways and Networks
Signaling pathways
Metabolic pathways
Organelle biogenesis
Molecular networks
15
Pathways
Human metabolic maps
Global gene expression in skeletal
muscle from gastric bypass patients
before surgery and 1 year afterward.
General trend after surgery: upregulated anaerobic metabolism;
down-regulated oxidative
phosphorylation
green, down-regulated genes
red, up-regulated genes
white, no data available
Proc Natl Acad Sci U S A.
2007 Feb 6;104(6):1777-82
http://www.pnas.org/cgi/data/0610772104/DC1/30
16
Databases of Protein Functions
• Metabolic Pathways
– KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic
Pathways
– EcoCyc: Encyclopedia of E. coli Genes and Metabolism
– MetaCyc: Metabolic Encyclopedia (Metabolic Pathways)
• Inter-Molecular Interactions and Regulatory Pathways
–
–
–
–
–
–
–
IntAct: Protein interaction data from literature and user submission
BIND: Descriptions of interactions, molecular complexes and pathways
DIP: Catalogs experimentally determined interactions between proteins
Reactome - A curated knowledgebase of biological pathways
Pathway Interaction Database (PID)
BioCarta: Biological pathways of human and mouse
Pathway Commons
• GO and GO annotation projects
17
Gene
Ontology
(GO)
18
GO Slim
http://www.geneontology.org/GO.slims.shtml
19
Biological Pathway Resource Collection
http://www.pathguide.org/
•
•
•
•
•
Protein-protein interactions
Metabolic pathways
Signaling pathways
Pathway diagrams
Transcription factors / gene
regulatory networks
• Protein-compound interactions
• Genetic interaction networks
20
http://www.pathwaycommons.org/pc/home.do
21
KEGG Metabolic & Regulatory Pathways
KEGG is a suite of databases and associated software, integrating our current knowledge on molecular
interaction networks, the information of genes and proteins, and of chemical compounds and reactions.
(http://www.genome.ad.jp/kegg/pathway.html)
22
BioCarta Cellular Pathways
(http://www.biocarta.com/index.asp)
Transforming Growth Factor (TGF)
beta signaling [Homo sapiens]
23
Transforming
Growth Factor (TGF)
beta signaling
[Homo sapiens]
Reactome:
events and
objects
(including
modified forms
and complex)
(http://reactome.org/cgibin/eventbrowser?DB=gk_curre
nt&FOCUS_SPECIES=Homo%
20sapiens&ID=170834&)
Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens]
Object -> REACT_7364.1: Phospho-R-SMAD [cytosol]
Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens]
Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol]
Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleus
Object -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] ……
24
PID
Transforming Growth
Factor beta signaling
25
Transforming Growth Factor (TGF) beta signaling
Reactome
PID
26
~26 proteins in PID are not defined in Reactome, while only 2 in Reactome not defined in PID
Ca2+
Growth
signals
PRO:000000616
TGF-beta signaling – comparison
between PID and Reactome
LAP
TGF-b
Furin
TGF-b
Growth
signals
TGF-b
TGF-beta receptor
PRO:000000410
S P
Y P
Y P
S P
Y P
T P
T P
S P
T P
Smad 2
S P
Y P
Y P
S P
Y P
II
ISP
S P
T P
Cytoplasm
S P
S P
PRO:000000650
MEKK1
ERK1/2
I S PRO:000000523
P
II
Stress
signals
Smad 2
Smad 4
S P
S P
Smad 2 X P
Shc
XIAP
CaM
TAK1
X
Shc
S P
S P
Smad 2
TPTP
SPSP
Smad 2 SPSP
S P
Y P
KU
TAK1
Smad 4
Degradation
MAPKKK
S P
S P
S P
S P
Smad 2
Smad 2
Smad 4
Smad 4 Ski
X
DNA binding and transcription regulation
Nucleu
s
P38 MAPK
pathway
JNK
cascade
S P
T P
Y P
Phosphorylation (P) at Serine (S),
Threonine (T) and Tyrosine (Y)
KU
Ubiquitination (U) at Lysine (K)
Common in both Reactome & PID
Only reported in Reactome
27 in
* All others are in PID. Not all components
the pathway from both databases are listed
GEO: a gene expression/
molecular abundance repository
http://www.ncbi.nlm.nih.gov/geo/
IntAct: open source database system and
analysis tools for protein interaction data
http://www.ebi.ac.uk/pride/
PRIDE: centralized, standards compliant,
public data repository for proteomics data
28
http://www.ebi.ac.uk/pride/
Analysis Tools
• iProXpress
– http://pir.georgetown.edu/iproxpress/
• DAVID
– http://david.abcc.ncifcrf.gov/
• Babelomics - FatiGO
– http://babelomics.bioinfo.cipf.es/
• Commercial:
– Ingenuity: http://www.ingenuity.com/
– GeneGO: http://www.genego.com/
• Visual tools:
– Cytoscape: http://www.cytoscape.org/
– CellDesigner: http://www.celldesigner.org/
29
iProXpress: Integrative analysis of
proteomic and gene expression data
Data
http://pir.georgetown.edu/iproxpress/
MS spectrum
Peptide ident.
Protein ident.
Information
Function
Pathway
Family
Categorize
Statistics
Association
Knowledge
30
iProXpress – Pathway Profiling
• Organelle proteome data sets
ER
Mit
• Protein information matrix: extensive
annotations including protein name, family
classification, function, protein-protein
interaction, pathway…
• Functional profiling: iterative categorization,
sorting, cross-dataset comparison, coupled
with manual examination.
KEGG
pathway
31
iProXpress Analysis Interface
1
2
3
4
5
6
7
8
Cross-data groups
comparative profiling
32
http://david.abcc.ncifcrf.gov/
33
A Literature-Derived
Network for Yeast
All MEDLINE abstracts
processed using statistical cooccurrence and NLP methods:
• Functional association (cooccurrence – grey shades
• Physical interaction – green
• Regulation of expression – red
• Phosphorylation – dark blue
• Dephosphorylation – light blue
Inference: Ssn3 ->Hsp104 (b) and
Ume6 -> Ino2 & Erg9 (c)
expressions
Jensen et al., 2006
34
Case Studies
Pathway studies: analysis of proteomics and gene
expression data from cancer research
I. Estrogen Signaling Pathways (estrogen-induced apoptosis)
Breast cancer cells (+E2)  IP (AIB1, pY)  1D-gel  MS/MS
II. Purine Metabolic Pathways (radiation-induced DNA repair)
Human fibroblast (AT patient) + irradiation  2D-gel  MS
 DNA microarray
III. Melanosome Biogenesis
(comparative organelle proteomic profiling)
Melanoma cell  isolation of stage specific melanosmes  MS
35
I. Estrogen Signaling Pathways (estrogen-induced apoptosis)
E2
MCF-7
200nM for 2h
MCF-7/5C
Estrogen
deprived
condition
Signaling pathway:
early events?
Breast cancer cells
Growth
pY-IP
AIB1
Mimicking clinical
condition: 2nd phase antiestrogen drug resistance
Apoptosis
AIB1-IP
Hu ZZ, et al. (2008) US HUPO
Integrated Bioinformatics
MS proteomics
Expression Profiling,
Pathway/Network Mapping
36
Proteins only in E2 treated MCF-7/5C cells from both pY-IP and AIB1-IP
GO profiling (biological process)
Transcription
Cell communication
Chromosome remodeling & co-repression, cell cycle inhibition, apoptosis
37
Pathway Mapping:
G(o) alpha-2 subunit
(pY/5C +E2)
RAP1GAP
(AIB1/5C+E2)
38
Hypothesized E2-induced Apoptosis Pathways
E2
GPR30
?
?Gas
Cytoplasm
pY
GNAO2
pY-IP AIB1-IP
G(o) alpha-2, GPCR signaling
GNAO2
AIB1
Rap1a
Rap1GAP
E2 ERa
Function
Rap1GAP
Apoptosis
MEK
Growth inhibition/apoptosis
BAD-mediated apoptosis
CDK1
ERK
BAD
Sirt3
pY
CDK1
Apoptosis
Histone modification, apoptosis
TLE3
Co-repression, apoptosis
Cell growth
Sirt3
TLE3
E2 ERa
Sirt3
Nucleus
CIP29
Cell cycle arrest/apoptosis
RUNX3
AIB1
pY
CIP29
39
Text mining for proteinprotein interaction (PPI)
information
40
II. Purine Metabolic Pathways (radiation-induced DNA repair)
Ionizing Radiation
AT5BIVA
ATCL8
ATM introduced
AT patient fibroblast
ATM-mutated
ATM
Sensitive to
IR damage
ATM-wild type
Resistant to
IR damage
2D-gel/MS
DNA Microarray
Proteins differentially
expressed (1093)
mRNAs differentially
expressed (231)
Hu ZZ, et al. (2008) J Prot. Bioinfo.
Integrated Bioinformatics
Intersections
Expression Profiling,
Pathway/Network Mapping
(13 proteins/genes)
41
KEGG pathway profiles
42
(RRM2)
43
Purine metabolic pathway
ATP X dATP
DNA synthesis dGTP X GTP
DNA repair
ADP  dADP
dGDPGDP
1.17.4.1
Ribonucleoside diphosphate
1.17.4.1
reductase subunit M2 (RRM2)
44
Functional Association Networks
RRM2
HDAC1
p53
BRCA1
RRM2 connected to other major DNA repair and cell
cycle proteins, such as p53, BRCA1, HDAC1.
45
RRM2 in radiation-induced ATM-p53-mediated DNA repair pathway
ATM
p53
HDAC1
BRCA1
BRCA1
ATM
p53
RRM2
RRM1
RR complex
DNA repair
46
Comparative organelle proteome profiling allows to propose key
proteins potentially involved in regulation of organelle biogenesis
Keratinocytes
Melanocyte
P
Drebrin
P
ARPC4
b-actin
P
Molecular
motors:
kinesin, dynein
/dynactin, dynamin
Myosin V, myosin
Ic, Id, I4
P
em
ne
a
r
b
ll m
e
C
2
P21-rac1
Rab5c
M
Myo-Va
P
P
M
Lyst
P
AP-2a
Pmel17
DDT?
P
vATPaseG2
Stage IV
SLC24A5
(golden)
V
Lysosome
Vinculin
P
vATPase
G2
M
Early
endosome
M
Rab38
Rab27a
3
Tyrp1
TYR
V
H+
Na+/K+/Ca2+
2
Late
endosome
3
C
Schematic drawing
of melanosome
biogenesis pathway
and key proteins
involved in each
stage.
Matp
OA1
4
MART1
Sec24
M
Pmel17
4
P
P
AP-2a
Stage II
Tyrp1
B
Atp7a
VAP-A
P
DCT
TYR
M
Cu2+
Golgi
ic
m
as m
pl lu
do icu
En Ret
eus
III. Organelle Proteomes
V
1
A
P
Pmel17
MART1
PEDF
V
Matp
Ib1
M
MGST3
Stage I
hybrid organelle
Flotillin-2
P
Rab5
V
TYR
Tyrp1
V
Newly identified
and validated
M
Mouse color gene
homolog
P
Proposed new
protein
* Untagged are known
melanosome proteins
Chi A, et al. (2006)
J. Prot. Res.
47
Towards Systems Biology
(Nature 422:193, 2003)
Genomics
Bibliomics
Transcriptomics
Literature Mining
Proteomics
Metabolomics
Bioinformatics
…mics
…mics
…omics
Integrated knowledge
and tools are needed
for Systems Biology’s
research
48
What is Systems Biology?
Systems Biology, 2004, 1(1):19-27.
‘Systems biology defines and analyses the
interrelationships of all of the elements in a
functioning system in order to understand
how the system works.’ -- Leroy Hood
• How an organism works from an overall perspective.
• Interactions of parts of biological systems
– how molecules work together to serve a regulator
function in cells or between cells.
– how cells work to make organs, how organs work to
make a person.
• Systems biology is the converse of reductionist biology.
49
Reductionist vs. Systems Biology
The driving force in 20th century biology
has been reductionism:
From the population to the individual
From the individual to the cell
From the cell to the biomolecule
From the biomolecule to the genome
From the genome to the genome
sequence
With the publication of genome
sequences, reductionist biology has
reached its endpoint
The driving force for 21st century biology will
be integration:
Integrating the activity of genes and
regulators into regulatory networks
Integrating the interactions of amino acids
into protein folding predictions
Integrating the interactions of metabolites
into metabolic networks
Integrating the interactions of cells into
organisms
Integrating the interactions of individuals into
ecosystems
50
Universal Organizing Principles
Large-scale organization
Level 4
Functional modules
Level 3
Regulatory motif, pathway
Level 2
Omics data, information
Level 1
Although the individual components are unique to a given organism, the
topologic properties of cellular networks share surprising similarities with
those of natural and social networks
51
Approaches: top-down or bottom-up
Three types of models
• top-down: systemic-data driven, to discover or
refine pre-existing models that describe the
measured data (more on regulatory models).
Emerges as dominant method due to “-omics”.
• bottom-up: starts with the molecular properties to
construct models to predict systemic properties
followed by validation and model refinement
(more on kinetic models) (Silicon cell program:
52
http://www.siliconcell.net/)
Bruggeman FJ, Westerhoff HV. Trends Microbiol. 2007 15:45-50.
Top-down
Yeast two-hybrid
Combination of
techniques (Y2H,
protein arrays)
Integration of other
types of information
(expression,
localization or genetic
studies)
Curr Opin Chem Biol. 2006 Dec;10(6):551-8.
dynamic biologically
relevant interaction
53
subnetworks
EGFR-GAB1-ERK/Akt network
EGFR signaling network model is constructed based
on the reaction stoichiometry and kinetic constants
Bottom-up
J Biol Chem. 2006
281:19925-38
The model allows predictions of temporal patterns of cellular responses to EGF under diverse perturbations (e.g., EGF doses):
• The dynamics of GAB1 tyr-phosphorylation is controlled by positive GAB1-PI3K and negative MAPK-GAB1 feedbacks.
• The essential function of GAB1 is to enhance PI3K/Akt activation and extend the duration of Ras/MAPK signaling.
• GAB1 plays a critical role in cell proliferation and tumorigenesis by amplifying positive interactions between survival and
54
mitogenic pathways
Gene regulatory networks (GRNs)
WIRED Systems biology looks at the connections
between components in cells.
Essential elements of the role of
Dorsal in establishing dorsoventral
polarity in Drosophila embryonic
development
Reprod Toxicol. 19:281-90, 2005
55
Modeling of the main modules of cell-cycle progression
Three functional
units:
• Start function: onset
of S-phase
• Cyclin cascades
(C1, C2, C3)
• End function: onset
of mitosis to cell
division
56
Chembiochem 5:1322-33, 2004
Challenges to Systems Biology
• A complete characterization of an organism (molecular
constituents  interactions  cell function)
• Spatial-temporal molecular characterization of a cell
• A thorough systems analysis of “molecular response”
of a cell to external/internal perturbations
• Information must be integrated into mathematical
models to enable knowledge testing by formulating
hypothesis and discovery of new biological
mechanisms…
57
Cellular Maps?
signaling, metabolism, gene regulation …
58