Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Types of artificial neural networks wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Neurogenomics wikipedia , lookup

Signal transduction wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Transcript
CONFIDENTIAL
Building and Systems
usingBiology
protein
interaction networks:
for Drug Discovery
industry perspective
Andrej Bugrim
GeneGo, Inc.
Copyright GeneGo 2000-2003
Topics
CONFIDENTIAL
• Annotation process and collecting network
content for idustrial-type applications
• Biological and disease ontologies – how to
improve and use them in functional analysis
• Tools: utilizing network data in pharmaceutical
R&D
Copyright GeneGo 2000-2003
Multi-level understanding of human biology
Causative
relations
CONFIDENTIAL
Level
of
phenotype
Level of
Cell process/
network
Mechanistic
relations
Level of
protein
Copyright GeneGo 2000-2003
Disease-centered knowledge base in MetaMiner (Oncology example) CONFIDENTIAL
GG annotation team
Disease group
Network group
Specialty group
Chemistry group
Causative disease associations:
DNA, RNA, protein levels
Protein-protein; Protein-DNA;
protein-RNA interactions
Biomarkers
Ligand-receptor
interactions: drugs,
leads, hits
Compare
General BC schema
Causative BC models
BC-perturbed cell processes
Other cancers chosen by Consortium
Copyright GeneGo 2000-2003
CONFIDENTIAL
Content
Copyright GeneGo 2000-2003
Three interactions domains in MetaCore
Ligands: metabolites, peptides, xenoboitics
Membrane receptors
Signal transduction:
G proteins,
Secondary messengers
Kinases
Phosphotases
Transcription factors
CONFIDENTIAL
•1,600 drugs w/targets
• 4,100 endogenous metabolites
•>21,000 ligand-receptor interactions
•850 GPCRs and other membrane receptors
•110 Nuclear hormone receptors
172K manually curated physical signaling interactions
538 canonical maps
42,000 13-step canonical signal transduction pathways
924 Human transcription factors
6,000 target genes
11,300 metabolic reactions
Core effect: metabolic pathways
116 Fine metabolic maps
Metabolites
4,100 endogenous metabolites
Copyright GeneGo 2000-2003
MetaBase Content Overview
– Database
• Chemical compounds
• Drugs
• Chemical Reactions
• Metabolic networks
– Network
• Proteins + genes
• Transcription factors
• Chemical compounds
• Drugs
• Endogenous compounds
• Proteins linked to drugs
• Reactions
• Small molecule ligands for
human receptors
• blockers for ion channels
• Pubmed journals
• Pubmed articles
• Total amount of interactions177,000
– Content
• GeneGo regulatory networks
• GeneGo disease networks
• Maps
• Regulatory maps
• Metabolic maps
• Traditional metabolic maps (EC)
• Diseases
CONFIDENTIAL
580,000
8,590
35,600
251
13,402
924
26,000
2,740
4,100
2,711
5,330
3,510
629
3,100
81,400
120
88
538
325
116
97
4,920
Copyright GeneGo 2000-2003
CONFIDENTIAL
MetaBase content by type
Database
Chemical
compounds
580,000
Reaction substrates with
kinetic data
3,580
27,418
Compounds with structures
25,662
Compounds in network
Metabolic
reactions Genes
(human:
35,600
38,700)
Total:137,500
Human
proteins
14,570
15,700
Compounds in reacts
Drug metabolites
Drugs
Endogenous compounds
3,422
8,590
4,100
Copyright GeneGo 2000-2003
CONFIDENTIAL
Network interactions
Manually curated interactions (172,787)
Protein-protein interactions
Logical relations; 1,934; 1%
Signalling interactions;
137,297; 79%
Protein-protein; 87,675;
51%
Activation/
inhibition via
binding; 43,079;
52%
Covalent
modification;
5,967; 8%
Unspecified
regulation;
3,990; 5%
Regulation of
transcription;
15,725; 21%
Influence on
expression;
10,120; 14%
Small molecule-protein;
42,383; 26%
Small molecule-protein
Metabolic reactions; 35,490;
21%
Y2H "Interactome"; 2,370; 1%
With virus protiens; 335; 0%
Chip-Chip; 980; 1%
With MicroRNA; 1,620; 1%
All interactions taken from articles indexed in
Pubmed
Pubmed journals
3,100
Pubmed articles
81,400
Binding to
receptors;
14,497; 34%
Regulation of
other proteins;
6,218; 15%
Regulation of
enzymes; 8,898;
21%
Binding to
kinazes; 6,984;
16%
Regulation of
transporters;
5,786; 14%
Copyright GeneGo 2000-2003
CONFIDENTIAL
Type of interactions in network
Effects
activation
Direct interaction
Indirect
interaction
inhibition
Mechanism
Mechanism
phosphorylation
influence
expression
dephosphorylation
unspecified
unspecified
on
other type of covalent
modification
binding
transport
cleavage
transcription regulation
transformation
catalysis
competition
Copyright GeneGo 2000-2003
CONFIDENTIAL
Distribution of interactions by mechanism
influence on
expression
12%
unspecified
6.4%
phosphorylation
4.1%
binding
48%
dephosphorylation
0.5%
covalent
modification
1%
transport
2%
cleavage
2%
competition
0.1%
catalysis
8%
transformation
1%
transcription
regulation
15%
Copyright GeneGo 2000-2003
CONFIDENTIAL
Network objects
Total number of nodes: 40,229
Network objects
Metabolites of
xenobiotics; 1,924
Enzymes; 2,910
Metabolic reactions;
5,353
Kinazes; 626
Phosphatases; 137
Xenobiotic
compounds; 15,955
Metabolic reactions;
5,353
Proteins; 13,406
Proteases; 352
Transcription factors;
924
membrane receptors;
764
Receptor Ligands; 640
Nuclear hormone
receptors; 110
Chemical
compounds ; 25,662
Drug metabolites; 1,032
Drugs; 2,741
Transporters; 804
Ion Channels; 217
Other; 5,922
Endogenous
compounds; 4,010
Copyright GeneGo 2000-2003
CONFIDENTIAL
Proteins: distribution by tissue & localization
Proteins: distribution by tissue
Proteins: distribution by cell compartment
4452
Com m on for all these tissues
7484
Uteri
7695
Upper GI Tract
Trachea
18107
Unspecified
7471
823
nucleus
7758
Tonsil
integral to plasma membrane
7427
Thyroid
Testes
7803
Spleen
530
cytoplasm
6761
Thym us
684
399
plasma membrane
7064
7788
Skin
Salivary Gland
6241
Retina
8376
5715
Pancreas
249
membrane fraction
226
178
147
mitochondrion
6961
Placenta
extracellular region
extracellular space
7377
Prostate
335
integral to membrane
7655
Spinal Cord
126
soluble fraction
Ovary
7263
100
membrane
7150
Marrow
Mam m ary Gland
7430
cytosol
94
Lung
7485
endoplasmic reticulum
91
Liver
6888
Kidney
intracellular
56
Golgi apparatus
56
proteinaceous extracellular matrix
54
cytoskeleton
48
actin cytoskeleton
44
lysosome
42
7247
7064
Heart
7236
Colon
Brain
7365
Adrenal Gland
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
7181
1
10
100
1,000
10,000
100,000
Copyright GeneGo 2000-2003
Molecular functions in Database
signal transducer
activity; 2535; 13%
catalytic activity; 4086;
23%
CONFIDENTIAL
transcription
regulator activity;
1396; 7%
transporter activity;
1078; 6%
enzym e regulator
activity; 599; 3%
structural m olecule
activity; 459; 2%
translation regulator
activity; 75; 0%
m otor activity; 77; 0%
antioxidant activity; 51;
0%
chaperone regulator
activity; 11; 0%
chem oattractant
activity; 8; 0%
chem orepellant
activity; 3; 0%
binding; 8503; 46%
Copyright GeneGo 2000-2003
Endogenous compounds (4,100 total)
CONFIDENTIAL
•3,070 endogenous compounds involved in metabolic reactions: 6,819
reactions with endogenous compounds only
•751 endogenous ligand for 498 receptors with 2,455 interactions
•4000 (98%) of endogenous compounds in network
•15,962 network interactions with endogenous metabolites
•3,600 compounds with structures and brutto-formulas (other 700 are “generic”:
contain acyl-, alkyl- and other variable groups)
Endogenous compounds by origin
Other
19%
Steroids
Fatty Acids
4%
5%
Nucleotides
2%
Vitamins/Cofactors
6%
Peptides
10%
Carbohydrates
15%
Lipids
43%
Copyright GeneGo 2000-2003
Network and pathway statistics in GeneGO
•
•
•
•
•
CONFIDENTIAL
>40,000 nodes;
~177,000 edges;
Average node degree: 3,77;
241 million shortest pathways;
Average shortest pathway length: 5.3811;
• 42,000 13-step canonical signal transduction pathways;
• 200 canonical metabolic pathways- major metabolic
fluxes like glycolysis or TCA;
• 72,000 pathways on metabolic maps: pathways
analogous to KEGG (KEGG has 42,500)
Enzyme1
reaction1
metabolite
reaction2
Enzyme2
Copyright GeneGo 2000-2003
CONFIDENTIAL
Pathways in regulatory network
Start: TMR
(transmembrane receptor)
TF (Transcription Factor)
+P
B
Tr
Tr
B
kinase
A
1
C
B
+P
B
kinase
Tr
B
Z
B
a
B
B
D
2
Tr
B
+P
kinase
ab
3
End: Target genes
Copyright GeneGo 2000-2003
CONFIDENTIAL
Ontologies
Copyright GeneGo 2000-2003
CONFIDENTIAL
Knowledge base (ontologies)
By genre:
- Drama
- Action
- Romance
- Horror
- Foreign
By director:
- Lynch
-Tarantino
- Leone
- Stone
- Antonioni
By actor:
- Pitt
- Nicholson
- Depp
- Redford
- Damon
By year:
-2007
-2006
-2005
-2004
-2003
• How do you compare “action” movies vs. Tarantino movies vs. 2003 movies?
•These are incomparable as these are different categories
Molecular pthwy
Cellular process
Disease
Metabolic process
Mixed ontologies
Copyright GeneGo 2000-2003
CONFIDENTIAL
Multiple ontologies in MetaDiscovery Platform: multi-dimensional knowledge base on human
biology
Copyright GeneGo 2000-2003
Enrichment in GO and GeneGo processes
GO processes
CONFIDENTIAL
GeneGo process networks
• Resolution: interactions between proteins
• Connections between all proteins in folder
• Clear signaling path, effect within process
• Resolution: list of proteins
• No connections between proteins
• No sgnaling/effect within process
•4 samples from 4 patiens
•Disease/norm from same patients
•Affy U133A arrays
Copyright GeneGo 2000-2003
CONFIDENTIAL
Inflammation
Genes from GO process
“Inflammatory response”
Genes from GO-processes
“Inflammatory response”
“Immune response”
231
Genes from GO-process
“Immune response”
446
613
Not in networks
Not in networks
79
199
In networks
In networks
152
247
Not in networks
In networks
268
345
Genes in 15 process networks
Genes added to networks
1642
1297
Copyright GeneGo 2000-2003
CONFIDENTIAL
Diseases
38,709 Human genes total
4,881 Diseases, based on MeSH
17%
Human genes
linked to diseases
– 6,318
83%
Human genes not
linked to diseases –
32,391
21,264 unique articles,
indexed in PubMed
34%
Diseases linked to
genes – 1,630
Diseases with no gene
links – 3,251
66%
6,318 genes are linked to
1,630 diseases
Copyright GeneGo 2000-2003
Disease tree – Neoplasms by Site
CONFIDENTIAL
Copyright GeneGo 2000-2003
Drug toxicity tree
CONFIDENTIAL
38 Drug-induced pathological
processes
Folders from MeSH
Folders created at GeneGo
based on reviews
Copyright GeneGo 2000-2003
Gene-Disease connections in public domain and GeneGo
OMIM
GENE
Only genetic info (mutation, SNPs)
-No expression
- No protein activity, loc
CONFIDENTIAL
MeSH
Only citation with
Diseases name. Low trust
Only
hierarchical structure
disease tree
Public domain does not have structured information
about disease connectivity(by clinical classification)
and causative relations withgenes and proteins
GeneGo
•Hierarchical strusture
disease classification
•Genes associated with diseases
•Cited articles
4,888 diseases
6,429
33, 792
Copyright GeneGo 2000-2003
CONFIDENTIAL
Content. Cancer maps and networks. Breast Cancer: general scheme
Copyright GeneGo 2000-2003
Angiogenesis in tumor growth
CONFIDENTIAL
Copyright GeneGo 2000-2003
Fine metabolic differences between rodents, human
CONFIDENTIAL
Unique genes
Human
Mouse, Rat
Unique genes and orthologs catalyse one reaction
141 mouse genes
74 rat genes
There is no human orthologs
for Protein A
Unique genes catalyze unique reactions
9 mouse genes
2 rat genes
Orthologs catalyse different reactions
1 mouse gene
1 rat gene
Copyright GeneGo 2000-2003
CONFIDENTIAL
Tools
Copyright GeneGo 2000-2003
Data analysis workflow in MetaDiscovery suit
Molecular bio data
Metabolites
Structures
sdf, MOL
HTS, HCS
ISIS DB
Custom interactions data:
-Y2H
-Pull-down
-Co-expression
- annotation
HTS, HCS
MetaLink
PathwayEditor
CONFIDENTIAL
Custom maps,
networks, pathways
MapEditor
MetaCore/MetaDrug platform
Signature networks
-Diseases
-Drug response
P-value scoring
Ontologies:
-GO processes
-GeneGo processes
-Canonical pathways
-Metabolic networks
-Diseases
-Toxicities
Cross-experiment comparison
-Time series
- Multi-patient cohorts
- Multiple logical operations
-Complete report
Network alignment
- Multiple algorithms
- Sub-network queries
SBML, BioPax
Med. chemistry:
- Indications
- Toxicities
- Off-site effects
Biology:
- Biomarkers
- Pathway-based targets
Modeling software:
-CellDesigner
- Virtual Cell
-
Copyright GeneGo 2000-2003
CONFIDENTIAL
MetaCore™ Platform
Pathway editor
Statistics for pathways,
processes, networks
Data:m-arrays, SAGE, proteomics,
siRNA, metabolites, custom
interactions
Logical operations module
Networks
Building Tools
Visualization
Tools
curated interactions from the literature
Oracle Based Database
Copyright GeneGo 2000-2003
Pathways Integration
CONFIDENTIAL
Interactive, static maps
– 550 maps
– Signaling, regulation,
metabolism, diseases
– Backbone of formalized “state of
art” in the field
Networks of protein interactions
– Dynamic; built “on-the-fly”
– Exploratory tool
– Build new pathways for genes
of interest
Copyright GeneGo 2000-2003
Choose direction and checkpoints within network building page
CONFIDENTIAL
From – histamine
through – histamine H1 receptor
to – Actin
Copyright GeneGo 2000-2003
CONFIDENTIAL
False discovery rate filter
Threshold
0.01
Apply
i
Non-significant bars become semi-transparent
Copyright GeneGo 2000-2003
New customization modules
CONFIDENTIAL
•
MapEditor: custom maps synchronized with MC/MD database
– Draw pathways maps from scratch
– Transform gene lists into networks into pathway maps
– Edit MetaCore’s canonical maps
– View and score your maps within the context of canonical maps
– Map experimental data on custom maps
•
MetaLink: overlaying custom interactions
– Import custom interactions (Y2H, co-expression, pull-down, etc.)
– Visualize using GeneGo network building algorithms
– Score “unknown” proteins (high IP potential) based on relevance to
“benchmark” networks built from MetaCore interactions
•
PathwayEditor: annotation technology transfer, at the database level
– Custom annotation of interactions, compounds, diseases, metabolism in the
framework of internal annotation system at GeneGo
– Use the annotation forms, workflows and QC system developed at GeneGo
– Novel objects are imported and integrated with pre-existing data in MetaCore
Copyright GeneGo 2000-2003
Adding Localizations
CONFIDENTIAL
Additional
Localizations
can be added
Copyright GeneGo 2000-2003
Your NEW map is now an interactive part of MetaCore
CONFIDENTIAL
Users can
visualize
their
experimental
data on the
new map
Copyright GeneGo 2000-2003
Mapping interaction sets on networks
CONFIDENTIAL
Resulting Direct Interactions network
Pink interactions
are from the
uploaded links file
Mouse over an
interaction to see the
uploaded weight value
Blue interactions are in
both the links file and
the MetaCore
database
Copyright GeneGo 2000-2003
CONFIDENTIAL
Algorithms
Copyright GeneGo 2000-2003
CONFIDENTIAL
Old and new ways to analyze data
Current way of analysis:
all significance calculations done before mapping onto network
Full data
tables
Statistical
procedures,
thresholds of fold, pvalue either in MC
or 3rd party tools
Sets of genes
Connect them on network by
one way or another:
Too many choices, no clear way
to choose
New way of analysis:
significance calculations follow the mapping onto network
Full data
tables
Apply to global
network
Statistical
procedures in MC
based on concurrent
analysis of
expression profiles
and connectivity
Sets of network
modules
Copyright GeneGo 2000-2003
Samples are analyzed in pathway’s expression space
CONFIDENTIAL
Sample 1
Sample 2
Sample 3
Sample 4
Gene 1
1
4
3
2
Gene 2
4
2
7
6
Gene 3
2
9
3
8
Gene 4
2
5
4
2
Copyright GeneGo 2000-2003
Network signatures for compounds effects
Mestranol
Tamoxifen
CONFIDENTIAL
Phenobarbital
Phenobarbital
Copyright GeneGo 2000-2003
CONFIDENTIAL
Finding topologically significant nodes
Topologically significant
B
4 out 6 under nodes regulated by B are
differentially expressed: more than random share
= significant
A
Not topologically significant
C
Only 1 out of 6 nodes regulated by C is differentially
expressed: could be due to random event
= not significant
In reality algorithm also considers nodes beyond first-degree neighbors
Differentially expressed genes
Non-differentially expressed genes
Copyright GeneGo 2000-2003
Why JAK1 is significant in this dataset?
CONFIDENTIAL
Regulation via JAK1
Feedback loops
JAK1 provides essential network conduit between PLAUR and many differentially expressed targets of STAT1
Topological significance helps to find important links in pathways that do
not come up on HT screens
Copyright GeneGo 2000-2003
Regulation of lipid Metabolism
Topologically significant nodes
revealed by the new algorithm
CONFIDENTIAL
Differentially expressed genes
identified by microarray and
confirmed by proteomic screen
Copyright GeneGo 2000-2003
CONFIDENTIAL
Putting it all together: network activity inference
– Identifying causal relation between putative input and output signals
– Tracking effects of molecular perturbation trough activation/inhibition
cascades
Predicted input
Scoring intermediary nodes
Experimental
data
Experimental data:
terminate cascade
Predicted target
Experimental data:
start cascade
Z
Z
Z
Inferred activity
Copyright GeneGo 2000-2003
Work in progress
•
•
•
•
CONFIDENTIAL
Finding Patterns of significance (based on one experiment):
– Significant neighborhoods
– Significant receptors (by underlying cascade)
– Significant transcription factors (by upstream cascade)
– Significant interaction types (by distribution of expression at terminals)
Finding common and different pathway modules (based on multiple
samples:
– Looking for “differential pathways” - modules that distinguish one group of
samples from another
– Finding common motifs in a group of pathway modules
Inferring patterns of network activity
– Identifying causal relation between putative input and output signals
– Tracking effects of molecular perturbation trough activation/inhibition
cascades
Looking into mutual gene-process information and Bayesian inference of
significance
– If gene G occurs only in process P its up-/down-regulation is a significant
evidence with respect to inferring P’s status
– If gene G occurs in many other processes in addition to P its up-/downregulation is not a significant evidence with respect of inferring P’s status
Copyright GeneGo 2000-2003
CONFIDENTIAL
Future products
Copyright GeneGo 2000-2003
MetaMiner Consortiums for 2007
CONFIDENTIAL
• Oncology (breast cancer, 4 other cancers)
• Metabolic diseases (diabetes II, obesity,
metabolic syndrome)
• CNS and neurodegenerative diseases
• Immunological and autoimmune diseases
Copyright GeneGo 2000-2003
MetaMiner consortiums: Analytical platform for disease areas
HTS, HCS
Cancer consortium labs
CONFIDENTIAL
Cancer relevant annotations, datatabases,
Active cpds analysis creening
MetaMiner (Oncology) platform
Biomarkers:
-Combination of different types
- Expression
- Secreted proteins
- Metabolites
-Convergence hubs (core effectors)
Drug targets:
-Divergence hubs on networks;
- “Druggability” testing
- Pathways connectivity
Data parsing, normalization
Experimental data depository
• Maps for disease, processes, drug action
• Custom maps for projects
Compounds scoring:
- Indications
- Toxicities
- Off-site effects
Data analysis
Copyright GeneGo 2000-2003
MetaTox consortium. Functional descriptors
CONFIDENTIAL
Mapping on descriptors
Enrichment by category
Pathways maps
Toxicity, process maps
Sub-networks, modules, nodes
Predictive models
Indexing & scoring by tox. category
Copyright GeneGo 2000-2003