Download Network inference from repeated observations of node sets Neil

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene regulatory network wikipedia , lookup

Transcript
Network inference from repeated
observations of node sets
Neil Clark, Avi Ma'ayan
Network Inference
Protein-Protein interaction network
Cell signaling network
Overview
• Network inference - the deduction of an
underlying network of interactions from indirect
data.
1. A general class of network inference problem
2. Network inference approach
3. Application:
1. inference of physical interactions: PPI
2. Inference of gene associations: Stem cell genes
3. inference of statistical interactions: Drug/side effect
network
GMT files
The inference problem
• Input: a set of entities (genes or proteins or ...) in the form
of a GMT file - the results of experiments, or sampling
more generally.
• Assumptions:
• 1 An underlying network exists which relates the
interactions between the entities in the GMT file
• 2 Each line of the GMT file contains information on the
connectivity of the underlying network
• The problem: Given a GMT file can we extract enough
information to resolve the underlying network?
A synthetic example
Approach...
• Forget for the moment that we know the underlying network and
pretend we only have the GMT file.
• Attempt to use the accumulation of our course data to infer the fine
details of the underlying network.
• Consider the set of all networks that are consistent with our data there are likely to be many.
• Use an algorithm to sample this ensemble of networks randomly.
• The mean adjacency matrix gives the probability of each link being
present within the ensemble.
Inference live!
Information content
Analytic Approximation
• When applying this approach to real data typically there are large
numbers of nodes
• Sample space of networks can be very large -> computationally
demanding
• Write a simple analytical approximation which mimics the action of
the algorithm.
𝑝𝑖𝑗 = 1 −
𝑘
2𝛼
1−
𝑛𝑖𝑗𝑘
Compare analytic approximation
Correction for sampling bias
• Destroy any information by a random permutation of the GMT file
and compare the actual edge weight to the distribution of edge
weights from the randomly permuted GMT files:
Application to Infer PPIs
Malovannaya A et al. Analysis of the human endogenous coregulator complexome.
Cell. 2011 May 27;145(5):787-99
PPI network
Validataion
• Compare inferred PPI network to the following
databases:
– BioCarta
– HPRD PPIInnateDB
– IntAct
– KEGG
– MINT mammalia
– MIPS
– BioGrid
Comparison
Validation
Validation
Application to stem cells
• We used two types of high-throughput data from the
ESCAPE database (www.maayanlab.net/ESCAPE).
• Chip X data: from Chip-Chip and Chip-seq experiments.
– 203,190 protein DNA binding interactions in the proximity
of coding regions from 48 ESC-relevant source proteins.
• Logof followed by microarray data: A manually
compiled database of Protein-mRNA regulatory
interactions deriving from loss-of-function gain-offunction followed by microarray profiling.
– 154,170 interactions from 16 ESC-relevant regulatory
proteins from loss-of-function studies, and 54 from gainof-function studies.
Chip X network
Logof network
Combining networks
• Each data source gives a different perspective on
the associations between the genes
• New insights may possibly be gained by
combining the different perspectives. e.g. small
but consistent associations across different
perspectives will be revealed by the enhanced
signal-to-noise ratio.
𝑝𝑖𝑗 = 1 −
1−
𝑘1
2𝛼
𝑛𝑖𝑗 𝑘 1
1−
𝑘2
2𝛽
𝑛𝑖𝑗 𝑘 2
…
… …
Combination of Chip X and Logof
An extension of the approach...
Application II: Inference of Network of
statistical relationships in AERS database
• Adverse Event Reporting System (AERS) database contains records of
....
AERS Record 1 Drug 1, Drug 2, ...
AERS Record 2 Dug 3, Drug 4, ...
…
…
Side-effect 1, Side-effect 2, ...
Side-effect 3, Side effect 4, ...
AERS sub network
AERS Large-scale Adjacency Matrix
And finally…
Summary
• We described a general class of problem in network
inference.
• A network of physical interactions between proteins is
inferred based on high-throughput IP/MS experiments
• The method has been applied to examine associations
between stem-cell genes from multiple perspectives
• We have begun to apply the approach to the inference
of statistical interactions between drugs and sideeffects based on the AERS database
• More details can be found on the
website •
www.maayanlab.net/S2N