Download FG-NEMs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Therapeutic gene modulation wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Essential gene wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Heritability of IQ wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Epistasis wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene wikipedia , lookup

Genome evolution wikipedia , lookup

Genomic imprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Minimal genome wikipedia , lookup

Microevolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Probabilistic models for interpreting
perturbations in networks: Nested
effect models
Sushmita Roy
[email protected]
Computational Network Biology
Biostatistics & Medical Informatics 826
Computer Sciences 838
https://compnetbiocourse.discovery.wisc.edu
Dec 6th 2016
Types of algorithms used to examine
perturbations in networks
• Graph diffusion followed by subnetwork finding methods
– HOTNET
– NETBAG
• Information flow-based methods (also widely used for
integrating different types of data)
– Prize collecting steiner tree
– Min cost max flow
• Probabilistic graphical model-based methods
– Factor graphs
– Nested Effect Models (NEMs)
Probabilistic graphical models for interpreting
network perturbations
• C.-H. H. Yeang, T. Ideker, and T. Jaakkola, "Physical network
models." Journal of computational biology : a journal of
computational molecular cell biology, vol. 11, no. 2-3, pp. 243262, Mar. 2004.
• F. Markowetz, D. Kostka, O. G. Troyanskaya, and R. Spang,
"Nested effects models for high-dimensional phenotyping
screens," Bioinformatics, vol. 23, no. 13, pp. i305-312, Jul.
2007.
• C. J. Vaske, C. House, T. Luu, B. Frank, C.-H. H. Yeang, N. H.
Lee, and J. M. Stuart, "A factor graph nested effects model to
identify networks from genetic perturbations." PLoS
computational biology, vol. 5, no. 1, pp. e1 000 274+, Jan.
2009.
Motivation of nested effect models
• Perturbation of genes followed by high-throughput profiling
of different phenotypes can be used to characterize functions
of genes
• However, most genes do not function independently but
interact in a network to drive a particular function
• Phenotypic measurements (e.g. mRNA levels) are indirect
measurements of the underlying network structure
– Includes direct and indirect effects
• Given perturbation data from multiple genes, can we more
systematically identify the functions of these genes and how
they interact at a pathway level?
Problem overview
• Given
– global measurements of gene expression after single gene
deletions of multiple genes
• Do
– Infer interactions between genes with deletions to enable
further characterization of these genes
• Nested Effect Models are probabilistic model-based
approaches to solve this problem
Nested Effect Models
Markowetz et al, 2007
Nested Effect Models Key properties
• A generalization of similarity based clustering
• Orders the clusters according to subset relationships
– A gene A is upstream of another gene B if B’s effects are a
subset of A’s effects
• Build a hierarchy of all perturbed genes by constructing from
smaller sub-models of pairs and triplets of genes
Subset relationships to order genes
A complete model. The left part of the figure shows a complete model M’xyz consisting
of a transitively closed graph between genes and assignments of genes to specific
effects (the dashed arrows). Given the complete model, we can formulate a prediction
of what effects to expect: perturbing x should cause all effects, while perturbing y
should only cause E3–E6, and perturbing z only E5 and E6 (middle plot). In reality, our
observations will be noisy: there can be false positive (FP) and false negative (FN) effect
observations (right plot).
Probabilistic graphical models for interpreting
network perturbations
• C.-H. H. Yeang, T. Ideker, and T. Jaakkola, "Physical network
models." Journal of computational biology : a journal of
computational molecular cell biology, vol. 11, no. 2-3, pp. 243262, Mar. 2004.
• F. Markowetz, D. Kostka, O. G. Troyanskaya, and R. Spang,
"Nested effects models for high-dimensional phenotyping
screens," Bioinformatics, vol. 23, no. 13, pp. i305-312, Jul.
2007.
• C. J. Vaske, C. House, T. Luu, B. Frank, C.-H. H. Yeang, N. H.
Lee, and J. M. Stuart, "A factor graph nested effects model to
identify networks from genetic perturbations." PLoS
computational biology, vol. 5, no. 1, pp. e1 000 274+, Jan.
2009.
Key properties of Factor Graph-NEMs (FGNEMs)
• NEMs assume the genes that are perturbed interact in a
binary manner
• But many interactions have sign
– inhibitory or stimulating action
• FG-NEMs capture a broader set of interactions among the
perturbed genes
• Formulation based on a Factor Graph
– Provide an efficient search over the space of NEMs
Notation
•
•
•
•
•
S-genes: Set of genes that have been deleted individually
E-genes: Set of effector genes that are measured
Θ: The attachment of an effector gene to the S-gene network
Φ: The interaction matrix of S-genes
X: The phenotypic profile, each column gives the difference in
expression in a knockout compared to wild type
– Rows: E-genes
– Columns: S-genes
• Y: Hidden effect matrix, each entry is {-1, 0, +1} which
specifies whether an S-gene affects the E-gene
An example of 4 S-genes and 13 E-gens
A->C is reflected
in the scatter
plot.
When XA is up, X
C is up. When XA
is down, XC is
down or no
change
E-genes
X
Φ
S-genes
Θ
B-ID is also
reflected in the
scatter plot.
XD is a subset of
opposite
changes from XB
S-gene interaction modes and their
expression signatures
Interaction mode
Figure 1. Predicting Pair-wise Interaction Using Quantitat i ve Nested Effects. (A) Hypothetical example with four S-genes, A, B, C, a
with
expectation
of E-gene expression under knockdown of each S-gene shows both inhib
A heatmap
link, BxD (left).
graph contains one inhibitoryConsistent
stimulatory effects (middle). Scatter plots of the C, A, B, and D knock-outs show that expression fits in the shaded preferred regions of each in
(right). The inhibitory link explains some of the ‘‘observed’’ data: expression changes under DD (bright red or bright green entries in the
Expected trend in E-genes for specific interaction modes
expectation
Expression levels
interaction.
known inhibitorywith
occur in a subset of the E-genes for which the opposite changes occur in DB. (B) Data from aInconsistent
genes under the DIG1/DIG2 knock-out (y-axis) plotted against their levels under the STE2 knock-out (x-axis) as detected in [17]. Expression
significant at a = 0.05 indicated in gray lines. DIG1/DIG2 is known to inhibit STE12. (C) Interaction modes. Observed E-gene expression cha
compared to five possible types of interactions between two S-genes, A and B (i–v). The top row illustrates the expected nested effects rel
Factor graphs
• A type of graphical model
• A bi-partite graph with variable nodes and factor nodes
• Edges connect variables to potentials that the variables are
arguments of
• Represents a global function as product of smaller local
functions
• Perhaps the most general graphical model
– Bayesian networks and Markov networks have factor graph
representations
Example factor graph
Variable nodes
Factor nodes
From Kschischang, Frey, Loeliger 2001
Probabilistic model for NEMs
• Goal is to find a network, Φ and Θ that best fit the observed
data (X)
• This is an inference problem
• Use a Maximum a posterior (MAP) approach
X is a noisy measurement of Y. Y is the quantity we need to sum over
is attached to a single S-gene and that each E-gene observation
vector across the knock-downs is independent of other E-gene
observations. The maximization function can then be written:
Probabilistic model continued
(
J ðX Þ~ maxW,H
X
PðWÞ
Y
)
P PðY ejW,heÞPðX ejY eÞ
e[ E
ð5Þ
Independence over all E-genes
(
~ maxW,H PðWÞP
X
e[ E
)
PðY ejW,heÞPðX ejY eÞ
ð6Þ
Y
Re-arranging the terms
~ maxW,H PðWÞP L e
e[ E
ð7Þ
where Xe and Ye are the row vectors of data and hidden states for
E-gene e respectively, and he records the attachment point
How to compute Le?
Previous approaches decompose Le over the knock-downs,
which assume the S-gene observations are independent given the
Digging
inside
theforLan
e term
network and
attachments
(see [18]
example of such a
derivation). To facilitate scoring the expanded set of interaction
• Note:
Ye={YeA,YeB, Yearlier,
N is the total
Smodes
mentioned
we replace
Le number
with aof function
eC.. YeN}, where
genes to Le, Le9. Le9 is defined as a product of pair-wise Sproportional
gene
terms:L’e, proportional to Le using a set of pairwise potentials
• Define
L ’e~
X
P PðY eA ,Y eB jwAB ,heAB ÞPðX eA jY eA ÞPðX eB jY eB Þ ð8Þ
Y eA,
A,B[ S Y
eB
where
heAB represents
the attachment of E-gene e relative to the
•
The S- gene interaction
pair of S-genes A and B. Note that both heAB and wAB are indexed
•
Attachment of gene e with respect to A or B
by the unordered pair, { A, B} , so that wAB and wBA are references
for the same variable. We refer to heAB as e’s local attachment which
gene. wAB defines the mode of interaction between S-genes A and
B. Assuming the replicates are independent given the E-gene
states, P(XeA | YeA) can be written as a product over replicate
terms: P PðX eAr jY eA Þ, where P(XeAr | YeeA) is modeled with a
Digging inside the L term
r[ RA
Gaussian distribution having mean m:Y eA and standard deviation
• s estimated
Thefrom
S- gene
the interaction
data (see Text S1).
Substituting
Le9 for
Le into
Eq.respect
(7) and
the
•
Attachment
of gene
e with
to Adistributing
or B
maximization over attachment points, we obtain the maximizing
function used in our approach:
Now the joint can be 8
written in a more tractable way
<
J ðX Þ~ maxW PðWÞ P max
e[ E, heAB
:
A,B[ S
X
)
ð9Þ
PðY eA ,Y eB jwAB ,heAB ÞPðX eA jY eA ÞPðX eA jY eA Þ
Y eA ,Y eB
Each of these conditional distributions will correspond to a factor
The interaction factors P(YeA, YeB | wAB, heAB) have a value of one if
the E-gene eis attached to either A or B and e’s state is consistent
s
m
i
i
s
f
s
i
i
a
s
f
p
p
I
G
f
<
X Þ~ maxW PðWÞ P max
e[ E, hthe
: Defining
eAB factors
A,B[ S
)
PðY eA ,Y eB jwAB ,heAB ÞPðX eA jY eA ÞPðX eA jY eA Þ
Y eB
Modeled as Gaussian distributions
Four variable factor, over discrete variables
action
factors
A, YeB | wAB, heAB) have a value of
Y : binary
variablesP(Ye
ne eis attached
to
either
A
or
B
and
e
’s
state
is
cons
Four values for each possible type of interaction: inhibitory, activating, equivalent
no interaction
interaction
mode between A and B. If e’s sta
Interaction of e with A or B: inhibited or activated by A or B or no action
ent with the
interaction and attachment, then the
This factor
has value=1
E-genehard
e is attached
to either A or B and
state is consistent
e zero.
While
weif the
used
constraints
toe’smodel
cons
with the interaction mode between A and B.
nsistent expression changes (corresponding to the
eA
should exhibit transitivity to force pair-wise interaction mo
be consistent among all triples. Using transitivity, all pathsbe
any two
genes,
A andover
B, areS-gene
guaranteedgraph
to have the same o
The
prior
effect; i.e. the product of the signsof individual linksalong di
between
A and
B are equal.
• paths
The prior
P(Φ ) can
incorporate
prior knowledge of
In order among
to preserve
transitivity of identified inter
interactions
genes inthe
pathways
the prior
is decomposed
over interaction
configur
• modes,
At its simplest,
it should
encode a transitivity
relationship to
into
transitivity
onbeallconsistent
triples ofamong
S-genes;
force
all pairwiseconstraints
interactions to
all i.e.:
triples
PðWÞ!
P
A,B,C[ S
t ABC ðwAB ,wBC ,wAC Þ
P r AB ðwAB Þ
A,B[ S
Transitivity constraint for triples
Physical network
constraints
where t iszero if the triple of interactions
are intransitive, an
if the interactions are transitive (see Text S1 for full defin
Then,
A->Cto find con
Example
If A
B, forces
B
C,the
Usingtransitivity:
transitivity constraints
search
models that best explain the observed changes. The tran
Factor graph representation of NEMs
Prior
Figure 2. Structure of the factor graph for net work inference
The factor graph consists of three classes of variables (circles) and three
Inference on the factor graph
•
•
•
•
Find most likely configurations for
Use a message passing algorithm (standard for factor graphs)
Called the Max-Product algorithm
Message passing happens in two steps
– Messages are passed from observations XeA to the
– Messages are passed between the interaction and
transitivity factors until convergence
To make the comparison of FG-NEM to uFG-NE
measured network recovery in two ways. 1) We ca
measure of structure recovery: a predicted interaction
correct if it matched an interaction (of either sign) in the
Does FG-NEM capture activating and
inhibitory relationships?
FG-NEM: capture inhibitory and
activating relationships
uFG-NEM: capture only unsigned
interactions
FG-NEM AVT: FG-NEM run on
absolute value data
Solid lines: structure recovery
Dashed lines: sign recovery
of FG-NEM to uFG-NEM fair, we
in two ways. 1) We calculated a
a predicted interaction was called
ction (of either sign) in the simulated
linearly with increasing fraction of inhibition. G
results, we expect FG-NEMs to have significa
performance on real genetic networks where
amounts of inhibition exist (see Figure S1). We also
Does FG-NEM expand pathways better
than the baseline approach
Right shift in percentile rank
difference of NEM methods
e.
ee
ne
EwAB
rs
on
in
m
es
ng
wise attachments for a single E-gene connection variable heAB,
provide local ‘‘best guesses’’ for e’s attachment. Rather than
aggregate e’s collection of local attachments, we use NEM scoring,
modified to incorporate both stimulatory and inhibitory attachments, to estimate the attachment point using the full network
learned
in the
previous
(seenetwork
Text S1).
Attach new
E-genes
to step
S-gene
calculate
a log-likelihood
thatthat
measures
the degree to
AnWe
attached
gene
e to S-gene s ratio
asserts
e is directly
which
e’s expression
data is explained by the network if it is
downstream
of s
attached to one of the S-genes compared to being disconnected
All E-genes attached to the S-gene network are called frontier
from the network, i.e. its likelihood was generated entirely by the
genes
background Gaussian distribution. For E-gene e, we compute the
An E-gene’s connectivity
is examined
based on the Loglog-likelihood
of attachment
ratio (LAR):
likelihood Attachment Ratio
0
1 One of the S genes
max PðX ejW,he~ i Þ
i= 0
A,
L ARðeÞ~ log@
PðX ejW,he~ 0Þ
Pathway expansion
•
•
•
•
where hehere represents Markowetz et. al’s attachment parameter
expanded to include inhibitory and stimulatory attachments. We
FG-NEM based pathway expansion in
yeast
Factor Graph Nested Effects Mode
Template matching: rank E genes based on similarity in expression to an “idealized template”
FG-NEM infers a more accurate network
than the unsigned version in yeast
• FG-NEM and uFG-NEM networks inferred in
the ion-homeostasis pathway
• FG-NEM inferred more genes associated with
ion homeostatis compared to uFG-NEM
FG-NEM application to colon cancer
Factor Graph Nested Effects Model
Most
interactions in
S-gene
network are
activating
Novel expanded genes
that have significant
effect on the invasive
phenotype
Figure 5. Invasive colon cancer network predictions. (A) Expression changes of selected E-genes following targeted S-gene knock-downs in
Summary
• FG-NEMs: A general approach to infer an ordering of genes
from knock-down phenotypes
• Strengths
– FG-NEMs could be used in an iterative computationalexperimental framework
– Handles signed interactions between S-genes
• Weaknesses
– Computational complexity of the inference procedure
might be high
• Required independence among E-genes
• Model pairs of S-genes at a time
Concluding remarks
• We have seen a suite of problems, algorithms and applications in a
real setting
• These ranged from network inference, dynamic network inference,
network modules, network alignment and network-based
interpretation
• What we did not see (or saw less of)
– Integration of different types of networks
– Experimental design for better learning of networks
– More topological properties of networks
– Subnetwork network analysis
• If you remain interested in these topics or would like to learn more,
send me an email