Download Genetic network inference: from co-expression clustering to reverse

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of types of proteins wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene regulatory network wikipedia , lookup

Transcript
Genetic network inference:
from co-expression clustering
to reverse engineering
Patrik D’haeseleer,Shoudan Liang
and Roland Somogyi
The goal of this review


Principles of genetic network
organization
Computational methods for extracting
network architectures from
experimental data
Outline






Introduction
A conceptual approach to complex network
dynamics
Inference of regulation through clustering of
gene expression data
Modeling methodologies
Gene network inference:reverse engineering
Conclusions and Outlook

Genes encode proteins, some of which
in turn regulate other genes
determine the structure of this
intricate network of genetic regulatory
interactions

Traditional approach: local

Examining and collecting data on a single
gene, a single protein or a single reaction
at a time
functional genomics
Functional Genomics

Specifically, functional genomics refers to the
development and application of global
experimental approaches to assess gene
function by making use of the information
and reagents provided by structural genomic.


high throughput
large scale experimental methodologies combined
with statistical and computational analysis of the
results.
Functional Genomics(Cont.)

We need to define the mapping from
sequence space to functional space.
Intermediate representation


Focus at the level of single cells
A biological system can be considered
to be a state machine,where the
change in internal state of the system
depends on both its current internal
state and any external inputs.
The goal

Observe the state of a cell and how it
changes under different circumstances,
and from this to derive a model of how
these state changes are generated

The state of cell

All those variables determining its behavior
Example

A simple,6-node regulatory network
Outline






Introduction
A conceptual approach to complex network
dynamics
Inference of regulation through clustering of
gene expression data
Modeling methodologies
Gene network inference:reverse engineering
Conclusions and Outlook


The global gene expression pattern is
the result of the collective behavior of
individual regulatory pathways
Gene function depends on its cellular
context; thus understanding the
network as a whole is essential.
Boolean Networks


Each gene is considered as a binary
variable—either ON or OFF—regulated
by other genes through logical or
Boolean functions.
Even with this simplification ,the
network behavior is already extremely
rich.
Boolean Networks(Cont.)

Cell differentiation corresponds to
transitions from one global gene
expression pattern to another.
Outline






Introduction
A conceptual approach to complex network
dynamics
Inference of regulation through clustering of
gene expression data
Modeling methodologies
Gene network inference:reverse engineering
Conclusions and Outlook
Scoring methods



Whether there has been a significant
change at any one condition
Whether there has been a significant
aggregate change over all conditions
Whether the fluctuation pattern shows
high diversity according to Shannon
entropy
Guilt By Association


Select a gene
Determine its nearest neighbors in
expression space within a certain userdefined distance cut-off
Clustering

extract groups of genes that are tightly
co-expressed over a range of different
experiments.
Caution


Different clustering methods can have
very different results
It’s not yet clear which clustering
methods are most useful for gene
expression analysis.
Definition:Gene Expression
Profile


An expression profile ej of an ordered
list of N samples(k=1 to N) for a
particular gene j is a vector of scaled
expression values vjk
The expression profile is:

ej=(vj1,vj2,vj3,…,vjN)
Definition:Gene Expression
Profile( Cont.)


A difference between two genes p and
q may be estimated as N-dimensional
metric “distance” between ep and eq.
Euclidean distance:

d pq =
2
(
v

v
)
 jp jq
j 1.. N
N
Clustering algorithms

Non-hierarchical methods


Cluster N objects into K groups in an
iterative process until certain goodness
criteria are optimized
E.g. K-means
Clustering algorithms

Hierarchical methods

Return an hierarchy of nested clusters,
where each cluster typically consists of the
union of two or more smaller clusters.

Agglomerative methods


Start with single object clusters and recursively
merge them into larger clusters
Divisive methods

Start with the cluster containing all objects and
recursively divide it into smaller clusters
Other applications of coexpression clusters

Extraction of regulatory motifs


Inference of functional annotation


Genes in the same expression share biological
funtions
Functions of unknown genes may be hypothesized
from genes with know function within the same
cluster
As a molecular signature in distinguishing cell
or tissue types

mRNA expression
Which clustering method to
use?


There is no single best criterion for
obtaining a partition because no precise
and workable definition of ‘cluster’
exists.
Clusters can be of any arbitrary shapes
and sizes in a multidimensional pattern
space.
Challenge in cluster analysis


A gene could be a member of several
clusters, each reflecting a particular
aspect of its function and control
Solutions


clustering methods that partition genes
into non-exclusive clusters
Several clustering methods could be used
simultaneously
Outline






Introduction
A conceptual approach to complex network
dynamics
Inference of regulation through clustering of
gene expression data
Modeling methodologies
Gene network inference:reverse engineering
Conclusions and Outlook
Level of biochemical detail

abstract


Boolean networks
concrete

Full biochemical interaction models with
stochastic kinetics in Arkin et al.(1998)
Forward and inverse modeling


Forward modeling approach
Inverse modeling, or reverse
engineering


Given an amount of data, what can we
deduce about the unknown underlying
regulatory network?
Requires the use of a parametric model,
the parameters of which are then fit to the
real-world data.
Outline






Introduction
A conceptual approach to complex network
dynamics
Inference of regulation through clustering of
gene expression data
Modeling methodologies
Gene network inference:reverse engineering
Conclusions and Outlook
Goal of network inference


Construct a coarse-scale model of the
network of regulatory interactions
between the genes
It’s possible to reverse engineer a
network from its activity profiles
Data requirements

We need to observe the expression of
that gene under many different
combinations of expression levels of its
regulatory inputs


Use data from different sources
Deal with different data types
Estimates for network models

a sparse network model of N genes,
where each gene is only affected by K
other genes on average.
a sparsely connected, directed graph
with N nodes and NK edges.
Estimate for network
models(Cont.)

To specify the correct model, we need

log C
NK
N2

2
N !
 log
2
( NK )!( N  NK )!
 NK log( N / K )
bits of information.
Correlation Metric Construction



Adam Arkin and John Ross
A method to reconstruct reaction
networks from measured time series of
the component chemical species.
The system is driven using inputs for
some of the chemical species and the
concentration of all the species is
monitored over time.
Correlation Metric
Construction(Cont. )




The time-lagged correlation matrix is
calculated
From this a distance matrix is constructed
based on the maximum correlation between
any two chemical species
This distance matrix is then fed into a simple
clustering algorithm to generate a tree of
connections between the species
The results are mapped into a twodimensional graph for visualization
Additive regulation models

Property


The regulatory inputs are combined using
a weighted sum
Can be used as a first-order
approximation to the gene network
Additive regulation models

The change in each variable over time is
given by a weighted sum of all other variables



yi   w ji y j  bi
is the level of the
i-th varibale
j
yi is a bias term indicating whether I is expressed
not in the absence of regulatory inputs
bof
i
represents the influence of j on the regulation of
wi ji
Use of such models

We can infer regulatory interactions
directly from the data, by fitting these
simple network models to large scale
gene expression data.
Outline






Introduction
A conceptual approach to complex network
dynamics
Inference of regulation through clustering of
gene expression data
Modeling methodologies
Gene network inference:reverse engineering
Conclusions
Conclusion


Conceptual foundations for
understanding complex biological
networks
Several practical methods for data
analysis