Download How to reconstruct a large genetic network from n gene

Document related concepts

Epigenetics of neurodegenerative diseases wikipedia , lookup

Human genetic variation wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

RNA-Seq wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene wikipedia , lookup

NEDD9 wikipedia , lookup

Genome evolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene desert wikipedia , lookup

Gene therapy wikipedia , lookup

History of genetic engineering wikipedia , lookup

Public health genomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene nomenclature wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression programming wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
How to Reconstruct a Large Genetic
Network from n Gene Perturbations in
fewer than n2 Easy Steps
Andreas Wagner, Bioinformatics, vol. 17, No. 12, 2001, pp. 1183-1187.
Speaker: Chuang Chieh Lin
Advisor: Professor R. C. T. Lee
National Chi-Nan University
CSIE in National Chi-Nan University
1
Outline
Introduction and basic definitions
Graph theoretical framework
Parsimonious network
Algorithm and complexity
Cycles in genetic networks
Conclusions
References
CSIE in National Chi-Nan University
2
Outline
Introduction and basic definitions
Graph theoretical framework
Parsimonious network
Algorithm and complexity
Cycles in genetic networks
Conclusions
References
CSIE in National Chi-Nan University
3
Introduction and basic definitions
Gene activity includes whether a gene is expressed or
not, as mRNA, as protein etc..
Gene network: In this paper, we define a genetic
network as a group of genes in which individual gene
can influence the activity of other genes.
The core task of reconstructing genetic networks is to
identify the causal structure of a gene network.
CSIE in National Chi-Nan University
4
To reconstruct a genetic network is to identify, for
each network gene, which other genes and their
activity the gene influences directly.
Now, let’s see an illustration of genetic network.
CSIE in National Chi-Nan University
5
transcription
factor
protein
kinase
protein
transcription
phosphatase
factor
inactive
inactive
P
protein
P
DNA
Gene 1
Gene 2
active
active
Gene 3
Gene 4
Gene 5
This is a hypothetical biochemical pathway involving two
transcription factors, a protein kinase and a protein phosphatase,
as well as the genes encoding them.
CSIE in National Chi-Nan University
6
Genetic perturbation: an experimental manipulation
of gene activity by manipulating either a gene itself or
its product. It includes point mutations, gene
deletions, or other interference with the activity of the
product.
CSIE in National Chi-Nan University
7
transcription
factor
protein
kinase
protein
transcription
phosphatase
factor
inactive
inactive
P
protein
P
DNA
Gene 1
Gene 2
Genetic perturbation: gene deletion
Aspect of gene activity: mRNA expression
G1:
G2:
G3:
G4:
G5:
active
active
Gene 3
Gene 4
Gene 5
Genetic perturbation: gene deletion
Aspect of gene activity: phosphorlation state
G2, G5
G5
G5
G5
G1:
G2:
G3:
G4:
G5:
CSIE in National Chi-Nan University
G3, G4
G3, G4
G4
8
Outline
Introduction and basic definitions
Graph theoretical framework
Parsimonious network
Algorithm and complexity
Cycles in genetic networks
Conclusions
References
CSIE in National Chi-Nan University
9
Graph theoretical framework
As the previous instance indicated, we are
concerned with qualitative information on gene
interaction.
We consider a “digraph”, a graph representation of
genetic networks, to this qualitative information.
A digraph is a directed graph consisting of nodes
and directed edges.
Let’s see an example.
CSIE in National Chi-Nan University
10
We use a → b to mean that gene a influence the activity of gene
b directly. For brevity, genes will be labeled by numbers from
now on.
1
13
17
18
4
8
7
19
9
6
3
2
11
20
10
15
5
12
16
0
14
CSIE in National Chi-Nan University
11
Adjacency list: for each gene i, it simply shows which
genes’ activity state the gene i influences directly.
We denote Adj(G) to be the adjacency list of graph G
and Adj(i) to be the set of nodes (genes) adjacent to
(directly influenced by) node i.
CSIE in National Chi-Nan University
12
Adjacency list of G:
1
13
17
18
4
8
7
19
9
6
3
2
11
20
10
15
5
12
16
0
G
14
0:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
CSIE in National Chi-Nan University
16
2 5 8
12
5 12
2 17
10 15
1 20
20
14
8 17
0
0
2
8
8
6 18
13
Accessibility list: the list of perturbation effects or the
list of regulatory effects. It shows all nodes (genes)
that can be accessed (influenced in their activity state)
from a given gene by paths of direct interactions.
We denote Acc(G) to be the accessibility list of the
graph G and Acc(i) to be the set of nodes that can be
reached (influenced) from node (gene) i.
CSIE in National Chi-Nan University
14
Accessibility list of G:
1
13
17
18
4
8
7
19
9
6
3
2
11
20
10
15
5
12
16
0
14
0:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
2 16
0 2 5 8 12 14 16
0 2 12 14 16
0 2 5 12 14 16
2 8 17
0
0
0
0
8
0
0
2
8
1 2 5 6 10 12 14 15 16 18 20
1 2 5 6 12 14 16 18 20
2 5 6 12 14 16 18 20
2 14 16
17
2 16
2 16
8
0 2 5 6 12 14 16 18
G
CSIE in National Chi-Nan University
15
Outline
Introduction and basic definitions
Graph theoretical framework
Parsimonious network
Algorithm and complexity
Cycles in genetic networks
Conclusions
References
CSIE in National Chi-Nan University
16
Before proceeding with the algorithm, we have to
give some concepts and theorems first.
CSIE in National Chi-Nan University
17
The most parsimonious network
An acyclic digraph defines its accessibility list, but an
accessibility list may have more than one
corresponding acyclic digraph.
Let’s see an example first.
CSIE in National Chi-Nan University
18
(d) is the most parsimonious network of Acc, i.e., (a).
0:
1:
2:
3:
4:
5:
0
1 2 3 4 5
2 3 4 5
3 4 5
1
2
5
4
(a)
5
(b)
0
0
1
1
2
2
4
3
3
5
4
3
5
(c)
(d)
CSIE in National Chi-Nan University
19
An accessibility list Acc and a digraph G are
compatible if G has Acc as its accessibility list. Acc
is the accessibility list induced by G.
Gpars is called the most parsimonious network
compatible with Acc.
CSIE in National Chi-Nan University
20
Why we prefer the most parsimonious
network?
We prefer simplest or most parsimonious one of gene
network.
For any accessibility list Acc of a digraph G, there
exists a most parsimonious network Gpars. (From a
result of a theorem.) Therefore Gpars is the core of all
the corresponding digraphs.
More complicated digraphs make people confused.
CSIE in National Chi-Nan University
21
Theorem 1
Let Acc be the accessibility list of an acyclic digraph.
Then there exists exactly one graph Gpars that has Acc
as its accessibility list and that has fewer edges than
any other graph G with Acc as its accessibility list.
Before starting the proof, we need to introduce some
terminology.
CSIE in National Chi-Nan University
22
Range and shortcut
Consider two nodes i and j of a digraph that are
connected by an edge e. The range r of the edge e is
the length of the shortest path between i and j in the
absence of e. If there is no other path connecting i
and j, then r : = .
An edge e with range r ≥ 2 but
shortcut.
is called a
Let’s see an example.
CSIE in National Chi-Nan University
23
e
i
j
r (e) = k + 1
zk
z1
zk-1
e is a shortcut. When eliminating e, i
and j are still connected by a path of
length k + 1, so r (e) = k + 1.
z2
zk-2
CSIE in National Chi-Nan University
24
Lemma 1
For any accessibility list Acc of a digraph, there exists
a compatible graph Gpars that is free of shortcuts.
CSIE in National Chi-Nan University
25
Proof of Lemma 1
Assume that there is no such graph Gpars.
ei
yi
xi
yi
xi
deleting ei
Pi
Pi
Length of Pi is greater than 1.
If there exists a shortcut ei between xi and yi , delete ei . Then
by the definition of shortcut, we’ll derive that xi and yi are still
connected via Pi , whose length is greater than 1.
CSIE in National Chi-Nan University
26
Suppose that we have n possible (xi , yi), i.e., (x1 ,
y1), …, (x1, xn). After repeating all possible (xi , yi), i
= 1, …, n, we’ll derive a shortcut-free graph
compatible with the accessibility list. This is a
contradiction to the assumption made in the
beginning of this proof.
CSIE in National Chi-Nan University
27
Lemma 2
Assume that Acc is the accessibility list of a digraph
G. For each node x, the adjacency list Adj(x) of a
shortcut-free graph Gpar compatible with Acc is a
subset of the adjacency list Adj(x) of any graph
compatible with Acc.
CSIE in National Chi-Nan University
28
Proof of Lemma 2
Assume that Lemma 2 is false.
W. L. O. G., suppose that a shortcut-free graph Gpars
and some other graph G induce Acc.
By assumption, Gpars contains at least one node x so
that Adj(x) of Gpars contains at least one node y that
isn’t in Adj(x) of G.
CSIE in National Chi-Nan University
29
Because G and Gpars have the same accessibility list
Acc, there must exist some path x → z1 → z2 → … →
zk → y from x to y in G. For the same reason, z1 is
accessible from x in Gpars, z2 from z1 in Gpars, … and
zk from zk-1 in Gpars.
Therefore we can find two paths (x →…→y) in Gpars:
(1) the edge e between x and y
(2) the path x → z1 →z2 →… →zk →y
This is in contradiction to the assumption that Gpars is
shortcut-free because e is a shortcut.
Let’s see an example!
CSIE in National Chi-Nan University
30
Acc:
x: z1 z2 y
z1 : z2 y
z2 : y
Adj(Gpars):
x
z1
x: z1 y
z1: z2
z2 : y
Adj(G):
x: z1 z2
z1: z2
z2 : y
x
z1
A shortcut!
z2
y
z2
y
Gpars
G
CSIE in National Chi-Nan University
31
Corollary 1
The shortcut-free graph Gpars compatible with Acc is a
unique graph with the fewest edges among all graphs
G compatible with Acc.
This corollary follows immediately from Lemma 2.
CSIE in National Chi-Nan University
32
Now, we can proceed to the algorithm.
CSIE in National Chi-Nan University
33
Outline
Introduction and basic definitions
Graph theoretical framework
Parsimonious network
Algorithm and complexity
Cycles in genetic networks
Conclusions
References
CSIE in National Chi-Nan University
34
1:
2:
for all nodes i of G
Adj(i) = Acc(i)
3:
4:
5:
6:
for all nodes i of G
if node i hasn’t been visited
call PRUNE_ACC(i)
end if
7:
8:
9:
10:
11:
12:
13:
PRUNE_ACC(i)
for all nodes j Acc(i)
if Acc(j) =
declare j as visited.
else
call PRUNE_ACC(j)
end if
14:
15:
16:
17:
18:
19:
20:
for all nodes j Acc(i)
for all nodes k Adj(j)
if k Acc(i)
delete k from Adj(i)
end if
declare node i as visited
end PRUNE_ACC(i)
CSIE in National Chi-Nan University
A recursive pruning algorithm
to reconstruct the most
parsimonious graph from an
accessibility list.
35
This algorithm is based on the following theorem, so
we have to get something from the theorem.
CSIE in National Chi-Nan University
36
Theorem 2
Let Acc(G) be the accessibility list of an acyclic
digraph, Gpars its most parsimonious graph, and V(Gpars)
the set of all nodes of Gpars. Then the following identity
holds:
In stead of proving the theorem, we give an example
later.
CSIE in National Chi-Nan University
37
0:
1:
2:
3:
4:
5:
1 2 3 4 5
2 3 4 5
3 4 5
0:
1:
2:
3:
4:
5:
5
1
2 3 4 5
3 4 5
0:
1:
2:
3:
4:
5:
5
1
2
3 4 5
5
Original Acc(G)
0
1 via 2, 3, 4, 5
0 via 1, 2, 3, 4, 5
1
1
1
2
4
0
0
3
5
2
2
4
3
5
4
3
5
A possible corresponding G
CSIE in National Chi-Nan University
38
0:
1:
2:
3:
4:
5:
0:
1:
2:
3:
4:
5:
1
2
3 4 5
5
2 via 3, 4, 5
1
5
1
2
3 4
5
0
4 via 5
1
1
2
2
5
0:
1:
2:
3:
4:
5:
0
0
4
1
2
3 4
3
4
2
4
3
5
3
5
The most parsimonious network
CSIE in National Chi-Nan University
39
Actually, the aforementioned example is an
illustration of our algorithm.
From this theorem, we can derive Corollary 2.
CSIE in National Chi-Nan University
40
Corollary 2
Let i, j and k be any three pairwise different nodes of
an acyclic directed shortcut-free graph G. If j is
accessible from i, then no node k accessible from j is
adjacent to i.
i
A shortcut !!
j
k
CSIE in National Chi-Nan University
41
Computational complexity
Let k < n − 1 be the average number of entries in a
node’s accessibility list.
Assume that there are n genes, that is, n entries.
CSIE in National Chi-Nan University
42
During execution, each node accessible from a node j
induces one recursive call of PRUNE_ACC, after which
the node accessed from j is declared as visited. Thus
each entry of the accessibility list of a node is explored
no more than once.
Line 15 of the algorithm loops over all nodes adjacent to
a node j. Let a denotes the average number of entries in
Adj(j).
The overall computational complexity would be O(nka).
CSIE in National Chi-Nan University
43
For practical matters, large scale experimental gene
perturbations in the yeast Saccharomyces cerevisiae
(n ≈ 6300) suggest that k < 50 ([HMJRS2000]), a ≤ 1
([W2001a]) and thus nka << n2.
CSIE in National Chi-Nan University
44
Storage complexity
The algorithm stores two copies of the accessibility
list, as well as a list of the nodes that has been visited.
Because the graph is acyclic, the recursion depth can
be no greater than n − 1.
Note that k < n − 1 is the average number of entries in
a node’s accessibility list.
The overall storage requirements are O(nk).
CSIE in National Chi-Nan University
45
Outline
Introduction and basic definitions
Graph theoretical framework
Parsimonious network
Algorithm and complexity
Cycles in genetic networks
Conclusions
CSIE in National Chi-Nan University
46
Dealing with cycles
All we have mentioned are restricted on acyclic graphs.
Now let us go to see the problems brought by cyclic
graphs.
CSIE in National Chi-Nan University
47
Problems that single gene perturbation
can’t solve
1
2
4
2
3
1
0
4
3
0:
1:
2:
3:
4:
0
1
0
0
0
0
2
2
1
1
1
3
3
3
2
2
4
4
4
4
3
They have the same accessibility list.
Therefore, we can not reconstruct the
gene network uniquely.
CSIE in National Chi-Nan University
48
1
2
4
2
3
1
0
4
3
0:
1:
2:
3:
4:
0
3
4
1
2
0
0:
1:
2:
3:
4:
1
2
3
4
0
Note that the order of direct regulatory interactions in these
two networks is different, as reflected in the adjacency lists.
CSIE in National Chi-Nan University
49
Instead of solving this problem, we collapse the
nodes which form a cycle into a single group of
nodes with indistinguishable order of regulatory
interactions.
Such a single group can be also called a strongly
connected component or strong component of a
directed graph G. Every two nodes in a strong
component are mutually accessible.
Let us see an example.
CSIE in National Chi-Nan University
50
1
3
5
2
4
15
7
9
14
6
0
8
10
12
13
11
A single group
This graph is called a
condensation of G.
1, 3, 4, 5, 15
2
7
6 , 9, 12
14
0
10
8
A single group
CSIE in National Chi-Nan University
11
13
51
How do we construct a condensation of a gene
network?
There are a theorem and a corollary before our
presenting the algorithm constructing a condensation
of a gene network.
CSIE in National Chi-Nan University
52
Theorem 3
Let P be the accessibility matrix of a digraph G with n
nodes, x1, …, xn. The strong component containing xi is
determined by the unit entries of ith row in the
matrix
.
xi
CSIE in National Chi-Nan University
53
Corollary 3
Let i and j (i ≠ j) be two nodes of a digraph G.
i and j are in the same component iff
and
We use corollary 3 because we will work with
accessibility lists, not matrices.
Now we are going to present the algorithm.
CSIE in National Chi-Nan University
54
1:
2:
3:
4:
5:
6:
7:
8:
9:
for all nodes i of G
if component [i] has not been defined
create new node x of G*
component [i] = x
for all nodes j Acc (i)
if i Acc (j)
component [j] = x
end if
end if
10: for all nodes i of G*
11:
12: for all nodes i of G
13:
for all nodes j Acc (i)
14:
if component [i] ≠ component [j]
15:
if component [j]
16:
add component [j] to
17:
end if
18:
end if
CSIE in National Chi-Nan University
55
1:
2:
3:
4:
5:
6:
7:
1
2
7
5
4
3
6
2
1
1
5
6
5
5
3
3
2
6
7
7
6
4 5 6 7
4 5 6 7
4 5 6 7
7
1
2
7
5
4
3
6
x1
x3
x2
CSIE in National Chi-Nan University
56
1
2
7
5
4
3
6
x1
1:
2:
3:
4:
5:
6:
7:
2
1
1
5
6
5
5
3
3
2
6
7
7
6
4 5 6 7
4 5 6 7
4 5 6 7
7
x3
x2
CSIE in National Chi-Nan University
57
Storage and time complexity
The graph G* has at most the same number of nodes
and accessibility list.
The algorithm generates only one copy of G* and its
accessibility list.
Therefore both time and storage complexity are O(k),
where k is the average number of entries of the
accessibility list. (k < n2)
CSIE in National Chi-Nan University
58
Outline
Introduction and basic definitions
Graph theoretical framework
Parsimonious network
Algorithm and complexity
Cycles in genetic networks
Conclusions
References
CSIE in National Chi-Nan University
59
Conclusions
Genetics is concerned with identifying the gene
interactions and their biological significance.
Function genomics takes this concern to the next
level, that is, identifying gene interactions among
thousands of genes in a genome.
There are other ways to simplify gene networks, such
as Boolean logic design, reduction in symbolic logic,
graph theory, and etc..
CSIE in National Chi-Nan University
60
References
[BB2001] Arabidopsis Gene Knockout: Phenotypes Wanted, Bouche, N.
and Bouchez, D., Curr. Opin. Plant Biol., vol. 4, pp. 111-117.
[DIB97] Exploring the Metabolic and Genetic Control of Gene Expression
on a Genomic Scale, DeRisi, J. L., Iyer, V. R., Brown, P. O., Science, Vol.
278, pp. 680-686.
[ESBB98] Cluster Analysis and Display of Genome-Wide Expression
Patterns, Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D., Proc.
Natl Acad. Sci. USA, vol. 95, pp. 14863-14868.
[FW2000] The Small World of Metabolism, Fell, D. and Wagner, A.,
Nature Biotechnology, Vol. 18, pp. 1121-1122.
[FKZMS2000] Functional Genomic Analysis of C. elegans Chromosome I
by Systematic RNA Interference, Fraser, A. G., Kamath, R. S., Zipperlen, P.,
MartinezCampos, M. and Sohrmann, M., Nature, Vol. 408, pp. 325-330.
CSIE in National Chi-Nan University
61
[GEOCJ2000] Functional Genomic Analysis of Cell Division in C.
elegans Using RNAi of Genes on Chromosome III, Gonczy, P., Echeverri,
C., Oegema, K., Coulson, A. and Jones, S. J. M. et al., Nature, Vol. 408, pp.
331-336.
[H69] Graph Theory, Harary, F., Addison-Wesley, Reading, MA., 1969.
[HMJRS2000] Functional Discovery via a Compendium of Expression
Profiles, Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J. and
Stoughton, R. et al., Cell, Vol. 102, 2000, pp. 109-126.
[JTAOB2000] The Large-Scale Organization of Metabolic Networks,
Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. and Barebasi, A. L., Nature,
Vol. 407, pp. 651-654.
[MN99] LEDA: a Platform for Combinatorial and Geometric Computing,
Mehlhorn, K. and Naher, S., Cambrige Unversity Press, Cambrige, 1999.
[SSBRL99] The Berkeley Drosophila Genome Project Gene Disruption
Project: Single P-element Insertions Mutating 25% of Vital Drosophila
Genes, Spradling, A. C., Stern, D., Beaton, A., Rhem, E. J. and Laverty, T.
et al., Genetics, Vol. 153, 1999, pp. 135-177.
CSIE in National Chi-Nan University
62
[THCCC99] Systematic Determination of Genetic Network Architecture,
Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. and Church, G. M.,
Nature Genet., Vol. 22, 1999, pp. 281-285.
[W2000] Mutational Robustness in Genetic Networks of Yeast, Wagner, A.,
Nature Genet., Vol. 24, 2000, pp. 355-361.
[W2001a] Genetic Networks Are Sparse: Estimates Based on a LargeScale Genetic Perturbation Experiment, submitted, Wagner, A., 2001.
[W2001b] The Yeast Protein Interaction Network Evolves Rapidly and
Contains Few Redundant Duplicate Genes, Wagner, A., Mol. Bio. Evol.,
Vol. 18, 2001, pp. 1283-1292.
[WF2001] The Small World Inside Large Metabolic Networks, Wagner, A.
and Fell, D., Proceedings of the Royal Society of London, Series B, Vol.
268, pp. 1803-1810.
[W97] The Structure and Dynamics of Small World Networks, Watts, D. J.,
PhD Dissertation, Cornell University, 1999.
[WSALA99] Functional Characterization of the S. cerevisiae Geneome by
Gene Deletion and Parallel Analysis, Winzeler, E. A., Shoemaker, D. D.,
Astromoffm A., Liang, H. and Anderson, K. et al., Science, Vol. 285, pp.
901-906.
CSIE in National Chi-Nan University
63