Download Structure discovery in PPI networks using pattern

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Drift plus penalty wikipedia , lookup

Network tap wikipedia , lookup

Airborne Networking wikipedia , lookup

Dijkstra's algorithm wikipedia , lookup

Transcript
Structure discovery in PPI networks
using pattern-based network
decomposition
Philip Bachman and Ying Liu
BIOINFORMATICS System biology
Vol.25 no. 14 2009
May 15, 2009
Outline
•
•
•
•
•
•
Introduction
Graph-Theoretic concepts
The algorithm
Algorithm and results
Pattern-based network decomposition
Conclusion
Introduction
• The large, complex networks of interactions
between proteins provide a lens through
which one can examine the structure and
function of biological systems.
• Previous analyses of these networks:
– Large-scales statistical analysis of holistic network
properties
– Small-scale analysis of local topological features
Introduction
• Investigation of meso-scale network structure
has been hindered by the computational
complexity of structure search in networks.
• In this article, an efficient algorithm for
performing sub-graph isomorphism queries on
a network and show its computational
advantage.
Graph-Theoretic concepts
• A graph G as G=(V,E)
• Given an ordering o of vertices, one can produce
the adjacency matrix Mo
• If two graph Gi and Gj are isomorphic, then there
exist orderings oi of Gi and oj of Gj such that
Moi=Moj
• A canonical labeling:
–
Graph-Theoretic concepts
– One canonical label :
interpreting all possible orderings of a graph’s
vertices and selecting the ordering omax such that
Momax is maximized.
Graph-Theoretic concepts
• An automorphism of a graph G
– mapping of the graph’s vertices onto each other
– any two oi and oj such that Moi=Moj
• Aut(G) : the set of all automorphisms of G
• The automorphism orbit A, of a graph G
– the maximal sets of the vertices in G that are
closed under all mappings in Aut(G)
– Refer to the automorphism orbit to which a vertex
v belongs as AutOrb(v).
Graph-Theoretic concepts
– Define a canonical ID for each automorphism
orbit of a graph G :
examining the canonical matrix for G , their order
of first appearance in the matrix
The algorithm
• To solve the following problem:
Given a query graph Gq, find all sub-graphs in
a source graph Gs that are isomorphic to Gq
• Three components of this algorithm :
– Basic backtracking search used in previous motif
discovery algorithm.
– Enhanced by the second, a sysmmetry-breaking
technique present by (Grochow and Kellis, 2007).
– Third components, a constraint set for each vertex
v is added.
The algorithm
• Constraint set :
– For each vertex v in Gq, the set of all vertices in Gs
that are potentially mappable onto v under some
mapping of Gq onto an isomorphic sub-graph of Gs.
– quick elimination of candidate vertex pairs during
the backtracking search.
– Generating these constraint sets requires a
constraint database that is populated by
performing an initial set of specific sub-graph
queries.
The algorithm : Backtracking
• Find all sub-graphs in a source graph Gs that are
isomorphic to a query graph Gs
• For each vertex v in Gs and each vertex u in Gq
– A mapping , I, of Gq onto Gs is initialized for each such
(v,u) pair
– Recursively extend I using each pair of compatible
vertices v’ in Gs and u’ in Gq such that v’ and u’ are
both adjacent to vertices to vertices already in I.
– An isomorphism has been found once I maps all
vertices in Gq onto compatible vertices in Gs.
The algorithm : symmetry-breaking
• The number of mappings is equal to the
number of automorphisms in Aut(Gq), which
grow factorially with the number of vertices in
Gq.
• We can avoid these ‘repeated’ mappings by
using symmetry-breaking constraints
The algorithm : Generating constraint
sets for the source graph
• The database used for per-vertex constraint
generation contains entries for each query
graph Gq :
– The entry for Gq is referenced by lc(Gq)
• The entry for each Gq is a list of constraints
sets, one for each automorphism orbit of Gq
– The entries for its automorphism orbits are
referenced by their respective canonical IDs.
The algorithm : Generating constraint
sets for the source graph
• The constraint sets for each query graph Gq can
be generated by performing a slight modified
version of the basic backtracking
– Checking each vertex vs in Gs against a representative
vq from each automorphism orbit A of Gq
– If an isomorphic mapping exits which maps vq onto vs,
then vs is added to the constraint set for A
• This allows the skipping of any future checks for
some vs against some vq where vs is already in the
constraint set for AutOrb(vq).
The algorithm : Generating constraint
sets for the source graph
• The database can be bootstrapped :
If sub-graph with n vertices will be sampled,
starting with all connected graphs of some
small size k, and then using the generated
database to accelerate the generation of the
database entries for all connected graphs of
size k+1 , and so on, until size n is reached.
The algorithm : Vertex constraints for a
query sub-graph
The algorithm : Vertex constraints for a
query sub-graph
• The intersection of constraint performed in step 4
does not preclude any viable mapping of Gq onto Gs.
• Source graph vertices fundamentally incompatible
with a query graph vertex will fail to appear in one or
more of the constraint sets associated with that
vertex.
•
Algorithm and results
Queries comprising form 8 to 19 vertices
The ratio between the times for unconstrained and constrained searches across
set of 100 randomly generated queries at each edge density and query size
Algorithm and results
The cumulative times for processing all 400 queries of various densities at each
query size
Algorithm and results
• Due to the presence of pathological queries that required an impractical
time to complete.
• In this plot, the cumulative time for unconstrained searches appears to
plateau due to our imposed limit on search time, with the maximum
possible cumulative time at eatch graph size being 400,000 s, as we set
the individual query time-out to 1000s.
• It was common for queries with 15 or more nodes and with edge densities
of 0.6 or 0.8 to time-out during unconstrained search.
• Only one constrained search out of the 3600 performed timed-out.
• Unconstrained search time was strongly affected by the edge-density of a
query. The effect of density were drastically reduced when using our
constraints.
Algorithm and results
• The time required to fill the database
– CPU time for exhaustive search of all seven and eight
vertex graph.
– The database was bootstrapped, and time required to
do so was 4477 s. This is significantly < 37361 s.
Pattern-based network decomposition
• An applications to using sub-graph queries
that allows a PPI network to be decomposed
into sub-network exhibiting specific structural
patterns.
– Create generalizations of their topological
structure
– Search for all appearances of each generalization
in the source network
Generating and using query patterns
• Given a specific form of inter-module
interaction, deriving generalized query
patterns:
– A query pattern may be created at the smallest
size which adequately represents some desired
structural features
– Allow to ‘expand’ to cover larger sub-graph, which
still fit the interaction pattern that it was designed.
Generating and using query patterns
Top row : a four-node clique covering all edges and nodes in a dense eight-node
graph
Bottom row : overlapping four-node cliques expanding to fully cover overlapping
dense six-node graphs
Generating and using query patterns
• How a dense module may be covered by a small clique query
pattern and how a pair of interacting/overlapping modules
may be covered by a small pair of overlapping cliques
Illustration of four sample pattern generalizations
• A pattern generalization created using this property of graphs
can be used to filter the source network.
Generating and using query patterns
• Filtering a source network using a generalized
query pattern :
– All instances of the query in the source network
are found
– All edges and vertices from the source network
that do not appear in some instance of the query
are removed
– Inspect all regions of the source network that
have topologies matching the pattern that the
query was designed to represent
An application of pattern-based
network
• Core PPI network for Yeast
– 17000 interactions between 5000 proteins
– All of the generalizations that we searched for
appeared as significantly enriched motif with respect
to an ensemble of 100 random networks, indicating
potential biological significance.
– We occasionally had to determine boundaries
between overlapping groups of proteins
• Determined these boundaries heuristically, by looking for
‘bottlenecks’ between groups of densely interacting proteain.
An application of pattern-based
network
• The most immediate results and the largest extracted
network(EN) came from using a four-node clique as
the query pattern.
An application of pattern-based
network
• These components likely represent the most
evolutionarily conserved core of the Yeast PPI
network, It was shown by Wuchty et al. (2003)
that proteins participating in four-node cliques
have an evolutionary conservation rate that is
over 400 times higher than that which would
be expected.
Conclusion
• An algorithm for one such problem, sub-graph
isomorphism, that is more efficient than
previous algorithms.
• In concert with suitable query patterns that
exploit some simple properties of graphs,
query-based graph search can be used to
examine network structure at a scale that
reveals relationships within and between
groups of interacting proteins.