Download LTS Discriminative S..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
LTS: Discriminative Subgraph Mining
by Learning from Search History
Ning Jin , Wei Wang
ICDE 2011
Outlines
 Motivation
 Objectives
 Methodology
 Experiments
 Conclusions
2
Motivation
 Complex structures in many scientific applications can
be represented by graphs, and many data mining and
database problems in graph databases, such as graph
indexing, graph classification, need discriminative subgraph
patterns.
 Discriminative subgraphs can be used to characterize
complex graphs, construct graph classifiers and
generate graph indices.
3
Objectives
 Discriminative subgraph pattern mining solved in
one of two ways: one is a greedy approach the other
is a branch-and-bound approach.
 greedy approach attempting to reach local optimal subgraph
as fast as possible
 branch-and-bound approach prunes the search space using
an estimated upper-bound of the scores.
 LTS (Learn To Search) algorithm, which integrates both
approaches with novel probing and pruning techniques.
4
Methodology
 Fast-probe algorithm
 Upper-bound estimation algorithm
 LTS algorithm
5
Definitions
 Graph
A graph is denoted as
nodes and E is a set of edges.
, where V is a set of
 positive graph set and a negative graph set
we assume that the positive set is the interesting set
denoted as
and the negative set is the decoy
denoted as
.
6
Cont.
 Subgraph Isomorphism
The label of node
the label of an edge
for two graphs
such that
is denoted by
and
is denoted by
there is an injection
is a supergraph of
7
, or
, then is a subgraph of
supports .
Cont.
 Frequency
Given a graph set , the frequency of a subgraph pattern
is defined as:
8
Cont.
 Discrimination Score
The more discriminative the pattern, the larger the
discrimination score.we define the discrimination score as
9
Cont.
 Lineage
lineage of pattern is a sequence of patterns:
,
can be directly extended from
 Score record
the score record for
is a sequence of scores for
the patterns in the lineage:
10
Fast-probe algorithm
 maintains a list of candidate subgraph patterns to generate
a good sample of discriminative subgraphs to
facilitate the subsequent branch-and-bound search.
 candidate list is initialized with all single-edge subgraph
patterns in
.
 for each graph
in
,
update the optimal pattern and
optimal score for .
 Add one more edge and repeat , terminates when the
candidate list becomes empty. Get optimal pattern for
each in .
11
12
Upper-bound estimation algorithm
 Discriminative subgraph mining process always generates
many score records, which can be organized into a prefix tree,
called prediction tree.
 Root node is labelled with 0.0, each tree node is also
associated with the maximum score in the sub-tree rooted
at this node.
 the maximum score at
each tree node is an estimated
upper-bound in the search space.
13
14
LTS algorithm
 LTS first uses fast-probe to collect score records and




15
generates search history , which includes a
of score records and a
.
LTS utilizes a vector F to keep track of the optimal pattern
for each positive graph.
stores the optimal pattern
for positive graph .
Candidate list is optimal pattern for each in
by
fast-probe algorithm.
LTS updates
if positive graph supports and
is greater than
.
Terminates when the candidate list becomes empty.
return the optimal pattern in .
16
Experiments
17
Conclusions
 In this paper, we investigate the feasibility of estimating
upper-bound for discrimination scores of subgraph patterns
in discriminative subgraph mining by learning from
search history.
 In the more complex protein datasets, LTS can significantly
improve classification accuracy by the branch-and-bound
search following fast-probe.
18