Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
國立雲林科技大學
National Yunlin University of Science and Technology
Mining Generalized Associations of Semantic
Relations from Textual Web Content
Tao Jiang, Ah-Hwee Tan, Senior Member, IEEE, and Ke Wang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
VOL. 19, NO. 2, 2007.
Presenter : Wei-Shen Tai
Advisor : Professor Chung-Chian Hsu
2007/1/10
N.Y.U.S.T.
I. M.
Outline
Introduction
Resource Description Framework and RDF
Schema
Semantic relation extraction
Mining generalized association form RDF
metadata
Experiments
Conclusion
Comments
N.Y.U.S.T.
I. M.
Motivation
Text mining problem
As terms are treated as individual items in such
simplistic representations, terms lose their
semantic relations and texts lose their original
meanings.
Two short text documents with different
meanings can be represented in a similar bag of
keywords.
N.Y.U.S.T.
I. M.
Objective
Semantic relation associations
An intermediate representation that expresses
the semantic relations between the concepts in
texts.
N.Y.U.S.T.
I. M.
Major processes
Semantic relation extraction
The extracted relations are encoded in RDF statements.
Semantic relation associations
Meaningful and detailed patterns can be discovered
from text using the conceptual graph representation.
Resource Description Framework and
RDF Schema
Resource Description
Framework (RDF)
For describing and interchanging
semantic metadata.
RDF statements
<subject, predicate, object>
{France, Defeat, Italy, World Cup,
Quarter Final}
RDF Schema
Defines RDF vocabularies for
constructing RDF statements.
N.Y.U.S.T.
I. M.
N.Y.U.S.T.
I. M.
Term Taxonomy Construction
Term similarity measure
Incremental term taxonomy construction
N.Y.U.S.T.
I. M.
RDF model
RDF vocabulary
={,P,H, domain, range},
where
={ a, b, c, d, e, f, ab, cd, ef, cdef},
P= {p},
domain = { a, b, ab},
and range= {c, d, e, f, cd, ef, cdef}
Generalized relation hierarchy
e.g. {< a, p, ef >,< b, p, c >}
is a relationset and it is also a
generalized relationset of
{< a, p, e >,< b, p, c >}.
N.Y.U.S.T.
I. M.
Overgeneralization
Example
{< a, p, e >,< b, p, c >},
{< a, p, ef >,< b, p, c >},
{< Score, agent, F:Inzaghi >,< Assist, agent, RuiCosta >}
{< Score, agent, AttackPlayer >, < Assist, agent, RuiCosta >}
Definition
A frequent relationset X is overgeneralized if there
exists a specialized relationset Y of X with supp(X) =
supp(Y).
N.Y.U.S.T.
I. M.
Overgeneralization Reduction
Node is a unique generalization closure
If a closure and its children have the same support, this
closure is not closed and can be pruned.
Such a nonclosed closure is prune by replacing it with
the union of its equal-support children.
N.Y.U.S.T.
I. M.
GP (Generalized Pattern)-Close Algorithm
GP-Close
Initializes the enumeration
tree to contain only the root
closure.
Closure-Enumeration
Starting from the root
closure of the empty set,
the closure enumeration
process recursively
traverses the closure
enumeration tree to
discover closed
generalization closures.
N.Y.U.S.T.
I. M.
Experiments
Data sets
The online database of the International Policy Institute
for Counter-Terrorism (ICT) including suicide bombing
(ICT-SB) and car bombing (ICT-CB) documents.
Analysis of Patterns
71.8 percent (56 out of 78) of the patterns are
commonsense patterns already known by people.
12.8 percent (Ten out of 78 ) of the patterns are
identified as previously unknown and not useful.
15.4 percent (12 out of 78) of the patterns are
previously unknown and potentially useful.
N.Y.U.S.T.
I. M.
Conclusions
Semantic relation extraction
Discovering knowledge from free-form textual
Web content.
GP-Close algorithm
Based on mining closed generalization closures.
Substantially reduce the pattern redundancy and
perform.
N.Y.U.S.T.
I. M.
Comments
Advantage
Drawback
A novel idea for semantic relation association extraction.
GP-Close is applicable for reducing pattern search space.
Example depiction cannot keep consistent in data.
Diagrams of child-closure pruning and sub-tree pruning
make reader confuse.
Application
Data mining applications in semantic relation association.