Download Learning from Imbalanced Data Sets with Boosting and Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
國立雲林科技大學
National Yunlin University of Science and Technology
Mining Generalized Associations of Semantic
Relations from Textual Web Content
Tao Jiang, Ah-Hwee Tan, Senior Member, IEEE, and Ke Wang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
VOL. 19, NO. 2, 2007.
Presenter : Wei-Shen Tai
Advisor : Professor Chung-Chian Hsu
2007/1/10
N.Y.U.S.T.
I. M.
Outline







Introduction
Resource Description Framework and RDF
Schema
Semantic relation extraction
Mining generalized association form RDF
metadata
Experiments
Conclusion
Comments
N.Y.U.S.T.
I. M.
Motivation

Text mining problem


As terms are treated as individual items in such
simplistic representations, terms lose their
semantic relations and texts lose their original
meanings.
Two short text documents with different
meanings can be represented in a similar bag of
keywords.
N.Y.U.S.T.
I. M.
Objective

Semantic relation associations

An intermediate representation that expresses
the semantic relations between the concepts in
texts.
N.Y.U.S.T.
I. M.
Major processes

Semantic relation extraction


The extracted relations are encoded in RDF statements.
Semantic relation associations

Meaningful and detailed patterns can be discovered
from text using the conceptual graph representation.
Resource Description Framework and
RDF Schema

Resource Description
Framework (RDF)


For describing and interchanging
semantic metadata.
RDF statements



<subject, predicate, object>
{France, Defeat, Italy, World Cup,
Quarter Final}
RDF Schema

Defines RDF vocabularies for
constructing RDF statements.
N.Y.U.S.T.
I. M.
N.Y.U.S.T.
I. M.
Term Taxonomy Construction

Term similarity measure

Incremental term taxonomy construction
N.Y.U.S.T.
I. M.
RDF model

RDF vocabulary

 ={,P,H, domain, range},
where
 ={ a, b, c, d, e, f, ab, cd, ef, cdef},
P= {p},
domain = { a, b, ab},
and range= {c, d, e, f, cd, ef, cdef}

Generalized relation hierarchy

e.g. {< a, p, ef >,< b, p, c >}
is a relationset and it is also a
generalized relationset of
{< a, p, e >,< b, p, c >}.
N.Y.U.S.T.
I. M.
Overgeneralization

Example

{< a, p, e >,< b, p, c >},


{< a, p, ef >,< b, p, c >},


{< Score, agent, F:Inzaghi >,< Assist, agent, RuiCosta >}
{< Score, agent, AttackPlayer >, < Assist, agent, RuiCosta >}
Definition

A frequent relationset X is overgeneralized if there
exists a specialized relationset Y of X with supp(X) =
supp(Y).
N.Y.U.S.T.
I. M.
Overgeneralization Reduction

Node is a unique generalization closure


If a closure and its children have the same support, this
closure is not closed and can be pruned.
Such a nonclosed closure is prune by replacing it with
the union of its equal-support children.
N.Y.U.S.T.
I. M.
GP (Generalized Pattern)-Close Algorithm

GP-Close


Initializes the enumeration
tree to contain only the root
closure.
Closure-Enumeration

Starting from the root
closure of the empty set,
the closure enumeration
process recursively
traverses the closure
enumeration tree to
discover closed
generalization closures.
N.Y.U.S.T.
I. M.
Experiments

Data sets


The online database of the International Policy Institute
for Counter-Terrorism (ICT) including suicide bombing
(ICT-SB) and car bombing (ICT-CB) documents.
Analysis of Patterns



71.8 percent (56 out of 78) of the patterns are
commonsense patterns already known by people.
12.8 percent (Ten out of 78 ) of the patterns are
identified as previously unknown and not useful.
15.4 percent (12 out of 78) of the patterns are
previously unknown and potentially useful.
N.Y.U.S.T.
I. M.
Conclusions

Semantic relation extraction


Discovering knowledge from free-form textual
Web content.
GP-Close algorithm


Based on mining closed generalization closures.
Substantially reduce the pattern redundancy and
perform.
N.Y.U.S.T.
I. M.
Comments

Advantage



Drawback



A novel idea for semantic relation association extraction.
GP-Close is applicable for reducing pattern search space.
Example depiction cannot keep consistent in data.
Diagrams of child-closure pruning and sub-tree pruning
make reader confuse.
Application

Data mining applications in semantic relation association.
Related documents