Download Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Mining the Semantic Web:
Requirements for Machine Learning
Fabio Ciravegna, Sam Chapman
Presented by
Steve Hookway
10/20/05
What is the Semantic Web


A way to automate reasoning with web
data
RDF



A uniform way to describe resources
(subject,predicate,object)
Ontology



Hierarchical structure of data
Property restrictions
Implicit typing
Adding Meta-Data


A prerequisite for Semantic Web
(SW) is structured knowledge
Manual Approach




Too Much data
Trust Issues
Noise
This process needs to be
automated
Armadillo


Automatically annotate web pages
Validity based on a number of weak
techniques




Redundant Information
Rating of Sources
Context around a capture
(LP)² - Extraction of knowledge

Makes use of Natural Language Processing (NLP)
(LP)²

Induce tagging rules




Contextual Tagging


Generalize NLP and keep best rules <tag>
Remove covered instances from pool
High Precision, Low Recall
Recovers rules and constrains their application
</tag>
Correction and Validation


Shifts tags to correct position (within d spaces)
Validation
Heterogeneity

Armadillo



Uses weak NLP
Uses intra-document relation recognition
Requirements


Must adapt to different document types
Relation Extraction
Bootstrapping Learning

Armadillo



Unsupervised approach – user only
validates
User cannot drive system towards
interesting documents and facts
Requirements


Identify triples
Goal: Bootstrap learning on a large scale

User needs a role to guide learning
Content Cleaning and
Normalization

Armadillo



Noise added during unsupervised (LP)²
Use the multiple weak evidence to help
avoid poor seeds
Requirements

Handle noisy training data
Conclusion

Semantic Web


Armadillo – a tool for IE



Meta-Data
Evidence Building and Validation
Extraction of knowledge (LP)²
A survey of requirements in mining web
content for SW meta-data