Download Information Extraction I: Kissler/Marais Web Language

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

URL redirection wikipedia, lookup

Transcript
Information Extraction I:
Kissler/Marais Web Language
Information extraction
applications
•
•
•
•
Find useful information
Extract it into form that can be processed
Process it
Present it back
A model of info-extraction
applications
Robustness is key
criterion
Tricky part.
Theoretically, this
will be obviated
by “Semantic
Web” and “Web
Services”
From: Kistler/Marais
WWW7
Not necc. Web
presentation
Example applications
• Shopping robots
• Personalized news
• Financial applications
– Use free data on Web
• Intra/extranets
– Manufacturing info
– Project info
• Meta-search engines
• Convert
Latex2HTMLgenerated pages into
printable form
Marais/Kistler Web Language
• Language for writing Web info extraction
applications
– Like Perl LWP, but specialized
• Good for O(10K)-page applications
– Manual/semi-automatic resource discovery
– Manual (heuristics) for extraction
Challenges of info-extraction
applications
• Web is unreliable
– Internet failures
– Site failures
• Resource-discovery problem
– Where are pages with interesting data?
• Pages are unstructured
– Difficult to reliably extract information
– Pages change frequently
Rest of today’s lecture
• From Marais’ SRI talk (slide 12)