Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Information Extraction I: Kissler/Marais Web Language Information extraction applications • • • • Find useful information Extract it into form that can be processed Process it Present it back A model of info-extraction applications Robustness is key criterion Tricky part. Theoretically, this will be obviated by “Semantic Web” and “Web Services” From: Kistler/Marais WWW7 Not necc. Web presentation Example applications • Shopping robots • Personalized news • Financial applications – Use free data on Web • Intra/extranets – Manufacturing info – Project info • Meta-search engines • Convert Latex2HTMLgenerated pages into printable form Marais/Kistler Web Language • Language for writing Web info extraction applications – Like Perl LWP, but specialized • Good for O(10K)-page applications – Manual/semi-automatic resource discovery – Manual (heuristics) for extraction Challenges of info-extraction applications • Web is unreliable – Internet failures – Site failures • Resource-discovery problem – Where are pages with interesting data? • Pages are unstructured – Difficult to reliably extract information – Pages change frequently Rest of today’s lecture • From Marais’ SRI talk (slide 12)