Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
iCrawl – Master Thesis and Hiwi Jobs Context - iCrawl Project – A novel approach for the creation of high quality Web Archives - Easy to use and extensible Web archive crawler framework - Usable also by non-technicians User Interface - Key Component to interact with the crawler - Setting up crawls - Maintaining and monitoring crawls - Quality assurance of crawls Thomas Risse 23/05/17 1 Master Thesis: Crawl Specification Wizard Problem Statement - Quality of a Web Archive depends on the quality of the Crawl specification - Crawl specification for focused crawls are complex and hard to define (Initial Starting points, good descriptions of terms, entities, etc.) - Crawl specification are similar to search engine queries but more complex Aim of the Master Thesis - Development of an semi-automatic tool that learns the intention of a crawl - - Based on a set of reference pages or on search engine results Iterative and interactive process Requires analysis and extraction of information from Web pages Requirements - Interest in doing cool things in the context of a research project - A “feeling” for good design and user friendliness - Programming skills in Java Contact: Thomas Risse (L3S), [email protected] Thomas Risse 23/05/17 2 Master Thesis: Entity-centric Linked Data Crawler Topic - Development of an entity-centric Linked Data crawler - Automatic collection of metadata for Linked Data sources to enable crawler prioritization - Integration of the crawler with the iCrawl platform for integrated crawling of Web pages and Linked Data Requirements - Good grades in the IR-related courses - Good programming skills in Java - Interest in research-oriented projects Contact: Elena Demidova, [email protected] Thomas Risse 23/05/17 3 Hiwi Job in the context of Web Archiving Topic - User Interface development for setup, maintaining and monitoring of crawls - Easy to use (also for non-computer scientists) - Near-real-time information Requirements - Interest in doing cool things in the context of a research project - A “feeling” for good design and user friendliness - Programming skills in Java Contact: Thomas Risse (L3S), [email protected] Thomas Risse 23/05/17 4