Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data center wikipedia , lookup
Data analysis wikipedia , lookup
Search engine indexing wikipedia , lookup
Web analytics wikipedia , lookup
3D optical data storage wikipedia , lookup
Clusterpoint wikipedia , lookup
Data vault modeling wikipedia , lookup
Information privacy law wikipedia , lookup
Database model wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Template-based Authoring Knowledge Systems Laboratory Stanford Project Goals Assist analyst in everyday work Knowledge Authoring Tools to assist in: Research for reports Produce reports Consume reports Share reports Our solution: Semantic Web Templates Semantic Web Templates Knowledge Representation, Semantics are key for information exchange Creation, maintenance of knowledge must be transparent Automate extraction of knowledge Enhance knowledge retrieval methods Semantic Web Templates Similar to MS Word Templates Different templates for different tasks Word templates can have restrictions on text Very primitive, such as length of text Simplistic patterns such as “phone number” No concepts such as “color” or “country” One template, many documents HTML templates are very common today Many web sites use SQL database as back end, template + SQL HTML Semantic Web Templates An HTML file with additional tags Tags specify: Where particular knowledge is stated What kind of knowledge it is Where it came from, if applicable References to an entity or relation Repetitive regions of text Goal: Assist Research Unstructured Extraction Sort through buckets of data to find gold Entity recognition Relation recognition Semistructured Extraction Utilize repetitive patterns within a page Use similar pages to extract more data Robust despite changing pages, data Unstructured Extraction Natural language processing News feeds Indexing, storage, retrieval Plugin architecture Rover news crawler Web Services Our system, collaboration with IBM via NIMD Political news articles from Yahoo! 22,000 articles, ~8500 concepts, ~1000 relations Used in authoring tools Unstructured Extraction Pattern based system Leverage “hints” for the reader in news articles British Prime Minister Tony Blair <type Country><subClassOf Politician> <unknown name> “Tony Blair” is a Prime Minister who represents the Country “England”. System runs daily on Yahoo political news Highlights known terms in green Highlights new terms in red Used to create search index, maintain KB Demo Semi-structured Extraction Extract, produce knowledge Initial model is Domain Authorities Enhance KB with ground facts Strong for relations and breadth of data Leverages work of others Makes use of SQL databases Future work is wide-scale web of trust Semi-structured Extraction Site Registry By description and property CIA World Fact Book has data about items which are of type <Country> CIA World Fact Book has properties <population>, <hasNeighbor>, <hasMembership>, etc. Demo Semi-structured Extraction Publishing Human editing good for high-level concepts Automated techniques good for relations, ground level facts, and massive repetition Rover web crawler Template construction is currently manual With critical mass of data, templates could be discovered. Enhanced Document Retrieval Enhanced document retrieval Search based on concept Find articles about… Membership: Scottie Pippen Trailblazers Membership: Osama bin Laden al-Qaeda Subgroups: Ramadan Shallah Islamic Jihad al-Qaeda Semantic search Enhanced Document Retrieval Document Augmentation Sidebar acts as glossary as you read Pre-fetch data user is likely to want Adapt to user preferences, activities Deeper understanding for user, gets answers to questions raised while reading Enhanced Document Retrieval Search Augmentation Google assumes users only want documents Provide answers along with documents Use query term denotation to more closely target results “Browns Ferry” is a garden park “Browns Ferry” is a nuclear power plant Automates what people do with IR systems Append hints about the type of term being sought Search Augmentation Search Augmentation Demo: Demo: Demo: Demo: Basic Search Followup Data Disambiguation Relations Basic Question Answering Automated techniques for ground facts Use reasoners for higher-level facts Tie in with KSL AQUAINT work Feedback, direction from user Structure of knowledge allows simple form of question answering Basic Question Answering Multiple views into data Browse interface Ugly, but complete view Activity-based knowledge presentation Search, document augmentation Future work accept user feedback, customization, preferred sources Basic Question Answering Query by example Users create many similar documents These are targeted to an activity Use past work to speed present work User creates and templates which present data they find interesting in a way they find convenient Query by Example Query by Example Query by Example Goal: Produce Reports Most reports are made with Office Enhance with semantic awareness Provide seamless access to knowledge Word processor, spreadsheet Transparent maintenance, creation Low overhead of operation Avoid centralized approach Contrast with relational database Word Processing Creation of new data Semantic scan Annotation of text Like spell check or grammar check Automatically identifies referenced entities Learns new entities, relations between entities User manually adjusts system User adds new data System gets smarter over time Word Processing Create data via entry into templates Create new templates For others For personal use Extend templates with new entry areas Enhance analyst’s view Semantic Search, Document Augmentation Sidebar boxes are templates too Word Processing Demo: Semantic Scan Demo: Annotation Demo: Knowledge Creation Spreadsheets Spreadsheets are key tools in analysis Tabular format, UI are both intuitive Sorting, basic math functions We add semantics: New formula type: “Get Data” New formula type: “Put Data” Summarization, new views Spreadsheets Example scenario Suppose SARS was found to affect AsianAmericans more than others? Analyst wants to determine, based on that, which states are most at risk Knowledge from Census tells us AsianAmerican population as a percentage Spreadsheets Spreadsheets Spreadsheets Spreadsheets Spreadsheets Spreadsheets Goal: Consume Reports Verify others’ data against yours Incorporate others’ results into your knowledge base, track sources Maintain data Change notification Document updates with new data Versioning of documents, data Goal: Share Reports Easily exchangable via e-mail Truth maintenance techniques Multiple views into data Leverage domain expertise The missile guy has a KB, … Collaboration, trust levels Colleagues disagree, sources are unreliable Conclusion KD-D effort is focused on authoring, analysis tasks Leverage automated techniques to complement manual techniques System gets smarter as it’s used Tie in with commonly used applications