Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
.nju.edu.cn Reading Report on Question Answering over Knowledge Base 论文阅读 孙亚伟 Articles ws .nju.edu.cn Information Extraction over Structured Data : Question Answering with Freebase ACL2014, Johns Hopkins University Xuchen Yao(姚旭晨) and Benjamin Van Durme 本报告一些截图来自姚旭晨PPT,谢谢姚旭晨. 2 Outline ws .nju.edu.cn Introduction Approach Graph Features Relation Mapping Experiments Conclusion 3 Introduction ws .nju.edu.cn Question What is the name of Justin Bieber brother? How people search for answers ? How does a machine answer this question ? 4 Introduction ws .nju.edu.cn Question: What is the name of Justin Bieber brother? Dataset: Freebase Formula : How does a machine know that Jaxon Bieber is answer ? 5 Introduction ws .nju.edu.cn Two Prominent Challenges: Model and Data The Model Challenge Finding the best meaning representation Converting it into a query Executing the query on the KB. The Data Challenge Ontology or textual schema matching : mapping between KB relations and NL text . KB “male sibling” Question “brother” 6 Approach ws .nju.edu.cn The Model Challenge Method From an IE Perspective Learn the pattern of QA pairs (without the use of intermediate meaning representations) Combining discriminative features The dependency parse of questions The Freebase structure of answer candidates Note : Information Extraction Perspective First performing relatively coarse information retrieval The set of possible answer candidates Then attempting to perform deeper analysis 7 Approach ws .nju.edu.cn The Data Challenge Mapping between KB relations and NL text Method CluewebMaping : mining directly from ClueWeb, 1 billion webpages 8 Graph Features ws .nju.edu.cn Question Graph Freebase Topic Graph Feature Production 9 Question Graph ws .nju.edu.cn Dependency-based features: Question word (qword) what/who/how many. Question focus (qfocus) a cue of expected answer types (name/money/time) Question verb (qverb), extracted from the main verb of the question (is/play/take) Question topic (qtopic), find relevant Freebase pages What[qword] is[qverb] the name[qfocus] of (Justin Bieber)[qtopic] brother? 10 Question Graph ws .nju.edu.cn Convert the dependency parse into question feature graph: What is the name of Justin Bieber brother? 11 Question Graph ws .nju.edu.cn 12 Freebase Topic Graph ws .nju.edu.cn Topic Graph Given a topic, roll out the Freebase graph by choosing those nodes within a few hops of relationship to the topic node , and form a topic graph Node Features: Relations(with directions) and properties as features for each node Which node is answer? 13 Feature Production ws .nju.edu.cn Combining question features and Freebase features (per node) 14 Feature Production ws .nju.edu.cn Capture the association between question patterns and answer node 15 Relation Mapping ws .nju.edu.cn Goal : P(R|Q), Find the most likely relation a question prompts Given a question Q of a word vector w Find out the relation R that maximizes the probability P(R|Q) For instance, Who is the father of King George VI people.person.parents More interestingly, Who is the father of the Periodic Table law.invention.inventor. (Thus count for each word in Q) How does a machine compute this mapping? 16 Relation Mapping ws .nju.edu.cn Formula: Conditional independence between words and apply Naïve Bayes Formula : A relation R is a concatenation of a series of sub-relations, 𝑅 = 𝑟 = 𝑟1 . 𝑟2 . 𝑟3 .…, assume conditional independent between sub-relations, and apply Naïve Bayes How to estimate the prior and conditional probability ? 17 Relation Mapping ws .nju.edu.cn To estimate the prior and conditional probability, we need a massive data collection Dataset ClueWeb : 1 billion webpages FACC1: the Freebase Annotation of the ClueWeb Corpus version 1. 18 Relation Mapping ws .nju.edu.cn Method For each binary Freebase relation, we can find a collection of sentences each of which contains both of its arguments Then simply learn how words in these sentences are associated with this relation, i.e., P’(w|R) and P’(w|r) By counting how many times each relation R was annotated, we can estimate P’(R) and P’(r) 19 Relation Mapping ws .nju.edu.cn Aligned results : CluewebMaping 20 Experiments ws .nju.edu.cn Evaluation Measures: 𝐹1 Data: WebQuestion (5810 questions) 21 Experiments ws .nju.edu.cn Search Freebase Search API Locate the exact Freebase topic node Freebase Topic API Retrieve all relevant information, resulting in a topic graph 22 Experiments ws .nju.edu.cn Model Tuning QA on Freebase as a binary classification task For each node in the topic graph, extract features and judge whether it is the answer node Model: L1 regularized logistic regression 23 Experiments ws .nju.edu.cn How additional features on the mapping between Freebase relations and the original questions help Three Feature Setting Basic: feature production (question features and node features) “+word overlap”: adds additional features on whether sub-relations have overlap with the question “+CluewebMaping”: adds the ranking of relation prediction given the question according to CluewebMapping. The additional CluewebMapping features improved overall 𝐹1 by 5% 24 Experiments ws .nju.edu.cn Test Result: Gold Retrieval:(a perfect IR front-end assumption) ranked the correct topic node top 1 Freebase Search API : returned the correct topic node 95% of the time in its top 10 result. The final 𝐹1 of 42.0% Previous best result (Berant et al. 2013) of 31.4% Relative improvement: 34% 25 Error Analysis ws .nju.edu.cn List questions are harder to answer Also weak in two types of questions Question with constraints on topics What is the new orleans hornets ? What was Reagan before president? Our features did not cover these temporal constraints such as new and before Counting questions (how many….) which require a special count()、argmax() operator on the answer candidates 26 Conclusion ws .nju.edu.cn Contribution A automatic method for Question Answering from KB (Freebase) Model: combine question features with answer patterns described by Freebase Data: helps with mapping between NL words and KB relations, CluewebMapping: ~3000 Freebase relations <-> 10,000 words Future Work More complicated questions Mapping of words for more implicit or high-order relations 27 致谢 ws .nju.edu.cn 欢迎老师和同学提问 28