Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Question Answering & Linked Data Wang Yong Content • Overview of QA System • Template-based Question Answering • Open Question Answering Over Multiple Knowledge Bases • Structured data and inference in DeepQA • Conclusion General Structure of QA System[1] Natural Language Question Linguistic Tools and Resources Question Analysis Index Matching with Data Scoring Query Construction Answer(s) Answer retrieval corpora KDs Ontology Data Sources Main challenges • Variability of Natural Language ▫ How can you tell if you have the flu? ▫ What are signs of the flu? • Complexity of Natural Language ▫ Of current U.N. member countries with 4-letter names, the one that is first alphabetically. ▫ Who produced the most films? Main challenges • Gap between Natural Language and Data Sources ▫ String Differences wife of, husband of ---- dbo:spouse ▫ Structure Differences Who are the great-grandchildren of Bruce Lee? dbo:child • Quality and Heterogeneity of Data Sources ▫ Completeness and accuracy Open Information Extraction ▫ Different Schemas dbo:location dbo:headquarter dbo:locationCity Template-based Question Answering[2] Motivation • Traditional methods map a natural language question to a triple-based representation ▫ Who wrote The Neverending Story? ▫ <person; wrote; Neverending Story> • Some question can be represented this way ▫ Which cities have more than three universities? ▫ <cities; more than; three universities> SELECT ?y WHERE { ?x rdf:type onto:University . ?x onto:city ?y . } HAVING (COUNT(?x) > 3) Solution • SPARQL template ▫ syntactic structure of natural language question ▫ domain-independent expressions Which y p more than N x? SELECT ?y WHERE { ?x rdf:type ?c . ?x ?p ?y . } HAVING (COUNT(?x) > N) Implementation • Lexicalized Tree Adjoining Grammar (LTAG) • discourse representation Structure (DRS) ▫ Based on manual compiled grammars and rules Natural language input grammar parser LTAG derivation Tree semantic construction syntactic construction Natural language input Scope resolution DRS formal query Experiment • 50 questions from the QALD benchmark ▫ 11 questions are not in the analysis scope ▫ 5 questions cannot be parsed unknown syntactic constructions uncovered domain-independent expressions Who has been the 5th president of the United States of America? ▫ 19 have correct answer, 2 are almost correct ▫ 13 are wrong or under the threshold • Main problem ▫ entity identification (Give me all movies with Tom Cruise?) ▫ query selection Open Question Answering Over Multiple Knowledge Bases[3] Motivation • One knowledge base can not answer all questions • Open Question Answering need information from different knowledge bases • Natural language has high variability • Different knowledge bases use different knowledge expression Solution • • • • Scope: simple factoid questions Paraphrase to overcome natural language variability Rewrite to match KB schema Express question as triples to utilize all KBs ▫ What fruits are a source of vitamin C? ▫ ?x : (?x, is-a, fruit) (?x, source of, vitamin c) SELECT t0.arg1 FROM triples AS t0, triples AS t1 WHERE keyword-match(t0.rel, "is-a") AND keyword-match(t0.arg2, "fruit") AND keyword-match(t1.rel, "source of") AND keyword-match(t1.arg2, "vitamin c") AND string-similarity(t0.arg1, t1.arg1) > 0.9 Implementation Question Paraphrase 5 million mined Operators From wikiAnswers How can you tell if you have the flu? Question Parse 10 high-precision templates Manual created Rewrite 74 million mined operators Mined from corpora What are signs of the flu? Query ?x: (?x, sign of, the flu) Query ?x: (the flu, symptoms, ?x) Execute 1 billion assertions Answer (the flu, symptoms include, chills) Experiment • KBs: ▫ Freebase, Open IE, Probase and NELL • Training over Question and Answer Pairs ▫ Linear scoring function ▫ latent-variable structured perceptron algorithm • Question and Answer pairs ▫ WebQuestions, TREC, WikiAnswers Experiment Structured data and inference in DeepQA[4] Motivation • Unstructured data ▫ Broad coverage ▫ Low-precision • structured data ▫ ▫ ▫ ▫ incomplete high-precision Has formal semantics logical reasoning (common sense reasoning/implicit evidence) Temporal and geospatial reasoning • Detect time relations: ▫ TLink, birthDate, deathDate • Compute temporally compatible ▫ birthdate < TLink < deathDate • Detect spatial relations ▫ relative direction, border, containment, near, far • Convert to geo-coordinates from Dbpedia to compute distance or other geospatial relations ▫ the symmetry of the borders relation ▫ transitivity of the containment relation • Evaluation ▫ 1% to 2% improvement in accuracy Taxonomic reasoning • check candidate answer’s type ▫ Data Source: Dbpedia, YAGO ▫ candidate answer – an entity resource ▫ question lexical answer type(LAT) - a class in the type system WordNet, domain-specific type-mapping file, statistical relatedness ▫ Soring Equivalent/subclass, Disjoint, Sibling, Superclass… • Evaluation ▫ 3%–4% improvement in accuracy Conclusion • Analysis of complex problem is a nontrivial problem • manual compiled grammars and rules • Mapping between natural language and KBs has significant impact on the accuracy • Semantics light expression(in), structure differences(gf) • Structured Data is incomplete, need help from unstructured data Reference • [1] Unger, C., Freitas, A., Cimiano P.: An Introduction to Question Answering over Linked Data. Reasoning Web. Reasoning on the Web in the Big Data Era, LNCS, pp. 100-140 (2014) • [2] Unger, C., Bühmann, L., Lehmann, et al.: Template-based question answering over RDF data. In: Proceedings of the 21st International Conference on World Wide Web, pp. 639–648. ACM (2012) • [3] Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, KDD (2014) • [4] Kalyanpur, A., et al.: Structured data and inference in DeepQA. IBM Journal of Research & Development 56(3/4) (2012)