Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Learn Question Focus and Dependency Relations from Web Search Results for Question Classification Wen-Hsiang Lu (盧文祥) [email protected] Web Mining and Multilingual Knowledge System Laboratory, Department of Computer Science and Information Engineering, National Cheng Kung University 2017/5/22 WMMKS Lab 1 Research Interest Web Mining Natural Language Processing 2017/5/22 WMMKS Lab Information Retrieval 2 Research Issues Unknown Term Translation & Cross-Language Information Retrieval A Multi-Stage Translation Extraction Method for Unknown Terms Using Web Search Results Question Answering & Machine Translation Using Web Search Results to Learn Question Focus and Dependency Relations for Question Classification Using Phrase and Fluency to Improve Statistical Machine Translation User Modeling & Web Search Learning Question Structure based on Website Link Structure to Improve Natural Language Search Improving Short-Query Web Search based on User Goal Identification Cross-Language Medical Information Retrieval MMODE: http://mmode.no-ip.org/ 2017/5/22 WMMKS Lab 3 雅各氏症候群 2017/5/22 WMMKS Lab 4 Outline Introduction Related Work Approach Experiment Conclusion Future Work 2017/5/22 WMMKS Lab 5 Outline Introduction Related Work Approach Experiment Conclusion Future Work 2017/5/22 WMMKS Lab 6 Question Answering (QA) System 1. Question Analysis: Question Classification, Keywords Extraction. 2. Document Retrieval: Retrieve related documents. 3. Answer Extraction: Extract a exact answer. 2017/5/22 WMMKS Lab 7 Motivation (1/3) Importance of Question Classification Dan Moldovan proposed a report [Dan Moldovan 2000] 2017/5/22 WMMKS Lab 8 Motivation (2/3) Rule-based Question Classification Manual and unrealistic method. Machine Learning-based Question Classification Support Vector Machine (SVM) . Need a large number of training data. . Too many features may be noise. 2017/5/22 WMMKS Lab 9 Motivation (3/3) A new method for question classification. Observe some useful features of question. Solve the problem of insufficient training data. 2017/5/22 WMMKS Lab 10 Idea of Approach (1/4) Many questions have ambiguous question words Importance of Question Focus (QF). Use QF identification for question classification. 2017/5/22 WMMKS Lab 11 Idea of Approach (2/4) If we do not have enough information to identify the type of QF. Question QF Dependency Verb Dependency Quantifier Dependency Noun Question Type : Dependency Features : Question Type : (Unigram) Semantic Dependency Relation : (Bigram) Semantic Dependency Relation 2017/5/22 WMMKS Lab 12 Idea of Approach (3/4) Example 2017/5/22 WMMKS Lab 13 Idea of Approach (4/4) Use QF and dependency features to classify questions. Learning QF and other dependency features from Web. Propose a Semantic Dependency Relation Model (SDRM). 2017/5/22 WMMKS Lab 14 Outline Introduction Related Work Approach Experiment Conclusion Future Work 2017/5/22 WMMKS Lab 15 Rule-based Question Classification [Richard F. E. Sutcliffe 2005][Kui-Lam Kwok 2005][Ellen Riloff 2000] 5W(Who, When, Where, What, Why) Who → Person. When → Time. Where → Location. What → Difficult type. Why → Reason. 2017/5/22 WMMKS Lab 16 Machine Learning-based Question Classification Several methods based on SVM. [Zhang, 2003; Suzuki, 2003; Day, 2005] Question 2017/5/22 KDAG Kernel Feature Vector WMMKS Lab SVM Question Type 17 Web-based Question Classification Use a Web search engine to identify question type. [Solorio, 2004] “Who is the President of the French Republic?” 2017/5/22 WMMKS Lab 18 Statistics-based Question Classification Language Model for Question Classification [Li, 2002] Too many features may be noise. 2017/5/22 WMMKS Lab 19 Outline Introduction Related Work Approach Experiment Conclusion Future Work 2017/5/22 WMMKS Lab 20 Architecture of Question Classification 2017/5/22 WMMKS Lab 21 Question Type 6 types of questions Person Location Organization Number Date Artifact 2017/5/22 WMMKS Lab 22 Basic Classification Rules We define 17 basic rules for simple questions. 2017/5/22 WMMKS Lab 23 Learning Semantic Dependency Features (1/3) Architecture for Learning Dependency Features Extracting Dependency Features Algorithm 2017/5/22 WMMKS Lab 24 Learning Semantic Dependency Features (2/3) Architecture for Learning Dependency Features 2017/5/22 WMMKS Lab 25 Learning Semantic Dependency Features (3/3) Extracting Dependency Features Algorithm .. 2017/5/22 WMMKS Lab 26 Question Focus Identification Algorithm (1/2) Algorithm 2017/5/22 WMMKS Lab 27 Question Focus Identification Algorithm (2/2) Example 2017/5/22 WMMKS Lab 28 Semantic Dependency Relation Model (SDMR) (1/12) Unigram-SDRM Bigram-SDRM 2017/5/22 WMMKS Lab 29 Semantic Dependency Relation Model (SDMR) (2/12) Unigram-SDRM Q Question P(C|Q) C Question Type P(C|Q) need many questions to train. 2017/5/22 WMMKS Lab 30 Semantic Dependency Relation Model (SDMR) (3/12) Unigram-SDRM C Question Type P(DC|C) DC P(Q|DC) Web search result Q Question P(DC|C): Collect related search results by every type. P(Q|DC): Use DC to determine the question type. 2017/5/22 WMMKS Lab 31 Semantic Dependency Relation Model (SDRM) (4/12) Unigram-SDRM 2017/5/22 WMMKS Lab 32 Semantic Dependency Relation Model (SDRM) (5/12) Unigram-SDRM Q={QF,QD}, QD={DV,DQ,DN}. 2017/5/22 WMMKS Lab DV : Dependency Verb DQ: Dependency Quantifier DN: Dependency Noun 33 Semantic Dependency Relation Model (SDRM) (6/12) Unigram-SDRM DV={ dv1, dv2,⋯,dvi}, DQ={ dq1, dq2,⋯, dqj}, DN={ dn1, dn2,⋯, dnk}. 2017/5/22 WMMKS Lab 34 Semantic Dependency Relation Model (SDRM) (7/12) Parameter Estimation of Unigram-SDRM P(DC|C) P(QF |DC), P(dv|DC), P(dq|DC), P(dn|DC) N(QF): The number of occurrence of the QF in Q. NQF(DC): Total number of all QF collected from search results. 2017/5/22 WMMKS Lab 35 Semantic Dependency Relation Model (SDRM) (8/12) Parameter Estimation of Unigram-SDRM 2017/5/22 WMMKS Lab 36 Semantic Dependency Relation Model (SDRM) (9/12) Bigram-SDRM 2017/5/22 WMMKS Lab 37 Semantic Dependency Relation Model (SDRM) (10/12) Bigram-SDRM 2017/5/22 WMMKS Lab 38 Semantic Dependency Relation Model (SDRM) (11/12) Parameter Estimation of Bigram-SDRM P(DC|C): The same as Unigram-SDRM P(QF|DC): The same as Unigram-SDRM P(dV|QF,DC), P(dQ|QF,DC), P(dN|QF,DC) Nsentence(dv,QF): The number of sentence containing dv and QF. Nsentence(QF): Total number of sentence containing QF. 2017/5/22 WMMKS Lab 39 Semantic Dependency Relation Model (SDRM) (12/12) Parameter Estimation of Bigram-SDRM 2017/5/22 WMMKS Lab 40 Outline Introduction Related Work Approach Experiment Conclusion Future Work 2017/5/22 WMMKS Lab 41 Experiment SDRM Performance Evaluation . Unigram-SDRM v.s. Bigram-SDRM . Combination with different weights SDRM v.s. Language Model . Use questions as training data . Use Web as training data . Questions v.s. Web 2017/5/22 WMMKS Lab 42 Experimental Data Collect questions from NTCIR-5 CLQA. 4-fold cross-validation. 2017/5/22 WMMKS Lab 43 Unigram-SDRM v.s. Bigram-SDRM Result 2017/5/22 WMMKS Lab 44 Unigram-SDRM v.s. Bigram-SDRM (2/2) Example For unigram: “人”,”創下”,”駕駛” are trained successfully. For bigram: “人_創下” are not trained successfully. 2017/5/22 WMMKS Lab 45 Combination with different weight (1/3) Different weights for different features α: The weight of QF, β: The weight of dV, γ: The weight of dQ, δ: The weight of dN. 2017/5/22 WMMKS Lab 46 Combination with different weight (2/3) Comparison of 4 dependency features 2017/5/22 WMMKS Lab 47 Combination with different weight (3/3) 16 experiments Best weighting: 0.23QF, 0.29DV, 0.48DQ. To solve some problem about mathematics. Example: QF and DV α: The weight of QF β: The weight of DV. α=(1-0.77)/[(1-0.77)+(1-0.71)] β=(1-0.71)/ [(1-0.77)+(1-0.71)] 2017/5/22 WMMKS Lab 48 Use questions as training data (1/2) Result 2017/5/22 WMMKS Lab 49 Use questions as training data (2/2) Example For LM: “網球選手”,”選手為” are not trained successfully. For SDRM: “選手”, ”奪得” are trained successfully. 2017/5/22 WMMKS Lab 50 Use Web search results as training data (1/2) Result 2017/5/22 WMMKS Lab 51 Use Web search results as training data (2/2) Example For LM: “何國” are not trained successfully. For SDRM: “國”, ”設於” are trained successfully. 2017/5/22 WMMKS Lab 52 Question v.s. Web (1/3) Result Trained Question: LM can train QF of the question. Untrained Question: LM can’t train QF of the question. 2017/5/22 WMMKS Lab 53 Question v.s. Web (2/3) Example of trained question For LM: “何地” are trained successfully. For SDRM: “地”, ”舉行” are trained successfully, but these terms are also trained on other types. 2017/5/22 WMMKS Lab 54 Question vs. Web (3/3) Example of untrained question For LM: “女星”, ”獲得” are not trained successfully. For SDRM: “女星”, ”獲得” are trained successfully. 2017/5/22 WMMKS Lab 55 Conclusion Discussion We need to enhance our learning method and performance. We need better smoothing method. Conclusion We propose a new model SDRM which uses question focus and dependency features for question classification. Use Web search results as training data to solve the problem of insufficient training data. 2017/5/22 WMMKS Lab 56 Future Work Further works in the future Enhance the performance of learning method. Consider the importance of features in the question. Question focus and dependency features may be used for other process steps of question answer systems. 2017/5/22 WMMKS Lab 57 Thank You 2017/5/22 WMMKS Lab 58