Download Building Semantic Parser Overnight

Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer Abstract  Functionality driven process for rapidly building a semantic parser in a new domain  The logical forms are meant to cover the desired set of compositional operators, and the canonical utterances are meant to capture the meaning of the logical forms (although clumsily).  Then crowdsourcing is used to paraphrase these canonical utterances into natural utterances. The resulting data is used to train the semantic parser  Study compositionality …paraphrases  Tested on 7 new domains Logical form • The logical form of a sentence is the form obtained by abstracting out the subject matter of its content terms or by regarding the content terms as mere placeholders or blanks on a form. In an ideal logical language, the logical form can be determined from syntax alone • Original argument – All humans are mortal. – Socrates is human. – Therefore, Socrates is mortal. • Argument Form – All H are M. – S is H. – Therefore, S is M • Multiple logic forms for one sentence and one logic for may correspond to multiple sentences. Seed Lexicon (L) • fixed database w set of triples (e1, p, e2), • where e1 and e2 are entities (e.g., article1, 2015) and p is a property (e.g., publicationDate). • The purpose of L is to simply connect each predicate with some representation in natural language • L: – <t → s[p]> • t is in natural language (representation) • p is a database property/entity • S is a category ex(RELNP,TYPENP) – <person → TYPENP[person]> here person is the natural lang. representation – And TYPENP[person] is a logical representation - Examples • • • • • ‘person’ has the syntactic category TYPENP, All entities ‘alice’ , ’1950’ are ENTITYNP. Properties ‘publication date’ are RELNP Unary predicates are realized as verb phrases VP. binaries as either relational noun phrases (RELNP) or generalized transitive verbs (VP/NP). Domain General Grammer Canonical Utterances • “article that has the largest publication date” and arg max( type.article, publicationDate)). • Lambda DCS is the logical language used Paraphrasing • Synonym level : (“block” to “brick”) • RELNP -> prepositions (“meeting whose attendee is alice ⇒ meeting with alice”) • complex RELNP => argument can become embedded: “player whose number of points is 15 ⇒ “player who scored 15 points” • Superlative/comparative constructions => other RELNP-dependent “article that has the largest publication date ⇒ newest article” Some examples “housing unit whose housing type is apartment ⇒ apartment” “university of student alice whose field of study is music” becomes “At which university did Alice study music?” Assumptions • Canonical compositionality : Using a small grammar, all logical forms expressible in natural language can be realized compositionally based on the logical form. • Sublexical Compositionality : – Our hypothesis is that the sublexical compositional units are small, so we only need to crowdsource a small number of canonical utterances to learn about most of the language variability in the given domain – “parent of alice whose gender is female ⇒ “mother of alice” – “person that is author of paper whose =>author is X ⇒ co-author of X” Bounded non-compositionality Natural utterances for expressing complex logical forms are compositional with respect to fragments of bounded size – “NP[number of NP[article CP[whose publication date is larger than NP[publication date of article 1]]]]” -> “How many articles were published after article 1?” Crowdsourcing • Amazon Mechanical Turk (AMT) to paraphrase the canonical utterances • Paraphrases that share the same canonical utterance are collapsed, while identical paraphrases that have distinct canonical utterances are deleted. • 26,098 examples collected over all domains • 20 examples in each domain were manually analysed, and found that 17% of the utterances were inaccurate. Domains, x,c Model and Learning • Log linear distribution over candidate pairs (z, c) ∈ GEN(G ∪ Lx): • G : domain general grammer • “article published in 2015 that cites article 1” Lx or T(x) : 2015 → NP[2015] article 1 → NP[article1] Features – Basic + Lexical Accuracies Analysis  Tested – 7 domains  Data : Generated facts using entities and properties  Training : 80% Test : 20%  Accuracy - fraction of examples that yield correct denotation. Error Analysis  70% due to paraphrasing model “restaurants that have waiters and you can sit outside” >>> “restaurant that has waiter service and that takes reservations”  12.5% - Reordering issues “What venue has fewer than two articles” >>> “article that has less than two venue” Thank you Sublexical compositionality • The idea is that common, multi-part concepts are compressed to single words or simpler constructions. • “person that is author of paper whose author is X ⇒ co-author of X” “person whose birthdate is birthdate of X ⇒ person born on the same day as X” “meeting whose start time is 3pm and whose end time is 5pm ⇒ meetings between 3pm and 5pm” “that allows cats and that allows dogs ⇒ that allows pets”

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Building Semantic Parser Overnight