* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Preposition and postposition wikipedia , lookup
Junction Grammar wikipedia , lookup
Kannada grammar wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
Macedonian grammar wikipedia , lookup
Portuguese grammar wikipedia , lookup
Zulu grammar wikipedia , lookup
Arabic grammar wikipedia , lookup
Old Irish grammar wikipedia , lookup
French grammar wikipedia , lookup
Compound (linguistics) wikipedia , lookup
Chinese grammar wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Morphology (linguistics) wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Spanish grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Turkish grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Lexical semantics wikipedia , lookup
Malay grammar wikipedia , lookup
Esperanto grammar wikipedia , lookup
English grammar wikipedia , lookup
CHAPTER 2 Grammars Hsin-Hsi Chen Grammar-1 Background Hsin-Hsi Chen Grammar-2 artificial intelligence communities computational linguistics limited domain only engineering communities acoustical speech recognition system “merry” vs. “very” “pan” vs. “ban” Hsin-Hsi Chen Grammar-3 “Please hand me the ??? ” “The choice you made was ??? good.” 他 正 在 讀 毒 書 ・extract the most likely words from signal ・decide their probabilities using information from surrounding words NLU community: parsing, interpretation, … SR community: statistical techniques, ... Hsin-Hsi Chen Grammar-4 Morphology & knowledge of words analysis of written language: Morphological analyzer morphology structure of words Parser syntax structure of sentences Semantic interpreter semantics meaning of individual sentences pragmatics how sentences relate to each other Hsin-Hsi Chen Grammar-5 Morphological Analyzer word (lexical item) dictionary (lexicon) morphological analyzer – apply morphological rule to finding the roots of words, e.g., going go, cats cat Hsin-Hsi Chen Grammar-6 Part of speech tagger part- of- speech (lexical category) : a set of words having similar syntactic properties Part of speech (symbol) Example noun (noun) dog, equation, concerts pronoun (pro) I, you, it, they, them possessive (pos) my, your verb (verb) is, touch, went, remitted Hsin-Hsi Chen Grammar-7 Part of speech (symbol) Example adjective (adj) red, large, remiss determiner (det) the, a , some proper noun ( prop) Alice, Romulus conjunction (conj) and, but, since preposition (prep) in, to, into preposition (prep) in, to, into auxiliary verb (aux) be, have modal verb (modal) will, can must, should adverb (adv) closely, quickly wh-word (wh) who, what, where final punctuation (fpunc) .?! Hsin-Hsi Chen Grammar-8 Tag Set 粗 細(Academia Sinica Balanced Corpus ) Tag mapping Hsin-Hsi Chen Grammar-9 Words’ Syntactic Functions Nouns refer to entities in the world like people, animals and things Determiners describe the particular reference of a noun Adjectives describe the properties of nouns Verbs are used to describe actions, activities and states Adverbs modify a verb in the same way as adjectives modify nouns Hsin-Hsi Chen Grammar-10 Words’ Syntactic Functions Prepositions are typically small words that express spatial or time relationships. Prepositions can also be used as particles to create phrasal verbs (e.g., add up) Conjunctions and complementizers link two words, phrases or clauses – A complementizer is a conjunction which marks a complement clause. – I know that he is here. Hsin-Hsi Chen Grammar-11 Features number, person, ... Singular Plural 1st person I, me, mine we, us, ours 2nd person you, yours you, yours 3rd person she, him, its they, them, theirs use of features grammatical checking (subject-verb agreement, ...) The dogs eat. The dog eats. semantic checking PP-attachment Hsin-Hsi Chen The hole that had been drilled by the woman in the wrench turned out to be very useful Grammar-12 Syntax and Context-Free Grammars syntactic structure “In the hotel the fake property was sold to visitors.” × s-maj (major sentence) s vp pp pp np np prep det noun det adj noun np aux verb prep noun fpunc in the hotel the fake property was sold Hsin-Hsi Chen to visitors . Grammar-13 Grammar: specification of permitted structures in a language Context-Free Grammars (CFGS) a set of terminal symbols (words & punctuation) a set of non-terminal symbols (pos & syntactic category, e.g., np, vp) start symbol rewrite rules non terminal (terminal| non-terminal) Hsin-Hsi Chen Grammar-14 P a r t (a) major sentence s-maj s fpunc s np vp vp verb np det noun P a r t (b) ambiguous grammar vp verb np noun vp verb np np verb np det noun noun noun np noun Hsin-Hsi Chen det noun verb fpunc the dog ate . salespeople sold biscuits Grammar-15 ambiguous “Salespeople sold the dog biscuits.” s-maj s vp np np noun Salespeople Hsin-Hsi Chen verb the noun noun sold the dog biscuits transitive verb fpunc . Grammar-16 s-maj s vp np np noun Salespeople np verb the noun noun sold the dog biscuits fpunc . Ditransitive verb Hsin-Hsi Chen Grammar-17 chomsky-normal form non-terminal terminal non-terminal non-terminal non-terminal X theory(Jackendoff) X’’’ specifier X’’ X’ Hsin-Hsi Chen X` X:v, n, ... modifier argument Grammar-18 multiplying out features (cf. unification- based method) s agree- s ment s Hsin-Hsi Chen np vp number: singular or plural np singular vp singular np plural vp plural Grammar-19 Local and Non-Local Dependencies A local dependency is a dependency between two words expressed within the same syntactic rule A non-local dependency is an instance in which two words can be syntactically dependent even though they occur far apart in a sentence (e.g., subject-verb agreement; long-distance dependencies such as wh-extraction). Non-local phenomena are a challenge for certain statistical NLP approaches (e.g., n-grams) that model local dependencies. Hsin-Hsi Chen Grammar-20 Semantic Roles A semantic role is the underlying relationship that a participant has with the main verb in a clause. Semantic case, thematic role, theta role (generative grammar), and deep case (case grammar) Most commonly, noun phrases are arguments of verbs. These arguments have semantic roles: the agent of an action, the patient and other roles such as the instrument or the goal. – John (agent) hit Bill (patient). vs. Bill (patient) was hit by John (agent). In English, these semantic roles correspond to the notions of subject and object. But things are complicated by the notions of direct and indirect object, active and passive voice. Hsin-Hsi Chen Grammar-21 Agent (施事者): a person or thing who is the doer of an event. – The boy ran down the street. Patient (受事者): the surface object of the verb in a sentence. – He opened the door. Instrument (工具格): an inanimate thing that an agent uses to implement an event. – The cook cut the cake with a knife. Goal (目標): thing toward which an action is directed – He threw the book at me. Beneficiary (受益): a referent which is advantaged or disadvantaged by an event. – John sold the car for a friend. More roles and examples Semantic Role Labeling Hsin-Hsi Chen Grammar-22 Subcategorization Different verbs can relate different numbers of entities: transitive versus intransitive verbs. Tightly related verb arguments are called complements but less tightly related ones are called adjuncts. Prototypical examples of adjuncts tell us time, place, or manner of the action or state described by the verb. Verbs are classified according to the type of complements they permit. This called subcategorization. Subcategorizations allow to capture syntactic as well as semantic regularities. Academic Sinica Dictionary (Format, Sample ) Hsin-Hsi Chen Grammar-23 Chinese PropBank () [arg0我][argm-adv已經][rel打][arg1電話][arg2給斯恩特] [arg1這些算盤][arg0 產業界自己][rel打]的[argm-ext最精] [arg1 鮑薩]被[arg0 泰森的鐵拳][rel打]得[arg2 爬不起來] [arg0 他][argm-tmp晚上]則到體育場[rel打][arg1 籃球] More examples 1-4937 () Hsin-Hsi Chen Grammar-24 Semantics Semantics is the study of the meaning of words, constructions, and utterances Semantics can be divided into two parts: lexical semantics and combination semantics Lexical semantics: – – – – 上下位:hypernymy (上義關係) vs. hyponymy (下義關係) antonymy (反義關係) vs. Synonymy(同義詞) meronymy/holonymy(部分-整体/材料-實體/成員-集體) homonymy(同音異義) vs. polysemy(一字多義 ) Compositionality: the meaning of the whole often differs from the meaning of the parts Idioms correspond to cases where the compound phrase means something completely different from itsGrammar-25 parts Hsin-Hsi Chen hypernym (上義關係) The generic term used to designate a whole class of specific instances. Y is a hypernym of X if X is a (kind of) Y . hyponym (下義關係) The specific term used to designate a member of a class. X is a hyponym of Y if X is a (kind of) Y . holonym (部分-整体/材料-實體/成員-集體) The name of the whole of which the meronym names a part. Y is a holonym of X if X is a part of Y . Hsin-Hsi Chen Grammar-26 Pragmatics Pragmatics is the area of studies that goes beyond the study of the meaning of a sentence and tries to explain what the speaker really is expressing Understand the scope of quantifiers, speech acts, discourse analysis, anaphoric relations The resolution of anaphoric relations is crucial to the task of information extraction Hsin-Hsi Chen Grammar-27 Scope of Quantifiers 每個學生都買了一本書 有一本書每個學生都買了 有一個學生不買所有的書 有一個學生所有的書都不買 不是所有的書都有一個學生買 沒有一個學生買了所有的書 所有的書都有一個學生不買 所有的書都沒有一個學生買 Hsin-Hsi Chen Grammar-28 Speech Act Examples Utterance Speech Act Chunked Semantic Concepts U 我想要掛眼科門診。(I want to register outpatient service of ophthalmology department) Elaborate-fact (DEPARTMENT_TYPE) (SERVICE_TYPE) (PERSON) (WANT) (REGISTER) S 請問要在總院看還是公館分 部看?(Would you like to go to main hospital or Gong-Gwen branch?) Request-ref (BRANCH_NAME1) (BRANCH_NAME2) (REQUEST) (WANT) (AT) (SEE_DOC) (QWORD) (SEE_DOC) U 總院。(The main hospital) Answer-ref (BRANCH_NAME) S 請問您是第一次到台大看診 嗎。(Is this your first time consulting a doctor at NTU Hospital?) Request-if (GENERAL_NUM) (REQUEST)(YES) (COME_TIME) (COME)(HERE) (SEE_DOC) (GROUND) U 不是。(No.) Answer-if (NO) S 您掛的是眼科薛琴醫生三月 二十九日十號。(You have registered Dr. Shue-Ching on March 29.) Elaborate-fact (DEPARTMENT_TYPE) (DOCTOR_NAME) (DATE_VALUE) (PERSON) (REGISTER)(YES) (DOCTOR) U 好了。(OK.) Accept (YES) Anaphoric Relations Anaphora vs. Co-Reference – Anaphora 例一 張 三 是 老 師, 他 1 教 學 很 認 真, 同 時, 他 2 也 是 一 個 好 爸 爸。 例二 現 在 的 氣 溫 是 攝 氏 3 0 度 。 – 同指涉包括 • Type/Instance: “老師”/“張三”, “一個好爸 爸”/“張三” • Function/Value: “現在的氣溫”/“攝氏 30 度” • NP 的同指涉關係: “一隻小花貓”/“那隻貓” Hsin-Hsi Chen Grammar-30 Discourse Analysis 1a: 佛羅倫斯哪個博物館在1993年的爆炸事件中受到破 壞? 1b: 這個事件哪一天發生? – 問句1b「這個事件」,指的是問句1a「1993年的爆炸事件」 1c: 哪些展覽室受到牽連? – 問句1c中「展覽室」,指的是1a的答句「某個佛羅倫斯博物館」 的展覽室 1d: 有多少人被殺? – 問句1d中的「人」,是問句1a「1993年的爆炸事件」被殺的人 1e: 這些人來自哪裡? – 問句1e中的「這些人」指的是問句1d提到被殺的人 1f: 爆炸物的量有多少? – 問句1f中的「爆炸物」是問句1a「1993年的爆炸事件」中的爆 炸物 Hsin-Hsi Chen Grammar-31 Grammars Hsin-Hsi Chen Grammar-32 Grammar as knowledge representation. a way to represent certain aspects of what we know about a language General criteria of grammar formalism – linguistic naturalness, – mathematical power, and – computational effectiveness. Hsin-Hsi Chen Grammar-33 NL employs the following knowledge. – A representation for syntactic categories or ‘parts of speech’. – A data type for words (and hence a lexicon, dictionary or word list ). – A data type for syntactic rules. – A data type for syntactic structures. Hsin-Hsi Chen Grammar-34 Words, rules and structures – Lexicon: associate each word with its properties e.g. syntactic category (verb, noun, ...) subcategory (transitive, intransitive, ...) morphological information, semantic information. Hsin-Hsi Chen Grammar-35 PATR notation (words) (Ex 1) Word paid: <cat> = V. Syntactic category (Ex 2) Word paid: <cat> = V <tense> = past past tense <arg1> = NP. transitive A feature description is always partial. Hsin-Hsi Chen Grammar-36 Phrase structure rule a particular syntactic category(LHS) is composed of (RHS). S NP VP (a sentence consists of noun phrase followed by a verb phrase) VP V NP (a verb phrase consists of a verb followed by a noun phrase ) Hsin-Hsi Chen Grammar-37 Phrase structure rules introduce lexical items. V paid ===> word paid: <cat> = V Example. Grammar 1 Word Dr Chan: Rule {simple sentence formation} S NP VP Rule {transitive verb} VP V NP Rule {intransitive verb} VP V. Hsin-Hsi Chen <cat > = NP. Word nurses: <cat > = NP. Word MediCenter: <cat > = NP. Word Patients: <cat > = NP. Word died: <cat> = V. Word employed: <cat> = V. Grammar-38 Context-free phrase structure grammars (CF-PSGS) LHS: category regardless RHS: category or symbol. of context Functions of a grammar – (1) Define the sets of grammatical words in a language. – (2) Associate one or more structures with each grammatical string. Hsin-Hsi Chen Grammar-39 S NP (1) Representation of parsing trees VP MediCenter V employed NP nurses (2) [s, [np, ‘MediCenter’], [vp, [v, employed],[np, nurses]]] (3) [s [np ‘MediCenter’] [vp [v employed] [np Grammar-40 Hsin-Hsi Chen nurses]]] (4) s (np(‘MediCenter’), vp(v(employed), np(nurses))) How to evaluate a particular grammar for a (fragment of a) language? (1) Does it undergenerate? (2) Does it overgenerate? (3) Does it assign appropriate structures to the strings that it generates? Hsin-Hsi Chen Grammar-41 undergenerate: There are some syntactically well-formed expressions to which grammar assigns no structure. (undergeneration is not necessarily a problem if the goal is to produce a language.) NL grammar Hsin-Hsi Chen Grammar-42 overgenerate: Grammar legitimates strings that cannot be constructed as grammatical expressions of the language in question. (Overgeneration is not necessarily a problem if the goal is to recognize or understand well-formed language) NL grammar Hsin-Hsi Chen Grammar-43 Subcategorization and the use of features *Dr. Chan died patients. *MediCenter employed. died: intransitive employed: transitive Hsin-Hsi Chen Grammar-44 Example. Grammar 2 Word died: Word and: Rule {simple sentence formation} <cat > = V <cat> = C S NP VP <arg1> = 0 Word or: Rule {intransitive verb} Word recovered: <cat> = C VP V <cat> = V <V arg 1>=0 <arg1> = 0 Rule {single complement verbs} Word slept: VP V X <cat> = V <V arg 1>=<X cat> <arg1>= 0 Rule {coordination of identical categories} Word employed: X0 X1 C X2 <cat> = V <X0 cat>=<X1 cat> <arg1> = NP <X0 cat>=<X2 cat> Word paid: <X0 arg 1>=<X1 arg 1> <cat> = V <X0 arg 1>=<X2 arg 1> <arg1> = NP Word nursed: <cat> = V <arg1> = NP Grammar-45 Hsin-Hsi Chen Examples Nurses died. *Nurses died patients. *MediCenter employed. MediCenter employed nurses. Hsin-Hsi Chen Grammar-46 Phrase structure tree for sentence with conjunction S S C NP Dr Chen VP V and NP employed nurses S NP VP nurses VP C VP V died Hsin-Hsi Chen or VP recovered Grammar-47 Grammar 2 admits all of the following examples: Nurses died. Nurses died and patients recovered. Nurses died and patients recovered and Dr Chan slept and MediCenter employed nurses. Nurses died and patients recovered and Dr Chan slept and MediCenter employed nurses and … Grammar 3 = Grammar 2 + Rule {prepositional phrases} PP P X: <P arg1> =<X cat> Hsin-Hsi Chen Grammar-48 Example: Lexicon3 Word approved: <cat>=V <arg1>=PP Word disapproved: <cat>=V <arg1>=PP Word appeared: <cat>=V <arg1>=AP Word seemed: <cat>=V <arg1>=AP Word had: <cat>=V <arg1>=VP Word believed: Hsin-Hsi Chen <cat>=V <arg1>=PP Word thought: <cat>=V <arg1>=S Word of: <cat>=P <arg1>=NP Word fit: <cat>=AP Word competent: <Cat>=AP Word well-qualified: <cat>=AP Grammar-49 Grammar 3 admits additional examples like: Nurse thought Dr Chan seemed competent. Dr Chan appeared well-qualified and disapproved of MediCenter. Patients had believed nurses thought Dr Chan had slept. sentential or verb phrase complements of a verb Hsin-Hsi Chen Grammar-50 Long-distance dependencies wh-question, relative clause, … “Whom did Freg give the ball to e?” dependency “Whom does Alice believe Freg wants to give the ball to e?” dependency Hsin-Hsi Chen Grammar-51 Unbounded dependencies-no limit on the amount of intervening material that can occur between displaced constituent and the empty constituent. cf. ATN HOLD list. (1971) Hsin-Hsi Chen Grammar-52 Slash categories s/np, vp/np, … s wh s/np s/np vp s/np np vp/np vp verb np pp/np Hsin-Hsi Chen Grammar-53 Example: Rules for Grammar 4 – x/y: a category x0 whose cat is x and whose slash is y. <x0 cat> = x, <x0 slash> = y. – Interpretation: an expression of category x/y an expression of category x from which an expression of category y is missing. – e.g. S/NP: a sentence (or clause) that has got a noun phrase missing. VP/PP: a verb phrase that is missing a prepositional phrase. Grammar-54 Hsin-Hsi Chen Example. Phrase structure tree with slash categories Dr. Chan, nurses thought MediCenter had employed … S NP S/NP Dr Chan NP nurses VP/NP V S/NP thought NP VP/NP MediCenter V had VP/NP V employed NP/NP - Transfer of information about a missing category from mother to daughter. Hsin-Hsi Chen Grammar-55 Rule {simple sentence formation} S NP VP: <S slash>=<VP slash> <NP slash>=0. Rule {intransitive verb} VP V: <V arg1>=0 <V slash>=0 <VP slash>=0. Rule{single complement verbs} VP V X: <V arg1>=<X cat> <V slash>=0 <VP slash>=<X slash>. Rule{prepositional phrases} PP P X: <P arg1>=<X cat> <P slash>= 0 Hsin-Hsi Chen <PP slash>=<X slash>. Rule X0 X1 X2 <X0 cat>=S <X1 cat>=NP <X2 cat>=VP <X0 slash>=<X2 slash> <X1 slash>=0 S/Y NP VP/Y Rule S/S NP VP/S. Rule S/NP NP VP/NP. Rule S/VP NP VP/VP. Rule S/AP NP VP/AP . Grammar-56 Rule {coordination} Rule {slash elimination} PP X1 C X2: X0 : <X0 cat>=<X0 slash> <X0 cat>=<X1 cat> <X0 empty>=yes. <X0 cat>=<X2 cat> <C slash>=0 X/X . <X0 slash>=<X1 slash> <X0 slash>=<X2 slash> <X0 arg1>=<X1 arg1> <X0 arg1>=<X2 arg1> . Rule{topicalization} S S S/X A sentence can consist of some category X0 X1 X2: followed by a sentence which is missing an <X0 cat>=S expression of that category. <X1 empty>=no MediCenter, nurses disapproved of _. <X2 cat>=S Of MediCenter, nurses disapproved _. <X2 slash>=<X1 cat> Well-qualified, Dr Chan had seemed _. <X2 empty>=no Grammar-57 Hsin-Hsi Chen <X0 slash>=<X1 slash>. Classes of grammars and languages Recursively enumerable sets (type 0) Context-sensitive languages (type 1) Indexed language Chomsky hierarchy of languages Context-free languages (type 2) Finite-state languages (type 3) Hsin-Hsi Chen Grammar-58 Context-free grammars and languages CF-PSGS: exist high-efficiency algorithms for recognizing and parsing CFLs. types CF-PSGs RTNs factors Hsin-Hsi Chen power equivalent semantics declarative procedural iteration recursion yes Grammar-59 Indexed grammars and languages anbncn problem <X cat>=S, <X stack top>=a, <X stack stack top>=a, <X stack stack stack top>=z. cat:s stack: top:a X stack: top:a stack:[top:z ] stack:? Hsin-Hsi Chen Grammar-60 Grammar Rule{push endmarker on to stack} S a A: <A stack top>=z <A stack stack>=<S stack>. Rule {push ‘a ’ on to stack} A0 a A1: <A0 cat>=A <A1 cat>=A <A1 stack top >=a <A1 stack stack >=<A0 stack>. Rule {copy stack from A to B} A B: <A stack >=<B stack >. Rule {pop ‘a’ off stack} B0 b B1 c: <B0 cat>=B <B1 cat>=B <B0 stack top>=a <B0 stack stack>=<B1 stack>. Rule {pop endmarker off stack} B b c: For Hsin-Hsi<B Chen stack top>=z. stack : A 2 top : z stack : 1 cat A : A1 stack : top : a stack : 2 3 4 3 every n, there is a category x s.t. Grammar-61 <x stackn top >=z