Download Natural Language Processing

Natural Language Processing CS480/580 Levels of Linguistic Analysis • Phonology---recognize speech sounds • Morphology---analysis of word forms (e.g., adding s to make a plural etc.) • Syntax---sentence structure • Semantics---meaning • Pragmatics---relation of language to context Tokenization • A string broken into words, punctuations removed, and key information represented as a sequence of words or tokens. • E.g., “How are you today?” is converted to [how, are, you, today]. Tokenize.pl lower_case(A, B) :- A>=65, A=<90, !, B is A+32. lower_case(A, A). tokenize([], []) :- !. tokenize(A, [B|E]) :- grab_word(A, C, D), name(B, C), tokenize(D, E). punctuation_mark(A) :- A=<47. punctuation_mark(A) :- A>=58, A=<64. punctuation_mark(A) :- A>=91, A=<96. punctuation_mark(A) :- A>=123. grab_word([32|A], [], A) :- !. grab_word([], [], []). grab_word([A|B], C, D) :- punctuation_mark(A), !, grab_word(B, C, D). grab_word([D|A], [E|B], C) :- grab_word(A, B, C), lower_case(D, E). tokenize("This is CS480/580 course", X). X = [this, is, cs480580, course]. name(john,X). X = [106, 111, 104, 110]. Template System • Templates --- stored sentence patterns • Each template is accompanied by a translation schema • E.g., [X, is, a , Y] is translated to Y(X). • process([X, is, a, Y]) :- Fact =.. [Y, X], assert(Fact). • Process([is, X, a T]) :- Query =.. [Y, X], call(Query). Template.pl grab_word([32|A], [], A) :- !. grab_word([], [], []). grab_word([A|B], C, D) :- punctuation_mark(A), !, grab_word(B, C, D). grab_word([D|A], [E|B], C) :- grab_word(A, B, C), lower_case(D, E). punctuation_mark(A) :- A=<47. punctuation_mark(A) :- A>=58, A=<64. punctuation_mark(A) :- A>=91, A=<96. punctuation_mark(A) :- A>=123. lower_case(A, B) :- A>=65, A=<90, !, B is A+32. lower_case(A, A). write_str([A|B]) :- put(A), write_str(B). write_str([]). read_str_aux(-1, []) :- !. read_str_aux(10, []) :- !. read_str_aux(13, []) :- !. read_str_aux(A, [A|B]) :- read_str(B). do_one_sentence :- write(>), read_str(A), tokenize(A, B), process(B). note(A) :- asserta(A), write('OK'), nl. read_atom(A) :- read_str(B), name(A, B). start :- write('TEMPLATE.PL at your service.'), nl, write('Terminate by pressing Break.'), nl, repeat, do_one_sentence, fail. check(A) :- call(A), !, write('Yes.'), nl. check(_) :- write('Not as far as I know.'), nl. read_num(A) :- read_str(B), name(A, B). remove_s(A, C) :- name(A, B), remove_s_list(B, D), name(C, D). read_str(B) :- get0(A), read_str_aux(A, B). remove_s_list([115], []). remove_s_list([A|B], [A|C]) :- remove_s_list(B, C). process([B, is, a, A]) :- !, C=..[A, B], note(C). process([A, is, an, B]) :- !, process([A, is, a, B]). process([is, B, a, A]) :- !, C=.. [A, B], check(C). process([is, A, an, B]) :- !, process([is, A, a, B]). process([A, are, B]) :- !, remove_s(A, D), remove_s(B, C), F=..[C, E], G=..[D, E], note((F:-G)). process([does, B, A]) :- !, C=..[A, B], check(C). process([A, B]) :- \+ remove_s(A, _), remove_s(B, C), !, D=..[C, A], note(D). process([A, B]) :- remove_s(A, C), \+ remove_s(B, _), !, E=..[B, D], F=..[C, D], note((E:-F)). process(_) :- write('I do not understand.'), nl. tokenize([], []) :- !. tokenize(A, [B|E]) :- grab_word(A, C, D), name(B, C), tokenize(D, E). start. TEMPLATE.PL at your service. Terminate by pressing Break. >CS480 is a course. OK >is CS480 a course? Yes. >is cs471 a course? Not as far as I know. >cs471 is a course. OK >is cs471 a course? Yes. Generative Grammars • Templates are inadequate to describe human language (in the last example only sentences that were allowed was X is a Y.) • John arrived • Max said John arrived • Bill claimed Max said John arrived • Mary thought Bill claimed Max said John arrived • Chomsky’s suggestion: Treat syntax as a problem in set theory---express infinite set as a finite description Context Free Grammars • • Phrase Structure Rules – S  NP VP – NP  Det N – N  N PP – NNN – PP  P NP – VP  IV VP  TV NP VP  DV NP NP Lexical Entries – N  book, cow, course, … – P  in, on, with, … – Det  the, every, … – IV  ran, hid, … – TV  likes, hit, … – DV  gave, showed Noam Chomsky Context-Free Derivations • S  NP VP  Det N VP  the N VP  the kid VP  the kid IV  the kid ran • Penn TreeBank bracketing notation (Lisp-like) – (S (NP (Det the) (N kid)) (VP (IV ran))) • Theorem: A sequence has a derivation if and only if it has a parse tree “Standard” Parse Tree Notation S NP VP Jones followed NP PP him into , S , ... NP the front room A simple Parser verb_phrase(A, C) :- verb(A, B), noun_phrase(B, C). verb_phrase(A, C) :- verb(A, B), sentence(B, C). determiner([the|A], A). determiner([a|A], A). sentence(A, C) :- noun_phrase(A, B), verb_phrase(B, C). noun_phrase(A, C) :- determiner(A, B), noun(B, C). noun([dog|A], A). noun([cat|A], A). noun([boy|A], A). noun([girl|A], A). verb([chased|A], A). verb([saw|A], A). verb([said|A], A). verb([believed|A], A). 2 ?- sentence([the, cat, saw, the, dog], []). true . 3 ?- sentence([the, dog, saw, the, dog], []). true . 4 ?- sentence([a, dog, chased, the, cat], []). true . 5 ?- sentence([that, dog, chased, the, cat], []). false. Definite Clause Grammar (DCG) • This is a Prolog notation to provide an easy way to write grammar rules. • E.g., sentence  non_phrase, verb_phrase. • This is equivalent to the rule: – sentence(X,Z) :- noun_phrase(X,Y), verb_phrase(Y,Z). • Also, noun [dog] or noun  [dog] [cat]; [boy]; [girl] • or verb  [gives, up] where “gives up” is a single verb. • A query to the above sentence rule will be sentence/2 E.g., sentence([the dog, chased, the, cat],[]). Try sentence([A,B,C,D,E],[]) or sentence([the, A, B, C, cat|E],[]). Non-terminal symbols can also take arguments: e.g., sentence(N)  noun_phrase(N), verb_phrase(N). Parser2.pl based on DCG sentence --> noun_phrase, verb_phrase. noun_phrase --> determiner, noun. verb_phrase --> verb, noun_phrase. verb_phrase --> verb, sentence. determiner --> [the]. determiner --> [a]. noun --> [dog]; [cat]; [boy]; [girl]. verb --> [chased]; [saw]; [said]; [believed]. verb --> [saw]. verb --> [said]. verb --> [believed]. Grammatical Features • How to handle agreement in tense and number between the noun and the verb? sentence(N) --> noun_phrase(N), verb_phrase(N). noun_phrase(N) --> determiner(N), noun(N). verb_phrase(N) --> verb(N), noun_phrase(_). verb_phrase(N) --> verb(N), sentence. determiner(singular) --> [a]. determiner(_) --> [the]. determiner(plural) --> []. noun(singular) --> [dog];[cat];[boy];[girl]. noun(plural) --> [dogs];[cats];[boys];[girls]. verb(singular) --> [chases];[sees];[says];[believes]. verb(plural) --> [chase];[see];[say];[believe]. sentence(plural, [the, dogs, A, B, C],[]). A = chase, B = a, C = dog ; A = chase, B = a, C = cat ; A = chase, B = a, C = boy ; A = chase, B = a, C = girl ; A = chase, B = the, C = dog Morphology • How to generate plural nouns from singular? • How to generate third person singular verbs from plural verbs? • Mostly by adding: s Sentence(N) --> noun_phrase(N), verb_phrase(N). noun_phrase(N) --> determiner(N), noun(N). verb_phrase(N) --> verb(N), noun_phrase(_). verb_phrase(N) --> verb(N), sentence. determiner(singular) --> [a]. determiner(_) --> [the]. determiner(plural) --> []. noun(N) --> [X], { morph(noun(N),X) }. verb(N) --> [X], { morph(verb(N),X) }. morph(noun(singular),dog). % Singular nouns morph(noun(singular),cat). morph(noun(singular),boy). morph(noun(singular),girl). morph(noun(singular),child). morph(noun(plural),children). % Irregular plural nouns morph(noun(plural),X) :% Rule for regular plural nouns remove_s(X,Y), morph(noun(singular),Y). morph(verb(plural),chase). % Plural verbs morph(verb(plural),see). morph(verb(plural),say). morph(verb(plural),believe). morph(verb(plural),chase). % Plural verbs morph(verb(plural),see). morph(verb(plural),say). morph(verb(plural),believe). morph(verb(singular),X) :% Rule for singular verbs remove_s(X,Y), morph(verb(plural),Y). % remove_s(+X,-X1) [lifted from TEMPLATE.PL] % removes final S from X giving X1, % or fails if X does not end in S. remove_s(X,X1) :name(X,XList), remove_s_list(XList,X1List), name(X1,X1List). remove_s_list("s",[]). remove_s_list([Head|Tail],[Head|NewTail]) :remove_s_list(Tail,NewTail).

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Natural Language Processing