Download Natural Language Processing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Old Norse morphology wikipedia , lookup

Modern Greek grammar wikipedia , lookup

Ojibwe grammar wikipedia , lookup

Macedonian grammar wikipedia , lookup

Lithuanian grammar wikipedia , lookup

Compound (linguistics) wikipedia , lookup

English clause syntax wikipedia , lookup

Lexical semantics wikipedia , lookup

Old English grammar wikipedia , lookup

Swedish grammar wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

Georgian grammar wikipedia , lookup

Portuguese grammar wikipedia , lookup

Navajo grammar wikipedia , lookup

Udmurt grammar wikipedia , lookup

Malay grammar wikipedia , lookup

Inflection wikipedia , lookup

Arabic grammar wikipedia , lookup

Zulu grammar wikipedia , lookup

Kannada grammar wikipedia , lookup

Romanian nouns wikipedia , lookup

Old Irish grammar wikipedia , lookup

Chinese grammar wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Icelandic grammar wikipedia , lookup

Vietnamese grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

Esperanto grammar wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Latin syntax wikipedia , lookup

French grammar wikipedia , lookup

Determiner phrase wikipedia , lookup

Spanish grammar wikipedia , lookup

Basque grammar wikipedia , lookup

English grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Polish grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transcript
Natural Language Processing
CS480/580
Levels of Linguistic Analysis
• Phonology---recognize speech sounds
• Morphology---analysis of word forms (e.g.,
adding s to make a plural etc.)
• Syntax---sentence structure
• Semantics---meaning
• Pragmatics---relation of language to context
Tokenization
• A string broken into words, punctuations
removed, and key information represented as
a sequence of words or tokens.
• E.g., “How are you today?” is converted to
[how, are, you, today].
Tokenize.pl
lower_case(A, B) :- A>=65, A=<90, !, B is A+32.
lower_case(A, A).
tokenize([], []) :- !.
tokenize(A, [B|E]) :- grab_word(A, C, D), name(B, C), tokenize(D, E).
punctuation_mark(A) :- A=<47.
punctuation_mark(A) :- A>=58, A=<64.
punctuation_mark(A) :- A>=91, A=<96.
punctuation_mark(A) :- A>=123.
grab_word([32|A], [], A) :- !.
grab_word([], [], []).
grab_word([A|B], C, D) :- punctuation_mark(A), !, grab_word(B, C, D).
grab_word([D|A], [E|B], C) :- grab_word(A, B, C), lower_case(D, E).
tokenize("This is CS480/580 course", X).
X = [this, is, cs480580, course].
name(john,X).
X = [106, 111, 104, 110].
Template System
• Templates --- stored sentence patterns
• Each template is accompanied by a translation
schema
• E.g., [X, is, a , Y] is translated to Y(X).
• process([X, is, a, Y]) :- Fact =.. [Y, X],
assert(Fact).
• Process([is, X, a T]) :- Query =.. [Y, X],
call(Query).
Template.pl
grab_word([32|A], [], A) :- !.
grab_word([], [], []).
grab_word([A|B], C, D) :- punctuation_mark(A), !, grab_word(B, C, D).
grab_word([D|A], [E|B], C) :- grab_word(A, B, C), lower_case(D, E).
punctuation_mark(A) :- A=<47.
punctuation_mark(A) :- A>=58, A=<64.
punctuation_mark(A) :- A>=91, A=<96.
punctuation_mark(A) :- A>=123.
lower_case(A, B) :- A>=65, A=<90, !, B is A+32.
lower_case(A, A).
write_str([A|B]) :- put(A), write_str(B).
write_str([]).
read_str_aux(-1, []) :- !.
read_str_aux(10, []) :- !.
read_str_aux(13, []) :- !.
read_str_aux(A, [A|B]) :- read_str(B).
do_one_sentence :- write(>), read_str(A), tokenize(A, B), process(B).
note(A) :- asserta(A), write('OK'), nl.
read_atom(A) :- read_str(B), name(A, B).
start :- write('TEMPLATE.PL at your service.'), nl,
write('Terminate by pressing Break.'), nl, repeat, do_one_sentence, fail.
check(A) :- call(A), !, write('Yes.'), nl.
check(_) :- write('Not as far as I know.'), nl.
read_num(A) :- read_str(B), name(A, B).
remove_s(A, C) :- name(A, B), remove_s_list(B, D), name(C, D).
read_str(B) :- get0(A), read_str_aux(A, B).
remove_s_list([115], []).
remove_s_list([A|B], [A|C]) :- remove_s_list(B, C).
process([B, is, a, A]) :- !, C=..[A, B], note(C).
process([A, is, an, B]) :- !, process([A, is, a, B]).
process([is, B, a, A]) :- !, C=.. [A, B], check(C).
process([is, A, an, B]) :- !, process([is, A, a, B]).
process([A, are, B]) :- !, remove_s(A, D), remove_s(B, C), F=..[C, E], G=..[D, E], note((F:-G)).
process([does, B, A]) :- !, C=..[A, B], check(C).
process([A, B]) :- \+ remove_s(A, _), remove_s(B, C), !, D=..[C, A], note(D).
process([A, B]) :- remove_s(A, C), \+ remove_s(B, _), !, E=..[B, D], F=..[C, D], note((E:-F)).
process(_) :- write('I do not understand.'), nl.
tokenize([], []) :- !.
tokenize(A, [B|E]) :- grab_word(A, C, D), name(B, C), tokenize(D, E).
start.
TEMPLATE.PL at your service.
Terminate by pressing Break.
>CS480 is a course.
OK
>is CS480 a course?
Yes.
>is cs471 a course?
Not as far as I know.
>cs471 is a course.
OK
>is cs471 a course?
Yes.
Generative Grammars
• Templates are inadequate to describe human language (in the last
example only sentences that were allowed was X is a Y.)
• John arrived
• Max said John arrived
• Bill claimed Max said John arrived
• Mary thought Bill claimed Max said John arrived
• Chomsky’s suggestion: Treat syntax as a problem in set theory---express
infinite set as a finite description
Context Free Grammars
•
•
Phrase Structure Rules
– S  NP VP
– NP  Det N
– N  N PP
– NNN
– PP  P NP
– VP  IV VP  TV NP VP  DV NP NP
Lexical Entries
– N  book, cow, course, …
– P  in, on, with, …
– Det  the, every, …
– IV  ran, hid, …
– TV  likes, hit, …
– DV  gave, showed
Noam Chomsky
Context-Free Derivations
• S  NP VP  Det N VP  the N VP  the kid
VP  the kid IV  the kid ran
• Penn TreeBank bracketing notation (Lisp-like)
– (S (NP (Det the)
(N kid))
(VP (IV ran)))
• Theorem: A sequence has a derivation if and
only if it has a parse tree
“Standard” Parse Tree Notation
S
NP
VP
Jones
followed
NP
PP
him
into
,
S
,
...
NP
the front room
A simple Parser
verb_phrase(A, C) :- verb(A, B), noun_phrase(B, C).
verb_phrase(A, C) :- verb(A, B), sentence(B, C).
determiner([the|A], A).
determiner([a|A], A).
sentence(A, C) :- noun_phrase(A, B), verb_phrase(B, C).
noun_phrase(A, C) :- determiner(A, B), noun(B, C).
noun([dog|A], A).
noun([cat|A], A).
noun([boy|A], A).
noun([girl|A], A).
verb([chased|A], A).
verb([saw|A], A).
verb([said|A], A).
verb([believed|A], A).
2 ?- sentence([the, cat, saw, the, dog], []).
true .
3 ?- sentence([the, dog, saw, the, dog], []).
true .
4 ?- sentence([a, dog, chased, the, cat], []).
true .
5 ?- sentence([that, dog, chased, the, cat], []).
false.
Definite Clause Grammar (DCG)
• This is a Prolog notation to provide an easy way to write
grammar rules.
• E.g., sentence  non_phrase, verb_phrase.
• This is equivalent to the rule:
– sentence(X,Z) :- noun_phrase(X,Y), verb_phrase(Y,Z).
• Also, noun [dog] or noun  [dog] [cat]; [boy]; [girl]
•
or verb  [gives, up] where “gives up” is a single verb.
• A query to the above sentence rule will be sentence/2
E.g., sentence([the dog, chased, the, cat],[]).
Try sentence([A,B,C,D,E],[]) or sentence([the, A, B, C, cat|E],[]).
Non-terminal symbols can also take arguments: e.g., sentence(N)
 noun_phrase(N), verb_phrase(N).
Parser2.pl based on DCG
sentence --> noun_phrase, verb_phrase.
noun_phrase --> determiner, noun.
verb_phrase --> verb, noun_phrase.
verb_phrase --> verb, sentence.
determiner --> [the].
determiner --> [a].
noun --> [dog]; [cat]; [boy]; [girl].
verb --> [chased]; [saw]; [said]; [believed].
verb --> [saw].
verb --> [said].
verb --> [believed].
Grammatical Features
• How to handle agreement in tense and number between the noun and
the verb?
sentence(N) --> noun_phrase(N), verb_phrase(N).
noun_phrase(N) --> determiner(N), noun(N).
verb_phrase(N) --> verb(N), noun_phrase(_).
verb_phrase(N) --> verb(N), sentence.
determiner(singular) --> [a].
determiner(_)
--> [the].
determiner(plural) --> [].
noun(singular) --> [dog];[cat];[boy];[girl].
noun(plural) --> [dogs];[cats];[boys];[girls].
verb(singular) --> [chases];[sees];[says];[believes].
verb(plural) --> [chase];[see];[say];[believe].
sentence(plural, [the, dogs, A, B, C],[]).
A = chase,
B = a,
C = dog ;
A = chase,
B = a,
C = cat ;
A = chase,
B = a,
C = boy ;
A = chase,
B = a,
C = girl ;
A = chase,
B = the,
C = dog
Morphology
• How to generate plural nouns from singular?
• How to generate third person singular verbs
from plural verbs?
• Mostly by adding: s
Sentence(N) --> noun_phrase(N), verb_phrase(N).
noun_phrase(N) --> determiner(N), noun(N).
verb_phrase(N) --> verb(N), noun_phrase(_).
verb_phrase(N) --> verb(N), sentence.
determiner(singular) --> [a].
determiner(_)
--> [the].
determiner(plural) --> [].
noun(N) --> [X], { morph(noun(N),X) }.
verb(N) --> [X], { morph(verb(N),X) }.
morph(noun(singular),dog).
% Singular nouns
morph(noun(singular),cat).
morph(noun(singular),boy).
morph(noun(singular),girl).
morph(noun(singular),child).
morph(noun(plural),children). % Irregular plural nouns
morph(noun(plural),X) :% Rule for regular plural nouns
remove_s(X,Y),
morph(noun(singular),Y).
morph(verb(plural),chase).
% Plural verbs
morph(verb(plural),see).
morph(verb(plural),say).
morph(verb(plural),believe).
morph(verb(plural),chase).
% Plural verbs
morph(verb(plural),see).
morph(verb(plural),say).
morph(verb(plural),believe).
morph(verb(singular),X) :% Rule for singular verbs
remove_s(X,Y),
morph(verb(plural),Y).
% remove_s(+X,-X1) [lifted from TEMPLATE.PL]
% removes final S from X giving X1,
% or fails if X does not end in S.
remove_s(X,X1) :name(X,XList),
remove_s_list(XList,X1List),
name(X1,X1List).
remove_s_list("s",[]).
remove_s_list([Head|Tail],[Head|NewTail]) :remove_s_list(Tail,NewTail).