* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Natural Language Processing
Swedish grammar wikipedia , lookup
Transformational grammar wikipedia , lookup
Antisymmetry wikipedia , lookup
Preposition and postposition wikipedia , lookup
Old Irish grammar wikipedia , lookup
Macedonian grammar wikipedia , lookup
Navajo grammar wikipedia , lookup
Junction Grammar wikipedia , lookup
English clause syntax wikipedia , lookup
Georgian grammar wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Compound (linguistics) wikipedia , lookup
Zulu grammar wikipedia , lookup
Portuguese grammar wikipedia , lookup
French grammar wikipedia , lookup
Italian grammar wikipedia , lookup
Romanian nouns wikipedia , lookup
Arabic grammar wikipedia , lookup
Kannada grammar wikipedia , lookup
Icelandic grammar wikipedia , lookup
Lexical semantics wikipedia , lookup
Vietnamese grammar wikipedia , lookup
Malay grammar wikipedia , lookup
Determiner phrase wikipedia , lookup
Chinese grammar wikipedia , lookup
Romanian grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Turkish grammar wikipedia , lookup
Spanish grammar wikipedia , lookup
Esperanto grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Polish grammar wikipedia , lookup
Inteligenta Artificiala
Universitatea Politehnica Bucuresti
Anul universitar 2003-2004
Adina Magda Florea
http://turing.cs.pub.ro/ia_2005
1
Curs nr. 12
Prelucrarea limbajului natural
(Natural Language Processing)
2
2
Defining Languages with
Backus-Naur Form (BNF)
A formal language is defined as a set of
strings, where each string is a sequence of
symbols
All the languages consist of an infinite set of
strings need a concise way to characterize
the set use a grammar
Terminal Symbols
– Symbols or words that make up the strings of the
language
Example
– Set of symbols for the language of simple
arithmetic expressions
– {0,1,2,3,4,5,6,7,8,9,+,-,*,/,(,)}
3
Components in a BNF Grammar
Nonterminal Symbols
– Categorize subphrases of the language
Example
– The nonterminal symbol NP (NounPhrase)
denotes an infinite set of strings, including
“you” and “the big dog”
4
Components in a BNF Grammar
Start Symbol
– Nonterminal symbol that denotes the
complete strings of the language
Set of rewrite rules or productions
– LHS RHS
– LHS is a nonterminal
– RHS is a sequence of zero or more
symbols (either terminal or nonterminal)
5
Example: BNF Grammar for Simple
Arithmetic Expressions
Exp Exp Operator Exp
| (Exp)
| Number
Number Digit
| Number Digit
Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Operator + | - | * | /
6
The Component Steps of
Communication
A typical communication, in which the
speaker S wants to transmit the
proposition P to the hearer H using
words W, is composed of 7 processes.
3 take place in the speaker
4 take place in the hearer
7
Processes in the Speaker
Intention
– S wants H to believe P (where S typically
believes P)
Generation
– S chooses the words W (because they
express the meaning P)
Synthesis
– S tells the words W (usually addressing
them to H)
8
Processes in the Hearer
Perception
– H perceives W’ (ideally W’ = W, but
misperception is possible)
Analysis
– H infers that W’ has possible meanings
P1,…,Pn (words and phrases can have
several meanings)
9
Processes in the Hearer
Disambiguation
– H infers that S intended to express Pi
(where ideally Pi = P, but misinterpretation
is possible)
Incorporation
– H decides to believe Pi (or rejects it if it is
out of line with what H already believes)
10
Observations
If the perception refers to spoken
expressions, this is speech recognition
If the perception refers to hand written
expressions, this is recognition of hand
writing
Neural networks have been successfully
used to both speech recognition and to
hand writing recognition
11
Observations
The analysis, disambiguation and
incorporation form natural language
understanding are relying on the assumption
that the words of the sentence are known
Many times, recognition of individual words
may be driven by the sentence structure, so
perception and analysis interact, as well as
analysis, disambiguation, and incorporation
12
Defining a Grammar
Lexicon - list of allowable vocabulary
words, grouped in categories (parts of
speech):
– open classes - words are added to the
category all the time (natural language is
dynamic, it constantly evolves)
– closed classes - small number of words,
generally it is not expected that other
words will be added
13
Example - A Small Lexicon
Noun stench | breeze | wumpus ..
Verb is | see | smell ..
Adjective right | left | smelly …
Adverb here | there | ahead …
Pronoun me | you | I | it
RelPronoun that | who
Name John | Mary
Article the | a | an
Preposition to | in | on
Conjunction and | or | but
14
The Grammar Associated to the
Lexicon
Combine the words into phrases
Use nonterminal symbols to define
different kinds of phrases
– sentence S
– noun phrase NP
– verb phrase VP
– prepositional phrase PP
– relative clause RelClause
15
Example - The Grammar Associated to the
Lexicon
S NP VP | S Conjunction S
NP Pronoun | Noun | Article Noun |
NP PP | NP RelClause
VP Verb | VP NP | VP Adjective |
VP PP | VP Adverb
PP Preposition NP
RelClause RelPronoun VP
16
Syntactic Analysis (Parsing)
Parsing is the problem of constructing a
derivation tree for an input string from a
formal definition of a grammar.
Parsing algorithms may be divided
into two classes:
– top-down parsing
– bottom-up parsing
17
Top-Down Parsing
Start with the top-level sentence symbol
and attempt to build a tree whose
leaves match the target sentence's
words (the terminals)
Better if many alternative terminal
symbols for each word
Worse if many alternative rules for a
phrase
18
Example for Top-Down Parsing
"John hit the ball"
1. S
2. S NP, VP
3. S Noun, VP
4. S John, Verb, NP
5. S John, hit, NP
6. S John, hit, Article, Noun
7. S John, hit, the, Noun
8. S John, hit, the, ball
19
Bottom-Up Parsing
Start with the words in the sentence (the
terminals) and attempt to find a series of
reductions that yield the sentence
symbol
Better if many alternative rules for a
phrase
Worse if many alternative terminal
symbols for each word
20
Example for Bottom-Up Parsing
1. John, hit, the, ball
2. Noun, hit, the, ball
3. Noun, Verb, the, ball
4. Noun, Verb, Article, ball
5. Noun, Verb, Article, Noun
6. NP, Verb, Article, Noun
7. NP, Verb, NP
8. NP, VP
9. S
21
Definite Clause Grammar (DCG)
Problems with BNF Grammar
– BNF only talks about strings, not meanings
– Want to describe context-sensitive
grammars, but BNF is context-free
Introduce a formalism that can handle
both of these problems
Use the first-order logic to talk about
strings and their meanings
22
Definite Clause Grammar (DCG)
We are interested in using language for
communication need some way of
associating a meaning with each string
Each nonterminal symbol becomes a
one-place predicate that is true of
strings that are phrases of that category
Example
– Noun(“ball”) is a true logical sentence
– Noun(“the”) is a false logical sentence
23
Definite Clause Grammar (DCG)
A definite clause grammar (DCG) is a
grammar in which every sentence must
be a definite clause.
A definite clause is a type of Horn
clause that, when written as an
implication, has exactly one atom in the
conclusion and a conjunction of zero or
more atoms in the hypothesis, for
example A1 A2 … C1
24
Example 1
In BNF notation, we have:
S NP VP
In First-Order Logic notation, we have:
NP(s1) VP(s2) S(Append(s1, s2))
We read: If there is a string s1 that is a noun
phrase and a string s2 that is a verb phrase,
then the string formed by appending them
together is a sentence
25
Example 2
In BNF notation, we have:
Noun ball | book
In First-Order Logic notation, we have:
(s = “ball” s = “book”) Noun(s)
We read: If s is the string “ball” or the string
“book”, then the string s is a noun
26
Rules to Translate BNF in DCG
BNF
DCG
XYZ
Y(s1) Z(s2)
X(Append(s1,s2))
X word
X(["word"])
XY|Z
Y(s) X(s)
Z(s) X(s)
27
Augmenting the DCG
Extend the notation to incorporate
grammars that can not be expressed in
BNF
Nonterminal symbols can be
augmented with extra arguments
28
Augmenting the DCG
Add one argument for semantics
In DCG, the nonterminal NP translates
as a one-place predicate where the
single argument is a string: NP(s)
In the augmented DCG, we can write
NP(sem) to express “an NP with
semantics sem”. This gets translated
into logic as the two-place predicate
NP(sem, s)
29
Augmenting the DCG
Add one argument for semantics
DCG
FOPL
PROLOG
S(sem) NP(sem1) VP(sem2)
{compose(sem1, sem2, sem)}
NP(s1, sem1) VP(s2, sem2)
S(append(s1, s2)),
compose(sem1, sem2, sem)
See later on
30
Semantic Interpretation
Compositional semantics - the
semantics of any phrase is a function of
the semantics of its subphrases; it does
not depend on any other phrase before,
after, or encompassing the given phrase
But natural languages does not have a
compositional semantics for the general
case.
31
sentence(S, Sem) :- np(S1, Sem1), vp(S2, Sem2),
append(S1, S2, S), Sem = [Sem1 | Sem2].
np([S1, S2], Sem) :- article(S1), noun(S2, Sem).
vp([S], Sem) :- verb(S, Sem1), Sem = [property, Sem1].
vp([S1, S2], Sem) :- verb(S1), adjective(S2, color, Sem1),
Sem = [color, Sem1].
vp([S1, S2], Sem) :- verb(S1), noun(S2, Sem1), Sem = [parts,
Sem1].
Problems with Augmented DCG
The previous grammar will generate
sentences that are not grammatically
correct
NL is not a context free language
Must deal with
– cases
– agreement between subject and main verb
in the sentence (predicate)
– verb subcategorization: the complements
33
that a verb can accept
Solution
Augment the existing rules of the
grammar to deal with context issues
Start by parameterizing the categories
NP and Pronoun so that they take a
parameter indicating their case
34
CASES
Nominative case (subjective case) + agreement
I take the bus
Je prends l’autobus
You take the bus
Tu prends l’autobus
He takes the bus
Il prend l’autobus
Accusative case (objective case)
He gives me the book Il me donne le livre
Dative case
You are talking to me
Eu iau autobuzul
Tu iei autobuzul
El ia autobuzul
El imi da cartea
Il parle avec moi
El vorbeste cu mine
Example - The Grammar Using
Augmentations to Represent Noun Cases
S NP(Subjective) VP
NP(case) Pronoun (case) | Noun | Article Noun
Pronoun(Subjective) I | you | he | she
Pronoun(Objective) me | you | him | her
36
sentence(S) :- np(S1,subjective), vp(S2),
append(S1, S2, S).
np([S], Case) :- pronoun(S, Case).
np([S], _ ) :- noun(S).
np([S1, S2], _ ) :- article(S1), noun(S2).
pronoun(i, subjective).
pronoun(you, _ ).
pronoun(he, subjective).
pronoun(she, subjective).
pronoun(me, objective).
pronoun(him, objective).
pronoun(her, objective).
37
Verb Subcategorization
Augment the DCG with a new
parameter to describe the verb
subcategorization
The grammar must state which verbs
can be followed by which other
categories. This is the subcategorization
information for the verb
Each verb has a list of complements
38
Integrate Verb Subcategorization
into the Grammar
A subcategorization list is a list of
complement categories that the verb
accepts
Augment the category VP to take a
subcategorization argument that
indicates the complements that are
needed to form a complete VP
39
Integrate Verb Subcategorization
into the Grammar
Change the rule for S to say that it
requires a verb phrase that has all its
complements, and thus a
subcategorization list of [ ]
Rule
S NP(Subjective) VP([ ])
– The rule can be read as “A sentence can
be composed of a NP in the subjective
case, followed by a VP which has a null
subcategorization list “
40
Integrate Verb Subcategorization
into the Grammar
– Verb phrases can take adjuncts, which are
phrases that are not licensed by the individual
verb, but rather may appear in any verb phrase
– Phrases representing time and place are adjuncts,
because almost any action or event can have a
time or a place
VP(subcat) VP(subcat) PP
| VP(subcat) Adverb
I smell the wumpus now
–
41
VP(subcat)
|
|
|
|
|
VP([NP | subcat]) NP(Objective)
VP([Adjective | subcat]) Adjective
VP ([PP | subcat]) PP
Verb(subcat)
VP(subcat) PP
VP(subcat) Adverb
The first line can be read as “A VP, with a given
subcategorization list, subcat, can be formed by
a VP followed by a NP in the objective case, as
long as that VP has a subcategorization list that
starts with the symbol NP and is followed by the
elements of the list subcat ”
42
give [NP, PP]
[NP, NP]
give the gold in box to me
give me the gold
smell [NP]
[Adjective]
[PP]
smell a wumpus
smell awfull
smell like a wumpus
is
[Adjective]
[PP]
[NP]
is smelly
is in box
is a pit
died
[]
died
believe [S]
believe the wumpus is dead
43
VP(subcat)
|
|
|
|
|
VP([NP | subcat]) NP(Objective)
VP([Adjective | subcat]) Adjective
VP ([PP | subcat]) PP
Verb(subcat)
VP(subcat) PP
VP(subcat) Adverb
vp(S, [np | Subcat]) :- vp(S1, [np | Subcat]), np(S2, objective),
append(S1, S2, S).
vp(give, [np, pp]).
vp(give, [np, np]).
vp(smell, [np]).
vp(smell,[adjective]).
vp(smell,[pp]).
But dangerous to translate
VP(subcat) VP(subcat) PP
Solution
vp(S, Subcat) :- vp1(S1, Subcat), pp(S2), append(S1, S2, S).
Generative Capacity of
Augmented Grammars
The generative capacity of augmented
grammars depends on the number of
values for the augmentations
If there is a finite number, then the
augmented grammar is equivalent to a
context-free grammar
46
Semantic Interpretation
The semantic interpretation is
responsible for getting all possible
interpretations, and disambiguation is
responsible for choosing the best one.
Disambiguation is done starting from
the pragmatic interpretation of the
sentence.
47
Pragmatic Interpretation
Complete the semantic interpretation by
adding information about the current
situation
Pragmatics shows how the language is
used and its effects on the listener
Pragmatics will tell why it is not
appropriate to answer "Yes" to the
question "Do you know what time it is?"
48
Indexicals
Indexical - phrase that refer directly to
the current situation
Example
– I am in Bucharest today.
49
Anaphora
Anaphora - the occurrence of phrases
referring to objects that have been
mentioned previously
Example
– John was hungry. He entered a restaurant.
– The ball hit the house. It broke the window.
50
Ambiguity
Lexical Ambiguity
Syntactic Ambiguity
Referential Ambiguity
Pragmatic Ambiguity
51
Lexical Ambiguity
A word has more than one meaning
Examples
– A clear sky
– A clear profit
– The way is clear
– John is clear
– It is clear that ...
52
Syntactic Ambiguity
Can occur with or without lexical
ambiguity
Examples
– I saw the Statue of Liberty flying over New
York.
– I saw John in a restaurant with a telescope.
53
Referential Ambiguity
Occurs because natural languages
consist almost entirely of words for
categories, not for individual objects
Example
– John met Mary and Tom. They went to a
restaurant.
– Block A is on block B and it is not clear.
54
Pragmatic Ambiguity
Occurs when the speaker and the
hearer disagree on what the current
situation is
Example
– I will meet you tomorrow.
55