Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
The Chomsky Hierarchy
Sentences
The sentence as a string of words
E.g
I saw the lady with the binoculars
string = a b c d e b f
The relations of parts of a string to
each other may be different
I saw the lady with the binoculars
is stucturally ambiguous
Who has the binoculars?
[ I ] saw the lady [ with the binoculars ]
= [a] b c d [e b f]
I saw [ the lady with the binoculars]
= a b [c d e b f]
How can we represent the difference?
By assigning them different structures.
We can represent structures with 'trees'.
I
read
the
book
a. I saw the lady with the binoculars
S
NP
VP
V
NP
NP
I
saw
PP
the lady with the binoculars
I saw [the lady with the binoculars]
b. I saw the lady with the binoculars
S
NP
VP
VP
I
saw the lady
PP
with the binoculars
I [ saw the lady ] with the binoculars
birds fly
S
NP
VP
N
birds
V
fly
Syntactic rules
S → NP
NP → N
VP → V
Graphs and trees
VP
S
NP
VP
birds
a
fly
b
ab
Graphs and trees
= string
S
A
B
a
b
ab
S→A B
A→ a
B→b
Graphs and trees
Rules
Assumption:
natural language grammars are a rule-based
systems
What kind of grammars describe natural language
phenomena?
What are the formal properties of grammatical
rules?
Chomsky (1957) Syntactic Struc-tures. The
Hague: Mouton
Chomsky, N. and G.A. Miller (1958) Finitestate languages Information and Control 1, 99112
Chomsky (1959) On certain formal properties of
languages. Information and Control 2, 137-167
Rules in Linguistics
1. PHONOLOGY
/s/ → [θ] V ___V
Rewrite /s/ as [θ] when /s/ occurs in
context V ____ V
With:
V =
s, θ =
auxiliary node
terminal nodes
Rules in Linguistics
2. SYNTAX
S → NP VP
VP → V
NP → N
Rewrite S as NP VP in any context
With:
S, NP, VP
= auxiliary nodes
V, N
= terminal node
PHONOLOGY (sound system)
Maltese – Word-final devoicing
Orthography
(spelling)
Pronunciation
(sound)
Sabet sab
Ħobża ħobż
Vjaġġi vjaġġ
[sa-bet]
[hob-za]
[vjağ-ği]
voiced [+vd]
[b, z, ğ]
voiceless [-vd]
[p, s, č]
[+vd]
→
[-vd]
(for # = end of word)
/____ #
[sap]
[hops]
[vjačč]
MORPHOLOGY (word formation)
Maltese – Progressive assimilation in 3fsg imprefective (present)
Marker for verb in 3rd person feminine singular imperfective t- (3fsgimpf = she)
e.g. she breaks =
I break
=
t-kisser
n-kisser
t-kisser
3fsg-break
she breaks
t-ressaq
3fsg-move
she moves
s-sakkar
3fsg-lock
she locks
d-dur
3fsg-turn
she turns
*t-sakkar
* t-dur
t →
s,d,etc.
/____ [s,d,etc.
|
[+cor]
μ
[3fsg]
(with μ = morpheme, C = consonant,
cor = coronal
SYNTAX (phrase/sentence formation)
SENTENCE:
The boy
SUBJECT
kissed the girl
NOUN PHRASE
ART + NOUN
S
VP
NP
→
→
→
PREDICATE
VERB PHRASE
VERB + NOUN PHRASE
NP VP
V
NP
ART N
SEMANTICS (meaning)
The lion attacks the hunter
ATTACK
a
(a, b)
λy [ATTACK (y, b)]
λz λy [ATTACK (y, z)]
(with a = the lion, b = the hunter)
b
Chomsky Hierarchy
0. Type 0 (recursively enumerable) languages
Only restriction on rules: left-hand side cannot be the
empty string
(* Ø …….)
1. Context-Sensitive languages - Context-Sensitive (CS)
rules
2. Context-Free languages - Context-Free (CF) rules
3. Regular languages - Non-Context-Free (CF) rules
0 ⊇ 1, 1 ⊇ 2, 2 ⊇ 3
a ⊇ b meaning a properly includes b (a is a superset of b),
i.e. b is a proper subset of a or b is in a
Generative power
0.Type 0 (recursively enumerable) languages
Only restriction on rules: left-hand side cannot
be the empty string (* Ø …….)
is the most powerful system
3. Type 3(regular language)
is the least powerful
Superset/subset relation
S1
S2
a
a
c
b
b
f
d
g
Rule Type – 3
Name: Regular
Example: Finite State Automata (Markov-process Grammar)
Rule type:
a) right-linear
A xB or
Ax
with:
A, B = auxiliary nodes and
x = terminal node
b) or left-linear
A Bx or
Ax
Generates: ambn with m,n 1
Cannot guarantee that there are as many a’s as b’s; no embedding
A regular grammar for natural language sentences
S →
the
A →
A →
A →
cat
B
mouse B
duck
B
B →
B →
B →
bites
sees
eats
C
C
C
C →
the
D
D →
D →
D →
boy
girl
monkey
A
the cat bites the boy
the mouse eats the monkey
the duck sees the girl
Regular grammars
Grammar 1:
A→a
A→aB
B→bA
Grammar 2:
A→a
A→Ba
B→Ab
Grammar 3:
A→a
A→aB
B→b
B→bA
Grammar 4:
A→a
A→Ba
B→b
B→Ab
Grammar 5:
S → aA
S → bB
A → aS
B → bbS
S →
Grammar 6:
A→Aa
A→Ba
B→b
B→Ab
A→a
Grammars
Grammar 6:
S→ A B
S → bB
A→ aS
B → bbS
S→
Grammar 7:
A→a
A→Ba
B→b
B→bA
Finite-State Automaton
article
noun
NP
NP1
adjective
NP2
NP
article
NP1
adjective
NP1
noun
NP → article NP1
NP1 →adjective NP1
NP1 → noun NP2
NP2
A parse tree
S
NP
N
root node
VP
V
interior
nodes
NP
DET
terminal nodes
N
Rule Type – 2
Name: Context Free
Example:
Phrase Structure Grammars/
Push-Down Automata
Rule type:
A
with:
A = auxiliary node
= any number of terminal or auxiliary nodes
Recursiveness allowed:
A A
CF Grammar
A Context Free grammar consists of:
a) a finite terminal vocabulary VT
b) a finite auxiliary vocabulary VA
c) an axiom S VA
d)
a finite number of context free rules of
form A →
γ,
where
A
VA
and
γ
{VA VT}*
In natural language syntax S is interpreted as the start symbol for
sentence, as in S → NP VP
CF Grammars
The following languages cannot be generated by a regular
grammar
Language 1:
anbn
Language 2:
mirror image
ab
aabb
abaaba
abbaabba
Context-Free rules:
A → aAa
A→ aAa
A→ ab
A→ bAb
Natural language
Is English regular or CF?
If centre embedding is required, then it cannot be regular
Centre Embedding:
1. [The cat]
[likes tuna fish]
a
b
2. The cat the dog chased likes tuna fish
a
a
b
b
3. The cat the dog the rat bit chased likes tuna fish
a
a
a b
b
b
4. The cat the dog the rat the elephant admired bit chased likes tuna fish
a
a
a
a
b
b
b
b
ab
aabb
aaabbb
aaaabbbb
Centre embedding
S
NP
the
cat
a
= ab
VP
likes
tuna
b
S
NP
NP
the
cat
a
= aabb
VP
likes
S
tuna
b
NP
VP
the chased
dog
b
a
S
NP
VP
likes
NP
S
tuna
the
b
cat
NP
VP
a
chased
NP
S
b
the
dog NP
VP
a the
bit
rat
b
a
=
aaabbb
Natural language
Is English regular or CF?
If centre embedding is required, then it cannot be
regular
Centre Embedding
1.[The cat][likes tuna fish]
a
b
= ab
2.[The cat] [the dog] [chased] [likes tuna fish]
a
a
b
b
= aabb
[The cat]
a
[likes tuna fish]
b
2.[The cat] [the dog] [chased] [likes ...]
a
a
b
b
3. [The cat] [the dog] [the rat] [bit] [chased] [likes ...]
a
a
a
b
b
b
4. [The cat] [the dog] [the rat] [the elephant] [admired] [bit] [chased] [likes
....]
=
a
a
a
a
b
b
b
b
aaabbb
aaaabbbb
Natural language 2
More Centre Embedding:
1. If S1, then S2
a
a
2. Either S3, or S4
b
b
3. The man who said S5 is arriving today
4. The man who said S6 is arriving the day after
Sentence with embedding:
If either the man who said S5 is arriving today or the man who said S5 is arriving
tomorrow, then the man who said S6 is arriving the day after
abba = abba
Natural language 2
More Centre Embedding:
1. If S1, then S2
a
a
2. Either S3, or S4
b
b
Sentence with embedding:
If either the man is arriving today or the woman is arriving tomorrow, then the child is
arriving the day after.
a = [if
b = [either the man is arriving today]
b = [or the woman is arriving tomorrow]]
a = [then the child is arriving the day after]
= abba
CS languages
The following languages cannot be generated by a CF grammar (by
pumping lemma):
anbmcndm
Swiss German:
A string of dative nouns (e.g. aa), followed by a string of accusative nouns
(e.g. bbb), followed by a string of dative-taking verbs (cc), followed by
a string of accusative-taking verbs (ddd)
= aabbbccddd
= anbmcndm
Swiss German:
Jan sait das (Jan says that) …
mer em Hans
es Huus
we Hans/DAT the house/ACC
we helped Hans paint the house
hälfed aastriiche
helped paint
abcd
NPdat NPdat NPacc NPacc Vdat Vdat Vacc Vacc
a
a
b
b
c
c
d
d
Natural language 3
Inadequacy of phrase structure rules (CF rules)
Transformations:
Passive
NP1 – Aux – V – NP2
→
NP2 – Aux + be – V – by + NP1
Transformations are Turing powerful, i.e. can do anything to
anything: inversion, deletion
Developments in syntax
a) do away with or severely constrain:
1. Phrase Structure rules
2. Transformational Rules
b) move away from:
derivational/procedural models
to:
constraint-based/declarative models
Head-Driven Phrase Structure Grammar (HPSG) and Optimality Theory (OT)
c) development of context-free rules with non-terminals structured as sets of
features and values
E.gs.
N=
[+N, -V]
V=
[-N, +V]
sleeps =
[-N,+V,-PST,AGR:[+N,-V,+3,-PLU]]
Rules in Linguistics
Traditional syntactic rules
VP →
V
NP
NP →
DET
N
PP →
P
NP
etc.
X-bar syntax
(' = bar/level one, '' = bar/level two)
N'' →
N' →
DET
N
N’
V'' →
V' →
ADV
V
V’
A'' →
A' →
DEG
A
A’
P'' →
P' →
DET
P
P’
X-bar rule schema
X’’ → X’
X’ → X
X’’
│
X’
│
X
X-bar Syntax
X’’ → (SPEC)
X’’ → X’’
X’ → X’
X’ → X
X’
MODIFIER
MODIFIER
(COMPLEMENT)
X''
X''
(SPEC) X'
X'
X'
X
MODIFIER
MODIFIER
(COMPLEMENT)
the girl often plays the violin
S
N''
V''
Det
N'
ADV
V'
the
often
N
V
N''
girl
plays Det
N'
the
N
violin