Download Lexicon - Grammar The Representation of Compound Words

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Germanic weak verb wikipedia , lookup

Navajo grammar wikipedia , lookup

Kannada grammar wikipedia , lookup

Lithuanian grammar wikipedia , lookup

Chinese grammar wikipedia , lookup

Zulu grammar wikipedia , lookup

Ojibwe grammar wikipedia , lookup

Georgian grammar wikipedia , lookup

Macedonian grammar wikipedia , lookup

Old Irish grammar wikipedia , lookup

Ukrainian grammar wikipedia , lookup

Inflection wikipedia , lookup

Esperanto grammar wikipedia , lookup

Modern Greek grammar wikipedia , lookup

Old Norse morphology wikipedia , lookup

Malay grammar wikipedia , lookup

Latin syntax wikipedia , lookup

Kagoshima verb conjugations wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Spanish grammar wikipedia , lookup

Japanese grammar wikipedia , lookup

Portuguese grammar wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Swedish grammar wikipedia , lookup

Sotho parts of speech wikipedia , lookup

Icelandic grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

French grammar wikipedia , lookup

Russian grammar wikipedia , lookup

Polish grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

Turkish grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Compound (linguistics) wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

English grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transcript
Lexicon - Grarar~ar
The Representation
Maurice
of C o m p o u n d Words
Gross
U n i v e r s i t y Paris 7
I_aboratoire
Documentaire
ot Linguistigue 1
2, place Jussieu
1-"-75221
Paris CEDEX 05
The essenti~d feature of a lexicon-grammar is that the elementary
unit
of
computation
and
storage
is
the
simple
sentence:
snbleet-verb-complement(s). -this type of representation
is obviously
needed for verbs:
limiting a verb to its shape has no meaning other
than typographic, ~ir]ce a verb cannot be separated from its subject and
essential coreplemenl(s) 2.
We have shown (M, Gross 1975) that given a
verb, Or equivalently a simple sentence, the set of syntactic properties
that describes its variations is unique: in general, no ether v e r b has
an identical syntactic paradigm 3. As a consequence, the properties
of
each verbal construction must be represented in a texicon--grammar. The
lexicon has no significance taken as an isolated component and the
gr~rnmar tempera:at, viewed as independent of the lexicon, will have to be
limited to c e r t a i n c o m p l e x sentences,
$apport verbs are frequerd in technical texts, and
variants, as in this last example,
(Z.S, H a r r i s 1976, M. Gross 1982, 1986):
Vsup
:::
The
fO be Prep:
text ia in contradiction with the low
VSUp
=:
to
This text has
h~ve
a certain importance for Bob4
Vsup
=:
tO occur, etc.
Accident8 occur at random
The ~4ccident (was, happened, occurred, took
night
place)
late
at
~.-'UA--8~I;'Ot-~I;RS.-This research has been partly financed by contract
"P[]C Informatique LinggisUque" 1985-86 from the Ministry of Research.
2. The notion of essential complement has been refined through the
systematic study of 12,000 verbs of French (M. Gross 1975; J,-P. Boons,
A. Gaillet, C. Locl~re 19YGa, 1976b, 1987) and a study of adverbials,
that is, of nonessential complements (M. Gross 19~6).
The subject
arKI/or the con'lplements may be transformed and/or omitted through various
syntactic operations, in particular, by nominalizing the verb (G. Gross,
R. Vivbs 198E;),
but
the
full
information
can
be
recovered
(Z.S. H a r r i s 1982).
3.
A line of '~+'" and " " ' " marks in FiGure 1 is such a paradigm.
4. Both examples are not isolated entries of the lexicon=grammar,
r a t h e r (Z,S, H a r r i s 1964), t r a n s f o r m s of o t h e r forms:
fhJ~
This
text
text
<::ontradi/.;ta the law
is important for Bob
but
have
stylistic
Grammatical elements such as
determiners,
prepositions
and
conjunctions, do not belong to the lexicon-grammar in the same sense as
the four major parts of speech do, siece they are parts ot structures or
rules.
For example, prepositions appear in the columns ef the
lexicon-grammar.
An early representation of verbs in a lexicon-grammar of about
12,000 verbs is ftiven in figure t. Each row of the matrix is an entry
whose main construction is defined by a table or class code. In figure
I,
the
code
G
corresponds
to
the
class of constructions:
subject-verb-direct
s e n t e n t i a l c o m p l e m e n t , noted:
Since
be-Adiective
tetras
are
close
to verbs,
their
description is quite similar, that is, they are considered as sentences.
We have apl}lied lexicon-grammar representation not only to the two
obvious predicative parts of speech, verb and adjective, but to nouns
and adverbs a~; well. In the same way as one adjoins the verb to be
tn
adjectives,
we
have ~ystematically introduced support verbs
(Vsup)
for
nouns
and adverbs, as in the following
examples
may
(N O
(1)
ts
NO
the
V qee P
subject
and
P
stands for sentence).
Each column is a syntactic property, and corresponds to a structure into
which V may enter, roughly a syntactic transform of the main structure,
~or example, in columns we have placed the Passive forms, Extraposed and
renominal forms. Thus, the related structures are semantically close.
" + " sign at the intersection of a row and a column indicates that the
ntry in the row is accepted in the structure associated to the column,
" - " sign correspoeds to inacceptability. The process of accumulation
that led to the formalized lexicon-grammar of 12,000 French verbs has
run into what seemed to be at first a minor problem of representation of
words:
the difference between simple and compound words, On the one
hand, there are simple words ~uch as the verb know and complex
(idiomatic) forms such as keep in mind, Both forms play the same
syetactic arid s e m a n t i c role in s e n t e n c e s such as:
Bob
Bob
knows
keeps
that Max ha~ moved to Tampa
in mind that Max /)as moved to Tampa
bat the lexical content (one word va three) requires
different
identification
procedures (simple dictionary tooktJp vs a certain amount
of s y n t a c t i c analys}s).
The representation of fiGUre 1 treats two forms such as to know
(,~erneone, something) arid to keep (someone, something] in mind m
t f ~ same way, thut~ emphasizing the semantic equivalence between simple
and c o m p o u n d verbs,
Bet compoged terms raise ~;i problem of representation. The unit of
representation in a linear lexicon is roughly the word 5 as d e f i n e d by
its written form, that is, a .~equence of letters separated from
ne~lhbOring sequences by boundary bionic. As a consequence, compound
words cam|of be directly put into a dictionary the way simple words are.
Aa idenUficatior| procedure i:~ needed for their occurrences in texts,
and thi~ procedure will make use of the various simple parts of the
compound utterance. Hence, the formal linguistic properties of con'lpouud
terms will determine both the procedure of ideetifieaUon in text~ and
t h e t y p e of s t o r a g e t h e y r e q u i r e .
[
Compl~nt
Compl6tive$
di:oc(
Om
i
observer
i
obtenir
-I
i
-I- officialiser
i
!
-I -I
1
-
+
ordlestrer
oublier
_
oui'r
-
+
+
omettre
--
+
--
+
+
-
+
-
+
-
+
palper
~arapher
~assersous silence
~enser
+
--
+
4-
)ercevoir
_
--
+
+
+
.-.
+
-
-i
-
+
-
+
iI
)erdre de vue
perforer
i
I_
p~rorer
TABLE 6
Verbs with
Sentential
Complements
(From M. Gross 1975)
Figure
We thus have to discuss the main types of compounds and
out
those
properties
that
bear
on
to single
automatic parsing and dictionary
lookup.
1.
Notice that nightly can also be considered as a frozen compound,
though not constituted of words but Of a word and a suffix. Again, lack
of
compositionality
s t e m s f r o m the
observation
that
daily,
weekly,
Compound
adverbs
We call adverb any circumstancial complement, including
p h r a s e s , as in t h e f o l l o w i n g e x a m p l e s :
(1)
I
sentent)al
The show took place nighlly
el night
during a busy night
the night Bob missed his plane
By compound adverbs, or frozen or idiomatic adverbs, we mean adverbs
that can be separated into several words, with some or all o f t h e i r
words
frozen,
that
is,
semantically
and/or
syntactically
noncompositional.
In (1), af night is a compound adverb, t h e lack
of c o m p o s i t i o n a l i t y is a p p a r e n t f r o m l e x i c a l r e s t r i c t i o n s such as:
monthly,
Det
day,
*at
afternoon,
*at
that
is
a
priori
during
are
compounds
Det
of
the
Adj night Relclauae
can vary freely (within semantic constraints).
In the same
event
associated
w i t h t h e s e n t e n c e 8 in t h e form:
the
night
way,
tfm
(E, that) 8
by a l a r g e v a r i e t y o f u n c o n s t r a i n e d
forms.
Frozen or compound adverbs constitute the simplest case of compound
forms because they do not allow variations of their components. As
mentioned
above, in at night no a d j e c t i v e is authorized.
Moreover,
one c a n n o t
insert
a determiner:
*at (a, this) night, t h e plural is
forbidden:
(coming. present) night
( c o l d , dark) night
during
during
which
evening
and by t h e
impossibility
of inserting
material
p l a u s i b l e , s y n t a c t i c a l l y and s e m a n t i c a l l y :
*at
*st
etc.
The two other adverbs of (1) are tree forms. Thus, the determiners
and
modifiers
(Adl and Re/clause) of:
can be e x p r e s s e d
*at
yearly,
same formal type have a regular formation, in the sense that t h e i r
interpretation is h o m o g e n e o u s . Thus, nightly is an isolated case,
as opposed to an open series of identical forms with a different
interpretation.
*M
can
be
appended:
Such observations are general, and apply to many adverbs of
f o r m and l e x i c a l c o n t e n t :
the (coming, present) night
s (cold, dark) night
It
5. Note that words or roots are often considered
a t t e m p t s to d e v i s e s e m a n t i c r e p r e s e n t a t i o n s .
nights and no relative clause
(that, which) was agreed on.
*at
night
as
units
in
most
rained
cats and dogs
*many cats and dogs
*big cats and dogs
*cat and dog
varied
from
~trorn
*from
from
time to tit~e
timet~ to times
a time to a n o t h e r time
l o n g time to l o n g time
Max
*Max propound
ideas f r o m the top el y o u r hat
*My
staler p r o p o s e d ideas from the top of his hat
Bob and Max proposed ideas from the top of their hat(8)
A lexieal study of compound adverbs has been performed in French
and a systematic inventory has been compiled from various dictionaries.
R u n n i , g texts have been examined as well. It is interesting to note
that whereas in current dictionaries t h e r e are about 1,500 one word
adverbs, most of t h e m
in -meat
(-ly), we have found o v e r 5,000
In this case. the recognition procedure is no longer a simple
string
matching
operation, since a variable slot must be dealt with inside the
fixed string. More general matching rules are required here 6.
Once
this compound adverb l,laa been identified in a text to be processed, it
can be given an iaterpt~etation, for example in terms of a simple adverb
such
as teiaarely
or lightly
and
the
referential
information
carried by Pots can then be ignored.
Itowever, one oar] easily
adverbs,
These compound adverbs have been classihed according to their
sywtacltc shape.
The syntactic forms are described at the elementary
level ef sequences of fmrts of speech.
We use symbols with obvious
interpretations
such
as
Prep,
Dot,
Adl,
N,
V,
COOl
(fay conjunction) and W for a variable ranging over verb complements,
etc.
8ohrtiena f r o m l,he top of his hat
It ic largely frozen:
no o t h e r d e t e r m i n e r is allowed, no a d j e c t i v e s can
be appended to either noun, etc., but the person of the possessive
a d j e c t i v e Pone, may vary. This possessive a d j e c t i v e must
refer
to
t h e s u b j e c t o f t h e s e n t e n c e , and v a r i e s a c c o r d i n g l y :
Consequently, these compound adverbs could be identified by a simple
recognRiorr procedure, for they do not require any lei, amatization or
syntactic analysis to be reduced to a dictionary form, as is the case
with verb for)as for example.
compound
propoaed
construct
particular
discourses w h e r e the obligatory
cereference
relation involved will (bsambiguate some analysis. Thus, not only the
v a r i a t i e n of Poaa must
be accounted l,or at the lexical level, but
its r e f e r e n t i a l i n f e r m a t i e n has to be kept l,or possible use in a parser.
We write:
Prep
Prep
Prep
N
Dot
Dot
=:
=:
N
Adf
Prop
Oel,
N
el Dot N
Prop
Def
N
Ceni
N
{fiber compound adverbs o i l e r different degrees of variation. There
m'e cases where one part of the adverb is frozen and another part is
entirely free:
at night
in the end
=: in the l o n g r i m
Max
Max
=: in every nonce of the w o r d
at the point of a gun
The
V
Dot N =: time and again
W
to
=:
S
=:
all
begin
with
things
being
.
._
in
honor,
al the far end are
The parts
they do not allow modil,iers.
o b s e r v e v a r i a t i o n s such as:
Max
Max
equal
F gure 2 shows the classes that have been defined on this basis,
t o g e t h e r w i t h e x a m p l e s and t h e n u m b e r el, i t e m s in each (:lass;
_
parts
organized
a p a r t y in h o n o r of Bob
h i d the c a r at the f a r end of the p a r k i n g lot
.
.
. . . . . .
|~oad~,fin
|i
Consider
for
for
for
P O __
-.~
PAD%
Adv
PC
Ihap ("
] en bref
P[)l~q(
Prep De~ C
lco,m-e tome atzente
570
PAC
Prep a d i ¢
+++esa hel,h, mort
440
frozen.
of
N
For
example,
are tree, for we
organized
a p a r t y in hJa h o n o r
hid the car at the lar end, I think, of the parking
lot
the adverbials:
the sake of r u i n i n g thinfjs
the sake of Bob
G o d ' s sake
pea
P~+pCAdj
~ d ~o,'~e ~l+,l..+
t
~oo l
We
(:all
the
combinatien
for--cake
frozen,
since the
noun
sake
does not occur elsewhere than in adverbial phrases with the
preposition
for: it cannot be t h e s u b j e c t or o b j e c t
of any verb.
On the other
hand, the modifiers of sake are quite varied and
regular from the point of view of the syntax of noun modifiers 7.
PCDN
Prap C de N
[ , ' h i maven tie N
/
330 /
There are also cases of seemingly free adverbs which require an
hoc t r e a t m e n t .
F o r e x a m p l e , d a t e s such as:
PCP(
i lhdp ( Pr~:p C
1
,160
!
/
/
1
Monday
-
.
'
[de.~ plods d {a 1~1%
.
. T
.
.
170
....... i
PF
P (phrase figae)
I p i . . . . . . . l i e sail
]
230
pl~co
<Adi)
I
I
200
PVCO
(V) comme C
~ . .............. ,in,.
[
/
(V) . . . . . . . . . ]'r6p C
F'JC
.....
('onj ("
J , " ......... d...... h,
t
i
are described
bellFr£
~
/
J
(2)
(3)
210
_1
100
TOTAL
~,',4
190
(t4.
w a y by a f i n i t e
automaton.
They
Max
e l e c t e d Bob on the ( f i r s l ,
ate his n o o d l e s in a bow/
close
to
being
s e c o n d ) ballot
30
/
el out le tret~ hlet e.' t
Frozen Adverbs
it) a n a t u r a l
Tecl;nical or specialized families of adverbs come
frozeu adverbs:
/
~ c o m m e un cheveu sar la soupe
/
PPCO
'
{,,o I
l,,,pv,
(
13, 1 9 6 8 at 9 p a n .
240
pv
..........
March
ad
(;ros;,s 19~6 )
The special semantic relations that hold between
the
adverbial
complement and the rest el, the sentence are lirmted. There are few
verbs such as to eat which c o m b i n e with in a bowl
and which
have t h e
non locative i n t e r p r e t a t i o n of (3). The usual i n t e r p r e t a t i o n
is t h a i f o u n d in:
Tableau 2
The examples discussed an far are entirely frozen.
Itence, as a
i)vuctical matter, t h e y can be located iu a text by using the search
function available for strings in any text e d i t o r
system.
T h e r e are
however more complex examples that require deeper analysis. Consider
fay e x a m p l e t h e i d i o m a t i c a d v e r b in t h e s e n t e n c e :
6. PRDLOG rules are particularly
f r o z e n f o r m s (P. S a b a t i e r 1980).
7.
There are nonetheless
~for
a
well
restrictions
h e a v e n l y ,~oke
adapted
on t h e m :
to
recognizing
such
Max
puF
hia n o o d l e ~ in a b o w l
Entering ITozen adverbs into a lexicon-grammar raises many r=ew
questions,
The bulk of adverbs can be described by means of the
Following t y p e of d e r i v a t i o n (Z.S, H a r r i s 197¢):
Bob
left;
7hat
Bob
left occurred
:
Bob
l e f t , fhia o c c u r r e d
::
Bob
l e f t at 9
at 9
at 9
and sulaport verbs play a crucial role here. However, there are cases
where no general support verb is found and where adverbs have to be
considered as a part of the elementary sentence.
Consider the adverb
in:
Bob
sang
at t h e t o p o f hJ~ v o i c e
It is syntactically and semantically analogous to tree adverbs such as
noisily,
powerfelly.
For
these
two f r e e adverbs, a d e r i v a t i o n a l
source i n v o l v i n g t h e a d j e c t i v e is available:
The
way
- b o a r d of g o v e r n o r s one be modified in several ways:
board
a~ld g o v e r n o r a
ta.ke
separate
determiners and modifiers: ~he
powerful boarda of the twelve governora of my bank, Such a compound
noun comes close to being a free Form. It is the liruited number of
second
aeons
such
as d i r e c t o r ,
governor
or regent
that
suggests we are dealing with a compound noun. Also, the meameg of
these phrases is nonoompoaitional in the sense that they have a legal or
i n s t i t u t i o n a l m e a o i n g t h a t t h e i r c o m p o n e n t s do not have c l e a r l y .
The variations of lurer we
atlcachiag
a finite automaton
describe the main grammatical
relative clauses to compound
have enumerated can be partly hal'=died bit
to a given entry, and this automaton will
changes allowed The adjunction o~ free
nouns may r e q u i r e a different t r e a t m e n t
"l~)e kiads of variation of
cletermieing whether a given
nol: almost requires c~. original
co,infraction
of a leKicoa
Ibnitatioas.
compound nouns are aO numereu,~ that
nomit)al coostruction is a compouod noun or
demonstratiou.
Titus, aotontatizirlg~ the
is a,'l activity that will preseot severe
Determining the sup~mrt verbs for compound nouns does )tot seem to
raise o t h e r p r o b l e t e s t h a n t h o s e e n c o u n t e r e d with simple nouns.
Bob sang was (noiay, powertn/)
This is not the case for at the top of his voice
which
is
practically limited to modifying the verbs of saying. Moreover the
obbgatory c o r e f e r e n c e link of hia leads to a representation
where
this adverb is not analyzed. Thus two semantically similar types of
adverbs have to be represented quite differently in the lexicon-grammar.
All the situations just exemplified with adverbs are quite common, cod
are also encountered with nouns, adjectives and verbs. The paradox el~
relaresentatJon they lead to can only be solved by introducing a complex
level of semantic equivalence for the entries of the lexicon-grammar,
- in Gerraan. whore rio blacks occur between component¢, segmentation is ~[
prebleltn;
- in French (G. Gross '1985), where the spelling of the plural is ht
g e n e r a l not s t a n d a r d i z e d , e x t r a v a r i a t i o n s have to be expecte(I.
2,
C o m p o u n d modifielFs
C o m p o u n d nouns
C~n'npound nouns form the bulk of the lexicon of languages.
Language
creativity
is
largely
associated with the growth of technical
vocabularies which consist mainly of technical nouns.
Compound nouns
number in the millions for European laoguages. They are usually built
rrem the vocabulary of simple words by means or grammatical rules which
may
involve grammatical words.
By definition, their meaniog is
nencompositional. The compound nouns can be described in terms of the
sequence of their grammatical categories, in the same way as for adverbs
(IA. Gross, D. T r e m b l a y 1985). We have for example:
Det
N
Adl
N
N of
N
Det
N
N
N
of
Dot
=:
the
=:
=:
crude
elroke
oil, r e a l
o f luck,
but m a n y s i t u a t i o n s
allow
variations
of
in some language•:
Adjectives, noun complements and relative clauses carl be cemplex
and yet apply to free nouns. From the point ot view developped here,
that is, the representation in terms of sequences of grammatical
categories
allowing
for efficient matching procedures witt) texts, th~.,y
do not d i f f e r from a d v e r b s and nouns.
E x a m p l e s are:
table
book
The
The
is as c l e a n a s a n e w p i n
is u p to d a l e
Bob is the w o r l d ' s ( b e a t , w o r a t )
teacher
They discussed
it, on a t a k e it or l e a v e it
ealaFe
3.
board
of
(governors, regenfa)
N =: t h e t a l k o f the t o w n
=: l e s t
lobe,
color
7V
In general, compound nouns
Conrlpound aeons raise o t h e r q u e s t i o n s
moon
Such nouns can b e c o m e q u i t e c o m p l e x in v a r i o u s t e c h n i c a l
modifiers,
R~MAIrlK
Fields.
determiners
and
are encountered:
the moon
is a frozen combination,
- - definite
article-noun
-- which behaves like a proper name, because ot its unicity of
reference. It cannot be modified
by adjectives without losing its
reference:
*the
(big,
yellow) moon;
c r u d e oil takes restricted determiners.
Since it is a mass noun,
t h e r e are difficulties in accepting its plural, It can be modified by
adjectives and nouns as in (cheap, high q u a l i t y ) c r u d e oH, but
these
cannot
modify
el/:
*crude,
( c h e a p , h i g h q u a l i t y ) oil;
stroke of luck has unrestricted determiners and modifiers, bat no
iosertion
is allowed i m m e d i a t e l y before or next to of, in particular
luck
cannot be modified: *stroke of g o r ~ luckS;
word,
whose
Compound verbs
Compound verbs or frozerl sentences as we have termed
them
(M. Gross 1982), can be described as sequences of categories. We write
N i for variable noun phrases and Ci for frozen noun phrases.
For
subjects;
i
= 0, for complements:
i = I, 2. Examples are:
(I)
(2)
(3)
(4)
NO
N0
N0
N°~
V
Ct
V
N1
V
CI
C0
V
Prep
Prep
Ct
=:
Bob
hit the /ackpot
C2
=: Bob took your project into account
C2
=: Bob look the bull b y Ihe heron
=:
Bob'a
dream
came
true
W e outlined in I the description ot a lexicon-grammar of French
v ~ b s and the reasons why compound verbs had to be separated from simple
On~S.
~;ystematic search through dictionaries (monolingaal, bilingual, and
specialized) has yielded close to 20,000 compound verbs belonging to the
same level of language as the 12,000 simple verbs.
A syntactic
c l a s s i f i c a t i o n has b e e n b u i l t for t h e m ( F i g u r e 3).
into
9.
8. ~,lrnko of b a d luck would be a different compound
relation
to a f r o k e o f l u c k
is o n l y etymological.
basis
Compound verbs are the most complex Forms that have to be entered
a lexicon £t. The compounds discussed previously were simple
There are however a limited number of frozen discourses such as:
If
wa,s
f o r a l l I h e w o r l d aa i t S
Which n e e d an e x t r a l e v e l of c o m p l e x i t y (L. Danlos 19B5).
because by and large they wore topologically connc% that is, either
their I'mrts could not be separated by any extraneous linguistic material
or else the+ inso~ted material could be easily described (i.e. by moans
of a finite a u t o m a t o n ) .
++'+ ]+ -
I
l.o
"
ii, +,
..... '"+
!
..... h..
/
CAN
|NoV (C i~ de N),
!;!_iN
NoV (C d,: N),
[] hat le rappcl d ........... h;
(.~PI
NoV Pr+p Ci
11 clmr=ie dam; los b+gonia+;
CPN
NoV I:'rap (t C de N),
C:!Pi"J
N0V (',2[r~p N2
. } l a d6charg6 sa bit . . . . . " M a x
t 750
CNF'2
NoV Iql })'cp C'~
l l s out pass/: Max par tes altucs
I 350 ]
!CIP2
N,. V (;i I'rb.l+ C2
It tact de l'cau dans sen vin
800
Quc P V PI6p Ci
Quc Max rcstc inilhe l!ti ~i;t favour
150
1(7
NoV Ct flce ()n P
II a dit IlOIl ~'l CC que Max testc
cP,
NoV (~l ,:1..' ce Qu P
I1 se tiler (t lop; ¢loigts dc cc qu'il
i'Cst(~
NoV Adv
Cola
C5
CAI)V
_
c×
INoVN
(]O
CoV %/
Col . . . . . leli~ la [antuc d~. Max (hd)
50(! [
5_00
i 300 +
II abondc ,Jam; Ic sens de Max
nc
25(!
r
egt
pisse pas loin
li est palti sans laisscl (l+adrcsse
,50
/
I
,]
Z!}
200!
]
30()
]AI nloll[indc nlOll[C all ILCI tic Max [ I 300
4.
SoFno (;oncIusions
Ilew to organize the lexicon of compound utterances is an gloom
question,
From a computational point ef view, many solutions use
a v a i l a b l e for t h e lookup of a (:emDound term:
(i) Io classical
algorithms
m which
left-to-right
analysis
is
ess~,ntird+ the compound teraq could I.)e viewed as an extension of the
first Ina)ot element met while scanning the sentence. Vor eXSOlplo, the
adiectiw'~
long
is the first such element of the cotopoond adverb
m the toJ~g rim.
Among mmw other possibilities,
the program,
pausing ,:nJ the word long would test the occurrence of the and
in to the l o f t of long,
snd
the o c c u r r e n c e of run
to
the
right.
Notice that the l e f t - t o - r i g h t
constraint has to be somewhat
relaxed
iu o r d e r to t e s t both l e f t and r i g h t c o n t e x t s of long.
(ii) In a futuristic view Of parsing involving parallel
computing,
one
might envision
several
levels of lexicon. At the firat level, lon(j
on the one hand ~md run, on the othe~, would to two sots of
cov=structions
whose intersection would contain tfJu~ compouiKI ilt I'ilo
/oJJfl run; the lattc, r can then be searched N)r in the input text.
V(u
con'ffJo,ond verbs, one wonh'l have to synthesize a matchinfl utterance,
rather than .girn[dy looking it up.
Such a procedure car, always fm
sln+utat ed s(tqueutJally.
I. all cost-.,';, the representatio, el utterances which we have used.
flamen the Se(luer.cos of syntactic categories, agow.~; for the separation
of the lexi(:on of con'lpeund [ornl!~: into classes for which direct access
can be provided. In this way, dictionary Iooliup can Lie stied u|l 1i
ftEMAIH<
AU'N ._INepv,}+~:~ l...)v 0
i+ ~).,,¢ ~., M.=
l
E°~ /
ANP2 / No i veil N I )).(~p ('2
Ill [' MilX CO hoti'ctll
L
]00
A'?+
A-I-i'2
"- No ave-h (£i A(+il- No avoir C I'l',r~ I) (+
EOi-i(°:ii-N~'!!i"!:~!! )
Eel'[
II t, la v\le[ ........
II a mat aux chcvcux
i
/ ;'imhe;i~M
C0 6trc lh61> Cl
. . . . . , "leo,lie
l.cs ~ieuts sont du c6t(" dc Max
Fr(3zen
IO0-}
250 i
350
2(10
Verbs
(hi. Cz'os;.~ 19112)
Tableau 3
In the case of compound verbs, the various ports of each utterance
remain syntactically
independent, Thus, the verbs of (1)-.(4) can take
any t e n s e d f o r m , as ill:
At
Sentential
tbaf
time. Bob w i l l be h i t t i n g the l a c k p a t
i n s e r t s (:an s e p a r a t e
Bob
hit,
a verb from its coruplemonts:
if s e e m s to me, the jackpot
In example (2). the direct complement N t is Ifee and general.
heoce, se+ltenti~d structures can separate the verb from its second
(frezed=} c o m p l e m e n t :
Bob
took
In laver el l e f l q o - r i g h t aualysit; one could point to the loci that
complex terms can ellen be abbreviated and that abbreviations are nlostty
rHfht truncations.
In seth situations the remaining part (the tellmast
p~rt) af the truocated term must carry the in|ormation that describers
the rgtht context m order to allow reconstruction of the reducncl part.
Iherc are however examples where abbreviations are carried out on the
left
part
el
a term.
(e g.
a progral~mlng language
a
larp.quagc).
Preliminary figures have shown that conl[~und terms form thP.
essential
[.art of a lexicon-grammar.
It is also interesting to observe
that they Iorce both the linguist and the computer specialist to adopt a
me(;h voore a b s t r a c t view of language;
- ~;emantically, tw defied)on, compoond utterances cannot be decomposed
into simple utterances', in other terms, meaning is not compositional fer
c(a'npoends, fleece, in a certain sense, one has to recognize that meaning
has not nuJch to do with words;
- syntactically,
it has become a rather general hatlit to attach
properties 1o individual words, In the case of compounds this mode of
representation is no longer possible:
Why privilege one part of a
compound with marks rather than some other part? For example, there is
no reason to attach the Passive marking to the verb rather than to
either of the complements of the utterance to put the cart before the
horse, Lexicon-grammar representations eliminate such questions by
dolocalizing
the
syntactic information and by attaching it to the full
sentence, In this sense, compound expressions provide a powerful
n]etivation for representing lexical and syntactic phenomena in the form
of a l e x i c o n - g r a m m a r .
the tact lhat Jo was absent yesterday into account
Notice that parts of compound verbs may be recognized directly, for
example the iackpof, or into account, but these parts may be
ambiguous, whereas the full utterances can rarely be confused with free
for~,ns10.
10+ As a matter of fact, when an utterance is found to be ambiguous,
with one analysis as a frozen form and the other as a free form,
ignoring c o m p e t i n g free forms a l t o g e t h e r is a good parsing strategy,
11. The saree use of se(luences of syntactic categories is found in n
string grammar (Z.S+
Harris 1961), which has proven to be quite
efficient
in syntactic recognition
(N, Sager 1981, M. Salkoff 1973,
1979).
REFERENCES
Boeas, Jean-Paul,, Guillet, Alain.
and LeclSre, C-hristian.
197Ga,
La slruclure des phrases simples en lran~aia, I Constr~/ctions
intransitives, Geneva: Droz, 37zp.
Boons, Jean-PauL, Guillet, Alain.
and Lecl~re, Christian,
197Gb,
ta structure (tea phrases simple8 en fran,~iais. III Clasae,~ de
constructions transitives, Rapport de recherches No 6, Paris:
University Paris 7, L,A,D,L., 143p,
Boons, Jean-Paul., Guillet, Alain. and t.ecl~'re, Christian. 1987.
sfrucltn'e
des
phrases
simplea en fran~ais.
II Classes
constructions locatives, Paris: Cantil~ae.
Danlos, Laurence. 1985, G#n~ration automatique de textes en
nalurellea, Paris: Masson, 239p,
La
de
langues
Gross, Ga.~:ton. 1985. Le lexique ~lectranique des roots compos~a du
fran(~ala, Rapport ATrP CNRS, Paris: LJnwersity Paris XIII,
Gross, Gaston; Viv~s Robert, eds. 1986. Syntaxe <lea hems, Langue
francaise 69, Paris: Larousse, 128p.
Gross,
Maurice
1975. M~fhades en syntaxe, Paris:
Hermann, 414p.
Gross, Maorice 1981. Lea bases empiriques de la notion de pr~dicat
~ m a n t i q u e , Langages G3, Paris: Larousse, pp,7-52.
Grc,~s, Mautice 1982, Une classification des phrases fig~.Kes du franwcais,
Revue qllt~ceJse de linguiatique, VOI. 11, No 2, Montreal :
Presses de I'Lh~iversite do Quebec ~ Montr#.:al, pp,151-19,5.
Gross,
Maurice 1986. Grammaire tranafermalionnelle
Ill Synlaxe de I'advert~e, Paris : CantJl~ne.
du
fran~sis.
Gross,
Maurice; Tremblay, Diane 1905, Etude du conlenu d'une bsnque
terminolegique, Rapport de recherche du LADL, Paris: MIDIST,
Harris,
Zellig S. 19Gt. String Analysis o1 Sentence Structure,
Papers on Formal Linguistics, The Plague: Mouton,
Harris, Zellig S. 1964. The Elementary lranformations, Transformations
and Discourse Analysis Papers 54, in Harris, Zellig S. 1970,
Papers
in
Structural
and
Transformational
Lingltistic~,
Dordrechl: Reidel, pp.482-532,
H~rris,
Zellig S.
Seuil, 237p.
Harris° Zellig $.
Principles,
1976. Notes du cours
de
syntaxe,
Paris :
Le
1982. A Grammar of English on Mathematical
New York: Wiley Interscience, 429p,
Sabatier, Paul 1 9 8 0 . Dialogue
Doctoral thesis, Marseille:
en francais avec un ordinateur,
Groupe d'intelligence artificielle.
Sager,
N a o m i 1991. Natural Language Information Processing.
A
Compuler
Grammar of English and Its Applicalions, Reading:
Addison-Wesley, xv-399p.
Salkoff, Morris 1973. Line grammaire en chsihe du
diatributionnelle, Paris: Duned, xiv-199p.
franyais.
Analyse
Salkotf, Morris 1979. Analyse ayntsxique du franqais.
Grammalre
cha}~e, Amsterdam: John Benjamins El.V,, 334p.
on