Download document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Spitzer Space Telescope wikipedia , lookup

Lovell Telescope wikipedia , lookup

James Webb Space Telescope wikipedia , lookup

CfA 1.2 m Millimeter-Wave Telescope wikipedia , lookup

Transcript
Center for Computing Research
National Polytechnic Institute
Mexico
Acquiring Selectional
Preferences from Untagged Text
for Prepositional Phrase
Attachment Disambiguation
Hiram Calvo and Alexander Gelbukh
Presented by Igor A. Bolshakov
1/17
Introduction
• Entities must be identified adequately for database
representation:
– See the cat with a telescope
– See [the cat] [with a telescope]
2 entities
– See [the cat with a telescope] 1 entity
• Problem is known as Prepositional Phrase (PP)
attachment disambiguation.
2/17
Existing methods - 1
• Accuracy when using treebank statistics:
– Ratnaparkhi et al., Brill and Resnik: up to 84%
– Kudo and Matsumoto: 95.8%
• Needed weeks for training
– Lüdtke and Sato: 94.9%
• Only 3 hours for training
• But there are no treebanks for many
languages!
3/17
Existing methods - 2
• Based on Untagged text:
– Calvo and Gelbukh, 2003: 82.3% accuracy
– Uses the web as corpus:
• Slow (up to 18 queries for each PP attachment
ambiguity)
• Does this method work with very big local
corpora?
4/17
Using a big local corpus
• Corpus
– 3 years of publication of 4 newspapers
– 161 million words
– 61 million sentences
• Results:
– Recall: 36% Precision: 67%
– Dissapointing!
5/17
What do we want?
• To solve PP attachment disambiguation with
–
–
–
–
Local corpora, not web
No treebanks
No supervision
High precision and recall
• Solution proposed:
– Selectional Preferences
6/17
Selectional Preferences
• The problem of
I see a cat with a telescope
turns into
I see {animal} with {instrument}
7/17
Sources for noun semantic
classification
• Machine-Readable dictionaries
• WordNet ontology
– We use the top 25 unique beginner concepts of
WordNet
• Examples: mouse is-a {animal},
ranch is-a {place}, root is-a part},
reality is-a {atrtibute},
race is-a {grouping}, etc.
8/17
Extracting Selectional
Preferences
• Text is shallow parsed
• Subordinate sentences are separated
• Patterns are searched
1. Verb NEAR Preposition NEXT_TO Noun
2. Verb NEAR Noun
3. Noun NEAR Verb
4. Noun NEXT_TO Preposition NEXT_TO Noun
• All Nouns are classified
9/17
Example
• Consider this toy-corpus:
– I see a cat with a telescope
– I see a ship in the sea with a spyglass
The following patterns are extracted:
– See,cat
see,{animal}
– See,with,telescope see,with,{instrument}
– Cat,with,telescope
{animal},with,{instrument}
– See,ship
see,{thing}
– See,in,sea
see,in,{place}
– See,with,spyglass
see,with,{instrument}
– Ship,in,sea
{thing},in,{place} 10/17
Example
• See, with, {instrument} has two occurrences
• {Animal}, with, {instrument} has one
occurrence
• Thus,
– See with {instrument} is more probable than
{animal} with {instrument}
11/17
Experiment
• Now, with a real corpus, we apply the
following formula:
occ( X , P, C 2 )
freq( X , P, C 2 ) 
occ( X )  occ(C 2 )
• X can be a specific verb or a noun’s semantic
class (see or {animal})
• P is a preposition (with)
• C2 is the class of the second noun
{instrument}
12/17
Experiment
• From the corpus of 161 million words of
Spanish Mexican newspaper the system
obtained:
• 893,278 selectional preferences for 5,387
verbs, and
• 55,469 noun patterns (like {animal} with
{instrument})
13/17
Evaluation
• We tested the obtained Selectional
Preferences doing PP attachment
disambiguation on 546 sentences from the
LEXESP corpus (in Spanish).
• Then we compared manually with the
correct PP attachments.
• Results: precision 78.2%, recall: 76.0%
14/17
Conclusions
• Results not as good as those obtained by
other methods (up to 95%)
• But we don’t need any costly resources,
such as:
– Treebanks
– Manually anotated corpora
– Web as corpus
15/17
Future Work
• To use not only 25 fixed semantic classes
(top concepts) but the whole hierarchy
• To use a WSD module
– Currently if a word belongs to more than one
class, all classes are taken into accoutb
16/17
Thank you!
[email protected]
[email protected]
17/17