Download Lexical Relations and WordNet - Courses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Focus (linguistics) wikipedia , lookup

Morphology (linguistics) wikipedia , lookup

Pipil grammar wikipedia , lookup

Stemming wikipedia , lookup

Junction Grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Word-sense disambiguation wikipedia , lookup

Cognitive semantics wikipedia , lookup

Transcript
Lexical Relations and
WordNet
Ray Larson & Warren Sack
University of California, Berkeley
School of Information Management and Systems
SIMS 202: Information Organization and Retrieval
Lecture author: Warren Sack
1 Nov 2001
IS202: Information Organization and Retrieval
Last Time
• What is Cognitive Science?
• What is Artificial Intelligence?
– Knowledge Representation
• Languages and Programming Paradigms
– Representing Common Sense
• Common Sense Interfaces
• Story Understanding, Story Generation, and
Common Sense
1 Nov 2001
IS202: Information Organization and Retrieval
Cognitive Science
• 10/30/01 – AI, knowledge
representation and common sense
• 11/01/01 – Computational Linguistics,
Cognitive Psychology and Lexical
Knowledge
• 11/06/01 – AI and information extraction
• 11/08/01 – Linguistics, Philosophy,
Psychology, categories, and cognition
1 Nov 2001
IS202: Information Organization and Retrieval
Today
• Lexical relations
– Linguistics
• Two approaches to semantics:
– Compositional
– Relational
– Psycholinguistics
• WordNet
– Description
– Structure
– Applications
1 Nov 2001
IS202: Information Organization and Retrieval
Levels of Linguistic Analysis
• Sentences
– Phonological/Morphological analysis
– Syntactic analysis
– Semantic analysis
• More than one sentence
– Pragmatic analysis
1 Nov 2001
IS202: Information Organization and Retrieval
Phonology/Morphology
• Phonology: The study of the systems of
sounds which are manifested in natural
languages; the significant contrasts between
sounds that are relevant to meaning.
– E.g., consonants, vowels, stress, intonation, etc.
• Morphology: the forms of words
– E.g., word=watched; morphs=watch+ed;
morphemes=watch+past
1 Nov 2001
IS202: Information Organization and Retrieval
Syntax
The syntax of a language is to
be understood as a set of
rules which accounts for the
distribution of word forms
throughout the sentences of
a language. These rules
codify permissible
combinations of classes of
word forms.
1 Nov 2001
IS202: Information Organization and Retrieval
Semantics
• Semantics is the study of linguistic
meaning.
• Two standard approaches to lexical
semantics (cf., sentential semantics;
and, logical semantics):
– (1) compositional
– (2) relational
• Other approaches…
1 Nov 2001
IS202: Information Organization and Retrieval
Pragmatics
• Deixis
– E.g., “I’ll be back in an hour” depends upon the time of
the utterance.
• Conversational implicature
– A: “Can you tell me the time?”
– B: “Well, the milkman has come.” [I don’t know exactly,
but perhaps you can deduce it from some extra
information I give you.]
• Presupposition
– “Are you still such a bad driver?”
• Speech acts
– Constatives vs. performatives
– e.g., “I second the motion.”
• Conversational Structure
1 Nov 2001
IS202: Information Organization and Retrieval
– E.g., turn-taking rules
Lexical Semantics:
Compositional Approach
•
•
Compositional lexical semantics, introduced by Katz & Fodor (1963),
analyzes the meaning of a word in much the same way a sentence is
analyzed into semantic components. The semantic components of a
word are not themselves considered to be words, but are abstract
elements (semantic atoms) postulated in order to describe word
meanings (semantic molecules) and to explain the semantic relations
between words. For example, the representation of bachelor might be
ANIMATE and HUMAN and MALE and ADULT and NEVER MARRIED.
The representation of man might be ANIMATE and HUMAN and MALE
and ADULT; because all the semantic components of man are included
in the semantic components of bachelor, it can be inferred that bachelor
 man. In addition, there are implicational rules between semantic
components, e.g. HUMAN  ANIMATE, which also look very much like
meaning postulates.
George Miller, “On Knowing a Word,” 1999
1 Nov 2001
IS202: Information Organization and Retrieval
Lexical Semantics:
Relational Approach
•
•
Relational lexical semantics was first introduced by Carnap (1956) in
the form of meaning postulates, where each postulate stated a
semantic relation between words. A meaning postulate might look
something like dog  animal (if x is a dog then x is an animal) or,
adding logical constants, bachelor  man and never married [if x is a
bachelor then x is a man and not(x has married)] or tall  not short [if x
is tall then not(x is short)]. The meaning of a word was given, roughly,
by the set of all meaning postulates in which it occurs.
George Miller, “On Knowing a Word,” 1999
1 Nov 2001
IS202: Information Organization and Retrieval
Psycholinguistics
• The introduction of Noam Chomsky’s theory
of syntax to psychologists:
• Miller, G.A., Galanter, E., Pribram, K.H.
(1960) Plans and the Structure of Behavior.
• Some areas of psycholinguistics:
– Children’s acquisition of language
– First and second language learning
– Artificial intelligence? (see Lyons, 1981)
1 Nov 2001
IS202: Information Organization and Retrieval
WordNet
• Started in 1985 by George Miller, students, and
colleagues at the Cognitive Science Laboratory,
Princeton University
• Can be downloaded for free:
www.cogsci.princeton.edu/~wn/
• In terms of coverage, WordNet’s goals differ
little from those of a good standard college-level
dictionary, and the semantics of WordNet is
based on the notionof word sense that
lexicographers have traditionally used in writing
dictionaries. It is in the organization of that
information that WordNet aspires to innovation.
1998,IS202:
chapter
1(Miller,
Nov 2001
Information1)
Organization and Retrieval
Presuppositions of
WordNet project
• Separability hypothesis: T
The lexical component of language can be
separated and studied in its own right.
• Patterning hypothesis:
People have knowledge of the systematic
patterns and relations between word
meanings.
• Comprehensiveness hypothesis:
Computational linguistics programs need a
store of lexical knowledge that is as extensive
1 Nov
2001
IS202:people
Information Organization
as
that which
have. and Retrieval
WordNet structure
• Synsets versus Words
1 Nov 2001
IS202: Information Organization and Retrieval
WordNet: Size
POS
Unique
Strings
Synsets
Noun
107930
74488
Verb
10806
12754
Adjective
21365
18523
Adverb
4583
3612
Totals 144684
1 Nov 2001
109377
IS202: Information Organization and Retrieval
Structure of WordNet
1 Nov 2001
IS202: Information Organization and Retrieval
Structure of WordNet
1 Nov 2001
IS202: Information Organization and Retrieval
Structure of WordNet
1 Nov 2001
IS202: Information Organization and Retrieval
Unique Beginners
•
•
{ entity, something, (anything having existence (living or nonliving)) }
{ psychological_feature, (a feature of the mental life of a living
organism) }
•
{ abstraction, (a general concept formed by extracting common
features from specific examples) }
•
{ state, (the way something is with respect to its main attributes; "the
current state of knowledge"; "his state of health"; "in a weak financial
state") }
•
{ event, (something that happens at a given place and time) }
•
{ act, human_action, human_activity, (something that people do or
cause to happen) }
•
{ group, grouping, (any number of entities (members) considered as
a unit) }
•
{ possession, (anything owned or possessed) }
•
{ phenomenon, (any state or process known through the senses
rather than by intuition
or reasoning) }
1 Nov 2001
IS202: Information Organization and Retrieval
Roget’s “Unique Beginners”
The ontology of Roget’s is headed by six Classes. The first three Classes cover the
external world: Abstract Relations deals with such ideas as number, order and time;
Space is concerned with movement, shapes and sizes, while Matter covers the
physical world and humankind’s perception of it by means of five senses. The
remaining Classes deal with the internal world of human beings: the mind (Intellect),
the will (Volition), the heart and soul (Emotion, Religion and Morality). There is a
logical progression from abstract concepts, through the material universe, to
mankind itself, culminating in what Roget saw as mankind’s highest achievements:
morality and religion (Kirkpatrick, 1998). Class Four, Intellect, is divided into
Formation of ideas and Communication of ideas, and Class Five, Volition, into
Individual volition and Social volition. In practice, therefore, the Thesaurus is
headed by eight Classes. A path in Roget’s ontology always begins with one of the
Classes. It branches to one of the 39 Sections and then to one of the 990 Heads.
Each Head is divided into paragraphs grouped by parts of speech: nouns,
adjectives, verbs and adverbs.
From Mario Jarmasz, Stan Szpakowicz, “Roget’s Thesaurus as an Electronic
Lexical Knowledge Base,” 2000.
1 Nov 2001
IS202: Information Organization and Retrieval
WordNet Browsers
• http://www.cogsci.princeton.edu/cgibin/webwn
• http://bogart.sip.ucm.es/~jorge/browser.
htm
• http://www.visualthesaurus.com/
1 Nov 2001
IS202: Information Organization and Retrieval
Other WordNets
http://www.hum.uva.nl/~ewn/gwa/wordnet_table.htm
•
•
•
•
•
•
•
Dutch
Spanish
Italian
German
French
Czech
Estonian
1 Nov 2001
IS202: Information Organization and Retrieval
Forthcoming WordNets
http://www.hum.uva.nl/~ewn/gwa/wordnet_table.htm
•
•
•
•
•
•
•
•
•
•
Bengali
Bulgarian
Danish
Greek
Hebrew
Hindi
Hindi
Kannada
Latvian
Moldavian
1 Nov 2001
•
•
•
•
•
•
•
•
•
•
Romanian
Russian
Slovenian
Swedish
Tamil
Thai
Turkish
Yugoslavian
Norwegian
Icelandic
IS202: Information Organization and Retrieval
Psycholinguistic evidence for
WordNet’s structure
• Bever and Rosenbaum, 1970:
– A pistol is more dangerous than a rifle.
– * A pistol is more dangerous than a gun.
– * A gun is more dangerous than a pistol.
• Resnik, 1993
– The direct object of the verb drink can be any
hyponym of the noun berverage.
• Collins and Quillian, 1969
– The time required to verify the statement “A robin
is a bird” is shorter than the time required to verify
the statement “A robin is an animal.”
1 Nov 2001
IS202: Information Organization and Retrieval
Psycholinguistic evidence
against WordNet’s structure
• Smith and Medin, 1981
– The time required to verify that a chicken is a bird
is significantly longer than the time required to
verify that a robin is a bird, even though chick and
robin stand in the same taxonomic relation to bird.
• Rosch, 1973
– Ratings of “typicality” have little to do with
frequency or familiarity.
• Lakoff, 1987
– Concepts are represented, not by a list of
distinguishing features, but by the focal instances
(or prototypes)
that are
the best
examples of the
1 Nov 2001
IS202: Information
Organization
and Retrieval
prototype.
WordNet Applications
• Using WordNet as a data structure. Many
languages used by computational linguists and
natural language processing researchers now
have WordNet packages. E.g., for Perl
– Lingua::Wordnet, and
– Lingua::Wordnet::Analysis
by Dan Brian, http://search.cpan.org/search?dist=LinguaWordnet
1 Nov 2001
IS202: Information Organization and Retrieval
WordNet Applications
• Information Retrieval: Voorhees, 1998
– Query expansion via synsets
– “sense-based” rather than “stem-based”
vectors
– Unfortunately, in both cases, the inability to
automatically resolve word senses
prevented any improvement from being
made.
1 Nov 2001
IS202: Information Organization and Retrieval
WordNet Applications
• Textual Cohesion and the correction of
Malapropisms: Hirst and St-Onge, 1998
Malapropism = the confounding of an
intended word with another word of
similar sound or similar spelling that has
a quite different meaning; e.g., “Super
bowl  Superb owl”
1 Nov 2001
IS202: Information Organization and Retrieval
WordNet Applications
• Temporal Indexing through lexical
chaining: Al-Halimi and Kazman, 1998
• Indexing transcripts of conference
meetings by topic.
1 Nov 2001
IS202: Information Organization and Retrieval
WordNet Applications
• Conversation themes in Usenet:
Sack, 2000
1 Nov 2001
IS202: Information Organization and Retrieval
Next Time
• Information Extraction, Artificial
Intelligence, and “Story Understanding”
Revisited
1 Nov 2001
IS202: Information Organization and Retrieval