Download Lexical Relations and WordNet

Document related concepts

Knowledge representation and reasoning wikipedia , lookup

The City and the Stars wikipedia , lookup

Embodied language processing wikipedia , lookup

Word-sense disambiguation wikipedia , lookup

Transcript
Lecture 5: Lexical Relations & WordNet
SIMS 202:
Information Organization
and Retrieval
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 pm
Fall 2003
http://www.sims.berkeley.edu/academics/courses/is202/f03/
IS 202 – FALL 2003
2003.09.09 - SLIDE 1
Lecture Overview
•
•
•
•
•
•
Review
Lexical Relations
WordNet
Demo
Discussion Questions
Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
IS 202 – FALL 2003
2003.09.09 - SLIDE 2
Lecture Overview
•
•
•
•
•
•
Review
Lexical Relations
WordNet
Demo
Discussion Questions
Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
IS 202 – FALL 2003
2003.09.09 - SLIDE 3
Definition of AI
“... artificial intelligence [AI] is the science of
making machines do things that would
require intelligence if done by [humans]”
(Minsky, 1963)
IS 202 – FALL 2003
2003.09.09 - SLIDE 4
The Goals of AI Are Not New
• Ancient Greece
– Daedalus’ automata
• Judaism’s myth of the Golem
• 18th century automata
– Singing, dancing, playing chess?
• Mechanical metaphors for mind
– Clock
– Telegraph/telephone network
– Computer
IS 202 – FALL 2003
2003.09.09 - SLIDE 5
Some Areas of AI
•
•
•
•
•
•
•
•
•
•
Knowledge representation
Programming languages
Natural language understanding
Speech understanding
Vision
Robotics
Planning
Machine learning
Expert systems
Qualitative simulation
IS 202 – FALL 2003
2003.09.09 - SLIDE 6
AI or IA?
• Artificial Intelligence (AI)
– Make machines as smart as (or smarter than)
people
• Intelligence Amplification (IA)
– Use machines to make people smarter
IS 202 – FALL 2003
2003.09.09 - SLIDE 7
Furnas: The Vocabulary Problem
• People use different words to describe the
same things
– “If one person assigns the name of an item,
other untutored people will fail to access it on
80 to 90 percent of their attempts.”
– “Simply stated, the data tell us there is no one
good access term for most objects.”
IS 202 – FALL 2003
2003.09.09 - SLIDE 8
The Vocabulary Problem
• How is it that we come to understand each
other?
– Shared context
– Dialogue
• How can machines come to understand
what we say?
– Shared context?
– Dialogue?
IS 202 – FALL 2003
2003.09.09 - SLIDE 9
Vocabulary Problem Solutions?
• Furnas et al.
– Make the user memorize precise system
meanings
– Have the user and system interact to identify
the precise referent
– Provide infinite aliases to objects
• Minsky and Lenat
– Give the system “commonsense” so it can
understand what the user’s words can mean
IS 202 – FALL 2003
2003.09.09 - SLIDE 10
CYC
• Decades long effort to build a
commonsense knowledge-base
• Storied past
• 100,000 basic concepts
• 1,000,000 assertions about the world
• The validity of Cyc’s assertions are
context-dependent (default reasoning)
IS 202 – FALL 2003
2003.09.09 - SLIDE 11
Cyc Examples
• Cyc can find the match between a user's query for
"pictures of strong, adventurous people" and an
image whose caption reads simply "a man climbing
a cliff"
• Cyc can notice if an annual salary and an hourly
salary are inadvertently being added together in a
spreadsheet
• Cyc can combine information from multiple
databases to guess which physicians in practice
together had been classmates in medical school
• When someone searches for "Bolivia" on the Web,
Cyc knows not to offer a follow-up question like
"Where can I get free Bolivia online?"
IS 202 – FALL 2003
2003.09.09 - SLIDE 12
Cyc Applications
• Applications currently available or in development
–
–
–
–
–
Integration of Heterogeneous Databases
Knowledge-Enhanced Retrieval of Captioned Information
Guided Integration of Structured Terminology (GIST)
Distributed AI
WWW Information Retrieval
• Potential applications
–
–
–
–
–
–
–
–
Online brokering of goods and services
"Smart" interfaces
Intelligent character simulation for games
Enhanced virtual reality
Improved machine translation
Improved speech recognition
Sophisticated user modeling
Semantic data mining
IS 202 – FALL 2003
2003.09.09 - SLIDE 13
Cyc’s Top-Level Ontology
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Fundamentals
Top Level
Time and Dates
Types of Predicates
Spatial Relations
Quantities
Mathematics
Contexts
Groups
"Doing"
Transformations
Changes Of State
Transfer Of
Possession
Movement
Parts of Objects
•
•
•
•
•
•
•
•
•
•
•
•
•
Composition of
Substances
Agents
Organizations
Actors
Roles
Professions
Emotion
Propositional
Attitudes
Social
Biology
Chemistry
Physiology
General
Medicine
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Materials
Waves
Devices
Construction
Financial
Food
Clothing
Weather
Geography
Transportation
Information
Perception
Agreements
Linguistic Terms
Documentation
http://www.cyc.com/cyc-2-1/toc.html
IS 202 – FALL 2003
2003.09.09 - SLIDE 14
Lecture Overview
•
•
•
•
•
•
Review
Lexical Relations
WordNet
Demo
Discussion Questions
Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
IS 202 – FALL 2003
2003.09.09 - SLIDE 15
Syntax
• The syntax of a language
is to be understood as a
set of rules which accounts
for the distribution of word
forms throughout the
sentences of a language
• These rules codify
permissible combinations
of classes of word forms
IS 202 – FALL 2003
2003.09.09 - SLIDE 16
Semantics
• Semantics is the study of linguistic
meaning
• Two standard approaches to lexical
semantics (cf., sentential semantics; and,
logical semantics):
– (1) compositional
– (2) relational
IS 202 – FALL 2003
2003.09.09 - SLIDE 17
Lexical Semantics: Compositional Approach
• Compositional lexical semantics, introduced by Katz & Fodor (1963),
analyzes the meaning of a word in much the same way a sentence
is analyzed into semantic components. The semantic components of
a word are not themselves considered to be words, but are abstract
elements (semantic atoms) postulated in order to describe word
meanings (semantic molecules) and to explain the semantic
relations between words. For example, the representation of
bachelor might be ANIMATE and HUMAN and MALE and ADULT
and NEVER MARRIED. The representation of man might be
ANIMATE and HUMAN and MALE and ADULT; because all the
semantic components of man are included in the semantic
components of bachelor, it can be inferred that bachelor  man. In
addition, there are implicational rules between semantic
components, e.g. HUMAN  ANIMATE, which also look very much
like meaning postulates.
– George Miller, “On Knowing a Word,” 1999
IS 202 – FALL 2003
2003.09.09 - SLIDE 18
Lexical Semantics: Relational Approach
• Relational lexical semantics was first introduced
by Carnap (1956) in the form of meaning
postulates, where each postulate stated a
semantic relation between words. A meaning
postulate might look something like dog 
animal (if x is a dog then x is an animal) or,
adding logical constants, bachelor  man and
never married [if x is a bachelor then x is a man
and not(x has married)] or tall  not short [if x is
tall then not(x is short)]. The meaning of a word
was given, roughly, by the set of all meaning
postulates in which it occurs.
– George Miller, “On Knowing a Word,” 1999
IS 202 – FALL 2003
2003.09.09 - SLIDE 19
Pragmatics
• Deals with the relation between signs or linguistic
expressions and their users
• Deixis (literally “pointing out”)
– E.g., “I’ll be back in an hour” depends upon the time of the
utterance
• Conversational implicature
– A: “Can you tell me the time?”
– B: “Well, the milkman has come.” [I don’t know exactly, but
perhaps you can deduce it from some extra information I give
you.]
• Presupposition
– “Are you still such a bad driver?”
• Speech acts
– Constatives vs. performatives
– E.g., “I second the motion.”
• Conversational structure
– E.g., turn-taking rules
IS 202 – FALL 2003
2003.09.09 - SLIDE 20
Language
• Language only hints at meaning
• Most meaning of text lies within our minds
and common understanding
– “How much is that doggy in the window?”
• How much: social system of barter and trade (not
the size of the dog)
• “doggy” implies childlike, plaintive, probably cannot
do the purchasing on their own
• “in the window” implies behind a store window, not
really inside a window, requires notion of window
shopping
IS 202 – FALL 2003
2003.09.09 - SLIDE 21
Semantics: The Meaning of Symbols
• Semantics versus Syntax
– add(3,4)
–3+4
– (different syntax, same meaning)
• Meaning versus Representation
– What a person’s name is versus who they are
• A rose by any other name...
– What the computer program “looks like”
versus what it actually does
IS 202 – FALL 2003
2003.09.09 - SLIDE 22
Semantics
• Semantics: assigning meanings to
symbols and expressions
– Usually involves defining:
• Objects
• Properties of objects
• Relations between objects
– More detailed versions include
•
•
•
•
Events
Time
Places
Measurements (quantities)
IS 202 – FALL 2003
2003.09.09 - SLIDE 23
The Role of Context
• The concept associated with the symbol
“21” means different things in different
contexts
– Examples?
• The question “Is there any salt?”
– Asked of a waiter at a restaurant
– Asked of an environmental scientist at work
IS 202 – FALL 2003
2003.09.09 - SLIDE 24
What’s in a Sentence?
“A sentence is not a verbal snapshot or movie
of an event. In framing an utterance, you have
to abstract away from everything you know, or
can picture, about a situation, and present a
schematic version which conveys the
essentials. In terms of grammatical marking,
there is not enough time in the speech
situation for any language to allow for the
marking of everything which could possibly be
significant to the message.”
Dan Slobin, in Language Acquisition: The state of the art, 1982
IS 202 – FALL 2003
2003.09.09 - SLIDE 25
Lexical Relations
• Conceptual relations link concepts
– Goal of Artificial Intelligence
• Lexical relations link words
– Goal of Linguistics
IS 202 – FALL 2003
2003.09.09 - SLIDE 26
Major Lexical Relations
•
•
•
•
•
•
Synonymy
Polysemy
Metonymy
Hyponymy/Hypernymy
Meronymy/Holonymy
Antonymy
IS 202 – FALL 2003
2003.09.09 - SLIDE 27
Synonymy
• Different ways of expressing related concepts
• Examples
– cat, feline, Siamese cat
• Overlaps with basic and subordinate levels
• Synonyms are almost never truly substitutable
– Used in different contexts
– Have different implications
• This is a point of contention
IS 202 – FALL 2003
2003.09.09 - SLIDE 28
Polysemy
• Most words have more than one sense
– Homonym: same sound and/or spelling, different
meaning (http://www.wikipedia.org/wiki/Homonym)
• bank (river)
• bank (financial)
– Polysemy: different senses of same word
(http://www.wikipedia.org/wiki/Polysemy)
• That dog has floppy ears.
• She has a good ear for jazz.
• bank (financial) has several related senses
– the building, the institution, the notion of where money is stored
IS 202 – FALL 2003
2003.09.09 - SLIDE 29
Metonymy
• Use one aspect of something to stand for
the whole
– The building stands for the institution of the
bank.
– Newscast: “The White House released new
figures today.”
– Waitperson: “The ham sandwich spilled his
drink.”
IS 202 – FALL 2003
2003.09.09 - SLIDE 30
Hyponymy/Hyperonymy
• ISA relation
• Related to Superordinate and Subordinate
level categories
– hyponym(robin,bird)
– hyponym(emu,bird)
– hyponym(bird,animal)
– hyperym(animal,bird)
• A is a hypernym of B if B is a type of A
• A is a hyponym of B if A is a type of B
IS 202 – FALL 2003
2003.09.09 - SLIDE 31
Basic-Level Categories (Review)
• Brown 1958, 1965, Berlin et al., 1972, 1973
• Folk biology:
–
–
–
–
–
Unique beginner: plant, animal
Life form: tree, bush, flower
Generic name: pine, oak, maple, elm
Specific name: Ponderosa pine, white pine
Varietal name: Western Ponderosa pine
• No overlap between levels
• Level 3 is basic
– Corresponds to genus
– Folk biological categories correspond accurately to
scientific biological categories only at the basic level
IS 202 – FALL 2003
2003.09.09 - SLIDE 32
Psychologically Primary Levels
SUPERORDINATE
BASIC LEVEL
SUBORDINATE
animal
dog
terrier
furniture
chair
rocker
• Children take longer to learn superordinate
• Superordinate not associated with mental
images or motor actions
IS 202 – FALL 2003
2003.09.09 - SLIDE 33
Meronymy/Holonymy
• Part/Whole relation
– meronym(beak,bird)
– meronym(bark,tree)
– holonym(tree,bark)
• Transitive conceptually but not lexically
– The knob is a part of the door.
– The door is a part of the house.
– ? The knob is a part of the house ?
• Holonyms are (approximately) the inverse
of meronyms
IS 202 – FALL 2003
2003.09.09 - SLIDE 34
Antonymy
• Lexical opposites
– antonym(large, small)
– antonym(big, small)
– antonym(big, little)
– but not large, little
• Many antonymous relations can be reliably
detected by looking for statistical
correlations in large text collections.
(Justeson & Katz 91)
IS 202 – FALL 2003
2003.09.09 - SLIDE 35
Thesauri and Lexical Relations
• Polysemy: same word, different senses of
meaning
– Slightly different concepts expressed similarly
• Synonyms: different words, related senses of
meanings
– Different ways to express similar concepts
• Thesauri help draw all these together
• Thesauri also commonly define a set of relations
between terms that is similar to lexical relations
– BT, NT, RT
• More on Thesauri next week…
IS 202 – FALL 2003
2003.09.09 - SLIDE 36
What is an Ontology?
• From Merriam-Webster’s Collegiate
– A branch of metaphysics concerned with the nature
and relations of being
– A particular theory about the nature of being or the
kinds of existence
• More prosaically
– A carving up of the world’s meanings
– Determine what things exist, but not how they interrelate
• Related terms
– Taxonomy, dictionary, category structure
• Commonly used now in CS literature to describe
structures that function as Thesauri
IS 202 – FALL 2003
2003.09.09 - SLIDE 37
Lecture Overview
•
•
•
•
•
•
Review
Lexical Relations
WordNet
Demo
Discussion Questions
Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
IS 202 – FALL 2003
2003.09.09 - SLIDE 38
WordNet
• Started in 1985 by George Miller, students,
and colleagues at the Cognitive Science
Laboratory, Princeton University
– Miller also known as the author of the paper
“The Magical Number Seven, Plus or Minus
Two: Some Limits on our Capacity for
Processing Information” (1956)
• Can be downloaded for free:
– www.cogsci.princeton.edu/~wn/
IS 202 – FALL 2003
2003.09.09 - SLIDE 39
Miller on WordNet
• “In terms of coverage, WordNet’s goals
differ little from those of a good standard
college-level dictionary, and the semantics
of WordNet is based on the notion of word
sense that lexicographers have
traditionally used in writing dictionaries. It
is in the organization of that information
that WordNet aspires to innovation.”
– (Miller, 1998, Chapter 1)
IS 202 – FALL 2003
2003.09.09 - SLIDE 40
Presuppositions of WordNet Project
• Separability hypothesis
– The lexical component of language can be
separated and studied in its own right
• Patterning hypothesis
– People have knowledge of the systematic
patterns and relations between word
meanings
• Comprehensiveness hypothesis
– Computational linguistics programs need a
store of lexical knowledge that is as extensive
as that which people have
IS 202 – FALL 2003
2003.09.09 - SLIDE 41
WordNet: Size
WordNet Uses “Synsets” – sets of synonymous terms
POS
Unique
Strings
Synsets
Noun
107930
74488
Verb
10806
12754
Adjective
21365
18523
4583
3612
144684
109377
Adverb
Totals
IS 202 – FALL 2003
2003.09.09 - SLIDE 42
Structure of WordNet
IS 202 – FALL 2003
2003.09.09 - SLIDE 43
Structure of WordNet
IS 202 – FALL 2003
2003.09.09 - SLIDE 44
Structure of WordNet
IS 202 – FALL 2003
2003.09.09 - SLIDE 45
Unique Beginners
• Entity, something
– (anything having existence (living or nonliving))
• Psychological_feature
– (a feature of the mental life of a living organism)
• Abstraction
– (a general concept formed by extracting common
features from specific examples)
• State
– (the way something is with respect to its main
attributes; "the current state of knowledge"; "his state
of health"; "in a weak financial state")
• Event
– (something that happens at a given place and time)
IS 202 – FALL 2003
2003.09.09 - SLIDE 46
Unique Beginners
• Act, human_action, human_activity
– (something that people do or cause to happen)
• Group, grouping
– (any number of entities (members) considered as a
unit)
• Possession
– (anything owned or possessed)
• Phenomenon
– (any state or process known through the senses
rather than by intuition or reasoning)
IS 202 – FALL 2003
2003.09.09 - SLIDE 47
Lecture Overview
•
•
•
•
•
•
Review
Lexical Relations
WordNet
Demo
Discussion Questions
Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
IS 202 – FALL 2003
2003.09.09 - SLIDE 48
WordNet Demo
• Available online (from Unix) if you wish to
try it…
– Login to irony and type “wn word” for any
word you are interested in
– Demo…
IS 202 – FALL 2003
2003.09.09 - SLIDE 49
Lecture Overview
•
•
•
•
•
•
Review
Lexical Relations
WordNet
Demo
Discussion Questions
Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
IS 202 – FALL 2003
2003.09.09 - SLIDE 50
Discussion Questions
• Joe Hall on Lexical Relations and
WordNet
– Which method of linguistic analysis do you
think will be more fruitful... the painstaking
process involved with building WordNet or the
relatively easy output afforded by Church et
al.'s computational method that, however,
requires much work to decipher the results?
IS 202 – FALL 2003
2003.09.09 - SLIDE 51
Discussion Questions
• Joe Hall on Lexical Relations and
WordNet
– What are the problems/advantages of using
the World Wide Web itself as a "corpus"? (If
you were to incorporate the current digital
copies of all newspapers, journals, etc.
wouldn't you very quickly exceed the 15
Million words of the largest corpus in the
Church article?)
IS 202 – FALL 2003
2003.09.09 - SLIDE 52
Discussion Questions
• Joe Hall on Lexical Relations and
WordNet
– With the diversity of dialects of the English
language, how much does this type of
computational analysis get confused by
phrases such as "What up?" (i.e., slang)?
Aren't these some of the more interesting
parts of language (i.e., how language
evolves)?
IS 202 – FALL 2003
2003.09.09 - SLIDE 53
Lecture Overview
•
•
•
•
•
•
Review
Lexical Relations
WordNet
Demo
Discussion Questions
Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
IS 202 – FALL 2003
2003.09.09 - SLIDE 54
Homework
• Read Chapters 3 and 5 of The
Organization of Information (Textbook)
• Discussion Question volunteers?
– Tu Tran
– Hong Qu
IS 202 – FALL 2003
2003.09.09 - SLIDE 55
Next Time
• Introduction to Metadata
IS 202 – FALL 2003
2003.09.09 - SLIDE 56