Download Basic IR Processes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Old Irish grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Lithuanian grammar wikipedia , lookup

Ojibwe grammar wikipedia , lookup

Georgian grammar wikipedia , lookup

Inflection wikipedia , lookup

Spanish grammar wikipedia , lookup

Germanic weak verb wikipedia , lookup

Germanic strong verb wikipedia , lookup

Portuguese grammar wikipedia , lookup

Esperanto grammar wikipedia , lookup

Modern Greek grammar wikipedia , lookup

Latin syntax wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

Malay grammar wikipedia , lookup

Zulu grammar wikipedia , lookup

Japanese grammar wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Old Norse morphology wikipedia , lookup

Ukrainian grammar wikipedia , lookup

Icelandic grammar wikipedia , lookup

French grammar wikipedia , lookup

Swedish grammar wikipedia , lookup

Turkish grammar wikipedia , lookup

Sotho parts of speech wikipedia , lookup

Russian grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Polish grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

English grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transcript
CS4740 (Spring 2015!)
Intro. to Natural Language Processing
POS Tagging
Jon Park
DEPARTMENT OF COMPUTER SCIENCE
CORNELL UNIVERSITY
* Adopted/modified from Cardie(Cornell) & Lin(UMD)’s presentations.
Source: Calvin and Hobbs
Quick check on iClickerGO
A. I’m using the iClickerGO App!
B. I’m using an iClicker!
Agenda
 What are parts of speech (POS)?
 What is POS tagging?
Parts of Speech
 “Equivalence class” of linguistic entities
 “Categories” or “types” of words
 Study dates back to the ancient Greeks
 Dionysius Thrax of Alexandria (c. 100 BC)
 8 parts of speech: noun, verb, pronoun, preposition, adverb,
conjunction, participle, article
 Remarkably enduring list!
 Typically much larger nowadays:
 Penn Treebank: 45
 Brown corpus: 87
 C7 tagset: 146
5
How do we define POS?
 By meaning (can be unreliable)
 Verbs are actions
 Adjectives are properties
 Nouns are things
 By the syntactic environment
 What occurs nearby?
 What does it act as?
 By what morphological processes affect it
 What affixes does it take?
 Combination of the above
Parts of Speech
 Open class
 Impossible to completely enumerate
 New words continuously being invented, borrowed, etc.
 Closed class
 Closed, fixed membership
 Reasonably easy to enumerate
 Generally, short function words that “structure” sentences
Open Class POS
 Four major open classes in English




Nouns
Verbs
Adjectives
Adverbs
 All languages have nouns and verbs... but may
not have the other two
Open Class POS
 Nouns
 Semantics:
 Generally, words for people, places, things
 But not always (bandwidth, energy, ...)
 Syntactic environment:
 Occurring with determiners
 Pluralizable, possessivizable
 Other characteristics:
 Mass vs. count nouns
Open Class POS
 Verbs
 Semantics:
 Generally, denote actions, processes, etc.
 Syntactic environment:
 Intransitive, transitive, ditransitive
 Alternations
 Other characteristics:
 Main vs. auxiliary verbs
 Gerunds (verbs behaving like nouns)
 Participles (verbs behaving like adjectives)
Open Class POS
 Adjectives
 Generally modify nouns, e.g., tall girl
 Adverbs
 Sometimes modify verbs, e.g., sang beautifully
 Sometimes modify adjectives, e.g., extremely hot
Closed Class POS
 Prepositions
 In English, occurring before noun phrases
 Specifying some type of relation (spatial, temporal, …)
 Examples: on the shelf, before noon
 Particles
 Resembles a preposition, but combined with a verb (“phrasal
verbs”)
 Examples: find out, turn over, go on
Closed Class POS
 Determiners
 Establish reference for a noun
 Examples: a, an, the (articles), that, this, many, such, …
 Pronouns
 Refer to person or entities: he, she, it
 Possessive pronouns: his, her, its
 Wh-pronouns: what, who
Closed Class POS
 Conjunctions
 Coordinating conjunctions
 Join two elements of “equal status”
 Examples: cats and dogs, salad or soup
 Subordinating conjunctions
 Join two elements of “unequal status”
 Examples: We’ll leave after you finish eating. While I was
waiting in line, I saw my friend.
 Complementizers are a special case: I think that you should
finish your assignment
Agenda
 What are parts of speech (POS)?
 What is POS tagging?
Part of speech tagging

Assign the correct part of speech (word class) to each
word/token in a document
“The/DT planet/NN Jupiter/NNP and/CC its/PPS moons/NNS
are/VBP in/IN effect/NN a/DT mini-solar/JJ system/NN ,/, and/CC
Jupiter/NNP itself/PP is/VBZ often/RB called/VBN a/DT star/NN
that/IN never/RB caught/VBN fire/NN ./.”

Needed as an initial processing step for a number of
language technology applications
 Answer extraction in Question Answering systems
 Base step in identifying syntactic phrases for IR systems
 Critical for word-sense disambiguation
 Information extraction

…
Why is POS tagging hard?

Ambiguity





He will race/VB the car.
When will the race/NOUN end?
The boat floated/ VBD.
The boat floated/ VBD down Fall Creek.
The boat floated/VBN down Fall Creek sank.

Average of ~2 parts of speech for each word

The number of tags used by different systems
varies a lot. Some systems use < 20 tags, while
others use > 400.
POS Tagging is Hard (Even for Humans)

particle vs. preposition



past tense vs. past participle



The horse walked past the barn.
The horse walked past the barn fell.
noun vs. adjective?



He talked over the deal.
He talked over the telephone.
My favorite color is red.
The car is red.
gerund vs. present participle


Fishing can be fun.
I am fishing.
To obtain gold standards for evaluation, annotators rely on a set of
From Ralph Grishman, NYU
tagging guidelines.
Penn Treebank Tagset
POS Tagging Example
It is a nice night.
It/PP is/VBZ a/DT nice/JJ night/NN ./.
POS Tagging Example
Nobody ever takes the newspapers she sells.
Nobody/NN ever/RB takes/VBZ the/DT
newspapers/NNS she/PP sells/VBZ ./.
POS Tagging Exercise
He is a tall, skinny guy with a long, sad, mean-looking kisser, and a
mournful voice.
He/PP is/VBZ a/DT tall/JJ ,/, skinny/JJ guy/NN with/IN a/DT
long/JJ ,/, sad/JJ ,/, mean-looking/JJ kisser/NN ,/, and/CC a/DT
mournful/JJ voice/NN ./.
POS Tagging
Which of the following
has been correctly
tagged?
A. I am sitting in Mindy’s restaurant eating the gefillte fish.
B. I am sitting in Mindy’s restaurant eating the gefillte fish.
C. I am sitting in Mindy’s restaurant eating the gefillte fish.
POS Tagging Exercise
I am sitting in Mindy’s restaurant eating the gefillte fish.
I/PP am/VBP sitting/VBG in/IN Mindy/NNP ’s/POS
restaurant/NN eating/VBG the/DT gefillte/NN fish/NN ./.
a ship shipping ship shipping shipping ships
Think buffalo
buffalo buffalo buffalo buffalo buffalo buffalo buffalo
buffalo.
Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo
buffalo.
Buffalo buffalo, Buffalo buffalo buffalo, buffalo Buffalo
buffalo.
n1. the city of Buffalo, NY
n2. an animal…the American bison
v. to bully, confuse, deceive, or
intimidate