Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Presented by
Rani Qumsiyeh & Andrew Zitzelberger
Common approaches
 Collocation analysis: Producing anonymous
relations without a label.
 Syntactic Dependencies: The dependencies
between verbs and arguments.
 Hearst’s approach: Matching lexico-syntactic
patterns.
Definition: A pair of words which occur
together more often than expected by
chance within a certain boundary.
Can be detected by Student’s t-test or X^2
test.
Examples of such techniques are presented in
the related work section.
“A person works for some employer”
 Relation: work-for
 Concepts: person, employer
The acquisition of selectional restrictions
Detecting verbs denoting the same
ontological relation.
Hierarchical ordering of relations.
Discussed later in detail.
Used to discover very specific relations such as part-of, cause,
purpose.
Charniak employed part-of-speech tagging to detect such
patterns.
Other approaches to detect causation and purpose relations
are discussed later.
Learning Attributes relying on the syntactic relation
between a noun and its modifying adjectives.
Learning Relations on the basis of verbs and their
arguments.
Matching lexico-syntactic patterns and aims at
learning qualia structures for nouns.
Attributes are defined as relations with a datatype
as range.
Attributes are typically expressed in texts using the
preposition of, the verb have or genitive constructs:
the color of the car
every car has a color
the car's color
Peter bought a new car. Its color [...]
attitude adjectives, expressing the opinion of the speaker
such as in 'good house'
temporal adjectives, such as the 'former president' or the
'occasional visitor‘
membership adjectives, such as the 'alleged criminal', a 'fake
cowboy‘
event-related adjectives, such as 'abusive speech', in which
either the agent of the speech is abusive or the event itself
Find the corresponding description for the adjective by
looking up its corresponding attribute in WordNet.
Consider only those adjectives which do have such an
attribute relation.
This increases the probability that the adjective being
considered denotes the value of some attribute, quality or
property.
Tokenize and part-of-speech tag the corpus
using TreeTagger.
Match to the following two expressions and
extract adjective/noun pairs:
 (\w+{DET})? (\w+{NN})+ is{VBZ} \w+{JJ}
 (\w+{DET})? \w+{JJ} (\w+{NN})+
Cond (n, a) := f(n, a)/f(n)
Tourism Corpus
Threshold = 0.01
Car
For each of the adjectives we look up the
corresponding attribute in WordNet
 age is one of {new, old}
 value is one of {black}
 numerousness/numerosity/multiplicity is one of {many}
 otherness/distinctness/separateness is one of {other}
 speed/swiftness/fastness is one of {fast}
 size is one of {small, little, big}
Evaluate for every domain concept according to
(i) its attributes and their
(ii) corresponding ranges by assigning them a rate
from '0' to '3‘
▪ '3' means that the attribute or its range is totally
reasonable and correct.
▪ '0' means that the attribute or the range does not make
any sense.
A new approach that not only lists relations
but finds the general relation.
 work-for (man, department),
work.for (employee, institute),
work.for (woman, store)
 work-for (person,organization)
Conditional probability.
Pointwise mutual information (PMI).
A measure based on the x^-test.
Evaluate by applying their approach to the
Genia corpus using the Genia ontology
Extract verb frames using Steven Abney's
chunker.
Extract tuples NP-V-NP and NP-V-P-NP.
Construct binary relations from tuples.
 Use the lemmatized verb V as corresponding
relation label
 Use the head of the NP phrases as concepts.
protein_molecule: 5
Protein_family_or_group: 10
amino-acid: 10
Take into account the
frequency of occurrence.
Chose the highest one
Penalize concepts c which occur too frequently.
P{amino-acid) = 0.27, P(protein) = 0.14
Compares contingencies between two variables (the two
variables are statistically independent or not)
we can generalize c to ci if the X^2-test reveals the verb v
and c to be statistically dependent
Level of significance = 0.05
the Genia corpus contains 18.546 sentences with
509.487 words and 51.170 verbs.
 Extracted 100 relations, 15 were regarded as
inappropriate by a biologist evaluator.
 The 85 remaining was evaluated
 Direct matches for domain and range (DM),
 Average distance in terms of number of edges between
correct and predicted concept (AD)
 A symmetric variant of the Learning Accuracy (LA)
Nature of Objects
Aristotle
 Material cause (made of)
 Agentive cause (movement, creation, change)
 Formal cause (form, type)
 Final cause (purpose, intention, aim)
 Generative Lexicon framework [Pustejovsky,
1991]
 Qualia Structures
 Constitutive (components)
 Agentive (created)
 Formal (hypernym)
 Telic (function)
Knife
Human
 Subjective decisions
Web
 Linguistic errors
 Ranking errors
 Commercial Bias
 Erroneous information
 Lexical Ambiguity
Pattern library tuples (p, c)
 p is pattern
 c is clue (c:string -> string)
Given a term t and a clue c
 c(t) is sent to the search engine
π(x) refers to plural forms of x
Amount words:
 variety, bundle, majority, thousands, millions,
hundreds, number, numbers, set, sets, series,
range
Example:
 “A conversation is made up of a series of
observable interpersonal exchanges.”
▪ Constitutive role = exchange
PURP:=\w+{VB} NP I NP I be{VB} \w+{VBD}).
No good patterns
 X is made by Y
 X is produced by Y
 Instead:
 Agentive_verbs = {build, produce, make, write,
plant, elect, create, cook, construct, design}
e = element
t = term
Lexical elements: knife, beer, book, computer
Abstract Noun: conversation
Specific multi-term words:
 Natural language processing
 Data mining
Students score
 0 = incorrect
 1 = not totally wrong
 2 = still acceptable
 3 = totally correct
Reasoning: Formal and constitutive patterns are more ambiguous.
Madche and Stabb, 2000
 Find relations using association rules
 Transaction is defined as words occurring
together in syntactic dependency
 Calculate support and confidence
 Precision = 11%, Recall = 13%
Kavalec and Svatek, 2005
 Added ‘above expectation’ heuristic
▪ Measure association between verb and pair of concepts
Gamallo et al., 2002
 Map syntactic dependencies to semantic relations
 1) shallow parser + heuristics to derive syntactic
dependencies
 2) cluster based on syntactic positions
 Problems
▪ Mapping is under specified
▪ Largely domain dependent
Ciaramita et al., 2005
 Statistical dependency parser to extract:
▪ SUBJECT-VERB-DIRECT_OBJECT
▪ SUBJECT-VERB-INDIRECT_OBJECT
 χ2 test – keep those occurring significantly more
often than by chance
 83% of learned relations are correct
 53.1% of generalized relations are correct
Heyer et al., 2001
 Calculate 2nd order collocations
 Use set of defined rules to reason
Ogata and Collier, 2004
 HEARST patterns for extraction
 Use heuristic reasoning rules
Yamaguchi, 2001
 Word space algorithm using 4 word window
 Cos(angle) measure for similarity
▪ If similarity > threshold relationship
 Precision = 59.89% for legal corpus
Poesio and Almuhareb, 2005
 Classify attributes into one of six categories:
▪ Quality, part, related-object, activity, related-agent, nonattribute
 Classifier was trained using:
▪ Morphological information, clustering results, search engine
results, and heuristics
 Better results from combining related-object and part
 F-measure = 53.8% for non-attribute class, and
between 81-95% for other classes
Claveau et al., 2003
 Inductive Logic Programming Approach
 Doesn’t distinguish between different qualia roles
Learning relations from non-verbal structures
Gold standard of qualia structures
Deriving a reasoning calculus
Strengths
 Explained (their) methods in detail
Weaknesses
 Required a lot of NLP background knowledge
 Short summaries of other’s work