Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Presented by Rani Qumsiyeh & Andrew Zitzelberger Common approaches Collocation analysis: Producing anonymous relations without a label. Syntactic Dependencies: The dependencies between verbs and arguments. Hearst’s approach: Matching lexico-syntactic patterns. Definition: A pair of words which occur together more often than expected by chance within a certain boundary. Can be detected by Student’s t-test or X^2 test. Examples of such techniques are presented in the related work section. “A person works for some employer” Relation: work-for Concepts: person, employer The acquisition of selectional restrictions Detecting verbs denoting the same ontological relation. Hierarchical ordering of relations. Discussed later in detail. Used to discover very specific relations such as part-of, cause, purpose. Charniak employed part-of-speech tagging to detect such patterns. Other approaches to detect causation and purpose relations are discussed later. Learning Attributes relying on the syntactic relation between a noun and its modifying adjectives. Learning Relations on the basis of verbs and their arguments. Matching lexico-syntactic patterns and aims at learning qualia structures for nouns. Attributes are defined as relations with a datatype as range. Attributes are typically expressed in texts using the preposition of, the verb have or genitive constructs: the color of the car every car has a color the car's color Peter bought a new car. Its color [...] attitude adjectives, expressing the opinion of the speaker such as in 'good house' temporal adjectives, such as the 'former president' or the 'occasional visitor‘ membership adjectives, such as the 'alleged criminal', a 'fake cowboy‘ event-related adjectives, such as 'abusive speech', in which either the agent of the speech is abusive or the event itself Find the corresponding description for the adjective by looking up its corresponding attribute in WordNet. Consider only those adjectives which do have such an attribute relation. This increases the probability that the adjective being considered denotes the value of some attribute, quality or property. Tokenize and part-of-speech tag the corpus using TreeTagger. Match to the following two expressions and extract adjective/noun pairs: (\w+{DET})? (\w+{NN})+ is{VBZ} \w+{JJ} (\w+{DET})? \w+{JJ} (\w+{NN})+ Cond (n, a) := f(n, a)/f(n) Tourism Corpus Threshold = 0.01 Car For each of the adjectives we look up the corresponding attribute in WordNet age is one of {new, old} value is one of {black} numerousness/numerosity/multiplicity is one of {many} otherness/distinctness/separateness is one of {other} speed/swiftness/fastness is one of {fast} size is one of {small, little, big} Evaluate for every domain concept according to (i) its attributes and their (ii) corresponding ranges by assigning them a rate from '0' to '3‘ ▪ '3' means that the attribute or its range is totally reasonable and correct. ▪ '0' means that the attribute or the range does not make any sense. A new approach that not only lists relations but finds the general relation. work-for (man, department), work.for (employee, institute), work.for (woman, store) work-for (person,organization) Conditional probability. Pointwise mutual information (PMI). A measure based on the x^-test. Evaluate by applying their approach to the Genia corpus using the Genia ontology Extract verb frames using Steven Abney's chunker. Extract tuples NP-V-NP and NP-V-P-NP. Construct binary relations from tuples. Use the lemmatized verb V as corresponding relation label Use the head of the NP phrases as concepts. protein_molecule: 5 Protein_family_or_group: 10 amino-acid: 10 Take into account the frequency of occurrence. Chose the highest one Penalize concepts c which occur too frequently. P{amino-acid) = 0.27, P(protein) = 0.14 Compares contingencies between two variables (the two variables are statistically independent or not) we can generalize c to ci if the X^2-test reveals the verb v and c to be statistically dependent Level of significance = 0.05 the Genia corpus contains 18.546 sentences with 509.487 words and 51.170 verbs. Extracted 100 relations, 15 were regarded as inappropriate by a biologist evaluator. The 85 remaining was evaluated Direct matches for domain and range (DM), Average distance in terms of number of edges between correct and predicted concept (AD) A symmetric variant of the Learning Accuracy (LA) Nature of Objects Aristotle Material cause (made of) Agentive cause (movement, creation, change) Formal cause (form, type) Final cause (purpose, intention, aim) Generative Lexicon framework [Pustejovsky, 1991] Qualia Structures Constitutive (components) Agentive (created) Formal (hypernym) Telic (function) Knife Human Subjective decisions Web Linguistic errors Ranking errors Commercial Bias Erroneous information Lexical Ambiguity Pattern library tuples (p, c) p is pattern c is clue (c:string -> string) Given a term t and a clue c c(t) is sent to the search engine π(x) refers to plural forms of x Amount words: variety, bundle, majority, thousands, millions, hundreds, number, numbers, set, sets, series, range Example: “A conversation is made up of a series of observable interpersonal exchanges.” ▪ Constitutive role = exchange PURP:=\w+{VB} NP I NP I be{VB} \w+{VBD}). No good patterns X is made by Y X is produced by Y Instead: Agentive_verbs = {build, produce, make, write, plant, elect, create, cook, construct, design} e = element t = term Lexical elements: knife, beer, book, computer Abstract Noun: conversation Specific multi-term words: Natural language processing Data mining Students score 0 = incorrect 1 = not totally wrong 2 = still acceptable 3 = totally correct Reasoning: Formal and constitutive patterns are more ambiguous. Madche and Stabb, 2000 Find relations using association rules Transaction is defined as words occurring together in syntactic dependency Calculate support and confidence Precision = 11%, Recall = 13% Kavalec and Svatek, 2005 Added ‘above expectation’ heuristic ▪ Measure association between verb and pair of concepts Gamallo et al., 2002 Map syntactic dependencies to semantic relations 1) shallow parser + heuristics to derive syntactic dependencies 2) cluster based on syntactic positions Problems ▪ Mapping is under specified ▪ Largely domain dependent Ciaramita et al., 2005 Statistical dependency parser to extract: ▪ SUBJECT-VERB-DIRECT_OBJECT ▪ SUBJECT-VERB-INDIRECT_OBJECT χ2 test – keep those occurring significantly more often than by chance 83% of learned relations are correct 53.1% of generalized relations are correct Heyer et al., 2001 Calculate 2nd order collocations Use set of defined rules to reason Ogata and Collier, 2004 HEARST patterns for extraction Use heuristic reasoning rules Yamaguchi, 2001 Word space algorithm using 4 word window Cos(angle) measure for similarity ▪ If similarity > threshold relationship Precision = 59.89% for legal corpus Poesio and Almuhareb, 2005 Classify attributes into one of six categories: ▪ Quality, part, related-object, activity, related-agent, nonattribute Classifier was trained using: ▪ Morphological information, clustering results, search engine results, and heuristics Better results from combining related-object and part F-measure = 53.8% for non-attribute class, and between 81-95% for other classes Claveau et al., 2003 Inductive Logic Programming Approach Doesn’t distinguish between different qualia roles Learning relations from non-verbal structures Gold standard of qualia structures Deriving a reasoning calculus Strengths Explained (their) methods in detail Weaknesses Required a lot of NLP background knowledge Short summaries of other’s work