Download Learning Attributes and Relations

Presented by Rani Qumsiyeh & Andrew Zitzelberger  Common approaches  Collocation analysis: Producing anonymous relations without a label.  Syntactic Dependencies: The dependencies between verbs and arguments.  Hearst’s approach: Matching lexico-syntactic patterns.  Definition: A pair of words which occur together more often than expected by chance within a certain boundary.  Can be detected by Student’s t-test or X^2 test.  Examples of such techniques are presented in the related work section.  “A person works for some employer”  Relation: work-for  Concepts: person, employer  The acquisition of selectional restrictions Detecting verbs denoting the same ontological relation. Hierarchical ordering of relations.  Discussed later in detail.    Used to discover very specific relations such as part-of, cause, purpose.  Charniak employed part-of-speech tagging to detect such patterns.  Other approaches to detect causation and purpose relations are discussed later.  Learning Attributes relying on the syntactic relation between a noun and its modifying adjectives.  Learning Relations on the basis of verbs and their arguments.  Matching lexico-syntactic patterns and aims at learning qualia structures for nouns.  Attributes are defined as relations with a datatype as range.  Attributes are typically expressed in texts using the preposition of, the verb have or genitive constructs:     the color of the car every car has a color the car's color Peter bought a new car. Its color [...]  attitude adjectives, expressing the opinion of the speaker such as in 'good house'  temporal adjectives, such as the 'former president' or the 'occasional visitor‘  membership adjectives, such as the 'alleged criminal', a 'fake cowboy‘  event-related adjectives, such as 'abusive speech', in which either the agent of the speech is abusive or the event itself  Find the corresponding description for the adjective by looking up its corresponding attribute in WordNet.  Consider only those adjectives which do have such an attribute relation.  This increases the probability that the adjective being considered denotes the value of some attribute, quality or property.  Tokenize and part-of-speech tag the corpus using TreeTagger.  Match to the following two expressions and extract adjective/noun pairs:  (\w+{DET})? (\w+{NN})+ is{VBZ} \w+{JJ}  (\w+{DET})? \w+{JJ} (\w+{NN})+  Cond (n, a) := f(n, a)/f(n)    Tourism Corpus Threshold = 0.01 Car  For each of the adjectives we look up the corresponding attribute in WordNet  age is one of {new, old}  value is one of {black}  numerousness/numerosity/multiplicity is one of {many}  otherness/distinctness/separateness is one of {other}  speed/swiftness/fastness is one of {fast}  size is one of {small, little, big}  Evaluate for every domain concept according to (i) its attributes and their (ii) corresponding ranges by assigning them a rate from '0' to '3‘ ▪ '3' means that the attribute or its range is totally reasonable and correct. ▪ '0' means that the attribute or the range does not make any sense. A new approach that not only lists relations but finds the general relation.  work-for (man, department), work.for (employee, institute), work.for (woman, store)  work-for (person,organization)      Conditional probability. Pointwise mutual information (PMI). A measure based on the x^-test. Evaluate by applying their approach to the Genia corpus using the Genia ontology    Extract verb frames using Steven Abney's chunker. Extract tuples NP-V-NP and NP-V-P-NP. Construct binary relations from tuples.  Use the lemmatized verb V as corresponding relation label  Use the head of the NP phrases as concepts.    protein_molecule: 5 Protein_family_or_group: 10 amino-acid: 10   Take into account the frequency of occurrence. Chose the highest one  Penalize concepts c which occur too frequently.  P{amino-acid) = 0.27, P(protein) = 0.14  Compares contingencies between two variables (the two variables are statistically independent or not)  we can generalize c to ci if the X^2-test reveals the verb v and c to be statistically dependent  Level of significance = 0.05 the Genia corpus contains 18.546 sentences with 509.487 words and 51.170 verbs.  Extracted 100 relations, 15 were regarded as inappropriate by a biologist evaluator.  The 85 remaining was evaluated   Direct matches for domain and range (DM),  Average distance in terms of number of edges between correct and predicted concept (AD)  A symmetric variant of the Learning Accuracy (LA)   Nature of Objects Aristotle  Material cause (made of)  Agentive cause (movement, creation, change)  Formal cause (form, type)  Final cause (purpose, intention, aim)  Generative Lexicon framework [Pustejovsky, 1991]  Qualia Structures  Constitutive (components)  Agentive (created)  Formal (hypernym)  Telic (function) Knife  Human  Subjective decisions  Web  Linguistic errors  Ranking errors  Commercial Bias  Erroneous information  Lexical Ambiguity  Pattern library tuples (p, c)  p is pattern  c is clue (c:string -> string)  Given a term t and a clue c  c(t) is sent to the search engine  π(x) refers to plural forms of x  Amount words:  variety, bundle, majority, thousands, millions, hundreds, number, numbers, set, sets, series, range  Example:  “A conversation is made up of a series of observable interpersonal exchanges.” ▪ Constitutive role = exchange PURP:=\w+{VB} NP I NP I be{VB} \w+{VBD}).  No good patterns  X is made by Y  X is produced by Y  Instead:  Agentive_verbs = {build, produce, make, write, plant, elect, create, cook, construct, design}   e = element t = term  Lexical elements: knife, beer, book, computer  Abstract Noun: conversation  Specific multi-term words:  Natural language processing  Data mining  Students score  0 = incorrect  1 = not totally wrong  2 = still acceptable  3 = totally correct Reasoning: Formal and constitutive patterns are more ambiguous.  Madche and Stabb, 2000  Find relations using association rules  Transaction is defined as words occurring together in syntactic dependency  Calculate support and confidence  Precision = 11%, Recall = 13%  Kavalec and Svatek, 2005  Added ‘above expectation’ heuristic ▪ Measure association between verb and pair of concepts  Gamallo et al., 2002  Map syntactic dependencies to semantic relations  1) shallow parser + heuristics to derive syntactic dependencies  2) cluster based on syntactic positions  Problems ▪ Mapping is under specified ▪ Largely domain dependent  Ciaramita et al., 2005  Statistical dependency parser to extract: ▪ SUBJECT-VERB-DIRECT_OBJECT ▪ SUBJECT-VERB-INDIRECT_OBJECT  χ2 test – keep those occurring significantly more often than by chance  83% of learned relations are correct  53.1% of generalized relations are correct  Heyer et al., 2001  Calculate 2nd order collocations  Use set of defined rules to reason  Ogata and Collier, 2004  HEARST patterns for extraction  Use heuristic reasoning rules  Yamaguchi, 2001  Word space algorithm using 4 word window  Cos(angle) measure for similarity ▪ If similarity > threshold relationship  Precision = 59.89% for legal corpus  Poesio and Almuhareb, 2005  Classify attributes into one of six categories: ▪ Quality, part, related-object, activity, related-agent, nonattribute  Classifier was trained using: ▪ Morphological information, clustering results, search engine results, and heuristics  Better results from combining related-object and part  F-measure = 53.8% for non-attribute class, and between 81-95% for other classes  Claveau et al., 2003  Inductive Logic Programming Approach  Doesn’t distinguish between different qualia roles    Learning relations from non-verbal structures Gold standard of qualia structures Deriving a reasoning calculus  Strengths  Explained (their) methods in detail  Weaknesses  Required a lot of NLP background knowledge  Short summaries of other’s work

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Learning Attributes and Relations