Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Information Inference Mimicking human text-based reasoning P.D. Bruza & D. Song Information Ecology Project Distributed Systems Technology Centre Penguin Books U.K Why Linus chose a penguin Surfing the Himalayas Introductory remarks Information inference is a common and real phenomenom It can be modelled by symbolic inference, but this isn’t satisfying The inferences are often latent associations triggered by seeing a word(s) in the context of other words- so inference is not deductive, but about producing appropriate implicit associations appropriate to the context We need to look at the problem from a cognitive perspective…. Since last time…. (Philosophical) positioning of the work is clearer Some encouraging experimental results using information inference to derive query models Some initial ideas about how information inference fits into an abductive logic for text-based knowledge discovery Dretske’s Information Content To a person with prior knowledge K, r being F carries the information that s is G if and only if the conditional probability of s being G given r is F is 1 (and less than one given K alone) We can say that s being G is inferred (informationally) from r is F and K T= “Why Linus chose a penguin” K {Linus Torvalds invented Linux, The Linux logo is a penguin, Linus is a cartoon character in " Peanuts" } Pr(" Linus" being " Linus Torvalds" | K) 1 Pr(" Linus" is " Linus Torvalds" | K," Linus" is with " penguin" in T) 1 So Dretske’s definition does not permit the inference “Linus” is “Linus Torvalds”, though a human being may proceed under this “hasty” judgment. Dretske’s information content “sets too high a standard” (Barwise & Seligman) Inferential information content (Barwise &Seligman) To a person with prior knowledge K, r being F carries the information that s is G, if the person could legitimately infer that s is G from r being F together with K (but could not from K alone) T= “Why Linus chose a penguin” K {Linus Torvalds invented Linux, The Linux logo is a penguin, Linus is a cartoon character in " Peanuts" } " Linus" being " Linus Torvalds" can' t be legitimate ly inferred from K alone “Linus” being with “penguin” in T, together with K, carries the information that “Linus” is “Linus Torvalds” Barwise & Seligman (con’t) “… by relativizing information flow to human inference, this definition makes room for different standards in what sorts of inferences the person is able and willing to make” Remarks: - Psychologistic stance taken - Onerous from an engineering standpoint: “different standards” implies “nonmonotonicity”. Consider, “Linux Online: Why Linus chose a penguin” (willing) v.s. “Why Linus chose a penguin” (not willing) Consequences of psychologism Representations of information need not be propositional Semantics is not a model-theoretic issue, but a cognitive one - the “meanings” stored and manipulated by the system should accord with what we have in our heads. Gärdenfors’ cognitive model symbolic conceptual associationist (sub-conceptual) Propositional representation Geometric representation Connectionist representation Conceptual spaces: the property “red” hue chromaticity red(x) brightness Properties and concepts are dimensional (geometric) objects. Dimensions may be integral - the value in a dimension(s) determines the value in another. Barwise & Seligman’s real valued state spaces red hue : 445, chrom : 0.6, brightness: 0.7 Observation function Gärdenfors’ cognitive model: how we realize it symbolic Propositional keywords representation LSA conceptual Geometric representation HAL associationist (sub-conceptual) Connectionist representation Geometric representations of words via Hyperspace Analogue to Language (HAL) reagan = < administration: 0.45, bill: 0.05, budget: 0.07, house: 0.06, president: 0.83, reagan: 0.21, trade: 0.05, veto: 0.06, … > This example demonstrates how a word is represented as a weighted vector Whose dimensions comprise other words. The weights represent the strengths of association between “reagan” and other words seen in the same context(s) How HAL vectors are constructed …….Kemp urges Reagan to oppose stock tax….. Slide a window of width n across corpus Per word: Compute weight of association with other words within window the weight is inversely proportional to distance HAL space: each word in the corpus represented by a multi-dimensional vector - a weighted sum of the contexts the word appeared in. (Burgess et al refer to it as a “high dimensional context space”, or a “high dimensional semantic space”) Remarks about HAL A HAL space is easy to construct Cognitive compatibility with human information processing – – “word representations learned by HAL account for a variety of semantic phenomena” (Burgess et al) Therefore a good candidate for represented “meanings” in accord with our psychologistic stance A HAL space is a real-valued state space, thus opening the door to driving information inference according to Barwise & Seligman’s definition – A HAL vector represents a word’s “state” in the context of the text corpus it was derived from Differences with Burgess et al. We (often) normalize the weights Pre- and post- vectors are added into a single vector HAL vectors derived from small text corpora (e.g., Reuters-21758) seem to be OK HAL vectors are “summed” representations- similar in spirit to “prototypical concepts” (which are averaged representations Reagan traces President Reagan was ignorant about much of the Iran arms scandal Reagan says U.S. to offer missile treaty REAGAN SEEKS MORE AID FOR CENTRAL AMERICA Kemp urges Reagan to oppose stock tax Prototypical concepts * * * * * * Prototypical “Reagan” = average of vectors from traces president: 3.23, administration: 1.82, trade: 0.40, budget: 0.37, veto: 0.34, bill: 0.31, congress: 0.31, tax: 0.29, : : Concept combination: “Pink Elephant” Elephant = < , , …… > Heuristic concept combination: “Star wars” Observation: “star” dominates “wars” star = <trek: 0.2, episode: 0.05, soviet: 0.3, bush: 0.4, missile: 0.25> wars = <soviet: 0.1, missile:0.2, iran: 0.33, iraq: 0.28, gulf: 0.4> starwars = < trek: 0.3, episode: 0.15, soviet: 0.6, bush: 0.53, missile: 0.65, iran: 0.2, iraq: 0.18, gulf: 0.25> How to weight dimensions appropriately according to context? Weights are affected by how one concept appears in the light of another concept: Intersecting dimensions are emphasized, weights are adjusted according to degree of dominance. (NB moving prototypical concepts in the HAL space is a cleaner way of dealing with context) Theoretical background: Information inference via HAL-based information flow computations Barwise&Seligman: state-based “information flow” ij on, live light iff s(on) s(live) s(light) HAL-based “information flow” i1, , i n j iff degree(c i c j ) symbolic conceptual reagan, iran scandal Degree of inclusion (flow) computation degree( ci c j ) w cipl pl(QP (ci )QP(c j )) w cipk p kQP (ci ) source target Consider the “quality properties” above mean weight in the source concept. (Intuition: how much of the salient aspects of the source are contained in the target) Compute the ratio of intersecting dimensions between source and target concept to the dimensions in the source concept Visualizing degree of inclusion between HAL vectors A . F . K . . Q source A B C D F G K L M target Many of the above avg. “quality properties” of the source concept are present in the target, so the degree of inclusion will be high Information Inference in practice: deriving query models Construct HAL vectors for all vocabulary terms from the document collection Given a query such as “space program”, compute the information flows from it and use these to expand the query, e.g. space program - nasa Query expansion term derived via information flow computation (We used the top 80 information flows for expansion without feedback, 65 with feedback) The experiments Associated Press 88/89 collections TREC topics 1 – 50, 100-150, 151-200 (titles only). Models for comparison: Baseline, Composition, Relevance Model, Markov chain model Baseline Model BM-25 term weighting (terms were stemmed) Replication of Lafferty & Zhai’s baseline (SIGIR 2001) Dot product matching function Composition model Combine the HAL vectors of individual query terms by recursively applying the concept combination heuristic; query terms ranked according to idf (dominance ranking) starwars = < trek: 0.3, episode: 0.15, soviet: 0.6, bush: 0.53, missile: 0.65, iran: 0.2, iraq: 0.18, gulf: 0.25> Results Baseline Model Composition Model Info flow Model AvgPr 0.182 0.197 (+8%) 0.247 (+35%) InitPr 0.476 0.520 (+10%) 0.544 (+14%) Recall 1667/3301 1996/3301 (+15%) 2269/3301 (+35%) The effect of information inference 26% of the 35% improvement in precision of the HAL-based information flow model is due to information inference For example, the query “space program”. The information flow model infers query expansion terms such as “Reagan”, “satellites”,”scientists”, “pentagon”, “mars”, “moon”. These are real inferences with respect “space program”, as these terms do not appear as dimensions in HAL vectors of the concept combination: spaceprogram Comparison with probabilistic query language models MC: Markov chain model (Lafferty & Zhai, SIGIR 2001) 1-50 AP89 MC IM MCwP IMwP 0.201 0.247 0.232 0.258 Scores are average precision Comparison with probabilistic query language models (con’t) RM: Relevance model (Lavrenko & Croft, SIGIR 2001) IM IMwP RM 101-150 AP 0.265 0.301 0.261 151-200 AP 0.298 0.344 0.319 Scores are average precision Text-based scientific discovery B1 Blood viscosity Fish Oil A C Raynaud B2 Platelet Aggregation B3 Vascular Reactivity “.., he made the connection between these literatures and formulated the hypothesis that fish oil may be used for treating Raynaud’s disease..” Weeber et al “Using Concepts in Literature-Based Discovery JASIST 52(7):548-557 Logic of Abduction (Gabbay & Woods) Abductive logic Logic of discovery HAL-based info flow Logic of justification ? Hypothesis testing ? Raw material for abduction? Information flows from “Raynaud” Raynaud: 1.0 myocardial: 0.56 coronary: 0.54 renal: 0.52 ventricular: 0.52 . . . oil: 0.23 . fish: 0.20 . . . . Raynaud Some promise, but lack of representation of integral dimensions a problem Index expressions “Beneficial effects of fish oil on blood viscosity” beneficial effects of on fish blood oil viscosity Power index expressions for representing integral dimensions eff of fish oil fish oil eff on blood viscosity effects blood viscosity Information flows are single terms, power index expressions determine how they may be combined into higher order syntactic structures Initial results from using information flow computations as a logic of discovery 27 27 27 27 27 27 27 27 27 26 25 23 23 4 ventricular (0.52) infarction (0.46) thromboplastin (0.17) pulmonary (0.51) arteries (0.25) placental (0.19) protein (0.42) monoamine (0.17) oxidase (0.18) lupus (0.37) nephritis (0.17) instruments (0.17) coagulant (0.21) blood (0.63) coagulation (0.29) umbilical (0.24) vein (0.32) fish (0.20) viscosity (0.21) cigarette (0.26) smokers (0.22) fish (0.20) oil (0.23) Summary (Barwise & Seligman) and Gärdenfors have very stance wrt “human stance” (Gabbay and Woods also)… psychologism is alive…. An integration of a primitive approximation of a conceptual space with an information inference mechanism driven by information flow computations An initial attempt towards realizing Gärdenfors’ conceptual spaces – – A HAL space is only a primitive approximation We are looking at Voronoi tessellations A tiny contribution to Barwise & Seligman’s call for a “distinctively different model of human reasoning” (We are looking beyond IR)