Download Introduction – First-Order Probabilistic Models

IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Bhaskara Marthi, Brian Milch, Stuart Russell Computer Science Div. University of California NIPS 15th, 2003 Identity Uncertainty and Citation Matching Hanna Pasula, Bhaskara Marthi, Brian Milch, Stuart Russell, Ilya Shpitser Computer Science Div. University of California Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.06.21 Outlines       Introduction Related works Models for the bibliography domain Experiment on model A Desiderata for a FOPL Conclusions 2/18 Introduction – Citation Matching Problem  Citation matching:   the problem of deciding which citations correspond to the same publication Difficulties     Different citation styles An imperfect copy of the book’s title Different ways to refer an object (identity) Ambiguity  “Wauchope, K. Eucalyptus: Integrating Natural language Input with a Graphical User Interface”   Tasks    Author: “Wauchope, K. Eucalyptus” or “Wauchope, K.” ? Parsing Disambiguation Matching 3/18 Introduction – Citation Matching Problem: Examples Journal of Artificial Intelligence Research, or Artificial Intelligence Journal ?? 4/18 Introduction – First-Order Probabilistic Models Logic Propositio nal First-order Probabilistic Model Formula A  B P(j  m  a  b  e), Bayesian Network Inference/ Algorithms Resolution, Model Checking, Forward chaining, DPLL, WalkSAT … Bayes’ rule, Summing-out, smoothing, prediction, approximation (likely-hood, MCMC …), … Formula x King(x)  Greedy(x)  Evil(x) x King(x)  Greedy(x)  Evil(x) = 0.8766 Inference/ Algorithms Unification, Resolution, … Learning, Approximation, … System/ Languages Prolog, Rule Engine (JBoss), … FOPL, RPM 5/18 Introduction – Result of Model B 6/18 Related Works  IE   Bayesian modeling    the Message Understanding Conferences [DARPA,1998] finding stochastically repeated patterns (motifs) in DNA sequences [Xing et al., 2003] Robot localization [Anguelov et al., 2002] FOPL/RPM (Relational Prob. Model)  A. Pfeffer. Probabilistic Reasoning for Complex Systems. PhD thesis, Stanford, 2000. 7/18 Models for the Bibliography Domain – Model A  [Pasula et al. 2003] 8/18 Models for the Bibliography Domain – Model A (Cont.)   Suggest a declarative approach to identity uncertainty using a formal language Algorithm  Steps Generate objects/instances  Parse and fill attributes  Inference (Approximation, MCMC)   Cluster the identity (publication) 9/18 Models for the Bibliography Domain – Model A (Cont.)  Attributes using unconditional probability  learn several bigram models   using the following resources     letter-based models of first names, surnames, and title words the 2000 Census data on US names a large A.I. BibTeX bibliography a hand-parsed collection of 500 citations Attributes using conditional probability  Using noise channels for some attributes    Citation.parse    It keeps track of the segmentation of Citation.text An author segment, a title segment, and three filler segments (one before, one after, and one in between) Citation.text   the corruption models of Citation.obsTitle, AuthorAsCited.surname, and AuthorAsCited.fnames The parameters of the corruption models are learnt online, using stochastic EM Be constrained by Citation.parse, Paper.pubType, … These models were learned using our pre-segmented file. 10/18 Models for the Bibliography Domain – Model B Publisher Name, City Collection Authors Name, Type, Date, Publisher Name Area+ (Fields) Publication Citation Groups Title Area Type (Book/conf. …) AuthorList Collection Type (Area, Author) Style PublicationList CitationList Citation Publication TitleAsCited AuthorsAsCited Text Parse 11/18 Models for the Bibliography Domain – Model B (Cont.)  Generating objects     The set of Author objects, and the set of Collection objects are generated independently. the set of Publication objects is generated conditional on the Authors and Collections. CitationGroup objects are generated conditional on the Authors and Collections. Citation objects are generated from the CitationGroups. 12/18 Models for the Bibliography Domain – Model B (Cont.)  Fill attributes  Author.Name   Publications.Title   is chosen from a mixture of a letter bigram distribution with a distribution that chooses from a set of commonly occurring names is generated from an n-gram model, conditioned on Publications.area More specific relations and conditions between attributes 13/18 Experiment on model A – Experiment Setting  Dataset    Citeseer’s hand-matched datasets Each of these datasets contains several hundred citations of machine learning papers Citeseer’s phrase matching algorithm   a greedy agglomerative clustering method  based on a metric that measures the degrees to which the words and phrases of any two citations overlap half of them in clusters ranging in size from two to twenty-one citations 14/18 Experiment on model A – Experiment Result 15/18 Desiderata for a FOPL  Contains      A probability distribution over possible worlds The expression power to model the relational structure of the world An efficient inference algorithm A learning procedure which allows priors over the parameters Has the ability      to answer queries to make inferences about the existence or nonexistence of objects having particular properties to represent common types of compound objects to represent probabilistic dependencies to incorporate domain knowledge into the inference algorithms 16/18 Conclusions  First-order probabilistic models   a useful, probably necessary, component of any system that extracts complex relational information from unstructured text data Some of the directions we plan to pursue in the future   defining a representation language that allows such models to be specified declaratively, scaling up the inference procedure to handle large knowledge bases 17/18 Thanks!!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Introduction – First-Order Probabilistic Models