Download IliaBeinenson

Entity Resolution with Markov Logic Parag Singla Pedro Domingos Department of Computer Science and Engineering University of Washington Seattle, WA 98195-2350, U.S.A. Overview  Introduction Technical Tools Markov network  First Order Logic – Knowledge base  Markov Logic  Problem Formulation  Experiment Introduction • Assume that we have a student that studies few courses in different institutions. • His grades have to be in a unit major system. • His name is Ellen Musk Main Database Ellen musk Technion Real student name: Ellen Musk Musk Ellen Haifa University Ellen.M Ben-Gurion University Musk E Tel-Aviv University Mask Ellen Bar-Ilan University What we would have in the Main database Name institute Grades Ellen musk Technion }100{ Musk Ellen Haifa University }100{ Ellen.M Ben-Gurion University }100{ Musk E Tel-Aviv University }74,100{ Mask Ellen Bar-Ilan University }54,61,77,100{ We have a lot of records for one student • In big database, it can cost a lot. • In order to know student`s weak points we need to know at least his grades. Ellen musk Musk Ellen Ellen.M Musk E Mask Ellen Solution - Entity Resolution • Entity Resolution = Record linkage = determining which records in a database refer to the same entities. Ellen musk Musk Ellen Ellen.M Musk E Mask Ellen Conclusion • Data preparation is needed in the data mining process. • Data from relevant sources must be collected, integrated, scrubbed and pre-processed. • Correctly merging these records and the information they represent is an essential step in producing data of sufficient quality for mining. Technical Tools Markov Networks - MRF • Model for joint distribution of random variables X = (𝑋1 , 𝑋2 , 𝑋3 , … , 𝑋𝑛 ) • Defined by: Undirected Graph: G , Node for each variable, Edge represent dependency. • 𝑆𝑚𝑜𝑘𝑖𝑛𝑔 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 Smoking Cancer 𝑑𝑒𝑝𝑒𝑛𝑑 𝑜𝑛 𝑎𝑠𝑡ℎ𝑚𝑎. Asthma • 𝑃 𝑆𝑚𝑜𝑘𝑖𝑛𝑔 𝐶𝑎𝑛𝑐𝑒𝑟 = 𝑃 𝑆𝑚𝑜𝑘𝑖𝑛𝑔|𝐶𝑎𝑛𝑐𝑒𝑟, 𝐴𝑠𝑡ℎ𝑚 Cough Markov Networks • Undirected graphical models Smoking Cancer Asthma  Cough Potential functions defined over cliques 1 P ( x)    c ( xc ) Z c Z    c ( xc ) x c Smoking Cancer Ф(S,C) False False 4.5 False True 4.5 True False 2.7 True True 4.5 First Order Logic – Knowledge base • first-order knowledge base or KB is set of formulas in first order logic. First Order Logic – Formulas • Formula constants variables Objects People: Anna,Bob,etc functions relations Range X,Y predicates Mapping the domain: MotherOf(Bob) Friends(Bob, Anna) attributes Smokes(Bob) First Order Logic – Term and Atom Term- represent an object • 𝑇𝑒𝑟𝑚 = 𝑐𝑜𝑛𝑠𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑡𝑒𝑟𝑚1, 𝑡𝑒𝑟𝑚2, … , 𝑡𝑒𝑟𝑚𝑁 • 𝐴𝑡𝑜𝑚 = 𝑃𝑟𝑒𝑑𝑖𝑐𝑎𝑡𝑒 𝑡𝑒𝑟𝑚1, 𝑡𝑒𝑟𝑚2, … , 𝑡𝑒𝑟𝑚𝑁 Friends • 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑒𝑟𝑚 = 𝑡𝑒𝑟𝑚 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑎𝑛𝑦 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 x Atom MotherOf Anna First Order Logic – Term and Atom • 𝑇𝑒𝑟𝑚 = 𝑐𝑜𝑛𝑠𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑡𝑒𝑟𝑚1, 𝑡𝑒𝑟𝑚2, … , 𝑡𝑒𝑟𝑚𝑁 • 𝐴𝑡𝑜𝑚 = 𝑃𝑟𝑒𝑑𝑖𝑐𝑎𝑡𝑒 𝑡𝑒𝑟𝑚1, 𝑡𝑒𝑟𝑚2, … , 𝑡𝑒𝑟𝑚𝑁 • 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑒𝑟𝑚 = 𝑡𝑒𝑟𝑚 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑎𝑛𝑦 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝐺𝑟𝑜𝑢𝑛𝑑 𝑝𝑟𝑒𝑑𝑖𝑐𝑎𝑡𝑒/𝑎𝑡𝑜𝑚 = 𝑓𝑜𝑟𝑚𝑢𝑙𝑎 (𝑡𝑒𝑟𝑚1𝑔𝑟𝑜𝑢𝑛𝑑 , 𝑡𝑒𝑟𝑚2𝑔𝑟𝑜𝑢𝑛𝑑 ,…) • 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑙𝑖𝑡𝑒𝑟𝑎𝑙 = 𝑎 𝑛𝑒𝑔𝑎𝑡𝑒𝑑 𝑎𝑡𝑜𝑚𝑖𝑐 𝑓𝑜𝑟𝑚𝑢𝑙𝑎 • 𝐹𝑜𝑟𝑚𝑢𝑙𝑎𝑠 𝑟𝑒𝑐𝑢𝑟𝑠𝑖𝑣𝑒𝑙𝑦 𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑒𝑑 𝑓𝑟𝑜𝑚 𝑎𝑡𝑜𝑚𝑖𝑐 𝑓𝑜𝑟𝑚𝑢𝑙𝑎𝑠 𝑢𝑠𝑖𝑛𝑔 𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑣𝑒𝑠 𝑎𝑛𝑑 𝑞𝑢𝑎𝑛𝑡𝑖fi𝑒𝑟𝑠. ¬,∧,∨, ⇒, ⇔, ∀, ∃ first-order KB • We would like to find out determining if a KB is satisfied – if there is an assignment of truth values to ground atoms. • A possible world or assigns a truth value to each possible ground atom. • We may determine if KB is satisfied by using solvers. Motivation for Markov Logic Ideally we want a framework that can incorporate the advantages of both Markov Logic – Introduction • A first-order KB can be seen as a set of hard constraints on the set of possible worlds. • if a world violates even one formula, it has zero probability. • The basic idea in Markov logic is to soften these constraints: when a world violates one formula in the KB it is less probable, but not impossible. • The fewer formulas a world violates, the more probable it is. • Give each formula a weight (Higher weight  Stronger constraint) Markov Logic Networks – Example Markov Networks Friends(A,B) Friends(A,A) Plays(A) Plays(B) Friends(B,B) Friends(B,A) Friends ( x, y )  Plays ( x)  Plays ( y ), Friends ( x, y )  Plays ( x)  Plays ( y ) Constants: Alice (A) and Bob (B) Markov Logic Networks 23 Markov Logic Network Friends(x,y) Plays(x) Plays(y) ω True True True 3 True False True False True Plays(x) Fired(x) ω True True 2 0 True False 0 True 3 False True 0 False False Friends(A,A) True 3 Plays(B)False False 2 True True False 0 True False False 3 False True False 3 False Fired(A)False False Friends(A,B) Plays(A) Friends(B,A) 3 Friends(B,B) Fired(B) Friends ( x, y )  Plays ( x)  Plays ( y ), Plays ( x) ( xFired x)Plays ( x)  Plays ( y ) Friends , y ) ( Markov Logic Networks 24 Markov Logic - Definition • 𝑀𝑎𝑟𝑘𝑜𝑣 𝑙𝑜𝑔𝑖𝑐 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑀𝐿𝑁 − 𝐿 𝑖𝑠 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑎𝑖𝑟𝑠 𝐹𝑖, 𝑤𝑖 , 𝐹𝑖 𝑖𝑠 𝑎 𝑓𝑜𝑟𝑚𝑢𝑙𝑎 𝑖𝑛 𝑓𝑖𝑟𝑠𝑡 𝑜𝑟𝑑𝑒𝑟 𝑙𝑜𝑔𝑖𝑐, 𝑤𝑖 𝑖𝑠 𝑟𝑒𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 , 𝑠𝑒𝑡 𝑜𝑓 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝐶 = 𝑐1 , 𝑐2 , 𝑐3 , … , 𝑐 𝐶 . 1. 𝑀𝐿,𝐶 =One feature for each grounding of each formula 𝐹𝑖 in the MLN, with the corresponding weight 𝑤𝑖 . 2. 𝑀𝐿,𝐶 =One node for each grounding of each predicate in the MLN Markov Logic - Definition An MLN can be viewed as a extension for Markov networks. MLN without variables = Markov network. MLN with infinite weights is purely logical KB. Markov Logic - Definition •𝑃 𝑋=𝑥 = 1 𝐹 exp( 𝑖=1 𝑤𝑖 𝑛𝑖 (𝑥)) 𝑍 • 𝐹𝑖 is formal in the MLN and 𝑛𝑖 𝑥 is the number of not negated terms in the formula. • 𝑤𝑖 is the weight of formula 𝑖. Relations and weights in Markov Logic Network When added to an MLN with a finite weight, they capture an important statistical regularity: if two objects are in the same relation this is evidence that we would use later. • Some relations provide stronger evidence than others, and this is captured by assigning different weights to the formula for different R. Example Article 1 ? RSameTitle>RSameAuthor>RSameVenue>RNoThing Article 2 Goal • Goal of entity resolution : for each pair of objects of the same type (𝑥1, 𝑥2) , to determine whether they represent the same entity: 𝑥1 = 𝑥2 Equality in First-Order Logic • Most systems for inference in first-order logic make the unique names assumption: different constants refer to different objects in the domain. Equality • Formally: 𝐸𝑞𝑢𝑎𝑙𝑠 𝑥, 𝑦 ⇔ 𝑥 = 𝑦 . Reflexivity: ∀𝑥 𝑥 = 𝑥. Symmetry: ∀𝑥, 𝑦 𝑥 = 𝑦 ⇒ 𝑦 = 𝑥. Transitivity: ∀𝑥, 𝑦, 𝑧 𝑥 = 𝑦 ∧ 𝑦 = 𝑧 ⇒ 𝑥 = 𝑧. Real World word problem • Words have grammar Inflection. • Words may be misspelled. Comparison • We define the predicate 𝐻𝑎𝑠𝑊𝑜𝑟𝑑(𝑓𝑖𝑒𝑙𝑑, 𝑤𝑜𝑟𝑑) which is true iff field contains word. • Now, we can deal with Inflection. • Example: • 𝑓𝑖𝑒𝑙𝑑𝑠 = 𝑀𝑎𝑡ℎ, 𝑃ℎ𝑦𝑠𝑖𝑐𝑠, 𝐵𝑖𝑜𝑙𝑜𝑔𝑦, 𝐶ℎ𝑒𝑚𝑖𝑠𝑡𝑟𝑦 • 𝐻𝑎𝑠𝑊𝑜𝑟𝑑 𝑀𝑎𝑡ℎ, 𝑀𝑜𝑙 = 𝑓𝑎𝑙𝑠𝑒 𝐻𝑎𝑠𝑊𝑜𝑟𝑑 𝐶ℎ𝑒𝑚𝑖𝑠𝑡𝑟𝑦, 𝑀𝑜𝑙 = 𝑡𝑟𝑢𝑒 Not enough for good comparison … • Misspellings - variant spellings and abbreviations of a word as completely different words. Comparison • A solution for this problem is to compare word strings by the engrams they contain. • A genogram is a graphic representation of a family tree that displays detailed data on relationships among individuals Comparison • We will add to our framework the predicate 𝐻𝑎𝑠𝐸𝑛𝑔𝑟𝑎𝑚 𝑤𝑜𝑟𝑑, 𝑒𝑛𝑔𝑟𝑎𝑚 which is true iff engram is a substring of word. • Now, we can deal with misspelling. Scalability • Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged in order to accommodate that growth. • We will take small databases with 1000 constants of one type. In order to infer properly we need to do 1,000,000 equalities. Small Database 1,000 constants Equaling 1,000,000 equalities • Scalability is typically achieved by performing inference only over plausible candidate pairs, identified using a cheap similarity measure. • We select plausible candidates using McCallum CRF. Bayesian Classification Bayesian classifiers are the statistical classifiers. Classifiers give the probability that a given object belongs to a particular class. Bayesian Classification • Digit Recognition Bayesian Classification • X1,…,Xn  {0,1} (Black vs. White pixels) • Y  {5,6,None} (predict whether a digit is a 5 ,6 or other) 5 1969 The Fellegi-Sunter Model • The Fellegi-Sunter model uses naive Bayes to predict whether two records are the same, with field comparisons as the predictors. Naïve Bayes Fellegi-Sunter Predictors Conditional Random Field – CRF • CRFs classify with regard to "neighboring" samples. Naïve Bayes Transitivity predictors McCallum and Wellner’s CRF Conditional Random Field – CRF • While some of the apparent non-matches might be incorrect, this is a necessary and very reasonable approximation. Our solution • Using only binary relations. • Using MLN with finite weights. • We select only plausible candidates using McCallum CRF. • We use solver such as MaxWalkSAT to ground predicates and clauses. Experiment – Data Base • Using public data base Cora which has 1295 citations . • We used the first author, title and venue fields. Experiment – Models • NB - This is the naive Bayes model. We use a different feature for every word .Uses only HasWord predicate. • MLN(Basic) - This is the basic MLN model closest to naive Bayes. It has the four predicate equivalence rules connecting word to the corresponding field. • MLN(Basic+ Equivalence). This is obtained by adding predicate equivalence rules for the various fields to MLN(Basic). • MLN(Basic+Equl+TranClos). This model has both reverse predicate equivalence and transitive closure rules . • CLL - conditional log likelihood - probability density for the occurrence of a sample configuration 𝑋1, 𝑋2, 𝑋3, … , 𝑋𝑛. AUC-precision-recall curve for the match predicates. Results • The experiment shows that there is not optimal solution for all the fields. Conclusion • This paper proposes a unifying framework for entity resolution. We show how a small number of axioms in Markov logic capture the essential features of many different approaches to this problem. References • Database pic from slide 5 –GRACE – grace-fp7.eu • Main database from slide 6 – clipartkid.com • http://www.cs.washington.edu/homes/pedrod/803/

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download IliaBeinenson