Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Entity Resolution with Markov Logic
Parag Singla
Pedro Domingos
Department of Computer Science and Engineering University of Washington Seattle,
WA 98195-2350, U.S.A.
Overview
Introduction
Technical Tools
Markov network
First Order Logic – Knowledge base
Markov Logic
Problem Formulation
Experiment
Introduction
• Assume that we have a student that studies few courses in different
institutions.
• His grades have to be in a unit major system.
• His name is Ellen Musk
Main
Database
Ellen
musk
Technion
Real student name:
Ellen Musk
Musk
Ellen
Haifa
University
Ellen.M
Ben-Gurion
University
Musk E
Tel-Aviv
University
Mask
Ellen
Bar-Ilan
University
What we would have in the Main database
Name
institute
Grades
Ellen musk
Technion
}100{
Musk Ellen
Haifa
University
}100{
Ellen.M
Ben-Gurion
University
}100{
Musk E
Tel-Aviv
University
}74,100{
Mask Ellen
Bar-Ilan
University
}54,61,77,100{
We have a lot of records for one student
• In big database, it can cost a lot.
• In order to know student`s weak points we need to know at least his
grades.
Ellen
musk
Musk
Ellen
Ellen.M
Musk E
Mask
Ellen
Solution - Entity Resolution
• Entity Resolution = Record linkage = determining which records in a
database refer to the same entities.
Ellen
musk
Musk
Ellen
Ellen.M
Musk E
Mask
Ellen
Conclusion
• Data preparation is needed in the data mining process.
• Data from relevant sources must be collected, integrated, scrubbed
and pre-processed.
• Correctly merging these records and the information they represent is
an essential step in producing data of sufficient quality for mining.
Technical Tools
Markov Networks - MRF
• Model for joint distribution of random variables X = (𝑋1 , 𝑋2 , 𝑋3 , … , 𝑋𝑛 )
• Defined by:
Undirected Graph: G , Node for each variable, Edge represent
dependency.
• 𝑆𝑚𝑜𝑘𝑖𝑛𝑔 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡
Smoking
Cancer
𝑑𝑒𝑝𝑒𝑛𝑑 𝑜𝑛 𝑎𝑠𝑡ℎ𝑚𝑎.
Asthma
• 𝑃 𝑆𝑚𝑜𝑘𝑖𝑛𝑔 𝐶𝑎𝑛𝑐𝑒𝑟 = 𝑃 𝑆𝑚𝑜𝑘𝑖𝑛𝑔|𝐶𝑎𝑛𝑐𝑒𝑟, 𝐴𝑠𝑡ℎ𝑚
Cough
Markov Networks
• Undirected graphical models
Smoking
Cancer
Asthma
Cough
Potential functions defined over cliques
1
P ( x) c ( xc )
Z c
Z c ( xc )
x
c
Smoking Cancer
Ф(S,C)
False
False
4.5
False
True
4.5
True
False
2.7
True
True
4.5
First Order Logic – Knowledge base
• first-order knowledge base or KB is set of formulas in first order logic.
First Order Logic – Formulas
•
Formula
constants
variables
Objects
People:
Anna,Bob,etc
functions
relations
Range
X,Y
predicates
Mapping the
domain:
MotherOf(Bob)
Friends(Bob,
Anna)
attributes
Smokes(Bob)
First Order Logic – Term and Atom
Term- represent an
object
• 𝑇𝑒𝑟𝑚 = 𝑐𝑜𝑛𝑠𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑡𝑒𝑟𝑚1, 𝑡𝑒𝑟𝑚2, … , 𝑡𝑒𝑟𝑚𝑁
• 𝐴𝑡𝑜𝑚 = 𝑃𝑟𝑒𝑑𝑖𝑐𝑎𝑡𝑒 𝑡𝑒𝑟𝑚1, 𝑡𝑒𝑟𝑚2, … , 𝑡𝑒𝑟𝑚𝑁
Friends
• 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑒𝑟𝑚 = 𝑡𝑒𝑟𝑚 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑎𝑛𝑦 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
x
Atom
MotherOf
Anna
First Order Logic – Term and Atom
• 𝑇𝑒𝑟𝑚 = 𝑐𝑜𝑛𝑠𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑡𝑒𝑟𝑚1, 𝑡𝑒𝑟𝑚2, … , 𝑡𝑒𝑟𝑚𝑁
• 𝐴𝑡𝑜𝑚 = 𝑃𝑟𝑒𝑑𝑖𝑐𝑎𝑡𝑒 𝑡𝑒𝑟𝑚1, 𝑡𝑒𝑟𝑚2, … , 𝑡𝑒𝑟𝑚𝑁
• 𝐺𝑟𝑜𝑢𝑛𝑑 𝑡𝑒𝑟𝑚 = 𝑡𝑒𝑟𝑚 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑎𝑛𝑦 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝐺𝑟𝑜𝑢𝑛𝑑 𝑝𝑟𝑒𝑑𝑖𝑐𝑎𝑡𝑒/𝑎𝑡𝑜𝑚 = 𝑓𝑜𝑟𝑚𝑢𝑙𝑎 (𝑡𝑒𝑟𝑚1𝑔𝑟𝑜𝑢𝑛𝑑 , 𝑡𝑒𝑟𝑚2𝑔𝑟𝑜𝑢𝑛𝑑 ,…)
• 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑙𝑖𝑡𝑒𝑟𝑎𝑙 = 𝑎 𝑛𝑒𝑔𝑎𝑡𝑒𝑑 𝑎𝑡𝑜𝑚𝑖𝑐 𝑓𝑜𝑟𝑚𝑢𝑙𝑎
• 𝐹𝑜𝑟𝑚𝑢𝑙𝑎𝑠 𝑟𝑒𝑐𝑢𝑟𝑠𝑖𝑣𝑒𝑙𝑦 𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑒𝑑 𝑓𝑟𝑜𝑚 𝑎𝑡𝑜𝑚𝑖𝑐 𝑓𝑜𝑟𝑚𝑢𝑙𝑎𝑠
𝑢𝑠𝑖𝑛𝑔 𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑣𝑒𝑠 𝑎𝑛𝑑 𝑞𝑢𝑎𝑛𝑡𝑖fi𝑒𝑟𝑠.
¬,∧,∨, ⇒, ⇔, ∀, ∃
first-order KB
• We would like to find out determining if a KB is satisfied – if there is
an assignment of truth values to ground atoms.
• A possible world or assigns a truth value to each possible ground
atom.
• We may determine if KB is satisfied by using solvers.
Motivation for Markov Logic
Ideally we want a framework that can incorporate the
advantages of both
Markov Logic – Introduction
• A first-order KB can be seen as a set of hard constraints on the set of
possible worlds.
• if a world violates even one formula, it has zero probability.
• The basic idea in Markov logic is to soften these constraints: when a
world violates one formula in the KB it is less probable, but not
impossible.
• The fewer formulas a world violates, the more probable it is.
• Give each formula a weight
(Higher weight Stronger constraint)
Markov Logic Networks – Example
Markov Networks
Friends(A,B)
Friends(A,A)
Plays(A)
Plays(B)
Friends(B,B)
Friends(B,A)
Friends ( x, y ) Plays ( x) Plays ( y ),
Friends ( x, y ) Plays ( x) Plays ( y )
Constants:
Alice (A) and Bob (B)
Markov Logic Networks
23
Markov Logic Network
Friends(x,y)
Plays(x)
Plays(y)
ω
True
True
True
3
True
False
True
False
True
Plays(x)
Fired(x)
ω
True
True
2
0
True
False
0
True
3
False
True
0
False
False
Friends(A,A)
True
3
Plays(B)False
False
2
True
True
False
0
True
False
False
3
False
True
False
3
False
Fired(A)False
False
Friends(A,B)
Plays(A)
Friends(B,A)
3
Friends(B,B)
Fired(B)
Friends ( x, y ) Plays ( x) Plays ( y ),
Plays
( x) ( xFired
x)Plays ( x) Plays ( y )
Friends
, y ) (
Markov Logic Networks
24
Markov Logic - Definition
• 𝑀𝑎𝑟𝑘𝑜𝑣 𝑙𝑜𝑔𝑖𝑐 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑀𝐿𝑁 − 𝐿 𝑖𝑠 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑎𝑖𝑟𝑠 𝐹𝑖, 𝑤𝑖 ,
𝐹𝑖 𝑖𝑠 𝑎 𝑓𝑜𝑟𝑚𝑢𝑙𝑎 𝑖𝑛 𝑓𝑖𝑟𝑠𝑡 𝑜𝑟𝑑𝑒𝑟 𝑙𝑜𝑔𝑖𝑐, 𝑤𝑖 𝑖𝑠 𝑟𝑒𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟
, 𝑠𝑒𝑡 𝑜𝑓 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝐶 = 𝑐1 , 𝑐2 , 𝑐3 , … , 𝑐 𝐶 .
1. 𝑀𝐿,𝐶 =One feature for each grounding of each formula 𝐹𝑖 in the
MLN, with the corresponding weight 𝑤𝑖 .
2. 𝑀𝐿,𝐶 =One node for each grounding of each predicate in the MLN
Markov Logic - Definition
An MLN can be viewed as a extension for Markov networks.
MLN without variables = Markov network.
MLN with infinite weights is purely logical KB.
Markov Logic - Definition
•𝑃 𝑋=𝑥 =
1
𝐹
exp( 𝑖=1 𝑤𝑖 𝑛𝑖 (𝑥))
𝑍
• 𝐹𝑖 is formal in the MLN and 𝑛𝑖 𝑥 is the number of not negated terms
in the formula.
• 𝑤𝑖 is the weight of formula 𝑖.
Relations and weights in Markov Logic Network
When added to an MLN with a finite weight, they capture an important
statistical regularity: if two objects are in the same relation this is
evidence that we would use later.
• Some relations provide stronger evidence than others, and this is
captured by assigning different weights to the formula for different R.
Example
Article
1
?
RSameTitle>RSameAuthor>RSameVenue>RNoThing
Article
2
Goal
• Goal of entity resolution : for each pair of objects of the same type
(𝑥1, 𝑥2) , to determine whether they represent the same entity:
𝑥1 = 𝑥2
Equality in First-Order Logic
• Most systems for inference in first-order logic make the unique names
assumption: different constants refer to different objects in the
domain.
Equality
• Formally:
𝐸𝑞𝑢𝑎𝑙𝑠 𝑥, 𝑦 ⇔ 𝑥 = 𝑦 .
Reflexivity: ∀𝑥 𝑥 = 𝑥.
Symmetry: ∀𝑥, 𝑦 𝑥 = 𝑦 ⇒ 𝑦 = 𝑥.
Transitivity: ∀𝑥, 𝑦, 𝑧
𝑥 = 𝑦 ∧ 𝑦 = 𝑧 ⇒ 𝑥 = 𝑧.
Real World word problem
• Words have grammar Inflection.
• Words may be misspelled.
Comparison
• We define the predicate 𝐻𝑎𝑠𝑊𝑜𝑟𝑑(𝑓𝑖𝑒𝑙𝑑, 𝑤𝑜𝑟𝑑) which is true iff
field contains word.
• Now, we can deal with Inflection.
• Example:
• 𝑓𝑖𝑒𝑙𝑑𝑠 = 𝑀𝑎𝑡ℎ, 𝑃ℎ𝑦𝑠𝑖𝑐𝑠, 𝐵𝑖𝑜𝑙𝑜𝑔𝑦, 𝐶ℎ𝑒𝑚𝑖𝑠𝑡𝑟𝑦
• 𝐻𝑎𝑠𝑊𝑜𝑟𝑑 𝑀𝑎𝑡ℎ, 𝑀𝑜𝑙 = 𝑓𝑎𝑙𝑠𝑒
𝐻𝑎𝑠𝑊𝑜𝑟𝑑 𝐶ℎ𝑒𝑚𝑖𝑠𝑡𝑟𝑦, 𝑀𝑜𝑙 = 𝑡𝑟𝑢𝑒
Not enough for good comparison …
• Misspellings - variant spellings and abbreviations of a word as
completely different words.
Comparison
• A solution for this problem is to compare word strings by the engrams
they contain.
• A genogram is a graphic representation of a family tree that displays
detailed data on relationships among individuals
Comparison
• We will add to our framework the predicate
𝐻𝑎𝑠𝐸𝑛𝑔𝑟𝑎𝑚 𝑤𝑜𝑟𝑑, 𝑒𝑛𝑔𝑟𝑎𝑚
which is true iff engram is a substring of word.
• Now, we can deal with misspelling.
Scalability
• Scalability is the capability of a system, network, or process to handle
a growing amount of work, or its potential to be enlarged in order to
accommodate that growth.
• We will take small databases with 1000 constants of one type. In
order to infer properly we need to do 1,000,000 equalities.
Small Database
1,000 constants
Equaling
1,000,000
equalities
• Scalability is typically achieved by performing inference only over
plausible candidate pairs, identified using a cheap similarity measure.
• We select plausible candidates using McCallum CRF.
Bayesian Classification
Bayesian classifiers are the statistical classifiers.
Classifiers give the probability that a given object
belongs to a particular class.
Bayesian Classification
• Digit Recognition
Bayesian
Classification
• X1,…,Xn {0,1} (Black vs. White pixels)
• Y {5,6,None} (predict whether a digit is a 5 ,6 or other)
5
1969
The Fellegi-Sunter Model
• The Fellegi-Sunter model uses naive Bayes to predict whether two
records are the same, with field comparisons as the predictors.
Naïve Bayes
Fellegi-Sunter
Predictors
Conditional Random Field – CRF
• CRFs classify with regard to "neighboring" samples.
Naïve Bayes
Transitivity predictors
McCallum and Wellner’s
CRF
Conditional Random Field – CRF
• While some of the apparent non-matches might be incorrect, this is a
necessary and very reasonable approximation.
Our solution
• Using only binary relations.
• Using MLN with finite weights.
• We select only plausible candidates using McCallum CRF.
• We use solver such as MaxWalkSAT to ground predicates and clauses.
Experiment – Data Base
• Using public data base Cora which has 1295 citations .
• We used the first author, title and venue fields.
Experiment – Models
• NB - This is the naive Bayes model. We use a different feature for
every word .Uses only HasWord predicate.
• MLN(Basic) - This is the basic MLN model closest to naive Bayes. It
has the four predicate equivalence rules connecting word to the
corresponding field.
• MLN(Basic+ Equivalence). This is obtained by adding predicate
equivalence rules for the various fields to MLN(Basic).
• MLN(Basic+Equl+TranClos). This model has both reverse predicate
equivalence and transitive closure rules .
• CLL - conditional log likelihood - probability density for the occurrence
of a sample configuration 𝑋1, 𝑋2, 𝑋3, … , 𝑋𝑛.
AUC-precision-recall curve for the match predicates.
Results
• The experiment shows that there is not optimal solution for all the
fields.
Conclusion
• This paper proposes a unifying framework for entity resolution.
We show how a small number of axioms in Markov logic capture the
essential features of many different approaches to this problem.
References
• Database pic from slide 5 –GRACE – grace-fp7.eu
• Main database from slide 6 – clipartkid.com
• http://www.cs.washington.edu/homes/pedrod/803/