Download Statistical Relational Learning: A Tutorial

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistical
Relational
Learning: A Quick Intro
Lise Getoor
University of Maryland, College Park
acknowledgements

Synthesis of ideas of many individuals who have participated in
various SRL workshops :


Hendrik Blockeel, Mark Craven, James Cussens, Bruce D’Ambrosio,
Luc De Raedt, Tom Dietterich, Pedro Domingos, Saso Dzeroski, Peter
Flach, Rob Holte, Manfred Jaeger, David Jensen, Kristian Kersting,
Daphne Koller, Heikki Mannila, Tom Mitchell, Ray Mooney, Stephen
Muggleton, Kevin Murphy, Jen Neville, David Page, Avi Pfeffer, Claudia
Perlich, David Poole, Foster Provost, Dan Roth, Stuart Russell, Taisuke
Sato, Jude Shavlik, Ben Taskar, Lyle Ungar and many others…
and students:

Indrajit Bhattacharya, Mustafa Bilgic, Rezarta Islamaj, Louis Licamele,
Qing Lu, Galileo Namata, Vivek Sehgal, Prithviraj Sen
Why SRL?

Traditional statistical machine learning approaches assume:


Traditional ILP/relational learning approaches assume:



Multi-relational, heterogeneous and semi-structured
Noisy and uncertain
Statistical Relational Learning:


No noise or uncertainty in data
Real world data sets:


A random sample of homogeneous objects from single relation
newly emerging research area at the intersection of research in
social network and link analysis, hypertext and web mining, graph
mining, relational learning and inductive logic programming
Sample Domains:

web data, bibliographic data, epidemiological data, communication
data, customer networks, collaborative filtering, trust networks,
biological data, sensor networks, natural language, vision
SRL Approaches

Directed Approaches




Bayesian Network Tutorial
Rule-based Directed Models
Frame-based Directed Models
Undirected Approaches



Markov Network Tutorial
Frame-based Undirected Models
Rule-based Undirected Models
Probabilistic Relational Models

PRMs w/ Attribute Uncertainty
Inference in PRMs
Learning in PRMs

PRMs w/ Structural Uncertainty

PRMs w/ Class Hierarchies


Representation & Inference
[Koller & Pfeffer 98, Pfeffer, Koller, Milch &Takusagawa 99, Pfeffer 00]
Learning, Structural Uncertainty & Class Hierarchies
[Friedman et al. 99, Getoor, Friedman, Koller & Taskar 01 & 02, Getoor 01]
Relational Schema
Author
Review
Good Writer
Mood
Smart
Length
Paper
Quality
Accepted
Has Review
Author of

Describes the types of objects and relations in the
database
Probabilistic Relational Model
Review
Author
Smart
Mood
Good Writer
Length
Paper
Quality
Accepted
Probabilistic Relational Model
Review
Author
Smart
Mood
Good Writer
Length
Paper
 Paper.Accepted | 

Paper.Quality, 
P

 Paper.Review.Mood 
Quality
Accepted
Probabilistic Relational Model
Review
Author
Smart
Mood
Good Writer
Q, M P(A | Q, M)
f , f 0.1 0.9
f, t
0.2 0.8
t, f
0.6 0.4
t, t
0.7 0.3
Length
Paper
Quality
Accepted
Relational Skeleton
Author A1
Author A2
Paper P1
Author: A1
Review: R1
Review R1
Paper P2
Author: A1
Review: R2
Paper P3
Author: A2
Review: R2
Fixed relational skeleton :


Primary Keys
set of objects in each class
relations between them
Review R2
Review R2
Foreign Keys
PRM w/ Attribute Uncertainty
Author A1
Smart
Good Writer
Author A2
Smart
Good Writer
Paper P1
Author: A1
Review: R1
Quality
Accepted
Paper P2
Author: A1
Review: R2
Quality
Accepted
Paper P3
Author: A2
Review: R2
Quality
Accepted
Review R1
Mood
Length
Review R2
Mood
Length
Review R3
Mood
Length
PRM defines distribution over instantiations of attributes
A Portion of the BN
Pissy
r2.Mood
P2.Quality
Low
P2.Accepted
P3.Quality
P3.Accepted
r3.Mood
Q, M
M P(A | Q, M)
f , f 00.11 00.99
f, t
00.22 00.88
t, f
00.66 00.44
t, t
00.77 00.33
A Portion of the BN
Pissy
r2.Mood
P2.Quality
Low
P2.Accepted
P3.Quality
High
P3.Accepted
Pissy
r3.Mood
Q, M P(A | Q, M)
f , f 0.1 0.9
f, t
0.2 0.8
t, f
0.6 0.4
t, t
0.7 0.3
PRM: Aggregate Dependencies
Paper
Review
Quality
Mood
Length
Accepted
Review R1
Mood
Review R2
Length
Paper P1
Mood
Quality
Review R3
Length
Accepted
Mood
Length
PRM: Aggregate Dependencies
Paper
Review
Mood
Quality
Length
Accepted
Paper P1
Q, M
f, f
f,t
t, f
t, t
P(A | Q, M)
0. 1 0. 9
0. 2 0. 8
0. 6 0. 4
0. 7 0. 3
Review R1
Mood
Review R2
Quality
Length
Mood
Review R3
Length
Accepted
mode
sum, min, max,
avg, mode, count
Mood
Length
PRM with AU Semantics
Author
Author
A1
Paper
Author
A2
Review
Paper
P1
Review
R1
Paper
P2
Paper
P3
PRM
+
Review
R2
Review
R3
relational skeleton 
=
probability distribution over completions I:
P (I |  , S, )   P ( x.A | parents S , ( x.A))
x x . A
Objects
Attributes
Probabilistic Relational Models

PRMs w/ Attribute Uncertainty
Inference in PRMs
Learning in PRMs

PRMs w/ Structural Uncertainty

PRMs w/ Class Hierarchies


Kinds of structural uncertainty

How many objects does an object relate to?


Which object is an object related to?



does Paper1 cite Paper2 or Paper3?
Which class does an object belong to?


how many Authors does Paper1 have?
is Paper1 a JournalArticle or a ConferencePaper?
Does an object actually exist?
Are two objects identical?
Structural Uncertainty




Motivation: PRM with AU only well-defined when the
skeleton structure is known
May be uncertain about relational structure itself
Construct probabilistic models of relational structure
that capture structural uncertainty
Mechanisms:

Reference uncertainty

Existence uncertainty
Number uncertainty
Type uncertainty
Identity uncertainty



Existence Uncertainty
??
?
Document Collection
Document Collection
PRM w/ Exists Uncertainty
Paper
Paper
Topic
Words
Topic
Words
Cites
Exists
Dependency model for existence of relationship
Exists Uncertainty Example
Paper
Topic
Words
Paper
Topic
Words
Cites
Exists
Citer.Topic
Theory
Theory
AI
AI
Cited.Topic
Theory
AI
Theory
AI
False
True
0.995
0.999
0.997
0.993
0005
0001
0003
0008
PRMs w/ EU Semantics
Paper
Topic
Words
Paper
Cites
Exists
Topic
Words
PRM EU
Paper
Paper
P2
P5
Paper
Topic
Paper
Topic
P4Paper
Theory
P3
AI
Topic
P1Topic
Theory
TopicAI
???
???
Paper
Paper
P2
P5
Paper
Topic
Paper
Topic
P4Paper
Theory
P3
AI
Topic
P1Topic
Theory
TopicAI
???
object skeleton 
PRM-EU + object skeleton 
 probability distribution over full instantiations I
But…what about Probabilistic DBs?

Similarities:

Representation, e.g.
• PRMs can model attribute correlations compactly (or)
• PRMs can model tuple uncertainty by introducing exists
random variable for each uncertain tuple (maybe)
• PRMs can model join dependencies compactly

Differences:



ML emphasis on generalization and compact
modeling
DB emphasis on loss-less data storage
Commonality:

Need for efficient query processing
Conclusion

Statistical Relational Learning




Different approaches:



rule-based vs. frame-based
directed vs. undirected
Many common issues:






Supports multi-relational, heterogeneous domains
Supports noisy, uncertain, non-IID data
aka, real-world data!
Need for collective classification and consolidation
Need for aggregation and combining rules
Need to handle labeled and unlabeled data
Need to handle structural uncertainty
etc.
Great opportunity for combining machine learning for
hierarchical statistical models with probabilistic databases
which can efficiently store, query, update models
Related documents