Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Machine learning, probabilistic
modelling
Stuart Russell
Computer Science Division, UC Berkeley
1
Outline
Some basic aspects of machine learning
Example: detecting artifacts in ICU data
Example: probabilistic data association
Multitarget tracking
Freeway traffic
CiteSeer
Sibyl attacks on recommender systems
2
Machine learning: model-free
Learning
hypothesis
data
3
Model-free learning contd.
Supervised learning
Input: x1, f(x1) … xn, f(xn)
(many possible input and label spaces)
Output: h f
E.g., f classifies xi as earthquake/explosion
Unsupervised learning
Input: x1, … xn
Output: clustering of inputs into categories
4
Model-free learning contd.
Application, form of data influence choice
of hypothesis class for H
Linear models, logistic regression
Decision trees (classification or regression)
Nonparametric (instance-based)
Kernel methods
effectively linear separators in a transformed high-
dimensional input space
Probabilistic grammars for strings
Etc.
5
Model-based learning
prior
knowledge
Learning
knowledge
data
6
Model-based learning
prior
knowledge
Learning
knowledge
data
7
Bayesian model-based learning
Generative approach
P(world) describes prior over what is (source), also
over model parameters, structure
P(signal | world) describes sensor model (channel)
Given new signal, compute P(world | signal)
Learning
Posterior over parameters (or structure) given data
Or use maximum a posteriori, maximum likelihood
Substantial advances modeling capabilities,
general-purpose inference algorithms
Applications with millions of parameters,
gigabytes of data are fairly routine
8
9
Artifact events ubiquitous
10
Blood pressure signals
11
Artifact events
Goal: detect, categorize, and correct for
artifacts in blood pressure signal
12
Generative model
Parameters for event duration, frequency trained
on small sample of one-second data
Detection uses equivalent one-minute model
based on measurement and artifact processes
13
ALARM
14
Example: classical data association
15
Example: classical data association
16
Example: classical data association
17
Example: classical data association
18
Example: classical data association
19
Example: classical data association
20
Generative model
World = aircraft, trajectories, blip associations
#Aircraft ~ NumAircraftPrior();
State(a, t)
if t = 0 then ~ InitState()
else ~ StateTransition(State(a, t-1));
#Blip(Source = a, Time = t)
~ NumDetectionsCPD(State(a, t));
#Blip(Time = t)
~ NumFalseAlarmsPrior();
ApparentPos(r)
if (Source(r) = null) then ~ FalseAlarmDistrib()
else ~ ObsCPD(State(Source(r), Time(r)));
21
Aircraft Tracking Results
[Oh et al., CDC 2004]
(simulated data)
MCMC has smallest error,
hardly degrades at all as
tracks get dense
[Figures by Songhwai Oh]
MCMC is nearly as fast as
greedy algorithm;
much faster than MHT
22
Extending the Model: Air Bases
#Aircraft(InitialBase = b) ~ InitialAircraftPerBasePrior();
CurBase(a,
if t =
elseif
elseif
else =
t)
0 then = InitialBase(b)
TakesOff(a, t-1) then = null
Lands(a, t-1) then = Dest(a, t-1)
CurBase(a, t-1);
InFlight(a, t) = (CurBase(a, t) = null);
TakesOff(a, t)
if !InFlight(a, t) then ~ Bernoulli(0.1);
Lands(a, t)
if InFlight(a, t) then
~ LandingCPD(State(a, t), Location(Dest(a, t)));
Dest(a, t)
if TakesOff(a, t) then ~ Uniform({Base b})
elseif InFlight(a, t) then = Dest(a, t-1)
State(a, t)
if TakesOff(a, t-1) then
~ InitState(Location(CurBase(a, t-1)))
elseif InFlight(a, t) then
~ StateTrans(State(a, t-1), Location(Dest(a, t))); 23
Unknown Air Bases
Just add two more lines:
#AirBase ~ NumBasesPrior();
Location(b) ~ BaseLocPrior();
24
Example: traffic surveillance
Multiple distributed sensors
Uncertain, time-varying travel time
Prediction error >>> object separation
25
Example: Citation Matching
[Lashkari et al 94] Collaborative Interface Agents,
Yezdi Lashkari, Max Metral, and Pattie Maes,
Proceedings of the Twelfth National Conference on
Articial Intelligence, MIT Press, Cambridge, MA,
1994.
Metral M. Lashkari, Y. and P. Maes. Collaborative
interface agents. In Conference of the American
Association for Artificial Intelligence, Seattle,
WA, August 1994.
Are these descriptions of the same object?
Core task in CiteSeer, Google Scholar
26
(Simplified) BLOG model
#Researcher ~ NumResearchersPrior();
Name(r) ~ NamePrior();
#Paper(FirstAuthor = r) ~
NumPapersPrior(Position(r));
Title(p) ~ TitlePrior();
PubCited(c) ~ Uniform({Paper p});
Text(c) ~ NoisyCitationGrammar
(Name(FirstAuthor(PubCited(c))),
Title(PubCited(c)));
27
Citation Matching Results
Error
(Fraction of Clusters Not Recovered Correctly)
0.25
0.2
Phrase Matching
[Lawrence et al. 1999]
0.15
Generative Model + MCMC
[Pasula et al. 2002]
Conditional Random Field
[Wellner et al. 2004]
0.1
0.05
0
Reinforce
Face
Reason
Constraint
Four data sets of ~300-500 citations, referring to ~150-300 papers
28
Example: Sibyl attacks
Typically between 100 and 10,000 real entities
About 90% are honest, have one identity
Dishonest entities own between 10 and 1000 identities.
Transactions may occur between identities
If two identities are owned by the same entity (sibyls), then a
transaction is highly likely;
Otherwise, transaction is less likely (depending on honesty of
each identity’s owner).
An identity may recommend another after a transaction:
Sibyls with the same owner usually recommend each other;
Otherwise, probability of recommendation depends on the
honesty of the two entities.
29
#Entity ~ LogNormal[6.9, 2.3]();
Honest(x) ~ Boolean[0.9]();
#Identity(Owner = x) ~
if Honest(x) then 1 else LogNormal[4.6,2.3]();
Transaction(x,y) ~
if Owner(x) = Owner(y) then SibylPrior ()
else TransactionPrior(Honest(Owner(x)),
Honest(Owner(y)));
Recommends(x,y) ~
if Transaction(x,y) then
if Owner(x) = Owner(y) then Boolean[0.99]()
else RecPrior(Honest(Owner(x)),
Honest(Owner(y)));
Evidence: lots of transactions and recommendations,
maybe some Honest(.) assertions
Query: Honest(x)
30
Summary
Generative approach to machine learning
Can accommodate
strong prior knowledge
heterogeneous data
noise, artifacts
Vertically integrated probability models (not
pipeline) connect events, transmission,
detection, association
31