Download Learning curves for Gaussian process regression on random graphs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Machine learning wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

Signal-flow graph wikipedia , lookup

Randomness wikipedia , lookup

Hardware random number generator wikipedia , lookup

Transcript
Introduction
Approximating via Cavities
Results
Learning curves for Gaussian process regression
on random graphs
Matthew Urry & Peter Sollich
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
Conclusions
Introduction
Approximating via Cavities
Results
Conclusions
GP Regression
Gaussian Process Regression
I
GPs are an online learning technique where our a-priori
assumptions about the function (i.e. smoothness) is encoded
by a Gaussian process with fixed covariance and mean
functions
I
We aim to predict function values by using example points
D = {(iµ , yµ )|µ = 1, . . . , N } given by a teacher subject to
noise
I
In this work we will assume the simple case of Gaussian noise
with variance σ 2 and that the mean function is 0
I
Predictions are then made using Bayes
P (f |D) ∝ P (D|f )P (f )
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
Introduction
Approximating via Cavities
Results
Conclusions
Learning on Random Graphs
Learning on Random Graphs
I
I
I
I
I
Learning in continuous spaces is relatively well understood
[Opper 02, Sollich 98]
Discrete spaces are less well studied and contain some
interesting applications (e.g. social networks, metabolic
pathways, chemical properties)
We focus on GPs on random graphs (i.e. Random Regular or
Erdos-Renyi graphs)
We try to learn a function f : V 7→ R, defined on a graph
G(V, E), with edges E and nodes V
We use the random walk covariance function
ip
1h
C=
aI + (1 − a)D 1/2 AD 1/2
a ≥ 2, p ≥ 0
κ
P
Normalised by κ so that V −1 i Cii = 1 making the average
variance of the prior 1
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
Introduction
Approximating via Cavities
Results
Learning on Random Graphs
Examples of functions from the prior
Random Regular Graph
d = 3, p = 1, a = 2
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
Random Regular Graph
d = 3, p = 15, a = 2
Conclusions
Introduction
Approximating via Cavities
Results
Conclusions
Learning Curves
Learning Curves
I
We are interested in the average performance of GPs over a
particular class of functions and graphs
I
We look at the learning curves, the mean square error as a
function of the number of examples N
+
*
1 X
(fi − f¯i )2
g =
V
i
I
f |D,D,graphs
We assume a matched case, where the posterior P (f |D) is
also the posterior over underlying target functions
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
Introduction
Approximating via Cavities
Results
Conclusions
Approximating the Learning Curve
Approximating the Learning Curve
I
Earlier work [Sollich ’09] has shown that eigenvalue
approximations used in continuous spaces
[Sollich ’99, Opper ’02] are accurate in the small and large
example limit but not in between the two
I
We have improved on these approximations for sparse graphs
using the Cavity method
I
Sparse connectivity is important so that we may assume the
graph is treelike (required for the cavity method to converge
to a sensible limit)
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
Introduction
Approximating via Cavities
Results
Conclusions
Constructing the Partition Function
The Partition Function
I
We shift f¯ → 0 because we are in a matched case
I
After doing this we can construct a generating partition
function Z as
Z
Z=
N
1
1 X 2
λX 2
df exp(− f T C −1 f − 2
fiµ −
fi )
2
2σ
2
µ=1
I
This gives the learning curve as
g = −
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
∂
2
lim
hlog ZiD,graphs
V λ→0 ∂λ
i
Introduction
Approximating via Cavities
Results
Conclusions
Constructing the Partition Function
I
We begin by assuming the graph and data are fixed
I
Z is still not in a suitable form for the cavity method
I
Introduce a counter of the number of examples each node has
seen ni
Z
1
1 X
λX 2
Z = df exp(− f T C −1 f − 2
ni fi2 −
fi )
2
2σ
2
i
I
i
Rewrite via Fourier transforms the C −1 to be C which may
be written explicitly
Z
1
1
1
}h)
Z ∝ dh exp(− hT Ch − hT diag{ ni
2
2
+λ
σ2
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
Introduction
Approximating via Cavities
Results
Conclusions
Constructing the Partition Function
I
I
I
C still may contain interacting terms between nodes that are
not neighbours
P
Expand the p power in C to get C = pq=0 cq (D 1/2 AD 1/2 )q
with cq = κ1 pq (1 − a)q ap−q
Let hq = D 1/2 AD 1/2 hq−1 and h0 = h with enforcement
variables ĥq
Z Y
Y
Y Y
Z∝
dhq
dĥq
φi
ψij
q
φi = exp(−
1X
2
q
cq di h0i hqi −
q
i
1
2
(ij)
di (h0i )2
ni /σ 2 + λ
+i
X
q=1
X q q−1
ψij = exp(−i
(ĥi hj + ĥqj hq−1
))
i
q=1
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
di ĥqi hqi )
Introduction
Approximating via Cavities
Results
Conclusions
Cavity Equations
Cavity Equations
I
The generalisation error can be rewritten (for fixed D and
graph)
1 X
1
di h(h0i )2 i
g = lim
1−
(1)
λ→0 V
ni /σ 2 + λ
ni /σ 2 + λ
i
I
I
I
I
To solve this need the marginals of h0i
Derived using the cavity method (not repeated here)
Solved self consistently by zero mean complex Gaussians
Variance update equations for the cavity distribution
(i)
Pj (hj , ĥj ) are
X
(i)
(j)
Vj = (Aj −
XVk X)−1
k∈N (j)/i
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
(2)
Introduction
Approximating via Cavities
Results
Conclusions
Cavity Equations
I
I
I
I
To generalise these equations to graph ensembles
Let degree distribution of the graph be p(d)
Pick examples uniformly so that ni ∼ Poisson(ν) in the large
N limit
This gives update equations
Y
X p(d)d Z d−1
dVk W (Vk )
W (V ) = h
d¯
d
k=1
δ(V − (A −
d−1
X
XVk X)−1 )in
k=1
I
Finally the learning curve approximation
*
+
Z Y
d
X
1
=
p(d)
dVk W (Vk )
n/σ 2 + d(M −1 )00
d
k=1
n
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
Introduction
Approximating via Cavities
Results
Conclusions
Random Regular Ensemble, p = 1, a = 2 and d = 3
10
10
10
0
-1
-2
ε
-3
10
10
Random Regular Graph
V=500, d=3, a=2, p=1
-4
10
2
σ = 0.1
2
σ = 0.01
2
σ = 0.001
2
σ = 0.0001
-5
-6
100.01
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
0.1
1
ν = n/V
10
Introduction
Approximating via Cavities
Results
Conclusions
Random Regular Ensemble, p = 10, a = 2 and d = 3
10
10
10
0
-1
-2
ε
-3
10
10
Random Regular Graph
V=500 , d=3, a=2, p=10
-4
10
2
σ = 0.1
2
σ = 0.01
2
σ = 0.001
2
σ = 0.0001
-5
-6
100.01
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
0.1
1
ν=n/V
10
Introduction
Approximating via Cavities
Results
Conclusions
Erdos-Renyi Ensemble, p = 10, a = 2 and c = 3
10
10
Poisson Graph (without 0 degree nodes)
V=500, c=3, a=2, p=10
0
-1
ε -2
10
2
10
10
-3
σ = 0.1
2
σ = 0.01
2
σ = 0.001
-4
-5
100.01
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
0.1
1
ν = n/V
10
Introduction
Approximating via Cavities
Results
Conclusions and Further Work
Conclusions
We have shown,
I
The cavity method can give far better results at predicting
learning curves for sparse graphs
I
Interestingly in the thermodynamic limit this is exact for a
wide range of graph ensembles; contrast with continuous
spaces exact in only a few cases
Further Work
We would like to work on,
I
Different choice of normalisation maybe Cii = 1, ∀i
I
Model mismatch where teacher’s posterior distribution does
not match the student
I
Graph mismatch
I
Evolving graphs
Matthew Urry & Peter Sollich
Learning Curves on Random Graphs
Conclusions