Download Learning curves for Gaussian process regression on random graphs

Introduction Approximating via Cavities Results Learning curves for Gaussian process regression on random graphs Matthew Urry & Peter Sollich Matthew Urry & Peter Sollich Learning Curves on Random Graphs Conclusions Introduction Approximating via Cavities Results Conclusions GP Regression Gaussian Process Regression I GPs are an online learning technique where our a-priori assumptions about the function (i.e. smoothness) is encoded by a Gaussian process with fixed covariance and mean functions I We aim to predict function values by using example points D = {(iµ , yµ )|µ = 1, . . . , N } given by a teacher subject to noise I In this work we will assume the simple case of Gaussian noise with variance σ 2 and that the mean function is 0 I Predictions are then made using Bayes P (f |D) ∝ P (D|f )P (f ) Matthew Urry & Peter Sollich Learning Curves on Random Graphs Introduction Approximating via Cavities Results Conclusions Learning on Random Graphs Learning on Random Graphs I I I I I Learning in continuous spaces is relatively well understood [Opper 02, Sollich 98] Discrete spaces are less well studied and contain some interesting applications (e.g. social networks, metabolic pathways, chemical properties) We focus on GPs on random graphs (i.e. Random Regular or Erdos-Renyi graphs) We try to learn a function f : V 7→ R, defined on a graph G(V, E), with edges E and nodes V We use the random walk covariance function ip 1h C= aI + (1 − a)D 1/2 AD 1/2 a ≥ 2, p ≥ 0 κ P Normalised by κ so that V −1 i Cii = 1 making the average variance of the prior 1 Matthew Urry & Peter Sollich Learning Curves on Random Graphs Introduction Approximating via Cavities Results Learning on Random Graphs Examples of functions from the prior Random Regular Graph d = 3, p = 1, a = 2 Matthew Urry & Peter Sollich Learning Curves on Random Graphs Random Regular Graph d = 3, p = 15, a = 2 Conclusions Introduction Approximating via Cavities Results Conclusions Learning Curves Learning Curves I We are interested in the average performance of GPs over a particular class of functions and graphs I We look at the learning curves, the mean square error as a function of the number of examples N + * 1 X (fi − f¯i )2 g = V i I f |D,D,graphs We assume a matched case, where the posterior P (f |D) is also the posterior over underlying target functions Matthew Urry & Peter Sollich Learning Curves on Random Graphs Introduction Approximating via Cavities Results Conclusions Approximating the Learning Curve Approximating the Learning Curve I Earlier work [Sollich ’09] has shown that eigenvalue approximations used in continuous spaces [Sollich ’99, Opper ’02] are accurate in the small and large example limit but not in between the two I We have improved on these approximations for sparse graphs using the Cavity method I Sparse connectivity is important so that we may assume the graph is treelike (required for the cavity method to converge to a sensible limit) Matthew Urry & Peter Sollich Learning Curves on Random Graphs Introduction Approximating via Cavities Results Conclusions Constructing the Partition Function The Partition Function I We shift f¯ → 0 because we are in a matched case I After doing this we can construct a generating partition function Z as Z Z= N 1 1 X 2 λX 2 df exp(− f T C −1 f − 2 fiµ − fi ) 2 2σ 2 µ=1 I This gives the learning curve as g = − Matthew Urry & Peter Sollich Learning Curves on Random Graphs ∂ 2 lim hlog ZiD,graphs V λ→0 ∂λ i Introduction Approximating via Cavities Results Conclusions Constructing the Partition Function I We begin by assuming the graph and data are fixed I Z is still not in a suitable form for the cavity method I Introduce a counter of the number of examples each node has seen ni Z 1 1 X λX 2 Z = df exp(− f T C −1 f − 2 ni fi2 − fi ) 2 2σ 2 i I i Rewrite via Fourier transforms the C −1 to be C which may be written explicitly Z 1 1 1 }h) Z ∝ dh exp(− hT Ch − hT diag{ ni 2 2 +λ σ2 Matthew Urry & Peter Sollich Learning Curves on Random Graphs Introduction Approximating via Cavities Results Conclusions Constructing the Partition Function I I I C still may contain interacting terms between nodes that are not neighbours P Expand the p power in C to get C = pq=0 cq (D 1/2 AD 1/2 )q with cq = κ1 pq (1 − a)q ap−q Let hq = D 1/2 AD 1/2 hq−1 and h0 = h with enforcement variables ĥq Z Y Y Y Y Z∝ dhq dĥq φi ψij q φi = exp(− 1X 2 q cq di h0i hqi − q i 1 2 (ij) di (h0i )2 ni /σ 2 + λ +i X q=1 X q q−1 ψij = exp(−i (ĥi hj + ĥqj hq−1 )) i q=1 Matthew Urry & Peter Sollich Learning Curves on Random Graphs di ĥqi hqi ) Introduction Approximating via Cavities Results Conclusions Cavity Equations Cavity Equations I The generalisation error can be rewritten (for fixed D and graph) 1 X 1 di h(h0i )2 i g = lim 1− (1) λ→0 V ni /σ 2 + λ ni /σ 2 + λ i I I I I To solve this need the marginals of h0i Derived using the cavity method (not repeated here) Solved self consistently by zero mean complex Gaussians Variance update equations for the cavity distribution (i) Pj (hj , ĥj ) are X (i) (j) Vj = (Aj − XVk X)−1 k∈N (j)/i Matthew Urry & Peter Sollich Learning Curves on Random Graphs (2) Introduction Approximating via Cavities Results Conclusions Cavity Equations I I I I To generalise these equations to graph ensembles Let degree distribution of the graph be p(d) Pick examples uniformly so that ni ∼ Poisson(ν) in the large N limit This gives update equations Y X p(d)d Z d−1 dVk W (Vk ) W (V ) = h d¯ d k=1 δ(V − (A − d−1 X XVk X)−1 )in k=1 I Finally the learning curve approximation * + Z Y d X 1 = p(d) dVk W (Vk ) n/σ 2 + d(M −1 )00 d k=1 n Matthew Urry & Peter Sollich Learning Curves on Random Graphs Introduction Approximating via Cavities Results Conclusions Random Regular Ensemble, p = 1, a = 2 and d = 3 10 10 10 0 -1 -2 ε -3 10 10 Random Regular Graph V=500, d=3, a=2, p=1 -4 10 2 σ = 0.1 2 σ = 0.01 2 σ = 0.001 2 σ = 0.0001 -5 -6 100.01 Matthew Urry & Peter Sollich Learning Curves on Random Graphs 0.1 1 ν = n/V 10 Introduction Approximating via Cavities Results Conclusions Random Regular Ensemble, p = 10, a = 2 and d = 3 10 10 10 0 -1 -2 ε -3 10 10 Random Regular Graph V=500 , d=3, a=2, p=10 -4 10 2 σ = 0.1 2 σ = 0.01 2 σ = 0.001 2 σ = 0.0001 -5 -6 100.01 Matthew Urry & Peter Sollich Learning Curves on Random Graphs 0.1 1 ν=n/V 10 Introduction Approximating via Cavities Results Conclusions Erdos-Renyi Ensemble, p = 10, a = 2 and c = 3 10 10 Poisson Graph (without 0 degree nodes) V=500, c=3, a=2, p=10 0 -1 ε -2 10 2 10 10 -3 σ = 0.1 2 σ = 0.01 2 σ = 0.001 -4 -5 100.01 Matthew Urry & Peter Sollich Learning Curves on Random Graphs 0.1 1 ν = n/V 10 Introduction Approximating via Cavities Results Conclusions and Further Work Conclusions We have shown, I The cavity method can give far better results at predicting learning curves for sparse graphs I Interestingly in the thermodynamic limit this is exact for a wide range of graph ensembles; contrast with continuous spaces exact in only a few cases Further Work We would like to work on, I Different choice of normalisation maybe Cii = 1, ∀i I Model mismatch where teacher’s posterior distribution does not match the student I Graph mismatch I Evolving graphs Matthew Urry & Peter Sollich Learning Curves on Random Graphs Conclusions

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Learning curves for Gaussian process regression on random graphs