Download Information Theory and Learning

Information Theory and Learning Tony Bell Helen Wills Neuroscience Institute University of California at Berkeley One input, one output deterministic Infomax: match the input distribution to the non-linearity: Gradient descent learning rule to maximise the transferred information deterministic sensory only Examples of score functions LOGISTIC LAPLACIAN In stochastic gradient algorithms (online training), we dispense with the ensemble averages giving: for a single training example and a laplacian ‘prior’. Same theory for multiple dimensions: fire vectors into the the unit hypercube uniformly: ( ) where this is the absolute determinant of the Jacobian matrix, measuring how stretchy the mapping is for square or overcomplete transforms Undercomplete transformations are not invertable, and require the more complex formula: Same theory for multiple dimensions: fire vectors into the the unit hypercube uniformly: ( ) Post-multiplying this by a positive definate transform rescales the gradient optimally (called the Natural Gradient - Amari) giving the pleasantly simple form: Decorrelation is not enough: diagonal matrix f gives higher order statistics, through its Taylor expansion Infomax/ICA on image patches: learn co-ordinates for natural scenes. In this linear generative model, we want u = s: recover independent sources. After training, we calculate A = W -1 , and plot the columns. For 16x16 images, we get 256 bases f from logistic density f from laplacian density f from Gaussian density But this does not actually make the neurons independent. Many joint densities p(u1,u2) are decorrelated but still radially symmetric: they factorise in polar co-ordinates, but not in cartesian, unless they’re Gaussian.. instead of This happens when cells have similar position, spatial frequency, and orientation selectivity, but different phase. Dependent filters can combine to make non-linear complex cells (oriented but phase insensitive). ‘Dependent’ Component Analysis. First, the maximum likelihood framework. What we have been doing is: Infomax Maximum Likelihood Minimum KL Divergence We are fitting a model to the data: or equivalently: But a much more general model is the ‘energy-based’ model (Hinton): with sum of functions on subsets of ‘Dependent’ Component Analysis. For the completely general model: the learning rule is: with the 2nd term reducing to -I (identity) in the case of ICA. Unfortunately this involves an intractable integral over the model q. Nonetheless, we can still work with all dependency models which are non-loopy hypergraphs. Learn as before, but : with a modified score function: instead of a loopy hypergraph: For example, we can split the space into subspaces such that the cells are independent between subspaces and dependent within the subspaces. Eg: for 4 cells: 1 2 3 4 We now show a sequence of symmetry-breaking occuring as we move from training, on images, a model which is one big 256-dimensional hyperball, down to a model which is 64 four-dimensional hyperballs: Logistic Density 1 subspace Logistic density 2 subspaces Logistic density 4 subspaces Logistic density 8 subspaces Logistic density 16 subspaces Logistic density 32 subspaces Logistic density 64 subspaces Topographic ICA Arrange the cells in a 2D map with a statistical model q constructed from overlapping subsets. This is a loopy hypergraph, an un-normalised model, but it still gives a nice result…. The hyperedges of our hypergraph are overlapping 4x4 neighbourhoods etc. That was from Hyvarinen & Hoyer. Here’s one from Osindero & Hinton. Well, we did get somewhere: Conclusion. We seem to have an information-theoretic explanation of some properties of area V1 of visual cortex: -simple cells (Olshausen &Field, Bell & Sejnowski) -complex cells (Hyvarinen & Hoyer) -topographic maps with singularities (Hyvarinen & Hoyer) -colour receptive fields (Doi & Lewicki) -direction sensitivity (van Hateren & Ruderman) But we are stuck on: -the gradient of the partition function -still working with rate models, not spiking neurons -no top-down feedback -no sensory-motor (all passive world modeling) References. The references for all the work in these 3 talks will be forwarded separately. If you don’t have access to them email me at [email protected], and I’ll send them to you.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Information Theory and Learning