Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Designing High-Capacity Neural Networks for Storing, Retrieving and Forgetting Patterns in Real-Time Dmitry O. Gorodnichy IMMS, Cybernetics Center of Ukrainian Academy Sciences, Kiev,Ukraine Dept. of Computing Science, University of Alberta, Edmonton, Canada http://www.cv.iit.nrc.ca/~dmitry/pinn As presented at IJCNN’99, Washington DC, July 12-17, 1999 Outline • What is memory? What is good memory? • Pseudo-Inverse Neural Network is a tool to build it! • Setting the paradigm for designing non-iterative highcapacity real-time adaptive systems: • Increasing the Attraction Radius of the network • Desaturating the network • Solution for cycles and for fast retrieval • Just some results • Processing a stream of data - Dynamic Desaturation • Conclusions. Some food for thought What is Memory? The one that stores data: and retrieves it: What do we want? • To store • as much as possible (for given amount of space) • as fast as possible • To retrieve • from as much noise as possible • as fast as possible • To continuously update the contents of the memory, as new data are coming We are interested in theoretically grounded solutions Neural Network - a tool to do that • A fully connected network of N neurons Yi, which evolves in time according to the update rule: until it reaches a stable state (attractor). • Patterns can be stored as attractors -> Non-iterative learning - fast learning -> Synchronous dynamics - fast retrieval How to find the best weight matrix C ? Pseudo-Inverse Learning Rule • Obtained from stability condition [Personnaz’85, Reznik’93]: CV=eV C = VV+ or (Widrow-Hoff’s rule is its approximation) • Optical and hardware implementations exist • Its dynamics can be studied theoretically: -> It can retrieve up to 0.3 N patterns (Hebbian rule retrieves only 0.13N patterns) Attraction Radius Attraction radius (AR) tells us how good is the retrieval • Direct AR can be calculated theoretically [Gorodnichy’95] as -> Weights C determine AR ... -> … and weights satisfy: • Indirect AR can be estimated by Monte-Carlo simulations Desaturation of the network When there are too many patterns in memory, the network gets saturated: • There are too many spurious local attractors [Reznik 93]. • Global attractors are never reached. Solution [Gorodnichy 95&96]: desaturate the network by partially reducing self-connections: Cii = Cii*D, 0 < D < 1 • Desaturation: -> preserves main attractors -> decreases the number of static spurious attractors -> makes the network more flexible (increases the number of iterations) -> drastically increases the attraction radius [Gorodnichy&Reznik’97] But what about cycles (dynamic spurious attractors)? Increase of AR with Desaturation Indirect AR Direct AR Dynamics of the network The behaviour of the netrwork is governed by the energy functions • Cycles are possible, when D<1 : • However : -> They are few, when D>0.1 [Gorodnichy&Reznik’97] -> They are detected automatically Update flow neuro-processing [Gorodnichy&Reznik’94]: “Process only those neurons which change during the evolution”, i.e. instead of N multiplications: do only few of them : -> is very fast (as only few neurons are actually changing in one iteration) -> detects cycle automatically -> suitable for parallel implementation Dealing with a stream of data Dynamic desaturation: -> maintains the capacity of 0.2N (with complete retrieval) -> allows to store data in real-time (no need for iterative learning methods!) -> provides means for forgetting obsolete data -> is the basis for the design of adaptive filters -> gives new insights on how the brain work -> is a ground for the revision of the traditional learning theory That’s what is the Neurocomputer designed in IMMS of Cybernetics Center of the Ukrainian NAS Conclusions • The best performance of Hopfield like networks is achieved with the Desaturated Pseudo-Inverse Learning Rule: C=VV+, Cii=D*Cii, D=0.15 E.g. complete retrieval from 8% noise of M=0.5N patterns from 2% noise of M=0.7N patterns • The basis for non-iterative learning (to replace traditional iterative learning methods) is set. This basis is Dynamic Desaturation, which allows one to build real-time Adaptive Systems. • Update Flow neuro-processing technique makes retrieval very fast. It also resolves the issue of spurious dynamic attractors. • Free code of Pseudo-Inverse Memory is available! (see our web-site).