Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Neural modeling fields wikipedia , lookup
Gene expression programming wikipedia , lookup
Concept learning wikipedia , lookup
History of artificial intelligence wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
Machine learning wikipedia , lookup
Catastrophic interference wikipedia , lookup
Aplicación de REDES NEURONALES Artificiales a problemas de predicción Leonardo Franco ESCUELA Técnica de ingeniería informática Universidad de Málaga España http://www.lcc.uma.es/~lfranco [email protected] 2 de Septiembre de 2016 Presentación Córdoba Argentina Trieste Italia Oxford UK Málaga España Redes Neuronales: Estudio de la capacidad de generalización, Cálculo de arquitecturas óptimas y Medida de complejidad de problemas, Implementación en FPGA y microcontroladores Aplicaciones: Predicción de recidiva de cáncer de mama, Predicción y monitorización de contaminación atmosférica, Predicción del comportamiento de consumidores José M. Jerez (PhD) Iván Gómez (PhD) Leonardo Franco (PhD) Francis Veredas (PhD) THE ICB GROUP Paco Ortega (PhD) Computational Intelligence and biomedicine UNIVERSIDAD DE MÁLAGA http://www.lcc.uma.es/~lfranco/ Daniel Urda (PhD) José Subirats (PhD) Julio Montes (MSc) Gustavo Juárez(MSc) Generalization in Neural Networks Comments L. Franco and S.A. Cannas. Generalization and Selection of Examples in Feed-Forward Neural Networks. Neural Computation, 12, pp. 2405-- 2426 (2000) Number of patterns needed for perfect generalization L. Franco and S.A. Cannas. Generalization Properties of Modular Networks: Implementing the Parity Function. IEEE Transactions on Neural Networks, 12, pp. 1306--1313 (2001) Generalization ability of modular architectures L. Franco and M. Anthony. The influence of oppositely classified examples on the generalization complexity of Boolean functions. IEEE Transactions on Neural Networks, 17, pp. 578--590 (2006). Measuring the complexity of data sets I. Gómez, L. Franco and J.M. Jerez. Neural Network Architecture Selection: Can function complexity help?. Neural Processing Letters, 30, pp. 71-87 (2009) Using data set complexity for architecture selection J.P. Neirotti and L. Franco. Computational capabilities of multilayer committee machines. Journal of Physics A: Mathematical & Theoretical, 43, 445103 (2010) Analytics results for the Generalization of multilayer architectures I. Gómez, S.A. Cannas, O. Osenda, J.M. Jerez, and L. Franco. The Generalization Complexity Measure for Continuous Input Data. The Scientific World Journal, 815156 (2014). Extension of the complexity measure to continuous input Constructive Neural Network Publications Comments J.L. Subirats, J.M. Jerez & L. Franco. A New Decomposition Algorithm for Threshold Synthesis and Generalization of Boolean Functions. IEEE Transactions on Circuits and Systems I, 55, pp. 3188-3196 (2008). DASG: decomposition algorithm for Boolean functions J.L. Subirats, L. Franco & J.M. Jerez. (2012). Competition and stable learning for creating compact architectures with good generalization abilities: The C-Mantec algorithm. Neural Networks, 26, pp 130-140 C-Mantec neural network constructive algorithm J.L Subirats, J.M. Jerez, I. Gómez & L. Franco (2010). Multiclass pattern recognition extension for the new C-Mantec constructive neural network algorithm. Cognitive Computation, 2, pp. 285-290 (2010). C-Mantec Extension to multiclass problems Franco, L., Elizondo, D. & Jerez, J.M. (Eds.) Constructive Neural Networks, Springer (2010) Springer Series on Computational Intelligence 5 High Performance Computing for NN Topic F. Ortega-Zamorano, J.M. Jerez, Iván Gómez and L. Franco. Layer Multiplexing FPGA Implementation for Deep BackPropagation Learning. Integrated Computer-Aided Enginering (2016) FPGAs for Deep Learning F. Ortega-Zamorano, M. Montemurro, S.A. Cannas, J.M. Jerez, and L. Franco. FPGA Hardware Acceleration of Monte Carlo Simulations for the Ising Model. Transactions on Parallel and Distributed Systems, In Press (2015). Application of FPGAs to magnetic systems F. Ortega-Zamorano, J.M. Jerez, D. Urda, R.M. Luque-Baena and L. Franco. Efficient Implementation of the Backpropagation Algorithm in FPGAs and Microcontrollers. IEEE Transactions on Neural Networks and Learning Systems, (2015) BP implemented in FPGAs and microcontrollers Ortega-Zamorano, F., Jerez, J.M., & Franco, L. FPGA implementation of the C-Mantec neural nework constructive algorithm. IEEE Transactions on Industrial Informatics. (2014) FPGA implementation of NN constructive algorithm http://www.lcc.uma.es/~lfranco Breast Cancer Relapse Prediction Topic J Montes , JLSubirats, et . Advanced Online Survival Analysis Tool for Predictive Modelling in Clinical Data Science. PLOS ONE (2016) In Press. On-line Computatinal Intelligence techniques for predictive modelling I. Gomez, L. Franco, JM Jerez. Supervised discretization can discover risk groups in cancer survival analysis. Computer Methods and Programs in Biomedicine (2016), pp. 11-19 Discretization and prediction for group Discovery in Cancer patients J.M. Jerez, I. Molina, P.J. García-Laencina, E. Alba, N. Ribelles, M. Martín and L. Franco. Missing Data Imputation Using Statistical and Machine Learning Methods in a Real Breast Cancer Problem. Artificial Intelligence in Medicine, 50, pp.105-115 (2010) Missing data imputation using machine learning methods J. Jerez, L. Franco, E. Alba, A. Llombart-Cussac, A. Lluch, N. Ribelles, B. Munárriz and M. Martín. Improvement of Breast Cancer Relapse Prediction in High Risk Intervals Using Artificial Neural Networks. Breast Cancer Research and Treatment, 94, pp. 265--272 (2005). Neural Network for Breast Cancer prediction http://www.lcc.uma.es/~lfranco Where do Artificial Neural Networks fit ? Artificial Intelligence Symbolic-deductive Intelligence • Decision tree • Expert Systems • Bayesian networks Adaptive mechanism to acquire smart behaviour in complex and changing environments Computational Intelligence • Artificial Neural Networks • SVM • Evolutionary computation THE DATA MINING PROCESS Data preprocessing Overview analysis Modeling the data Model selection Normalisation Analysis of the Results Prediction error ROC curve Feature selection Parameter setting Missing data imputation Comparison to standard & simpler models Training data Rule generation & visualisation Outlier detection Regularization methods Artificial Neural Networks (ANN) Parallel processing system of interconnected simple units, named neurons, used to store knowledge through a learning process. Inspired on bran processes in the following two senses: 1. Knowledge is acquired through a learning process 2. Knowledge is stored in the conections between neurons “Synaptic weights”. The real neuron Morfology: Dendrites Soma Axon Synapsis Physiology: Synaptic terminals Synaptic potential Activation/Inhibition MILESTONES IN NEURAL NETWORKS HISTORY 1905 Neuron’s doctrine – Santiago Ramón y Cajal & others. 1948 Simple Neuron model - McCullogh-Pitts 1950 Hebb learning rule – Donald Hebb 1962 Perceptron Learning - Frank Rosenblatt 1972 Limitations of the perceptron - Book by Robert Minsky & Papert 1982 Associative memory model - Hopfield network 1986 PDP Parallel Distributed Processing book – BP algorithm – Rumelhart, McLelland, Hinton 2002 Deep Neural Networks. G. Hinton Most popular Neural Network Models Hopfield Network Multilayer perceptron Recurrent network for modelling associative memories. Feed-forward neural network trained by the back-propagation algorithm Attractors are created for desired memories. Form an initial condition, the system evolves towards the closest stored memory. Trained through a gradient descent method minimizing the difference between target and real output. INPUT OUTPUT Learning in neural networks Artificial neural networks learn in an adaptive process in which the synaptic weights between neuronas are modified. Learning Synaptic weight modification Unsupervised learning Learning process Reinforced learning Supervised learning Artificial Neural Networks: Learning paradigms Learning Synaptic weight modification Supervised Learning paradigms Use target values of input patterns to minimize the difference bettwen target and real outputs. Used for prediction and classification Unsupervised Find the right behaviour accordint to the data structure. No output is provided for the data. Used for Clustering and self-organization Reinforcement Learning Only a signal whether the system is doing right or not is provided. Biologically rooted The Pattern Classification Problem Given a set of patterns defined by a set of inputs and its correspondents outputs, we adjust a model (NN, SVM, DT, LR, ttc.) {x1,z1},{x 2,z 2},...,{x p ,z p } Set of patterns x (x1, x2,..., xn ) X2 C1 C2 (x1,x2) X1 Inputs z {1} Pattern class The Pattern Classification Problem After training we present novel patterns and would like to predict (generalization) to which class the new datum belongs to. 1 2 p p x (x1, x2,..., xn ) X2 C1 C2 2 {x ,z },{x ,z },...,{x ,z } Pattern set C1 or C2 ? 1 (x1,x2) X1 Inputs z {1} Class Artificial Neural Network Applications • • • • • • Financial Markets Cancer survival prediction Image processing Robotics Creativity Sustainable systems Feed-forward Neural Network synapses neurons • Problem + examples • Learning Algorithm Generalization: ability to predict the output of unseen examples Theories explaining generalization PAC learning VC theory Statistical mechanics Connections exists between them and some general bounds have been derived (not very tight) Factors affecting the generalization ability • • • • • Complexity of the problem Architecture (size of the weights, activation function ) Patterns of data Learning algorithm Initialization of the weights A picture of learning and generalization X Y O Given a problem, and an architecture: The architecture can be characterized by the number of functions that implements and their volumes defining the Entropy of an architecture. It is not only the number of functions and their space volume what affect the generalization ability but also the number of neighboring functions to the one we want to implement (distribution). The key idea is that learning is a search for configurations compatible with the training examples 0 0 0 0 1 0 1 0 1 1 1 0 Result obtained by exact enumeration, mean field theory and using simulated annealing as Learning experiment on a function computing the sum of 4 inputs. (From Van der Broeck et al, 1990, Phys Rev. A) Architecture One of the most important features affecting generalization in practical applications. Few theoretical results. Practical applications based on trial and error. Growing and pruning algorithms Modularity can greatly enhanced generalization, but for most problems is not easy to modularize. Occam's Razor [William of Occam (1285-1349)] "Entia non sunt multiplicanda praeter necessitatem" One should not increase, beyond what is necessary, the number of entities required to explain anything. The C-Mantec algorithm Competitive majority netwok trained by error correcting • Utilizes a very stable learning rule (The thermal perceptron, M. Frean, Neural Computation, 1989) • The algorithm permits the modification (learning) of all synaptic weight at all times Traditional constructive algorithms C-Mantec Very stable learning rule Much plasticity at individual neuron level Competitive global plasticity No global plasticity Severely affected by overfitting Buil-in filtering method to avoid overfitting We can say that both approaches operate at a different equilibrium point in the plasticity-stability dilemma The C-Mantec Neural Network Algorithm The architecture is automatically constructed during the training phase Output function is the majority function Neurons compete for learning the new patterns, according to what they have learnt so far and according to an internal temperature (thermal perceptron) Input 1 Output J.L. Subirats, L. Franco and J.M. Jerez. C-Mantec: a novel constructive neural network algorithm incorporating competition between neurons. Neural Networks, 26, pp 130-140 (2012). C-Mantec algorithm: operation Entradas Salida C-Mantec algorithm: operation Entradas Salida C-Mantec algorithm: operation Entradas Salida Algoritmo C-Mantec: Funcionamiento Entradas Salida C-Mantec algorithm: operation Entradas Salida C-Mantec algorithm: operation Entradas Salida C-Mantec algorithm: operation Entradas Salida C-Mantec algorithm: operation Entradas Salida C-Mantec algorithm: operation Input Output C-Mantec algorithm: operation Entradas Salida C-Mantec algorithm: operation Entradas Salida C-Mantec algorithm: operation Entradas Salida Neural Network Constructive Algorithms They are developed essentially to avoid the problem of architecture selection. A large variety of algorithm have been developed: Cascade correlation, Redes tipo ART, Algoritmo de Tiling, Upstart, etc. Constructive Neural Networks. (2010) Franco, L., Elizondo, D., Jerez, J.M. (Editors) . Springer ISBN: 978-3-642-04511-0 Fast Implementation of Neural Networks on Hardware Cluster computation Hardware Implementation of CI algorithms GPUs FPGA FPGA (Field Programmable Gate Array) Bloques I/O Programmable Interconections 3 LUT 3 LUT Configurable logic blocks (CLB) VHDL programming of C-Mantec in a FPGA DSP D-RAM FPGA: Field Programmable Gate Array Modem prototyping N e u r o b o t i c s Infrastructure monitoring Comparative results: FPGA vs PC implementation of C-Mantec FPGA implementation gets much faster as the complexity of the problem increases Ortega-Zamorano, F., Jerez, J.M., & Franco, L. FPGA implementation of the C-Mantec neural nework constructive algorithm. IEEE Transactions on Industrial Informatics, 10, pp. 1154-1161 (2014) Principales diferencias entre Back-propagation y Deep Learning Arquitecturas con una capa oculta Arquitecturas neuronales de muchas capas internas Inicialización de pesos sinápticos de forma aleatoria Fase de pre-entrenamiento de forma no supervisada Entrenamiento supervisado Fase final de Entrenamiento supervisado Funciones de activación sigmoidea Funciones de activación lineales con umbral Deep Neural Network Based Feature Representation for Weather Forecasting, Liu et al., In Proceedings of the 2014 Internatonal Conference on Artificial Intelligence, pp 261–267, 2014. AUTOENCODER for Pre-training Autocodificación (Autoencoder) para el pre-entrenamiento de las capas ocultas Las redes de convolución son utilizadas para obtener invariancia de traslación y tamaño. Podemos rastrear sus orígenes al Neocognitron (1980) Current project: Monitor the Algeciras Bay for predicting pollution alerts Combine sensor information with meteorological forecast Conclusions • In the last 20 years we have explored several aspects of Neural Networks related to their prediction ability, mainly towards the design of efficient architectures: Selecting the proper architecture for a given problem is difficult. • The FPGA implementation of neural network algorithms can boost training times by use of the intrinsic parallelism of FPGAs. • Deep Learning is a recent successfull approach that utilizes very large networks (deep and wide) that are trained unsupervised and supervisely. • The application of traditional NN to weather forecast problems did not lead to great results. Deep Learning seems to be an interesting option. GRACIAS ! You can contact me at: [email protected] http://www.lcc.uma.es/~lfranco Escuela Técnica Superior de Ingeniería Informática Y Telecomunicaciones