Download APLICACIóN DE REDES NEuRONALES ARTIFICIALES A

Document related concepts

Neural modeling fields wikipedia , lookup

Gene expression programming wikipedia , lookup

Concept learning wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Machine learning wikipedia , lookup

Catastrophic interference wikipedia , lookup

Pattern recognition wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
Aplicación de REDES NEURONALES
Artificiales a problemas de
predicción
Leonardo Franco
ESCUELA Técnica
de ingeniería informática
Universidad de Málaga
España
http://www.lcc.uma.es/~lfranco
[email protected]
2 de Septiembre de 2016
Presentación
Córdoba
Argentina
Trieste
Italia
Oxford
UK
Málaga
España
Redes Neuronales: Estudio de la capacidad de generalización, Cálculo de
arquitecturas óptimas y Medida de complejidad de
problemas, Implementación en FPGA y
microcontroladores
Aplicaciones:
Predicción de recidiva de cáncer de mama, Predicción
y monitorización de contaminación atmosférica,
Predicción del comportamiento de consumidores
José M. Jerez (PhD)
Iván Gómez (PhD)
Leonardo Franco (PhD)
Francis Veredas (PhD)
THE ICB GROUP
Paco Ortega (PhD)
Computational Intelligence
and biomedicine
UNIVERSIDAD DE MÁLAGA
http://www.lcc.uma.es/~lfranco/
Daniel Urda (PhD)
José Subirats (PhD)
Julio Montes (MSc) Gustavo Juárez(MSc)
Generalization in Neural Networks
Comments
L. Franco and S.A. Cannas. Generalization and Selection of
Examples in Feed-Forward Neural Networks. Neural
Computation, 12, pp. 2405-- 2426 (2000)
Number of patterns needed
for perfect generalization
L. Franco and S.A. Cannas. Generalization Properties of Modular
Networks: Implementing the Parity Function. IEEE Transactions
on Neural Networks, 12, pp. 1306--1313 (2001)
Generalization ability of
modular architectures
L. Franco and M. Anthony. The influence of oppositely classified
examples on the generalization complexity of Boolean
functions. IEEE Transactions on Neural Networks, 17, pp. 578--590
(2006).
Measuring the complexity
of data sets
I. Gómez, L. Franco and J.M. Jerez. Neural Network Architecture
Selection: Can function complexity help?. Neural Processing
Letters, 30, pp. 71-87 (2009)
Using data set complexity
for architecture selection
J.P. Neirotti and L. Franco. Computational capabilities of
multilayer committee machines. Journal of Physics A:
Mathematical & Theoretical, 43, 445103 (2010)
Analytics results for the
Generalization of multilayer
architectures
I. Gómez, S.A. Cannas, O. Osenda, J.M. Jerez, and L. Franco. The
Generalization Complexity Measure for Continuous Input
Data. The Scientific World Journal, 815156 (2014).
Extension of the complexity
measure to continuous
input
Constructive Neural Network Publications
Comments
J.L. Subirats, J.M. Jerez & L. Franco. A New Decomposition Algorithm
for Threshold Synthesis and Generalization of Boolean Functions. IEEE
Transactions on Circuits and Systems I, 55, pp. 3188-3196 (2008).
DASG: decomposition
algorithm for Boolean
functions
J.L. Subirats, L. Franco & J.M. Jerez. (2012). Competition and stable
learning for creating compact architectures with good generalization
abilities: The C-Mantec algorithm. Neural Networks, 26, pp 130-140
C-Mantec neural network
constructive algorithm
J.L Subirats, J.M. Jerez, I. Gómez & L. Franco (2010). Multiclass pattern
recognition extension for the new C-Mantec constructive neural
network algorithm. Cognitive Computation, 2, pp. 285-290 (2010).
C-Mantec Extension to
multiclass problems
Franco, L., Elizondo, D. & Jerez, J.M. (Eds.)
Constructive Neural Networks, Springer (2010)
Springer Series on Computational Intelligence
5
High Performance Computing for NN
Topic
F. Ortega-Zamorano, J.M. Jerez, Iván Gómez and L. Franco. Layer
Multiplexing FPGA Implementation for Deep BackPropagation Learning. Integrated Computer-Aided Enginering
(2016)
FPGAs for Deep Learning
F. Ortega-Zamorano, M. Montemurro, S.A. Cannas, J.M. Jerez, and L.
Franco. FPGA Hardware Acceleration of Monte Carlo
Simulations for the Ising Model. Transactions on Parallel and
Distributed Systems, In Press (2015).
Application of FPGAs to
magnetic systems
F. Ortega-Zamorano, J.M. Jerez, D. Urda, R.M. Luque-Baena and L.
Franco. Efficient Implementation of the Backpropagation
Algorithm in FPGAs and Microcontrollers. IEEE Transactions on
Neural Networks and Learning Systems, (2015)
BP implemented in FPGAs
and microcontrollers
Ortega-Zamorano, F., Jerez, J.M., & Franco, L. FPGA implementation
of the C-Mantec neural nework constructive algorithm. IEEE
Transactions on Industrial Informatics. (2014)
FPGA implementation of
NN constructive algorithm
http://www.lcc.uma.es/~lfranco
Breast Cancer Relapse Prediction
Topic
J Montes , JLSubirats, et . Advanced Online Survival Analysis Tool
for Predictive Modelling in Clinical Data Science. PLOS ONE
(2016) In Press.
On-line Computatinal
Intelligence techniques for
predictive modelling
I. Gomez, L. Franco, JM Jerez. Supervised discretization can
discover risk groups in cancer survival analysis. Computer
Methods and Programs in Biomedicine (2016), pp. 11-19
Discretization and
prediction for group
Discovery in Cancer
patients
J.M. Jerez, I. Molina, P.J. García-Laencina, E. Alba, N. Ribelles, M.
Martín and L. Franco. Missing Data Imputation Using Statistical
and Machine Learning Methods in a Real Breast Cancer
Problem. Artificial Intelligence in Medicine, 50, pp.105-115 (2010)
Missing data imputation
using machine learning
methods
J. Jerez, L. Franco, E. Alba, A. Llombart-Cussac, A. Lluch, N. Ribelles, B.
Munárriz and M. Martín. Improvement of Breast Cancer Relapse
Prediction in High Risk Intervals Using Artificial Neural
Networks. Breast Cancer Research and Treatment, 94, pp. 265--272
(2005).
Neural Network for Breast
Cancer prediction
http://www.lcc.uma.es/~lfranco
Where do Artificial Neural Networks fit ?
Artificial
Intelligence
Symbolic-deductive
Intelligence
• Decision tree
• Expert Systems
• Bayesian networks
Adaptive mechanism to
acquire smart behaviour in
complex and changing
environments
Computational
Intelligence
• Artificial Neural Networks
• SVM
• Evolutionary computation
THE DATA MINING PROCESS
Data preprocessing
Overview
analysis
Modeling the data
Model
selection
Normalisation
Analysis of the
Results
Prediction error
ROC curve
Feature selection
Parameter
setting
Missing data
imputation
Comparison to
standard & simpler
models
Training data
Rule generation
&
visualisation
Outlier detection
Regularization
methods
Artificial Neural Networks (ANN)
Parallel processing system of interconnected
simple units, named neurons, used to store
knowledge through a learning process.
Inspired on bran processes in the following two
senses:
1. Knowledge is acquired through a learning
process
2. Knowledge is stored in the conections
between neurons “Synaptic weights”.
The real neuron
Morfology:
Dendrites
Soma
Axon
Synapsis
Physiology:
Synaptic terminals
Synaptic potential
Activation/Inhibition
MILESTONES IN NEURAL NETWORKS HISTORY
1905
Neuron’s doctrine – Santiago Ramón y Cajal & others.
1948
Simple Neuron model - McCullogh-Pitts
1950
Hebb learning rule – Donald Hebb
1962
Perceptron Learning - Frank Rosenblatt
1972 Limitations of the perceptron - Book by Robert Minsky & Papert
1982 Associative memory model - Hopfield network
1986
PDP Parallel Distributed Processing book – BP algorithm – Rumelhart,
McLelland, Hinton
2002 Deep Neural Networks. G. Hinton
Most popular Neural Network Models
Hopfield Network
Multilayer perceptron
Recurrent network for modelling
associative memories.
Feed-forward neural network
trained by the back-propagation
algorithm
Attractors are created for desired
memories. Form an initial condition,
the system evolves towards the closest
stored memory.
Trained through a gradient descent
method minimizing the difference
between target and real output.
INPUT
OUTPUT
Learning in neural networks
Artificial neural networks learn in an adaptive process in which
the synaptic weights between neuronas are modified.
Learning
Synaptic weight modification
Unsupervised learning
Learning process
Reinforced learning
Supervised learning
Artificial Neural Networks: Learning paradigms
Learning
Synaptic weight modification
Supervised
Learning
paradigms
Use target values of input patterns to minimize the
difference bettwen target and real outputs.
Used for prediction and classification
Unsupervised
Find the right behaviour accordint to the data
structure. No output is provided for the data.
Used for Clustering and self-organization
Reinforcement
Learning
Only a signal whether the system is doing right or
not is provided. Biologically rooted
The Pattern Classification Problem
Given a set of patterns defined by a set of inputs and its correspondents
outputs, we adjust a model (NN, SVM, DT, LR, ttc.)
{x1,z1},{x 2,z 2},...,{x p ,z p }
Set of patterns
x  (x1, x2,..., xn )
X2
C1

C2
(x1,x2)

X1
Inputs
z  {1}
Pattern class
The Pattern Classification Problem
After training we present novel patterns and would like to predict
(generalization) to which class the new datum belongs to.
1
2
p
p
x  (x1, x2,..., xn )
X2
C1

C2
2
{x ,z },{x ,z },...,{x ,z }
Pattern set
C1 or C2 ?
1
(x1,x2)

X1
Inputs
z  {1}
Class
Artificial Neural Network Applications
•
•
•
•
•
•
Financial Markets
Cancer survival prediction
Image processing
Robotics
Creativity
Sustainable systems
Feed-forward Neural Network
synapses
neurons
• Problem + examples
• Learning Algorithm
Generalization: ability to predict the
output of unseen examples
Theories explaining generalization
 PAC learning
 VC theory
 Statistical mechanics
Connections exists between them and
some general bounds have been derived
(not very tight)
Factors affecting the generalization ability
•
•
•
•
•
Complexity of the problem
Architecture (size of the weights, activation function )
Patterns of data
Learning algorithm
Initialization of the weights
A picture of learning and generalization
X Y O
Given a problem, and an architecture:
 The architecture can be characterized by the
number of functions that implements and their
volumes defining the Entropy of an
architecture.
 It is not only the number of functions and their
space volume what affect the generalization
ability but also the number of neighboring
functions to the one we want to implement
(distribution).
 The key idea is that learning is a search for
configurations compatible with the training
examples
0 0 0
0 1 0
1 0 1
1 1 0
Result obtained by exact
enumeration, mean field theory
and using simulated annealing as
Learning experiment on a function
computing the sum of 4 inputs.
(From Van der Broeck et al, 1990, Phys
Rev. A)
Architecture
 One of the most important features affecting
generalization in practical applications.
 Few theoretical results.
 Practical applications based on trial and error.
 Growing and pruning algorithms
 Modularity can greatly enhanced generalization,
but for most problems is not easy to modularize.
Occam's Razor [William of Occam (1285-1349)]
"Entia non sunt multiplicanda praeter necessitatem"
One should not increase, beyond what is necessary,
the number of entities required to explain anything.
The C-Mantec algorithm
Competitive majority netwok trained by error correcting
• Utilizes a very stable learning rule (The thermal perceptron, M. Frean,
Neural Computation, 1989)
• The algorithm permits the modification (learning) of all synaptic weight at
all times
Traditional constructive
algorithms
C-Mantec
Very stable learning rule
Much plasticity at individual
neuron level
Competitive global plasticity
No global plasticity
Severely affected by overfitting
Buil-in filtering method to avoid
overfitting
We can say that both approaches operate at a different equilibrium point in
the plasticity-stability dilemma
The C-Mantec Neural Network Algorithm
 The
architecture is automatically constructed during the training
phase
Output function is the majority function
 Neurons compete for learning the new patterns, according to what
they have learnt so far and according to an internal temperature
(thermal perceptron)
Input
1
Output
J.L. Subirats, L. Franco and J.M. Jerez. C-Mantec: a novel constructive neural network algorithm
incorporating competition between neurons. Neural Networks, 26, pp 130-140 (2012).
C-Mantec algorithm: operation
Entradas
Salida
C-Mantec algorithm: operation
Entradas
Salida
C-Mantec algorithm: operation
Entradas
Salida
Algoritmo C-Mantec: Funcionamiento
Entradas
Salida
C-Mantec algorithm: operation
Entradas
Salida
C-Mantec algorithm: operation
Entradas
Salida
C-Mantec algorithm: operation
Entradas
Salida
C-Mantec algorithm: operation
Entradas
Salida
C-Mantec algorithm: operation
Input
Output
C-Mantec algorithm: operation
Entradas
Salida
C-Mantec algorithm: operation
Entradas
Salida
C-Mantec algorithm: operation
Entradas
Salida
Neural Network Constructive Algorithms
They are developed essentially to avoid the problem of
architecture selection.
A large variety of algorithm have been developed: Cascade
correlation, Redes tipo ART, Algoritmo de Tiling, Upstart, etc.
Constructive Neural Networks. (2010)
Franco, L., Elizondo, D., Jerez, J.M. (Editors) .
Springer ISBN: 978-3-642-04511-0
Fast Implementation of Neural Networks on Hardware
Cluster computation
Hardware
Implementation of
CI algorithms
GPUs
FPGA
FPGA (Field Programmable Gate Array)
Bloques
I/O
Programmable
Interconections
3 LUT
3 LUT
Configurable logic
blocks (CLB)
VHDL programming of C-Mantec in a FPGA
DSP
D-RAM
FPGA: Field Programmable Gate Array
Modem prototyping
N
e
u
r
o
b
o
t
i
c
s
Infrastructure monitoring
Comparative results: FPGA vs PC implementation of C-Mantec
FPGA implementation gets much faster as the complexity
of the problem increases
Ortega-Zamorano, F., Jerez, J.M., & Franco, L. FPGA implementation of the C-Mantec
neural nework constructive algorithm. IEEE Transactions on Industrial Informatics, 10, pp.
1154-1161 (2014)
Principales diferencias entre Back-propagation y Deep Learning
Arquitecturas con una capa oculta
Arquitecturas neuronales de
muchas capas internas
Inicialización de pesos
sinápticos de forma aleatoria
Fase de pre-entrenamiento de
forma no supervisada
Entrenamiento supervisado
Fase final de
Entrenamiento supervisado
Funciones de activación sigmoidea
Funciones de activación
lineales con umbral
Deep Neural Network Based Feature Representation for Weather
Forecasting, Liu et al., In Proceedings of the 2014 Internatonal
Conference on Artificial Intelligence, pp 261–267, 2014.
AUTOENCODER for Pre-training
Autocodificación (Autoencoder) para el pre-entrenamiento de las capas ocultas
Las redes de convolución son utilizadas para obtener
invariancia de traslación y tamaño. Podemos rastrear sus
orígenes al Neocognitron (1980)
Current project: Monitor the Algeciras Bay for predicting pollution alerts
Combine sensor
information with
meteorological
forecast
Conclusions
• In the last 20 years we have explored several aspects of Neural
Networks related to their prediction ability, mainly towards the design
of efficient architectures: Selecting the proper architecture for a given
problem is difficult.
• The FPGA implementation of neural network algorithms can boost
training times by use of the intrinsic parallelism of FPGAs.
• Deep Learning is a recent successfull approach that utilizes very large
networks (deep and wide) that are trained unsupervised and
supervisely.
• The application of traditional NN to weather forecast problems did
not lead to great results. Deep Learning seems to be an interesting
option.
GRACIAS !
You can contact me at: [email protected]
http://www.lcc.uma.es/~lfranco
Escuela Técnica
Superior de
Ingeniería
Informática Y
Telecomunicaciones