Download Neural Network Implementations on Parallel Architectures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neural modeling fields wikipedia , lookup

Pattern recognition wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Catastrophic interference wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
Neural Network Implementations on Parallel Architectures
Şeniz Demir
Department of Computer Engineering,
Bogaziçi University
Bebek,İstanbul
[email protected]
1. Introduction
Trade of preferring neural networks in the developed applications increases for a long
period. This increase brings the problem of selecting the best architecture for the network
structure used. Although there are different architecture possibilities in use, mostly one is
better than the others. This paper focuses on the implementation of neural network on
parallel architecture and the problems that must be solved for this purpose. In the second
section, what is a neural network is discussed and the training of a neural network is
explained. In the third section, different parallel architectures are explained. Fourt section
shows the difference between the parallelism of neural networks and parallel
architectures. Finally, different applications and mapping methods are given.
2. Neural Networks
The human brain is made up of billions of individual neurons(see Figure 1) which have
varying sizes and shapes. A neuron generally receives its stimulation at synapses,
junctions between its dendrites and the output areas of other neurons.Signals travel from
the synapses through the dendrites and are weighted and summed as they are gathered by
the branching structure. The total weighted sum of input stimulation must be high enough
for the cell to fire and its output pulse. The level of excitation that must be exceeded is
called the threshold for the neuron. When the neuron does fire, an electrical pulse is
carried down the axon to any synapses that the axon leads to. After the neuron has fired,
it takes some time for the neuron to recover and prepare itself to respond to input
information once more. This period is known as refractory period.
Figure 1. Structure of a neuron in brain
The investigation on developing intelligent systems(systems that can represent
knowledge and learn- reason) is inspired from neurons and their activities. Networks of
neurons clearly can perform the computations needed for intelligence. Finally, the
decision, which promotes a computer can perform computations like a human being if
neurons connected into artificial neural networks are simulated, started the development
of ANNs(Artificial Neural Networks)
Each ANN is composed of different number of neurons (see Figure 2). The main
characteristics of ANNs are:





adaptive learning: The behavior of the network is changed according to the data
given as input. The network is decided how it will be reacted, so no outside
interference to the system. This learning process continues when new data is fed
to the system.
self_organization: The structure of the network is changed according to the data
given. The structure of the network can be changed either by changing the
strength between the neurons or by changing the learning algorithm used.
error tolerance: The network is capable of finding a generalization for new or
distorted data. A model is determined for each given data.
real_time operation: As a consequence of parallel processing, real_time operation
becomes possible.
parallel information processing: Like the neurons in the human brain, the
processing between the neurons in ANN is parallel.
Figure 2. The structure of a neuron in ANN
2.1. Design and Structure
In order to design an ANN, the number of layers, the distribution of the neurons to
different number of layers, the communication used between the layers, and the strength
of connection within the network must be determined properly.
The structure of an ANN is basically a topology consists of three layers: input, hidden
(optional), and the output layer (see Figure 3).
Figure 3. Structure of an ANN
The input layer is responsible for extracting knowledge from the environment. The output
layer is responsible for the communication with the environment. The hidden layers are
responsible for the execution between these two layers.
Neurons are connected with unidirectional paths and each of them is communicated with
other neurons in the same layer or in different layers. The communication inside the same
layer is performed in two ways:


Each neuron is connected to each other neuron.
Each neuron is connected to the neurons which are neighbors.
The communication between different layers is performed in four ways:
 Each neuron is connected to each other neuron in the second layer.
 Each neuron is not connected to every neuron in the second layer.
 Each neuron can send their output to the neurons in the second layer but they can
not receive from these neurons.
 Each neuron can send their output to the neurons in the second layer using
different set of connections.
2.2. Training (Learning)
Training of an ANN is the determination of connection weights and the threshold values
of neurons in the network. These values are used for teaching the network for finding
solution to a problem. This process is inspired from the brain’s learning by experience.
The learning ability of a neural network is determined by its architecture and by the
algorithmic method chosen for training.
Basic structure of a training algorithm starts with initializing all weights. The input vector
is applied to the network and the outputs are propagated from the inner layers up to the
output layer. The output obtained from the network is compared by the desired output.
The error is calculated and in backward fashion new weights are assigned to the neurons.
All the operations from the forward output passing are iterated until a good solution,
which is as close to the desired output, is obtained.
This training process is a long and costly operation if it is executed on a computer that
has a single processor. The parallel architecture is preferred because it decreases the time
and fits the natural parallelism of ANNs. This architecture is used not only in the training
phase but also the execution phase. This paper focuses on the parallelism in the training
part of an ANN in the later sections.
3. Parallel Computer Architecture
Parallel computers are the ones that have more than one processing unit (processor). This
property brings them the capability of running multiple processes simultaneously. SIMD
and MIMD architectures are the pioneer parallel architectures.
SIMD is an architecture where entire data is manipulated by a single instruction. Its
powerful ability to manipulate large vectors and matrices is the most important reason for
its demand. There are different SIMD architectures which differ from others by using
either a distributed memory or a shared memory, organization of the processors, etc.
Connection Machine (CM-1, CM-2 and CM-200), built by Thinking Machines
Corporation is an example of SIMD computers. Developing a serial algorithm that will be
run on this architecture is hard in theory, but it becomes simpler when each inner loop is
replaced with a single broadcast instruction that implements the complete loop.
MIMD is an architecture where entire data is manipulated by multiple instructions. This
architecture is the most popular parallel architecture today. In this architecture, each
processor has its own copy of a program and works on different data streams. At any
time, different processors may be executing on different instructions on different pieces
of data. Like SIMD computers, these computers can use either a shared memory or
distributed memories (see Figure 4). Shared memory machines may be of the bus-based,
extended, or hierarchical type. Distributed memory machines may have hypercube or
mesh interconnection schemes.
Figure 4. Architecture of MIMD computers (Shared memory-Distributed
memory)
4. Parallelism in ANNs and Computers
Each ANN includes a natural parallelism inside. Neurons in the ANN process widespread
information simultaneously and outputs of some neurons become the inputs of others.
The neurons exchange lots of short messages (large connectivity) and make simple
calculations contrary to the parallel architectures. Since the performance in MIMD
computers decreases when the communication between the processes increases. This
difference brings the necessity of reducing the communication between processors and
changing the network topology according to the application.
This necessity also restricts the developed applications. For executing a network model
on a parallel architecture, the programmer must find a specific solution to his model. In
order to find this solution, the programmer must also know the parallelism in the model
and the parallelism in the used architecture. Actually, this can be called as one model one
solution. Nowadays, the trade is finding a general solution that can be used for lots of
models and architectures.
The solution to the problems stated above is developing a general mapping algorithm
between the parallelism in the ANN and in the architecture, whichever are used. The
following section shows such developed mapping algorithms used for training an ANN.
5. Applications and Mapping Methods
5.1. Historical Data Integration
In this application of neural networks for accuracy improvement of sensor signal
processing by sensor drift prediction is implemented [1]. Testing of the accuracy in
different environments and in different time intervals is a hard and time consuming
process. In order to overwhelm this problem, sensor drift prediction using neural
networks are used. As in all neural networks, initially this system must be trained using
training data. But these data are not available at the beginning of sensor exploitation. So,
the data obtained from the same type sensors in the similar operating conditions are used.
This data is called as historical and this process is called as historical data integration.
Two different mapping schemes are used.
a) Parallel Calculation of Weighed Sum: In this scheme, each weighted input is
calculated by different processors and the weighted sum is obtained by combining them
(see Figure 5). The quantity of executed operations of the weighted sum depends on
quantity of neurons in each neural network layer. The quantity of neurons in neural
network layers must be large enough and also the calculation time must be larger than
communication time between processors with the purpose of reduction of temporary
losses at parallel calculation.
Figure 5. Parallel calculation of weighed sum structure
b) Multiplication Cycles and Parallel Training of Each Separate ANN: In this scheme,
parallel computations of neural networks are implemented on different processors (see
Figure 6). Each processor is responsible from the training of its corresponding neural
network. All of the results obtained from these processors are combined and the final
result is obtained.
Figure 6. Multiplication cycles and parallel training of each separate ANN
The parallel computer Origin2000 is used for the experiments, which placed in Parallel
Computing Laboratory, University of Calabria, Italy. Computer Origin2000 contains 8
RISC-processors MIPS R10000 with clock rate 250MHz and 512 MB of the RAM. Each
processor has 4 MB cache memory. Origin2000 has operating system UNIX (IRIX). All
parallel routines are developed using MPI technology.
When the first scheme is implemented, the training time increases with the number of
processors increases. In the later scheme, the training time decreases when the number of
processors increases (see Figure 7). The difference between these schemes comes from
the difference between the numbers of weighted sum operations performed. So, the later
scheme seems to be the most appropriate scheme for this application.
First Scheme
Second Scheme
Figure 7. Running time of the schemes
5.2. Distributed Training Data
In this approach, one processor is selected as the master and other processors serve as
slaves. The training data is divided into smaller groups and each subgroup is sent to
different slave processors (see Figure 8). The master processor collects the weights used
by the slaves and the outputs of these slaves. It computes the obtained information and
sends new weights to each slave. This process continues until the network is trained.
Figure 8. Distributed training data scheme
5.3. A Library on DSM MIMD Computers
In this approach [3], a library is developed in order to solve the parallelism difference
between the neural networks and a distributed shared memory system (DSM) (see Figure
9). The library allows the programmers to develop neural network implementations
without considering the architectural part and also allows the implementation to be
executed on DSM architecture. The main advantage of the library is its division between
the programming and hardware parts.
Figure 9. DSM architecture
It is seen that, training of a Kohonen map of 100*100 neurons with 100000 iterations and
8 processors are 7 times faster than the sequential execution. The execution decreases
when the number of processors increases as seen in Figure 10.
Figure 10. Execution time with respect to number of processors
5.4. Fujitsu AP1000 Architecture
It is a message passing MIMD computer with a distributed memory (see Figure 11). Kuga
implemented a neural network implementation by vertical slicing of feed-forward
network. The aim is to develop a general mapping algorithm for different applications. In
this implementation, three different communication schemes are used: one to one
communication, rotated messages in horizontal and vertical rings, and center processor
with parallel routes. In order to increase the variety, different applications are used in this
implementation and training time is used as the heuristic criteria. The mapping algorithm
selects the best mapping by taking the network structure and the input data into account.
The heuristic selects the best mapping by combining three degrees of parallelism:



training set parallelism
node parallelism
pipelining parallelism
The most important feature of this implementation is its ability to select the best mapping
according to the network structure. This brings generality to the implementation.
Figure 11. AP1000 architecture
6. Conclusion
Artificial neural networks are used in many applications because of its efficiency. It
exhibits the natural structure, parallelism and execution capability of human brain. The
natural parallelism in ANNs brings the parallel implementations on parallel architectures.
Two most important problems of executing an ANN on a parallel architecture are: the
parallelism difference between them and the specific adaptation of the mapping algorithm
to different network structures. In this paper, how these problems can be solved in
different applications is explained. The common feature of all these mapping techniques
is the desire to the generality.
References
1. Turchenko V., Triki C., and Sachenko A. "Approach to Parallel Training of
Integration Historical Data Neural Networks", 20th IASTED International MultiConference on Applied Informatics - AI2002, Innsbruck, Austria, February 2002
2. Y. Boniface, F. Alexandre, and S. Vialle. “A library to implement neural networks on
MIMD machines”, In EuroPar-1, 1999.
3. Y. Boniface, F. Alexandre, and S. Vialle. “A bridge between two paradigms for
parallelism: neural networks and general purpose MIMD computers”, IJCNN, 1999
4. Manavendra Misra. “Parallel environments for implementing neural networks”.
Neural Computing Surveys, 1:48--60, 1997.