Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Neural Network Implementations on Parallel Architectures Şeniz Demir Department of Computer Engineering, Bogaziçi University Bebek,İstanbul [email protected] 1. Introduction Trade of preferring neural networks in the developed applications increases for a long period. This increase brings the problem of selecting the best architecture for the network structure used. Although there are different architecture possibilities in use, mostly one is better than the others. This paper focuses on the implementation of neural network on parallel architecture and the problems that must be solved for this purpose. In the second section, what is a neural network is discussed and the training of a neural network is explained. In the third section, different parallel architectures are explained. Fourt section shows the difference between the parallelism of neural networks and parallel architectures. Finally, different applications and mapping methods are given. 2. Neural Networks The human brain is made up of billions of individual neurons(see Figure 1) which have varying sizes and shapes. A neuron generally receives its stimulation at synapses, junctions between its dendrites and the output areas of other neurons.Signals travel from the synapses through the dendrites and are weighted and summed as they are gathered by the branching structure. The total weighted sum of input stimulation must be high enough for the cell to fire and its output pulse. The level of excitation that must be exceeded is called the threshold for the neuron. When the neuron does fire, an electrical pulse is carried down the axon to any synapses that the axon leads to. After the neuron has fired, it takes some time for the neuron to recover and prepare itself to respond to input information once more. This period is known as refractory period. Figure 1. Structure of a neuron in brain The investigation on developing intelligent systems(systems that can represent knowledge and learn- reason) is inspired from neurons and their activities. Networks of neurons clearly can perform the computations needed for intelligence. Finally, the decision, which promotes a computer can perform computations like a human being if neurons connected into artificial neural networks are simulated, started the development of ANNs(Artificial Neural Networks) Each ANN is composed of different number of neurons (see Figure 2). The main characteristics of ANNs are: adaptive learning: The behavior of the network is changed according to the data given as input. The network is decided how it will be reacted, so no outside interference to the system. This learning process continues when new data is fed to the system. self_organization: The structure of the network is changed according to the data given. The structure of the network can be changed either by changing the strength between the neurons or by changing the learning algorithm used. error tolerance: The network is capable of finding a generalization for new or distorted data. A model is determined for each given data. real_time operation: As a consequence of parallel processing, real_time operation becomes possible. parallel information processing: Like the neurons in the human brain, the processing between the neurons in ANN is parallel. Figure 2. The structure of a neuron in ANN 2.1. Design and Structure In order to design an ANN, the number of layers, the distribution of the neurons to different number of layers, the communication used between the layers, and the strength of connection within the network must be determined properly. The structure of an ANN is basically a topology consists of three layers: input, hidden (optional), and the output layer (see Figure 3). Figure 3. Structure of an ANN The input layer is responsible for extracting knowledge from the environment. The output layer is responsible for the communication with the environment. The hidden layers are responsible for the execution between these two layers. Neurons are connected with unidirectional paths and each of them is communicated with other neurons in the same layer or in different layers. The communication inside the same layer is performed in two ways: Each neuron is connected to each other neuron. Each neuron is connected to the neurons which are neighbors. The communication between different layers is performed in four ways: Each neuron is connected to each other neuron in the second layer. Each neuron is not connected to every neuron in the second layer. Each neuron can send their output to the neurons in the second layer but they can not receive from these neurons. Each neuron can send their output to the neurons in the second layer using different set of connections. 2.2. Training (Learning) Training of an ANN is the determination of connection weights and the threshold values of neurons in the network. These values are used for teaching the network for finding solution to a problem. This process is inspired from the brain’s learning by experience. The learning ability of a neural network is determined by its architecture and by the algorithmic method chosen for training. Basic structure of a training algorithm starts with initializing all weights. The input vector is applied to the network and the outputs are propagated from the inner layers up to the output layer. The output obtained from the network is compared by the desired output. The error is calculated and in backward fashion new weights are assigned to the neurons. All the operations from the forward output passing are iterated until a good solution, which is as close to the desired output, is obtained. This training process is a long and costly operation if it is executed on a computer that has a single processor. The parallel architecture is preferred because it decreases the time and fits the natural parallelism of ANNs. This architecture is used not only in the training phase but also the execution phase. This paper focuses on the parallelism in the training part of an ANN in the later sections. 3. Parallel Computer Architecture Parallel computers are the ones that have more than one processing unit (processor). This property brings them the capability of running multiple processes simultaneously. SIMD and MIMD architectures are the pioneer parallel architectures. SIMD is an architecture where entire data is manipulated by a single instruction. Its powerful ability to manipulate large vectors and matrices is the most important reason for its demand. There are different SIMD architectures which differ from others by using either a distributed memory or a shared memory, organization of the processors, etc. Connection Machine (CM-1, CM-2 and CM-200), built by Thinking Machines Corporation is an example of SIMD computers. Developing a serial algorithm that will be run on this architecture is hard in theory, but it becomes simpler when each inner loop is replaced with a single broadcast instruction that implements the complete loop. MIMD is an architecture where entire data is manipulated by multiple instructions. This architecture is the most popular parallel architecture today. In this architecture, each processor has its own copy of a program and works on different data streams. At any time, different processors may be executing on different instructions on different pieces of data. Like SIMD computers, these computers can use either a shared memory or distributed memories (see Figure 4). Shared memory machines may be of the bus-based, extended, or hierarchical type. Distributed memory machines may have hypercube or mesh interconnection schemes. Figure 4. Architecture of MIMD computers (Shared memory-Distributed memory) 4. Parallelism in ANNs and Computers Each ANN includes a natural parallelism inside. Neurons in the ANN process widespread information simultaneously and outputs of some neurons become the inputs of others. The neurons exchange lots of short messages (large connectivity) and make simple calculations contrary to the parallel architectures. Since the performance in MIMD computers decreases when the communication between the processes increases. This difference brings the necessity of reducing the communication between processors and changing the network topology according to the application. This necessity also restricts the developed applications. For executing a network model on a parallel architecture, the programmer must find a specific solution to his model. In order to find this solution, the programmer must also know the parallelism in the model and the parallelism in the used architecture. Actually, this can be called as one model one solution. Nowadays, the trade is finding a general solution that can be used for lots of models and architectures. The solution to the problems stated above is developing a general mapping algorithm between the parallelism in the ANN and in the architecture, whichever are used. The following section shows such developed mapping algorithms used for training an ANN. 5. Applications and Mapping Methods 5.1. Historical Data Integration In this application of neural networks for accuracy improvement of sensor signal processing by sensor drift prediction is implemented [1]. Testing of the accuracy in different environments and in different time intervals is a hard and time consuming process. In order to overwhelm this problem, sensor drift prediction using neural networks are used. As in all neural networks, initially this system must be trained using training data. But these data are not available at the beginning of sensor exploitation. So, the data obtained from the same type sensors in the similar operating conditions are used. This data is called as historical and this process is called as historical data integration. Two different mapping schemes are used. a) Parallel Calculation of Weighed Sum: In this scheme, each weighted input is calculated by different processors and the weighted sum is obtained by combining them (see Figure 5). The quantity of executed operations of the weighted sum depends on quantity of neurons in each neural network layer. The quantity of neurons in neural network layers must be large enough and also the calculation time must be larger than communication time between processors with the purpose of reduction of temporary losses at parallel calculation. Figure 5. Parallel calculation of weighed sum structure b) Multiplication Cycles and Parallel Training of Each Separate ANN: In this scheme, parallel computations of neural networks are implemented on different processors (see Figure 6). Each processor is responsible from the training of its corresponding neural network. All of the results obtained from these processors are combined and the final result is obtained. Figure 6. Multiplication cycles and parallel training of each separate ANN The parallel computer Origin2000 is used for the experiments, which placed in Parallel Computing Laboratory, University of Calabria, Italy. Computer Origin2000 contains 8 RISC-processors MIPS R10000 with clock rate 250MHz and 512 MB of the RAM. Each processor has 4 MB cache memory. Origin2000 has operating system UNIX (IRIX). All parallel routines are developed using MPI technology. When the first scheme is implemented, the training time increases with the number of processors increases. In the later scheme, the training time decreases when the number of processors increases (see Figure 7). The difference between these schemes comes from the difference between the numbers of weighted sum operations performed. So, the later scheme seems to be the most appropriate scheme for this application. First Scheme Second Scheme Figure 7. Running time of the schemes 5.2. Distributed Training Data In this approach, one processor is selected as the master and other processors serve as slaves. The training data is divided into smaller groups and each subgroup is sent to different slave processors (see Figure 8). The master processor collects the weights used by the slaves and the outputs of these slaves. It computes the obtained information and sends new weights to each slave. This process continues until the network is trained. Figure 8. Distributed training data scheme 5.3. A Library on DSM MIMD Computers In this approach [3], a library is developed in order to solve the parallelism difference between the neural networks and a distributed shared memory system (DSM) (see Figure 9). The library allows the programmers to develop neural network implementations without considering the architectural part and also allows the implementation to be executed on DSM architecture. The main advantage of the library is its division between the programming and hardware parts. Figure 9. DSM architecture It is seen that, training of a Kohonen map of 100*100 neurons with 100000 iterations and 8 processors are 7 times faster than the sequential execution. The execution decreases when the number of processors increases as seen in Figure 10. Figure 10. Execution time with respect to number of processors 5.4. Fujitsu AP1000 Architecture It is a message passing MIMD computer with a distributed memory (see Figure 11). Kuga implemented a neural network implementation by vertical slicing of feed-forward network. The aim is to develop a general mapping algorithm for different applications. In this implementation, three different communication schemes are used: one to one communication, rotated messages in horizontal and vertical rings, and center processor with parallel routes. In order to increase the variety, different applications are used in this implementation and training time is used as the heuristic criteria. The mapping algorithm selects the best mapping by taking the network structure and the input data into account. The heuristic selects the best mapping by combining three degrees of parallelism: training set parallelism node parallelism pipelining parallelism The most important feature of this implementation is its ability to select the best mapping according to the network structure. This brings generality to the implementation. Figure 11. AP1000 architecture 6. Conclusion Artificial neural networks are used in many applications because of its efficiency. It exhibits the natural structure, parallelism and execution capability of human brain. The natural parallelism in ANNs brings the parallel implementations on parallel architectures. Two most important problems of executing an ANN on a parallel architecture are: the parallelism difference between them and the specific adaptation of the mapping algorithm to different network structures. In this paper, how these problems can be solved in different applications is explained. The common feature of all these mapping techniques is the desire to the generality. References 1. Turchenko V., Triki C., and Sachenko A. "Approach to Parallel Training of Integration Historical Data Neural Networks", 20th IASTED International MultiConference on Applied Informatics - AI2002, Innsbruck, Austria, February 2002 2. Y. Boniface, F. Alexandre, and S. Vialle. “A library to implement neural networks on MIMD machines”, In EuroPar-1, 1999. 3. Y. Boniface, F. Alexandre, and S. Vialle. “A bridge between two paradigms for parallelism: neural networks and general purpose MIMD computers”, IJCNN, 1999 4. Manavendra Misra. “Parallel environments for implementing neural networks”. Neural Computing Surveys, 1:48--60, 1997.