* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download chaper 4_c b bangal
Neural coding wikipedia , lookup
Artificial intelligence wikipedia , lookup
Optogenetics wikipedia , lookup
Holonomic brain theory wikipedia , lookup
Neuropsychopharmacology wikipedia , lookup
Gene expression programming wikipedia , lookup
Biological neuron model wikipedia , lookup
Channelrhodopsin wikipedia , lookup
Neural engineering wikipedia , lookup
Neural modeling fields wikipedia , lookup
Development of the nervous system wikipedia , lookup
Synaptic gating wikipedia , lookup
Metastability in the brain wikipedia , lookup
Central pattern generator wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
Backpropagation wikipedia , lookup
Artificial neural network wikipedia , lookup
Nervous system network models wikipedia , lookup
Catastrophic interference wikipedia , lookup
Convolutional neural network wikipedia , lookup
CHAPTER 4 ARTIFICIAL NEURAL NETWORKS 4.1 INTRODUCTION Artificial Neural Networks (ANNs) are relatively crude electronic models based on the neural structure of the brain. The brain learns from experience. Artificial neural networks try to mimic the functioning of brain. Even simple animal brains are capable of functions that are currently impossible for computers. Computers do the things well, but they have trouble recognizing even simple patterns. The brain stores information as patterns. Some of these patterns are very complicated and allow us the ability to recognize individual faces from many different angles. This process of storing information as patterns, utilizing those patterns, and then solving the problems encompasses a new field in computing, which does not utilize traditional programming but involves the creation of massively parallel networks and the training of those networks to solve specific problems. The exact workings of the human brain are still a mystery, yet some aspects are known. The most basic element of the human brain is a specific type of cell, called ‘neuron’. These neurons provide the abilities to remember, think, and apply previous experiences to our every action. They are about 100 billion in number and each of these neurons connects itself with about 200,000 other neurons, although 1,000 to 10,000 is typical. The power of the human mind comes from the sheer numbers of these basic components and the multiple connections between them. It also comes from genetic programming and learning. The individual neurons are complicated. They have a myriad of parts, subsystems and control mechanisms. They convey information via a host of 74 electrochemical pathways. Together, these neurons and their connections form a process, which is not binary, not stable, and not synchronous. 4.2 A BIOLOGICAL NEURON Basically, a biological neuron receives inputs from other sources, combines them in some way, performs a generally nonlinear operation on the result, and then outputs the final result. Fig. 4.1 shows the relationship of these four parts. Fig. 4.1: A biological neuron Within humans there are many variations on basic type of neuron, yet, all biological neurons have the same four basic components. They are known by their biological names – cell body (soma), dendrites, axon, and synapses. Cell body (Soma): The body of neuron cell contains the nucleus and carries out biochemical transformation necessary to the life of neurons. Dendrite: Each neuron has fine, hair like tubular structures (extensions) around it. They branch out into tree around the cell body. They accept incoming signals. Axon: It is a long, thin, tubular structure which works like a transmission line. Synapse: Neurons are connected to one another in complex spatial arrangement. When axon reaches its final destination it branches again called as terminal arborization. At the end of axon are highly complex and specialized structures called synapses. Connection between two neurons takes place at these synapses. 75 Dendrites receive the input through the synapses of other neurons. The soma processes these incoming signals over time and converts that processed value into an output, which is sent out to other neurons through the axon and the synapses. 4.3 AN ARTIFICIAL NEURON The artificial neuron simulates four basic functions of a biological neuron. Fig. 4.2 shows basic representation of an artificial neuron. Fig. 4.2: A basic artificial neuron. In Fig. 4.2, various inputs to the network are represented by the mathematical symbol, x(n). Each of these inputs is multiplied by a connection weight. The weights are represented by w(n). In the simplest case, these products are summed, fed to a transfer function (activation function) to generate a result, and this result is sent as output. This is also possible with other network structures, which utilize different summing functions as well as different transfer functions. Some applications like recognition of text, identification of speech, image deciphering of scenes etc. require binary answers. These applications may utilize the 76 binary properties of ORing and ANDing of inputs along with summing operations. Such functions can be built into the summation and transfer functions of a network. Seven major components make up an artificial neuron. These components are valid whether the neuron is used for input, output, or is in the hidden layers. Component 1. Weighting Factors: A neuron usually receives many simultaneous inputs. Each input has its own relative weight, which gives the input the impact that it needs on the processing element's summation function. Some inputs are made more important than others to have a greater effect on the processing element as they combine to produce a neural response. Weights are adaptive coefficients that determine the intensity of the input signal as registered by the artificial neuron. They are a measure of an input's connection strength. These strengths can be modified in response to various training sets and according to a network's specific topology or its learning rules. Component 2. Summation Function: The inputs and corresponding weights are vectors which can be represented as (i1, i2 . . . in) and (w1, w2 . . . wn). The total input signal is the dot product of these two vectors. The result; (i1 * w1) + (i2 * w2) +…….. + (in * wn) ; is a single number. The summation function can be more complex than just weight sum of products. The input and weighting coefficients can be combined in many different ways before passing on to the transfer function. In addition to summing, the summation function can select the minimum, maximum, majority, product or several normalizing algorithms. The specific algorithm for combining neural inputs is determined by the chosen network architecture and paradigm. Some summation functions have an additional ‘activation function’ applied to the result before it is passed on to the transfer function for the purpose of allowing the summation output to vary with respect to time. 77 Component 3. Transfer Function: The result of the summation function is transformed to a working output through an algorithmic process known as the transfer function. In the transfer function the summation can be compared with some threshold to determine the neural output. If the sum is greater than the threshold value, the processing element generates a signal and if it is less than the threshold, no signal (or some inhibitory signal) is generated. Both types of response are significant. The threshold, or transfer function, is generally non-linear. Linear functions are limited because the output is simply proportional to the input. The step type of transfer function would output zero and one, one and minus one, or other numeric combinations. Another type, the ‘threshold’ or ramping function, can mirror the input within a given range and still act as a step function outside that range. It is a linear function that is clipped to minimum and maximum values, making it non-linear. Another option is a ‘S’ curve, which approaches a minimum and maximum value at the asymptotes. It is called a sigmoid when it ranges between 0 and 1, and a hyperbolic tangent when it ranges between -1 and 1. Both the function and its derivatives are continuous Component 4. Scaling and Limiting: After the transfer function, the result can pass through additional processes, which scale and limit. This scaling simply multiplies a scale factor times the transfer value and then adds an offset. Limiting is the mechanism which insures that the scaled result does not exceed an upper, or lower bound. This limiting is in addition to the hard limits that the original transfer function may have performed. Component 5. Output Function (Competition): Each processing element is allowed one output signal, which it may give to hundreds of other neurons. Normally, the output is directly equivalent to the transfer function's result. Some network topologies modify the transfer result to incorporate competition among neighboring processing elements. Neurons are allowed to compete with each other inhibiting 78 processing elements unless they have great strength. Competition can occur at one or both levels. First, competition determines which artificial neuron will be active or provides an output. Second, competitive inputs help determine which processing element will participate in the learning or adaptation process. Component 6. Error Function and Back-Propagated Value: In most learning networks the difference between the current output and the desired output is calculated as an error which is then transformed by the error function to match a particular network architecture. Most basic architectures use this error directly but some square the error while retaining its sign, some cube the error, other paradigms modify the error to fit their specific purposes. The error is propagated backwards to a previous layer. This back-propagated value can be either the error, the error scaled in some manner (often by the derivative of the transfer function) or some other desired output depending on the network type. Normally, this back-propagated value, after being scaled by the learning function, is multiplied against each of the incoming connection weights to modify them before the next learning cycle. Component 7. Learning Function: Its purpose is to modify the weights on the inputs of each processing element according to some neural based algorithm. 79 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the processing element from the upper left. The first step is to multiply each of these inputs by their respective weighting factor [w(n)]. These modified inputs are then fed into the summing function, which usually sums these products, however, many different types of operations can be selected. These operations can produce a number of different values, which are then propagated forward; values such as the average, the largest, the smallest, the ORed values, the ANDed values, etc. Other types of summing functions can also be created and sometimes they may be further complicated by the addition of an activation function which enables the summing function to operate in a time sensitive way. The output of the summing function is then sent into a transfer function, which turns this number into a real output (a 0 or a 1, -1 or +1 or some other number) via some algorithm. The transfer function can also scale the output or 80 control its value via thresholds. This output is then sent to other processing elements or an outside connection, as dictated by the structure of the network. 4.5 TRANSFER (ACTIVATION) FUNCTIONS The transfer function for neural networks must be differential and therefore continuous to enable correcting error. Derivative of the transfer function is required for computation of local gradient. One such example of a suitable transfer function is the sigmoid function. The sigmoid function is a S-shaped graph. It is one of the most common forms of transfer function used in construction of ANNs. It is defined as a strictly increasing function. Mathematically its derivative is always positive. It exhibits a graceful balance between linear and nonlinear behavior. One example of it is a logistic function represented by the equation: φ (v) = 1 1+ e This function has certain characteristic. At extremes of ϕ(v): ϕ (v) is flat and ϕ’(v) is very small. At midrange of ϕ (v): ϕ’(v) is maximum as seen in Fig. 4.4. Fig. 4.4: Logistic transfer function 81 Several other transfer functions can also be employed as shown in Fig. 4.5. Fig. 4.5: Transfer functions with different characteristic constant values 4.6 LAYER ARRANGEMENT IN A NEURAL NETWORK The neurons can be clustered together in many ways. This clustering occurs in the human mind in such a way that information can be processed in a dynamic, interactive, and self-organizing way. Biologically, neural networks are constructed in a three-dimensional world from microscopic components. These neurons seem capable of nearly unrestricted interconnections. That is not true of any existing manmade network. Neural networks are the simple clustering of artificial neurons by creating layers and interconnections as shown in Fig. 4.6. 82 Fig. 4.6: Layer arrangement in a neural network Basically, a neural network is the grouping of neurons into layers, the connections between these layers, and the summation and transfer functions that comprises a functioning neural network. Most applications require networks that contain at least the three layers - input, hidden, and output. The input layer receives the data either from input files or directly from electronic sensors in real-time applications. The output layer sends information directly to the outside world, to a secondary computer process or to other devices. Between these two layers there can be many hidden layers. These hidden layers contain many neurons in various interconnected structures. The inputs and outputs of each of these hidden neurons simply go to other neurons. In most networks, each neuron in a hidden layer receives the signals from all the neurons typically from the input layer. After a neuron performs its function, it passes its output to all of the neurons from typically the output layer, providing a feed-forward path. This gives a variable strength to an input. There are two types of these connections. One causes the summing mechanism of the next neuron to add (excite) while the other causes it to subtract (inhibit). Some networks want a neuron to inhibit the other neurons in the same layer, called ‘lateral inhibition’. The most common use of this is in the output layer, e.g., in text recognition, if the probability of a character being ‘P’ is 0.85 and if the same being ‘F’ is 0.65, the network wants to 83 choose the highest probability and inhibit all the others. It can do that with lateral inhibition (competition). 4.7 TYPES OF ARTIFICIAL NEURAL NETWORKS 4.7.1 SINGLE LAYER FEED FORWARD NETWORK A neural network in which the input layer of source nodes projects into an output layer of neurons but not vice-versa is known as single feed-forward or acyclic network. In single layer network, ‘single layer’ refers to the output layer of computation nodes as shown in Fig. 4.7. Input layer Output Layer Fig. 4.7 : A Single layer feedforward network 4.7.2 MULTILAYER FEED FORWARD NETWORK This type of network (Fig. 4.8) consists of one or more hidden layers, whose computation nodes are called hidden neurons or hidden units. The function of hidden neurons is to interact between the external input and network output in some useful manner and to extract higher order statistics. The source nodes in input layer of network supply the input signal to neurons in the second layer (1st hidden layer). The output signals of 2nd layer are used as inputs to the third layer and so on. The set of output signals of the neurons in the output layer of network constitutes 84 the overall response of network to the activation pattern supplied by source nodes in the input first layer. Input layer Hidden Layer Output layer Fig. 4.8 : A multilayer feed forward network Short characterization of feedforward networks: 1. typically, activation is fed forward from input to output through ‘hidden layers’, though many other architectures exist. 2. mathematically, they implement static input-output mappings. 3. most popular supervised training algorithm: backpropagation algorithm 4. have proven useful in many practical applications as approximators of nonlinear functions and as pattern classificators. 4.7.3 RECURRENT NETWORK A feed forward neural network having one or more hidden layers with atleast one feedback loop is known as recurrent network as shown in Fig. 4.9. The feedback may be a self feedback, i.e., where output of neuron is fed back to its own input. Sometimes, feedback loops involve the use of unit delay elements, which results in nonlinear dynamic behaviour, assuming that neural network contains non linear units. 85 Input layer Output Layer Fig. 4.9 : A recurrent network There are various other types of networks like; delta-bar-delta, Hopfield, vector quantization, counter propagation, probabilistic, Hamming, Boltzman, bidirectional associative memory, spacio-temporal pattern, adaptive resonance, self organizing map, recirculation etc. A recurrent neural network has (at least one) cyclic path of synaptic connections. Basic characteristics: 1. all biological neural networks are recurrent 2. mathematically, they implement dynamical systems 3. several types of training algorithms are known, no clear winner 4. theoretical and practical difficulties by and large have prevented practical applications so far. 4.8 TRAINING OF ARTIFICIAL NEURAL NETWORKS Once a network has been structured for a particular application, it is ready for training. At the beginning, the initial weights are chosen randomly and then the training or learning begins. There are two approaches to training; supervised and unsupervised. 86 4.8.1 SUPERVISED TRAINING In supervised training, both the inputs and the outputs are provided. The network then processes the inputs and compares its resulting outputs against the desired outputs. Errors are then propagated back through the system, causing the system to adjust the weights, which control the network. This process occurs over and over as the weights are continually tweaked. The set of data, which enables the training, is called the "training set." During the training of a network, the same set of data is processed many times, as the connection weights are ever refined. Sometimes a network may never learn. This could be because the input data does not contain the specific information from which the desired output is derived. Networks also don't converge if there is not enough data to enable complete learning. Ideally, there should be enough data so that part of the data can be held back as a test. Many layered networks with multiple nodes are capable of memorizing data. To monitor the network to determine if the system is simply memorizing its data in some non-significant way, supervised training needs to hold back a set of data to be used to test the system after it has undergone its training. If a network simply can't solve the problem, the designer then has to review the input and outputs, the number of layers, the number of elements per layer, the connections between the layers, the summation, transfer, and training functions, and even the initial weights themselves. Another part of the designer's creativity governs the rules of training. There are many laws (algorithms) used to implement the adaptive feedback required to adjust the weights during training. The most common technique is known as back-propagation. The training is not just a technique, but a conscious analysis, to insure that the network is not over trained. Initially, an artificial neural network configures itself 87 with the general statistical trends of the data. Later, it continues to ‘learn’ about other aspects of the data, which may be spurious from a general viewpoint. When finally the system has been correctly trained and no further learning is needed, the weights can, if desired, be ‘frozen’. In some systems, this finalized network is then turned into hardware so that it can be fast. Other systems don't lock themselves in but continue to learn while in production use. 4.8.2 UNSUPERVISED OR ADAPTIVE TRAINING The other type is the unsupervised training (learning). In this type, the network is provided with inputs but not with desired outputs. The system itself must then decide what features it will use to group the input data. This is often referred to as self-organization or adaption. These networks use no external influences to adjust their weights. Instead, they internally monitor their performance. These networks look for regularities or trends in the input signals, and makes adaptations according to the function of the network. Even without being told whether it's right or wrong, the network still must have some information about how to organize itself. This information is built into the network topology and learning rules. An unsupervised learning algorithm might emphasize cooperation among clusters of processing elements. In such a scheme, the clusters would work together. If some external input activated any node in the cluster, the cluster's activity as a whole could be increased. Likewise, if external input to nodes in the cluster was decreased, that could have an inhibitory effect on the entire cluster. Competition between processing elements could also form a basis for learning. Training of competitive clusters could amplify the responses of specific groups to specific stimuli. As such, it would associate those groups with each other and with a specific appropriate response. Normally, when competition for learning is in effect, only the weights belonging to the winning processing element will be 88 updated. Presently, the unsupervised learning is not well understood and there continues to be a lot of research in this aspect. 4.9 LEARNING RATES The rate at which ANNs learn depends upon several controllable factors. A slower rate means more time to spend in producing an adequately trained system. With faster learning rates, however, the network may not be able to make the fine discriminations that are possible with a system learning slowly. Most learning functions have some provision for a learning rate (learning constant). Usually this term is positive and between 0 and 1. If the learning rate is greater than 1, it is easy for the learning algorithm to overshoot in correcting the weights, and the network will oscillate. Small values of the learning rate will not correct the current error as quickly, but if small steps are taken in correcting errors, there is a good chance of arriving at the best minimum convergence. 4.10 LEARNING LAWS (ALGORITHMS) Many learning laws are in common use. Most of them are some sort of variation of the best known and oldest ‘Hebb's Rule’. Hebb's Rule: This was introduced by Donald Hebb in ‘Organization of Behavior’. The basic rule is: If a neuron receives an input from another neuron and if both are highly active (same sign), the weight between the two neurons should be strengthened. Hopfield Law: If the desired output and the input are both active or both inactive, increment the connection weight by the learning rate, otherwise decrement the weight by the learning rate. 89 The Delta Rule: This rule is based on the simple idea of continuously modifying the strengths of the input connections to reduce the difference (the delta) between the desired output value and the actual output of a processing element. The Gradient Descent Rule: This is similar to Delta Rule in that, the derivative of the transfer function is still used to modify the delta error before it is applied to the connection weights. However, an additional proportional constant tied to the learning rate is appended to the final modifying factor acting upon the weight. Kohonen's Law: In this, the processing elements compete for the opportunity to learn or update their weights. The element with largest output is declared the winner and has the capability of inhibiting its competitors as well as exciting its neighbors. Only the winner is permitted an output and only the winner plus its neighbors are allowed to adjust their connection weights. 4.11 BACKPROPAGATION FOR FEED FORWARD NETWORKS The backpropagation (BP) algorithm is the most commonly used training method for feed forward networks. Consider a multi-layer perceptron with ‘k’ hidden layers. Together with the layer of input units and the layer of output units this gives k+2 layers of units altogether, which are numbered by 0, ..., k+1. Let the m number of input units be K, output units be L and of unis in hidden layer m be N . m The weight of jth unit in layer m and the ith unit in layer m+1 is denoted by wij . The m activation of the ith unit in layer m is xi (for m = 0 this is an input value, for m = k+1 an output value). The training data for a feedforward network training task consist of T input-output (vector-valued) data pairs u (n) = ( x10 (n),..., xK0 (n)) t , d (n) = (d1k +1 (n),..., d Lk +1 (n)) t , where ‘n’ denotes training instance. The activation of non-input units is computed according to 90 xim+1 (n) = f ( wijm x j (n)). j =1,..., N m Presented with training input u(t), the previous update equation is used to compute activations of units in subsequent hidden layers, until a network response y (n) = ( x1k +1 (n),..., xLK +1 (n)) t is obtained in the output layer. The objective of training is to find a set of network weights such that the summed squared error 2 E= d ( n) − y ( n ) = n =1,...,T E (n) is minimized. This is done by incrementally n =1,....T changing the weights along the direction of the error gradient with respect to weights ∂E ∂E (n) using a (small) learning rate γ: = m ∂wij t =1,....T ∂wijm new wijm = wijm − γ ∂E . ∂wijm This is the formula used in batch learning mode, where new weights are computed after presenting all training samples. One such pass through all samples is called an epoch. Before the first epoch, weights are initialized, typically to small random numbers. A variant is incremental learning, where weights are changed after presentation of individual training samples: new wijm = wijm − γ ∂E (n) ∂wijm The subtask in this method is the computation of the error gradients ∂E (n) . ∂wijm The backpropagation algorithm is a scheme to perform these computations. The procedure for one epoch of batch processing is given below. Input: current weights wijm , training samples. Output: new weights. Computation steps: 1. For each sample n, compute activations of internal and output units (forward pass). 91 2. Compute, by proceeding backward through m = k+1, k, ..., 1, for each unit xim the error propagation term δ im (n) . δ ik +1 (n) = (d i (n) − yi (n)) ∂f (u ) ∂u u = zik +1 for the output layer and N m +1 δ im+1 wijm m j δ ( n) = i =1 ∂f (u ) ∂u u = z mj for the hidden layers, where N m −1 x mj−1 (n) wijm−1 zim (n) = j =1 is the internal state (or potential) of unit xim . This is the error backpropagation pass. Mathematically, the error propagation term δ im (n) represents the error gradient w.r.t. the potential of the unit xim . ∂E ∂u u = z mj 3. Adjust the connection weights according to T m −1 ij new w m −1 ij =w δ im (n) x mj−1 (n) +γ t =1 After every such epoch, compute the error. Stop when the error falls below a predetermined threshold or when the change in error falls below another predetermined threshold or when the number of epochs exceeds a predetermined maximal number of epochs. Many (order of thousands in nontrivial tasks) such epochs may be required until a sufficiently small error is achieved. One epoch requires O(T M) multiplications and additions, where M is the total number of network connections. 92 The basic gradient descent approach (and its backpropagation algorithm implementation) is notorious for slow convergence, because the learning rate γ must be typically chosen small to avoid instability. Another approach to achieve faster convergence is to use second-order gradient descent techniques, which exploit 2 curvature of the gradient but have epoch complexity O(T M ). 4.12 APPLICATIONS OF NEURAL NETWORKS 4.12.1 GENERAL APPLICATIONS Many of the networks being designed presently are statistically quite accurate (upto 85% to 90% accuracy). Currently, neural networks are not the user interface, which translates spoken words into instructions for a machine but some day it will be achieved. VCRs, home security systems, CD players, and word processors will simply be activated by voice. Touch screen and voice editing will replace the word processors of today while bringing spreadsheets and data bases to a level of usability. Neural network design is progressing in other more promising application areas. (i) Language Processing: These applications include text-to-speech conversion, auditory input for machines, automatic language translation, secure voice keyed locks, automatic transcription, aids for the deaf, aids for the physically disabled which respond to voice commands and natural language processing. (ii) Character Recognition: Neural network based products are available which can recognize hand printed characters through a scanner. It is 98% accurate for numbers, a little less for alphabetical characters. Quantum Neural Network software package (Qnspec) is available for recognizing characters, including cursive. (iii) Image (data) Compression: Neural networks can do real-time compression and decompression of data. These networks can reduce eight bits of data to three and then reverse that process upon restructuring to eight bits again. 93 (iv) Pattern Recognition: Many pattern recognition applications are in use like, a system that can detect bombs in luggage at airports by identifying from small variances and patterns from within specialized sensor's outputs, a back-propagation neural network which can discriminate between a true and a false heart attack, a network which can scan and also read the PAP smears etc. Many automated quality control applications are now in use, which are based on pattern recognition. (v) Signal Processing: Neural networks have proven capable of filtering out electronic noise. Another application is a system that can detect engine misfire simply from the engine sound. (vi) Financial: Banks, credit card companies and lending institutions deal with many decisions that are not clear-cut. They involve learning and statistical trends. Neural networks are now trained on the data from past decisions and being used in decision-making. (vii) Servo Control: A neural system known as Martingale's Parametric Avalanche a spatio-temporal pattern recognition network is being designed to control the shuttle during in-flight maneuvers. Another application is ALVINN, for Autonomous Land Vehicle. 4.12.2 APPLICATIONS IN POWER SYSTEMS The electric power industry is currently undergoing an unprecedented reform. One of the most exciting and potentially profitable recent developments is increasing usage of ANN techniques. A major advantage of ANN approach is that the domain knowledge is distributed in the neurons and information processing is carried out in parallel-distributed manner. Being adaptive units, they are able to learn these complex relationships even when no functional model exists. This provides the capability to do ‘Black Box Modeling’ with little or no prior knowledge of the function itself. ANNs have the ability to properly classify a highly non-linear relationship and once trained, they can classify new data much faster than it would 94 be possible by solving the model analytically. The rising interest in ANNs is largely due to the emergence of powerful new methods as well as to the availability of computational power suitable for simulation. The field is particularly exciting today because ANN algorithms and architectures can be implemented in VLSI technology for real-time applications. The application of ANNs in many areas under electrical power systems has lead to acceptable results. 1. Load Forecasting Load forecasting is a very common and popular problem, which has an important role in economic, financial, development, expansion and planning of power systems. • Short-term load forecasting over an interval ranging from an hour to a week is important for various applications such as unit commitment, economic dispatch, energy transfer scheduling and real time control. • Mid-term load forecasting that range from one month to five years, is used to purchase enough fuel for power plants after electricity tariffs are calculated. • Long-term load forecasting (LTLF), covering from 5 to 20 years or more, is used by planning engineers and economists to determine the type and the size of generating plants that minimize both fixed and variable costs. The ANNs can be used to solve these problems. Most of the projects using ANNs have considered many factors such as weather condition, holidays, weekends and sports matches days in forecasting model successfully. This is because of learning ability of ANNs with many input factors. Main advantages of ANNs that has increased their use in forecasting are: 1. Being conducted off-line without time constraints and direct coupling to power system for data acquisition. 95 2. Ability to adjust the parameters for ANN inputs that does not have functional relationship between them such as weather conditions and load profile. 2. Fault Diagnosis\Fault Location Progress in the areas of communication and digital technology has increased the amount of information available at supervisory control and data acquisition (SCADA) systems. Although information is very useful, during events that cause outages, the operator may be overwhelmed by the excessive number of simultaneously operating alarms, which increases the time required for identifying the main outage cause and to start the restoration process. Besides, factors such as stress and inexperience can affect the operator’s performance; thus, the availability of a tool to support the real-time decision-making process is desired. The protection devices are responsible for detecting the occurrence of a fault, and when necessary, they send trip signals to circuit breakers (CBs) in order to isolate the defective part of the system. However, when relays or CBs do not work properly, larger parts of the system may be disconnected. After such events, in order to avoid damages to energy distribution utilities and consumers, it is essential to restore the system as soon as possible. Not only this, before starting the restoration, it is necessary to identify the event that caused the sequence of alarms such as protection system failure, defects in communication channels, corrupted data acquisition etc. The heuristic nature of the reasoning involved in the operator’s analysis and the absence of an analytical formulation, leads to the use of artificial intelligence techniques. Model-based systems including temporal characteristics of protection schemes based on general regression neural network (GRNN) in feed forward topology can be successfully used for this purpose. ANNs are fault tolerant and are able to learn off-line from a set of historical or simulated data in order to make on-line inferences. They are able to produce a diagnosis even in difficult situations, such as the presence of noisy data or protection system failures. 96 3. Economic Dispatch Main goal of economic dispatch (ED) consists of minimizing the operating costs depending on demand and subject to certain constraints, i.e. how to allocate the required load demand between the available generation units. In practice, the whole of the unit operating range is not always available for load allocation due to physical operation limitations. Several methods like Lagrangian relaxation method, linear programming techniques, Beale’s quadratic programming, Newton method, Lagrangian augmented function etc. are being used. The economic dispatch problem is a nonconvex optimization problem. Neural networks and specially the Hopfield model, have a well-demonstrated capability of solving combinational optimization problem. 4. Security Assessment The principle task of an electric power system is to deliver the power requested by the customers, without exceeding acceptable voltage and frequency limits. This task has to be solved in real time and in safe, reliable and economical manner. Fig. 4.10 show a simplified diagram of the principle data flow in a power system where real-time measurements are stored in a database. The state estimation then adjusts bad and missing data. Based on the estimated values the current mathematical model of the power system is established. Based on simulation of potential equipment outage, the security level of the system is determined. If the system is considered unsafe with respect to one or more potential outages, control actions have to be taken. 97 Fig. 4.10: Data flow in power System Operation Generally there are two types of security assessments: static and dynamic. In both types different operational states are defined as follows: • Normal or secure state: All customer demands are met and operating limit is within presented limits. • Alert or critical state: The system variables are still within limits and constrain are satisfied, but little disturbance can lead to variable toward instability. • Emergency or unsecure state: the power system enters the emergency mode of operation upon violation of security related inequality constraints. ANN with backpropagation training algorithm can be successfully used to solve these problems. 5. Alarm Processing In emergencies, engineers are expected to quickly evaluate various options and implement an optimal corrective action. However, the number of real-time messages (alarms) received is too large for the time available for their evaluation. Processing such alarms in real-time and alerting the operator to the root cause or the 98 most important of these alarms has been identified as a valuable operational aid. ANNs have been implemented for such alarm processing. The fast response of a trained ANN and its generalization abilities are very useful in this application. 6. Eddy current analysis Analysis of eddy current losses requires numerical solution of Integradifferential equations. Discretizing these equations and solving them using finiteelement methods is computationally very expensive. Cellular neural networks as an alternative to finite-element methods have been developed which produce faster, computationally less expensive and simpler method of solving these equations. These networks calculate eddy currents and eddy current losses in a source current carrying conductor in a time-varying magnetic field. This implementation opens up a wide range of applications in structural analysis, electromagnetic field computations, etc. 7. Harmonic source monitoring It is often required to identify and monitor harmonic sources in the systems containing non-linear loads. The ANNs can be trained using simulation results for varying load conditions. They can be used in conjunction with a state estimator to pinpoint and monitor the source of the harmonics. 8. Applications in nuclear power plants ANNs have potential applications in enhancing the safety and efficiency of nuclear power plants, like, diagnosis of specific abnormal conditions, detection of the change of mode of operation, signal validation, monitoring of check valves, modeling of the plant thermodynamics, monitoring of plant parameters, analysis of plant vibrations, etc. The ANNs can also be used in many other applications, some of which as listed below: 99 • Load frequency control (Automatic Generation Control) • Hydroelectric Generation Scheduling: • Power System Stabilizer Design: • Load Flow studies • Load modeling • HVDC • Non conventional energy field • Maintenance scheduling • Unit commitment Main advantages of using ANNs in power system applications are: • Their capability of dealing with stochastic variations of the scheduled operating point with increasing data • Very fast and on-line processing and classification • Implicit nonlinear modeling and filtering of system data 4.13 CONCLUDING REMARKS The artificial neural networks have been dealt with in details in this chapter. In this research work, the ANN controllers based on optimal and suboptimal control strategy have been designed and developed for various types of interconnected power systems, the procedure for which has been discussed in chapter 5. Since these ANN controllers have been trained with a data obtained from optimal and suboptimal control strategies, they have given the results comparable to that of optimal and suboptimal control. The actual results have been shown and discussed in chapter 6. 100