Download chaper 4_c b bangal

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neural coding wikipedia , lookup

Artificial intelligence wikipedia , lookup

Optogenetics wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Gene expression programming wikipedia , lookup

Biological neuron model wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Neural engineering wikipedia , lookup

Neural modeling fields wikipedia , lookup

Development of the nervous system wikipedia , lookup

Synaptic gating wikipedia , lookup

Metastability in the brain wikipedia , lookup

Central pattern generator wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Backpropagation wikipedia , lookup

Artificial neural network wikipedia , lookup

Nervous system network models wikipedia , lookup

Catastrophic interference wikipedia , lookup

Convolutional neural network wikipedia , lookup

Recurrent neural network wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Transcript
CHAPTER 4
ARTIFICIAL NEURAL NETWORKS
4.1
INTRODUCTION
Artificial Neural Networks (ANNs) are relatively crude electronic models
based on the neural structure of the brain. The brain learns from experience.
Artificial neural networks try to mimic the functioning of brain. Even simple animal
brains are capable of functions that are currently impossible for computers.
Computers do the things well, but they have trouble recognizing even simple
patterns.
The brain stores information as patterns. Some of these patterns are very
complicated and allow us the ability to recognize individual faces from many
different angles. This process of storing information as patterns, utilizing those
patterns, and then solving the problems encompasses a new field in computing,
which does not utilize traditional programming but involves the creation of
massively parallel networks and the training of those networks to solve specific
problems.
The exact workings of the human brain are still a mystery, yet some aspects
are known. The most basic element of the human brain is a specific type of cell,
called ‘neuron’. These neurons provide the abilities to remember, think, and apply
previous experiences to our every action. They are about 100 billion in number and
each of these neurons connects itself with about 200,000 other neurons, although
1,000 to 10,000 is typical. The power of the human mind comes from the sheer
numbers of these basic components and the multiple connections between them. It
also comes from genetic programming and learning.
The individual neurons are complicated. They have a myriad of parts, subsystems and control mechanisms. They convey information via a host of
74
electrochemical pathways. Together, these neurons and their connections form a
process, which is not binary, not stable, and not synchronous.
4.2
A BIOLOGICAL NEURON
Basically, a biological neuron receives inputs from other sources, combines
them in some way, performs a generally nonlinear operation on the result, and then
outputs the final result. Fig. 4.1 shows the relationship of these four parts.
Fig. 4.1: A biological neuron
Within humans there are many variations on basic type of neuron, yet, all
biological neurons have the same four basic components. They are known by their
biological names – cell body (soma), dendrites, axon, and synapses.
Cell body (Soma): The body of neuron cell contains the nucleus and carries out
biochemical transformation necessary to the life of neurons.
Dendrite: Each neuron has fine, hair like tubular structures (extensions) around it.
They branch out into tree around the cell body. They accept incoming signals.
Axon: It is a long, thin, tubular structure which works like a transmission line.
Synapse: Neurons are connected to one another in complex spatial arrangement.
When axon reaches its final destination it branches again called as terminal
arborization. At the end of axon are highly complex and specialized structures called
synapses. Connection between two neurons takes place at these synapses.
75
Dendrites receive the input through the synapses of other neurons. The soma
processes these incoming signals over time and converts that processed value into
an output, which is sent out to other neurons through the axon and the synapses.
4.3
AN ARTIFICIAL NEURON
The artificial neuron simulates four basic functions of a biological neuron. Fig.
4.2 shows basic representation of an artificial neuron.
Fig. 4.2: A basic artificial neuron.
In Fig. 4.2, various inputs to the network are represented by the mathematical
symbol, x(n). Each of these inputs is multiplied by a connection weight. The weights
are represented by w(n). In the simplest case, these products are summed, fed to a
transfer function (activation function) to generate a result, and this result is sent as
output. This is also possible with other network structures, which utilize different
summing functions as well as different transfer functions.
Some applications like recognition of text, identification of speech, image
deciphering of scenes etc. require binary answers. These applications may utilize the
76
binary properties of ORing and ANDing of inputs along with summing operations.
Such functions can be built into the summation and transfer functions of a network.
Seven major components make up an artificial neuron. These components
are valid whether the neuron is used for input, output, or is in the hidden layers.
Component 1. Weighting Factors: A neuron usually receives many simultaneous
inputs. Each input has its own relative weight, which gives the input the impact that
it needs on the processing element's summation function. Some inputs are made
more important than others to have a greater effect on the processing element as
they combine to produce a neural response. Weights are adaptive coefficients that
determine the intensity of the input signal as registered by the artificial neuron.
They are a measure of an input's connection strength. These strengths can be
modified in response to various training sets and according to a network's specific
topology or its learning rules.
Component 2. Summation Function: The inputs and corresponding weights are
vectors which can be represented as (i1, i2 . . . in) and (w1, w2 . . . wn). The total input
signal is the dot product of these two vectors. The result; (i1 * w1) + (i2 * w2) +……..
+ (in * wn) ; is a single number.
The summation function can be more complex than just weight sum of
products. The input and weighting coefficients can be combined in many different
ways before passing on to the transfer function. In addition to summing, the
summation function can select the minimum, maximum, majority, product or
several normalizing algorithms. The specific algorithm for combining neural inputs
is determined by the chosen network architecture and paradigm. Some summation
functions have an additional ‘activation function’ applied to the result before it is
passed on to the transfer function for the purpose of allowing the summation output
to vary with respect to time.
77
Component 3. Transfer Function: The result of the summation function is
transformed to a working output through an algorithmic process known as the
transfer function. In the transfer function the summation can be compared with
some threshold to determine the neural output. If the sum is greater than the
threshold value, the processing element generates a signal and if it is less than the
threshold, no signal (or some inhibitory signal) is generated. Both types of response
are significant. The threshold, or transfer function, is generally non-linear. Linear
functions are limited because the output is simply proportional to the input.
The step type of transfer function would output zero and one, one and minus
one, or other numeric combinations. Another type, the ‘threshold’ or ramping
function, can mirror the input within a given range and still act as a step function
outside that range. It is a linear function that is clipped to minimum and maximum
values, making it non-linear. Another option is a ‘S’ curve, which approaches a
minimum and maximum value at the asymptotes. It is called a sigmoid when it
ranges between 0 and 1, and a hyperbolic tangent when it ranges between -1 and 1.
Both the function and its derivatives are continuous
Component 4. Scaling and Limiting: After the transfer function, the result can pass
through additional processes, which scale and limit. This scaling simply multiplies a
scale factor times the transfer value and then adds an offset. Limiting is the
mechanism which insures that the scaled result does not exceed an upper, or lower
bound. This limiting is in addition to the hard limits that the original transfer
function may have performed.
Component 5. Output Function (Competition): Each processing element is allowed
one output signal, which it may give to hundreds of other neurons. Normally, the
output is directly equivalent to the transfer function's result. Some network
topologies modify the transfer result to incorporate competition among neighboring
processing elements. Neurons are allowed to compete with each other inhibiting
78
processing elements unless they have great strength. Competition can occur at one
or both levels. First, competition determines which artificial neuron will be active or
provides an output. Second, competitive inputs help determine which processing
element will participate in the learning or adaptation process.
Component 6. Error Function and Back-Propagated Value: In most learning
networks the difference between the current output and the desired output is
calculated as an error which is then transformed by the error function to match a
particular network architecture. Most basic architectures use this error directly but
some square the error while retaining its sign, some cube the error, other paradigms
modify the error to fit their specific purposes. The error is propagated backwards to
a previous layer. This back-propagated value can be either the error, the error scaled
in some manner (often by the derivative of the transfer function) or some other
desired output depending on the network type. Normally, this back-propagated
value, after being scaled by the learning function, is multiplied against each of the
incoming connection weights to modify them before the next learning cycle.
Component 7. Learning Function: Its purpose is to modify the weights on the
inputs of each processing element according to some neural based algorithm.
79
4.4
AN ARTIFICIAL NEURAL NETWORK
Fig. 4.3: An artificial neural network
Fig. 4.3 shows an artificial neural network. Inputs enter into the processing
element from the upper left. The first step is to multiply each of these inputs by their
respective weighting factor [w(n)]. These modified inputs are then fed into the
summing function, which usually sums these products, however, many different
types of operations can be selected. These operations can produce a number of
different values, which are then propagated forward; values such as the average, the
largest, the smallest, the ORed values, the ANDed values, etc. Other types of
summing functions can also be created and sometimes they may be further
complicated by the addition of an activation function which enables the summing
function to operate in a time sensitive way.
The output of the summing function is then sent into a transfer function,
which turns this number into a real output (a 0 or a 1, -1 or +1 or some other
number) via some algorithm. The transfer function can also scale the output or
80
control its value via thresholds. This output is then sent to other processing elements
or an outside connection, as dictated by the structure of the network.
4.5
TRANSFER (ACTIVATION) FUNCTIONS
The transfer function for neural networks must be differential and therefore
continuous to enable correcting error. Derivative of the transfer function is required
for computation of local gradient. One such example of a suitable transfer function is
the sigmoid function. The sigmoid function is a S-shaped graph. It is one of the most
common forms of transfer function used in construction of ANNs. It is defined as a
strictly increasing function. Mathematically its derivative is always positive. It
exhibits a graceful balance between linear and nonlinear behavior. One example of it
is a logistic function represented by the equation: φ (v) =
1
1+ e
This function has certain characteristic. At extremes of ϕ(v): ϕ (v) is flat and ϕ’(v) is
very small. At midrange of ϕ (v): ϕ’(v) is maximum as seen in Fig. 4.4.
Fig. 4.4: Logistic transfer function
81
Several other transfer functions can also be employed as shown in Fig. 4.5.
Fig. 4.5: Transfer functions with different characteristic constant values
4.6
LAYER ARRANGEMENT IN A NEURAL NETWORK
The neurons can be clustered together in many ways. This clustering occurs
in the human mind in such a way that information can be processed in a dynamic,
interactive, and self-organizing way. Biologically, neural networks are constructed
in a three-dimensional world from microscopic components. These neurons seem
capable of nearly unrestricted interconnections. That is not true of any existing manmade network. Neural networks are the simple clustering of artificial neurons by
creating layers and interconnections as shown in Fig. 4.6.
82
Fig. 4.6: Layer arrangement in a neural network
Basically, a neural network is the grouping of neurons into layers, the
connections between these layers, and the summation and transfer functions that
comprises a functioning neural network. Most applications require networks that
contain at least the three layers - input, hidden, and output. The input layer receives
the data either from input files or directly from electronic sensors in real-time
applications. The output layer sends information directly to the outside world, to a
secondary computer process or to other devices. Between these two layers there can
be many hidden layers. These hidden layers contain many neurons in various
interconnected structures. The inputs and outputs of each of these hidden neurons
simply go to other neurons.
In most networks, each neuron in a hidden layer receives the signals from all
the neurons typically from the input layer. After a neuron performs its function, it
passes its output to all of the neurons from typically the output layer, providing a
feed-forward path. This gives a variable strength to an input. There are two types of
these connections. One causes the summing mechanism of the next neuron to add
(excite) while the other causes it to subtract (inhibit). Some networks want a neuron
to inhibit the other neurons in the same layer, called ‘lateral inhibition’. The most
common use of this is in the output layer, e.g., in text recognition, if the probability
of a character being ‘P’ is 0.85 and if the same being ‘F’ is 0.65, the network wants to
83
choose the highest probability and inhibit all the others. It can do that with lateral
inhibition (competition).
4.7
TYPES OF ARTIFICIAL NEURAL NETWORKS
4.7.1 SINGLE LAYER FEED FORWARD NETWORK
A neural network in which the input layer of source nodes projects into an output
layer of neurons but not vice-versa is known as single feed-forward or acyclic
network.
In single layer network, ‘single layer’ refers to the output layer of
computation nodes as shown in Fig. 4.7.
Input layer
Output Layer
Fig. 4.7 : A Single layer feedforward network
4.7.2 MULTILAYER FEED FORWARD NETWORK
This type of network (Fig. 4.8) consists of one or more hidden layers, whose
computation nodes are called hidden neurons or hidden units. The function of
hidden neurons is to interact between the external input and network output in
some useful manner and to extract higher order statistics. The source nodes in input
layer of network supply the input signal to neurons in the second layer (1st hidden
layer). The output signals of 2nd layer are used as inputs to the third layer and so on.
The set of output signals of the neurons in the output layer of network constitutes
84
the overall response of network to the activation pattern supplied by source nodes in
the input first layer.
Input
layer
Hidden Layer
Output layer
Fig. 4.8 : A multilayer feed forward network
Short characterization of feedforward networks:
1. typically, activation is fed forward from input to output through ‘hidden
layers’, though many other architectures exist.
2. mathematically, they implement static input-output mappings.
3. most popular supervised training algorithm: backpropagation algorithm
4. have proven useful in many practical applications as approximators of
nonlinear functions and as pattern classificators.
4.7.3 RECURRENT NETWORK
A feed forward neural network having one or more hidden layers with atleast
one feedback loop is known as recurrent network as shown in Fig. 4.9. The feedback
may be a self feedback, i.e., where output of neuron is fed back to its own input.
Sometimes, feedback loops involve the use of unit delay elements, which results in
nonlinear dynamic behaviour, assuming that neural network contains non linear
units.
85
Input layer
Output Layer
Fig. 4.9 : A recurrent network
There are various other types of networks like; delta-bar-delta, Hopfield,
vector quantization, counter propagation, probabilistic, Hamming, Boltzman, bidirectional associative memory, spacio-temporal pattern, adaptive resonance, self
organizing map, recirculation etc.
A recurrent neural network has (at least one) cyclic path of synaptic
connections. Basic characteristics:
1. all biological neural networks are recurrent
2. mathematically, they implement dynamical systems
3. several types of training algorithms are known, no clear winner
4. theoretical and practical difficulties by and large have prevented practical
applications so far.
4.8
TRAINING OF ARTIFICIAL NEURAL NETWORKS
Once a network has been structured for a particular application, it is ready for
training. At the beginning, the initial weights are chosen randomly and then the
training or learning begins. There are two approaches to training; supervised and
unsupervised.
86
4.8.1 SUPERVISED TRAINING
In supervised training, both the inputs and the outputs are provided. The
network then processes the inputs and compares its resulting outputs against the
desired outputs. Errors are then propagated back through the system, causing the
system to adjust the weights, which control the network. This process occurs over
and over as the weights are continually tweaked. The set of data, which enables the
training, is called the "training set." During the training of a network, the same set of
data is processed many times, as the connection weights are ever refined.
Sometimes a network may never learn. This could be because the input data
does not contain the specific information from which the desired output is derived.
Networks also don't converge if there is not enough data to enable complete
learning. Ideally, there should be enough data so that part of the data can be held
back as a test. Many layered networks with multiple nodes are capable of
memorizing data. To monitor the network to determine if the system is simply
memorizing its data in some non-significant way, supervised training needs to hold
back a set of data to be used to test the system after it has undergone its training.
If a network simply can't solve the problem, the designer then has to review
the input and outputs, the number of layers, the number of elements per layer, the
connections between the layers, the summation, transfer, and training functions, and
even the initial weights themselves. Another part of the designer's creativity governs
the rules of training. There are many laws (algorithms) used to implement the
adaptive feedback required to adjust the weights during training. The most common
technique is known as back-propagation.
The training is not just a technique, but a conscious analysis, to insure that the
network is not over trained. Initially, an artificial neural network configures itself
87
with the general statistical trends of the data. Later, it continues to ‘learn’ about
other aspects of the data, which may be spurious from a general viewpoint.
When finally the system has been correctly trained and no further learning is
needed, the weights can, if desired, be ‘frozen’. In some systems, this finalized
network is then turned into hardware so that it can be fast. Other systems don't lock
themselves in but continue to learn while in production use.
4.8.2 UNSUPERVISED OR ADAPTIVE TRAINING
The other type is the unsupervised training (learning). In this type, the
network is provided with inputs but not with desired outputs. The system itself
must then decide what features it will use to group the input data. This is often
referred to as self-organization or adaption. These networks use no external
influences to adjust their weights. Instead, they internally monitor their
performance. These networks look for regularities or trends in the input signals, and
makes adaptations according to the function of the network. Even without being
told whether it's right or wrong, the network still must have some information about
how to organize itself. This information is built into the network topology and
learning rules. An unsupervised learning algorithm might emphasize cooperation
among clusters of processing elements. In such a scheme, the clusters would work
together. If some external input activated any node in the cluster, the cluster's
activity as a whole could be increased. Likewise, if external input to nodes in the
cluster was decreased, that could have an inhibitory effect on the entire cluster.
Competition between processing elements could also form a basis for
learning. Training of competitive clusters could amplify the responses of specific
groups to specific stimuli. As such, it would associate those groups with each other
and with a specific appropriate response. Normally, when competition for learning
is in effect, only the weights belonging to the winning processing element will be
88
updated. Presently, the unsupervised learning is not well understood and there
continues to be a lot of research in this aspect.
4.9
LEARNING RATES
The rate at which ANNs learn depends upon several controllable factors. A
slower rate means more time to spend in producing an adequately trained system.
With faster learning rates, however, the network may not be able to make the fine
discriminations that are possible with a system learning slowly.
Most learning functions have some provision for a learning rate (learning
constant). Usually this term is positive and between 0 and 1. If the learning rate is
greater than 1, it is easy for the learning algorithm to overshoot in correcting the
weights, and the network will oscillate. Small values of the learning rate will not
correct the current error as quickly, but if small steps are taken in correcting errors,
there is a good chance of arriving at the best minimum convergence.
4.10
LEARNING LAWS (ALGORITHMS)
Many learning laws are in common use. Most of them are some sort of
variation of the best known and oldest ‘Hebb's Rule’.
Hebb's Rule: This was introduced by Donald Hebb in ‘Organization of Behavior’.
The basic rule is: If a neuron receives an input from another neuron and if both are
highly active (same sign), the weight between the two neurons should be
strengthened.
Hopfield Law: If the desired output and the input are both active or both inactive,
increment the connection weight by the learning rate, otherwise decrement the
weight by the learning rate.
89
The Delta Rule: This rule is based on the simple idea of continuously modifying the
strengths of the input connections to reduce the difference (the delta) between the
desired output value and the actual output of a processing element.
The Gradient Descent Rule: This is similar to Delta Rule in that, the derivative of
the transfer function is still used to modify the delta error before it is applied to the
connection weights. However, an additional proportional constant tied to the
learning rate is appended to the final modifying factor acting upon the weight.
Kohonen's Law: In this, the processing elements compete for the opportunity to
learn or update their weights. The element with largest output is declared the
winner and has the capability of inhibiting its competitors as well as exciting its
neighbors. Only the winner is permitted an output and only the winner plus its
neighbors are allowed to adjust their connection weights.
4.11
BACKPROPAGATION FOR FEED FORWARD NETWORKS
The backpropagation (BP) algorithm is the most commonly used training
method for feed forward networks. Consider a multi-layer perceptron with ‘k’
hidden layers. Together with the layer of input units and the layer of output units
this gives k+2 layers of units altogether, which are numbered by 0, ..., k+1. Let the
m
number of input units be K, output units be L and of unis in hidden layer m be N .
m
The weight of jth unit in layer m and the ith unit in layer m+1 is denoted by wij . The
m
activation of the ith unit in layer m is xi (for m = 0 this is an input value, for m = k+1
an output value). The training data for a feedforward network training task consist
of T input-output (vector-valued) data pairs
u (n) = ( x10 (n),..., xK0 (n)) t , d (n) = (d1k +1 (n),..., d Lk +1 (n)) t ,
where ‘n’ denotes training instance. The activation of non-input units is computed
according to
90
xim+1 (n) = f (
wijm x j (n)).
j =1,..., N m
Presented with training input u(t), the previous update equation is used to
compute activations of units in subsequent hidden layers, until a network
response y (n) = ( x1k +1 (n),..., xLK +1 (n)) t is obtained in the output layer. The objective of
training is to find a set of network weights such that the summed squared error
2
E=
d ( n) − y ( n ) =
n =1,...,T
E (n) is minimized. This is done by incrementally
n =1,....T
changing the weights along the direction of the error gradient with respect to
weights
∂E
∂E (n)
using a (small) learning rate γ:
=
m
∂wij t =1,....T ∂wijm
new wijm = wijm − γ
∂E
.
∂wijm
This is the formula used in batch learning mode, where new weights are
computed after presenting all training samples. One such pass through all samples
is called an epoch. Before the first epoch, weights are initialized, typically to small
random numbers. A variant is incremental learning, where weights are changed
after presentation of individual training samples:
new wijm = wijm − γ
∂E (n)
∂wijm
The subtask in this method is the computation of the error gradients
∂E (n)
.
∂wijm
The backpropagation algorithm is a scheme to perform these computations.
The procedure for one epoch of batch processing is given below.
Input: current weights wijm , training samples.
Output: new weights.
Computation steps:
1. For each sample n, compute activations of internal and output units (forward
pass).
91
2. Compute, by proceeding backward through m = k+1, k, ..., 1, for each unit xim the
error propagation term δ im (n) .
δ ik +1 (n) = (d i (n) − yi (n))
∂f (u )
∂u u = zik +1
for the output layer and
N m +1
δ im+1 wijm
m
j
δ ( n) =
i =1
∂f (u )
∂u u = z mj
for the hidden layers, where
N m −1
x mj−1 (n) wijm−1
zim (n) =
j =1
is the internal state (or potential) of unit xim . This is the error backpropagation pass.
Mathematically, the error propagation term δ im (n) represents the error gradient
w.r.t. the potential of the unit xim .
∂E
∂u
u = z mj
3. Adjust the connection weights according to
T
m −1
ij
new w
m −1
ij
=w
δ im (n) x mj−1 (n)
+γ
t =1
After every such epoch, compute the error. Stop when the error falls below a
predetermined threshold or when the change in error falls below another
predetermined threshold or when the number of epochs exceeds a predetermined
maximal number of epochs. Many (order of thousands in nontrivial tasks) such
epochs may be required until a sufficiently small error is achieved. One epoch
requires O(T M) multiplications and additions, where M is the total number of
network connections.
92
The basic gradient descent approach (and its backpropagation algorithm
implementation) is notorious for slow convergence, because the learning rate γ must
be typically chosen small to avoid instability. Another approach to achieve faster
convergence is to use second-order gradient descent techniques, which exploit
2
curvature of the gradient but have epoch complexity O(T M ).
4.12
APPLICATIONS OF NEURAL NETWORKS
4.12.1 GENERAL APPLICATIONS
Many of the networks being designed presently are statistically quite accurate
(upto 85% to 90% accuracy). Currently, neural networks are not the user interface,
which translates spoken words into instructions for a machine but some day it will
be achieved. VCRs, home security systems, CD players, and word processors will
simply be activated by voice. Touch screen and voice editing will replace the word
processors of today while bringing spreadsheets and data bases to a level of
usability. Neural network design is progressing in other more promising application
areas.
(i) Language Processing: These applications include text-to-speech conversion,
auditory input for machines, automatic language translation, secure voice keyed
locks, automatic transcription, aids for the deaf, aids for the physically disabled
which respond to voice commands and natural language processing.
(ii) Character Recognition: Neural network based products are available which can
recognize hand printed characters through a scanner. It is 98% accurate for numbers,
a little less for alphabetical characters. Quantum Neural Network software package
(Qnspec) is available for recognizing characters, including cursive.
(iii) Image (data) Compression: Neural networks can do real-time compression and
decompression of data. These networks can reduce eight bits of data to three and
then reverse that process upon restructuring to eight bits again.
93
(iv) Pattern Recognition: Many pattern recognition applications are in use like, a
system that can detect bombs in luggage at airports by identifying from small
variances and patterns from within specialized sensor's outputs, a back-propagation
neural network which can discriminate between a true and a false heart attack, a
network which can scan and also read the PAP smears etc. Many automated quality
control applications are now in use, which are based on pattern recognition.
(v) Signal Processing: Neural networks have proven capable of filtering out
electronic noise. Another application is a system that can detect engine misfire
simply from the engine sound.
(vi) Financial: Banks, credit card companies and lending institutions deal with many
decisions that are not clear-cut. They involve learning and statistical trends. Neural
networks are now trained on the data from past decisions and being used in
decision-making.
(vii) Servo Control: A neural system known as Martingale's Parametric Avalanche a spatio-temporal pattern recognition network is being designed to control the
shuttle during in-flight maneuvers. Another application is ALVINN, for
Autonomous Land Vehicle.
4.12.2 APPLICATIONS IN POWER SYSTEMS
The electric power industry is currently undergoing an unprecedented
reform. One of the most exciting and potentially profitable recent developments is
increasing usage of ANN techniques. A major advantage of ANN approach is that
the domain knowledge is distributed in the neurons and information processing is
carried out in parallel-distributed manner. Being adaptive units, they are able to
learn these complex relationships even when no functional model exists. This
provides the capability to do ‘Black Box Modeling’ with little or no prior knowledge
of the function itself. ANNs have the ability to properly classify a highly non-linear
relationship and once trained, they can classify new data much faster than it would
94
be possible by solving the model analytically. The rising interest in ANNs is largely
due to the emergence of powerful new methods as well as to the availability of
computational power suitable for simulation. The field is particularly exciting today
because ANN algorithms and architectures can be implemented in VLSI technology
for real-time applications.
The application of ANNs in many areas under electrical power systems has
lead to acceptable results.
1. Load Forecasting
Load forecasting is a very common and popular problem, which has an
important role in economic, financial, development, expansion and planning of
power systems.
• Short-term load forecasting over an interval ranging from an hour to a week is
important for various applications such as unit commitment, economic dispatch,
energy transfer scheduling and real time control.
• Mid-term load forecasting that range from one month to five years, is used to
purchase enough fuel for power plants after electricity tariffs are calculated.
• Long-term load forecasting (LTLF), covering from 5 to 20 years or more, is used by
planning engineers and economists to determine the type and the size of generating
plants that minimize both fixed and variable costs.
The ANNs can be used to solve these problems. Most of the projects using
ANNs have considered many factors such as weather condition, holidays, weekends
and sports matches days in forecasting model successfully. This is because of
learning ability of ANNs with many input factors. Main advantages of ANNs that
has increased their use in forecasting are:
1. Being conducted off-line without time constraints and direct coupling to power
system for data acquisition.
95
2. Ability to adjust the parameters for ANN inputs that does not have functional
relationship between them such as weather conditions and load profile.
2. Fault Diagnosis\Fault Location
Progress in the areas of communication and digital technology has increased
the amount of information available at supervisory control and data acquisition
(SCADA) systems. Although information is very useful, during events that cause
outages, the operator may be overwhelmed by the excessive number of
simultaneously operating alarms, which increases the time required for identifying
the main outage cause and to start the restoration process. Besides, factors such as
stress and inexperience can affect the operator’s performance; thus, the availability
of a tool to support the real-time decision-making process is desired. The protection
devices are responsible for detecting the occurrence of a fault, and when necessary,
they send trip signals to circuit breakers (CBs) in order to isolate the defective part of
the system. However, when relays or CBs do not work properly, larger parts of the
system may be disconnected. After such events, in order to avoid damages to energy
distribution utilities and consumers, it is essential to restore the system as soon as
possible. Not only this, before starting the restoration, it is necessary to identify the
event that caused the sequence of alarms such as protection system failure, defects in
communication channels, corrupted data acquisition etc. The heuristic nature of the
reasoning involved in the operator’s analysis and the absence of an analytical
formulation, leads to the use of artificial intelligence techniques. Model-based
systems including temporal characteristics of protection schemes based on general
regression neural network (GRNN) in feed forward topology can be successfully
used for this purpose. ANNs are fault tolerant and are able to learn off-line from a
set of historical or simulated data in order to make on-line inferences. They are able
to produce a diagnosis even in difficult situations, such as the presence of noisy data
or protection system failures.
96
3. Economic Dispatch
Main goal of economic dispatch (ED) consists of minimizing the operating
costs depending on demand and subject to certain constraints, i.e. how to allocate
the required load demand between the available generation units. In practice, the
whole of the unit operating range is not always available for load allocation due to
physical operation limitations. Several methods like Lagrangian relaxation method,
linear programming techniques, Beale’s quadratic programming, Newton method,
Lagrangian augmented function etc. are being used. The economic dispatch problem
is a nonconvex optimization problem. Neural networks and specially the Hopfield
model, have a well-demonstrated capability of solving combinational optimization
problem.
4. Security Assessment
The principle task of an electric power system is to deliver the power
requested by the customers, without exceeding acceptable voltage and frequency
limits. This task has to be solved in real time and in safe, reliable and economical
manner. Fig. 4.10 show a simplified diagram of the principle data flow in a power
system where real-time measurements are stored in a database. The state estimation
then adjusts bad and missing data. Based on the estimated values the current
mathematical model of the power system is established. Based on simulation of
potential equipment outage, the security level of the system is determined. If the
system is considered unsafe with respect to one or more potential outages, control
actions have to be taken.
97
Fig. 4.10: Data flow in power System Operation
Generally there are two types of security assessments: static and dynamic. In
both types different operational states are defined as follows:
• Normal or secure state: All customer demands are met and operating limit is
within presented limits.
• Alert or critical state: The system variables are still within limits and constrain are
satisfied, but little disturbance can lead to variable toward instability.
• Emergency or unsecure state: the power system enters the emergency mode of
operation upon violation of security related inequality constraints.
ANN with backpropagation training algorithm can be successfully used to
solve these problems.
5. Alarm Processing
In emergencies, engineers are expected to quickly evaluate various options
and implement an optimal corrective action. However, the number of real-time
messages (alarms) received is too large for the time available for their evaluation.
Processing such alarms in real-time and alerting the operator to the root cause or the
98
most important of these alarms has been identified as a valuable operational aid.
ANNs have been implemented for such alarm processing. The fast response of a
trained ANN and its generalization abilities are very useful in this application.
6. Eddy current analysis
Analysis of eddy current losses requires numerical solution of Integradifferential equations. Discretizing these equations and solving them using finiteelement methods is computationally very expensive. Cellular neural networks as an
alternative to finite-element methods have been developed which produce faster,
computationally less expensive and simpler method of solving these equations.
These networks calculate eddy currents and eddy current losses in a source current
carrying conductor in a time-varying magnetic field. This implementation opens up
a wide range of applications in structural analysis, electromagnetic field
computations, etc.
7. Harmonic source monitoring
It is often required to identify and monitor harmonic sources in the systems
containing non-linear loads. The ANNs can be trained using simulation results for
varying load conditions. They can be used in conjunction with a state estimator to
pinpoint and monitor the source of the harmonics.
8. Applications in nuclear power plants
ANNs have potential applications in enhancing the safety and efficiency of
nuclear power plants, like, diagnosis of specific abnormal conditions, detection of
the change of mode of operation, signal validation, monitoring of check valves,
modeling of the plant thermodynamics, monitoring of plant parameters, analysis of
plant vibrations, etc.
The ANNs can also be used in many other applications, some of which as
listed below:
99
•
Load frequency control (Automatic Generation Control)
•
Hydroelectric Generation Scheduling:
•
Power System Stabilizer Design:
•
Load Flow studies
•
Load modeling
•
HVDC
•
Non conventional energy field
•
Maintenance scheduling
•
Unit commitment
Main advantages of using ANNs in power system applications are:
• Their capability of dealing with stochastic variations of the scheduled operating
point with increasing data
• Very fast and on-line processing and classification
• Implicit nonlinear modeling and filtering of system data
4.13
CONCLUDING REMARKS
The artificial neural networks have been dealt with in details in this chapter.
In this research work, the ANN controllers based on optimal and suboptimal control
strategy have been designed and developed for various types of interconnected
power systems, the procedure for which has been discussed in chapter 5.
Since these ANN controllers have been trained with a data obtained from
optimal and suboptimal control strategies, they have given the results comparable to
that of optimal and suboptimal control. The actual results have been shown and
discussed in chapter 6.
100