Download image compression and decompression using

Document related concepts

Computer vision wikipedia , lookup

Stereoscopy wikipedia , lookup

Stereo display wikipedia , lookup

Spatial anti-aliasing wikipedia , lookup

Image editing wikipedia , lookup

Transcript
IMAGE COMPRESSION AND DECOMPRESSION USING
NEURAL NETWORKS
A PROJECT REPORT
Submitted in partial fulfillment of the requirements
for the award of the degree of
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
BY
This is to certify that this project work entitled “IMAGE COMPRESSION
AND DECOMPRESSION USING NEURAL NETWORKS” is being submitted in
partial fulfillment for the award of degree of BACHELOR OF TECHNOLOGY
in ELECTRONICS AND COMMUNICATION ENGINEERING, Jawaharlal
Nehru Technological University and is a record of bonafide work carried out
by
INDEX
1. ABSTRACT
1
2. INTRODUCTION
2
3. THEORY
4
3.1 NEURAL NETWORKS
4
 Artificial neural networks
 The Analogy to the Brain
 The Biological Neuron
 The Artificial Neuron
 Design
 Layers
 Communication and types of connections
 Learning laws
 Applications of Neural Networks
3.2 IMAGE PROCESSING
17
 Image Compression
 Principles of Image Compression
 Performance measurement of image Compression
 Compression Standards
4. IMAGE COMPRESSION WITH NEURAL NETWORK
27
 Back-Propagation image Compression
 Hierarchical Back-Propagation Neural Network
 Adaptive Back-Propagation Neural Network
 Hebbian Learning Based Image Compression
 Vector Quantization Neural Networks;
 Predictive Coding Neural Networks.
5. PROPSED IMGAE COMPRESSION USING NEURAL NETWORK
 Levenberg Marquardt Algorithm
 Training Procedure
41
6. IMPLEMEMNTATION OF IMAGE COMPRESSION
46
AND DE-COMPRESSION USING MATLAB
 Matlab code
 Functions used in MATLAB program
 Results
7. CONCLUSION
57
8. FUTURE SCOPE
58
9. BIBILIOGRAPHY
59
ABSTRACT
Uncompressed multimedia (graphics, audio and video) data requires considerable
storage capacity and transmission bandwidth. Despite rapid progress in mass-storage
density, processor speeds, and digital communication system performance, demand for data
storage capacity and data-transmission bandwidth continues to outstrip the capabilities of
available technologies. The recent growth of data intensive multimedia-based web
applications has not only sustained the need for more efficient ways to storage and
communication technology.
Apart from the existing technology on image compression represented by series of
JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic
algorithms are being developed to explore the future of image coding. Successful
applications of neural networks to basic propagation algorithm have now become well
established and other aspects of neural network involvement in this technology. Here we
present an extensive survey on the development of neural network for image compression.
One of the most promising ways to utilize the power of neural network for image
compression lies on
(a) Selection of efficient multi layered network
(b) Selection of training methods
(c) Test vector.
Based on this criteria network are trained and implemented.
In this project a literature survey has been carried out to find and efficient multilayered neural network and suitable and tested using MATLAB for a test case of image of
size 64:64, the trained weight and biases have been recorded.
1
INTRODUCTION:Neural networks are inherent adaptive systems, they are suitable for handling
nonstationaries in image data. Artificial neural network can be employed with success to
image compression. The advantages of realizing a neural network in digital hardware are:

Fast multiplication, leading to fast update of the neural network.

Flexibility, because different network architectures are possible.

Scalability, as the proposed hardware architecture can be used for arbitrary large
network, considered by the no. of neurons in one layer.
The greatest potential of neural networks is the high speed processing that is
provided through massively parallel VLSI implementations. The choice to build a neural
network in digital hardware comes from several advantages that are typical for digital
systems:
1. Low sensitivity to electric noise and temperature.
2. Weight storage is no problem.
3. The availability of user-configurable, digital field programmable gate arrays, which
can be used for experiments.
4. Well-understood design principles that have led to new, powerful tools for digital
design
The crucial problems of neural network hardware are fast multiplication, building a
large number of connections between neurons, and fast memory access of weight storage or
nonlinear function look up tables.
The most important part of a neuron is the multiplier, which performs high speed
pipelined multiplication of synaptic signals with weights. As the neuron has only one
multiplier the degree of parallelism is node parallelism. Each neuron has a local weight
ROM (as it performs the feed-forward phase of the back propagation algorithm) that stores,
2
as many values as there are connections to the previous layer. An accumulator is used to add
signals from the pipeline with the neuron’s bias value, which is stored in an own register.
The aim is to design and implement image compression using Neural network to
achieve better SNR and compression levels. The compression is first obtained by modeling
the Neural Network in MATLAB. This is for obtaining offline training.
3
3. THEORY
3.1 NEURAL NETWORKS
3.1.1 Artificial Neural Networks
An Artificial Neural Network (ANN) is an information-processing paradigm that is
inspired by the way biological nervous systems, such as the brain, process information. The
key element of this paradigm is the novel structure of the information processing system. It
is composed of a large number of highly interconnected processing elements (neurons)
working in unison to solve specific problems. ANNs, like people, learn by example. An
ANN is configured for a specific application, such as pattern recognition or data
classification, through a learning process. Learning in biological systems involves
adjustments to the synaptic connections that exist between the neurones. This is true of
ANNs as well. Artificial Neural Network is a system loosely modeled on the human brain.
The field goes by many names, such as connectionism; parallel distributed processing,
neurocomputing, natural intelligent systems, machine learning algorithms, and aritificial
neural networks. It is an attempt to simulate within specialized hardware or sophisticated
software, the multiple layers of simple processing elements called neurons. Each neuron is
linked to certain of its neighbors with varying coefficients of connectivity that represent the
strengths of these connections.
Neural networks, with their remarkable ability to derive meaning from complicated
or imprecise data, can be used to extract patterns and detect trends that are too complex to
be noticed by either humans or other computer techniques. A trained neural network can be
thought of as an "expert" in the category of information it has been given to analyze.
This expert can then used to provide projections given new situations of interest and answer
"what if" questions.
4
Other advantages include:
Adaptive learning: An ability to learn how to do tasks based on the data given for training
or initial experience.
Self-Organization: An ANN can create its own organization or representation of the
information it receives during learning time.
Real Time Operation: ANN computations may be carried out in parallel, and special
hardware devices are being designed and manufactured which take advantage of this
capability.
Fault Tolerance via Redundant Information Coding: Partial destruction of a
network leads to the corresponding degradation of performance. However, some network
capabilities may be retained even with major network damage
3.1.2 The Analogy to the Brain
Neural networks process information in a similar way the human brain does. The
network is composed of a large number of highly interconnected processing elements
(neurons) working in parallel to solve a specific problem. Neural networks learn by
example. They cannot be programmed to perform a specific task. The examples must be
selected carefully otherwise useful time is wasted or even worse the network might be
functioning incorrectly. The disadvantage is that because the network finds out how to solve
the problem by itself, its operation can be unpredictable.
On the other hand, conventional computers use a cognitive approach to problem
solving; the way the problem is to solve must be known and stated in small unambiguous
instructions. These instructions are then converted to a high-level language program and
then into machine code that the computer can understand. These machines are totally
predictable; if anything goes wrong is due to a software or hardware fault. Neural networks
and conventional algorithmic computers are not in competition but complement each other.
There are tasks are more suited to an algorithmic approach like arithmetic operations and
tasks that are more suited to neural networks. Even more, a large number of tasks require
systems that use a combination of the two approaches (normally a conventional computer is
used to supervise the neural network) in order to perform at maximum efficiency.
5
The most basic components of neural networks are modeled after the structure of the
brain. Some neural network structures are not closely to the brain and some does not have a
biological counterpart in the brain. However, neural networks have a strong similarity to the
brain and therefore a great deal of the terminology is borrowed from neuroscience.
3.1.3 The Biological Neuron
The most basic element of the human brain is a specific type of cell, which provides
with the abilities to remember, think, and apply previous experiences to our every action.
These cells are known as neurons; each of these neurons can connect with up to 200000
other neurons. The power of the brain comes from the numbers of these basic components
and the multiple connections between them.
All natural neurons have four basic components, which are dendrites, soma, axon, and
synapses. Basically, a biological neuron receives inputs from other sources, combines them
in some way, performs a generally nonlinear operation on the result, and then output the
final result. The figure below shows a simplified biological neuron and the relationship of
its four components. In the human brain, a typical neuron collects signals from others
through a host of fine structures called dendrites. The neuron sends out spikes of electrical
activity through a long, thin stand known as an axon, which splits into thousands of
branches. At the end of each branch, a structure called a synapse converts the activity from
the axon into electrical effects that inhibit or excite activity in the connected neurons. When
a neuron receives excitatory input that is sufficiently large compared with its inhibitory
input, it sends a spike of electrical activity down its axon. Learning occurs by changing the
effectiveness of the synapses so that the influence of one neuron on another changes
6
Fig 3.1 BIOLOGICAL NEURON
3.1.4 The Artificial Neuron
The basic unit of neural networks, the artificial neurons, simulates the four basic
functions of natural neurons. Artificial neurons are much simpler than the biological neuron;
the figure below shows the basics of an artificial neuron.
Figure 3.2 SINGLE NEURON
7
The various inputs to the network are represented by the mathematical symbol, x(n).
Each of these inputs are multiplied by a connection weight, these weights are represented by
w(n). In the simplest case, these products are simply summed, fed through a transfer
function to generate a result , and then output.
Even though all artificial neural networks are constructed from this basic building
block the fundamentals may vary in these building blocks and there are differences.
3.1.5 Design
The developer goes through a period of trial and error in the design decisions before
coming up with satisfactory design. The design issues in neural networks are complex and
are the major concerns of system developers.
Designing a neural network consists of:
8

Arranging neurons in various layers.

Deciding the type of connections among neurons for different layers, as well as
among the neurons within a layer.

Deciding the way a neuron receives input and produces output.

Determining the strength of connection within the network by allowing the network
learns the appropriate values of connection weights by using a training data set.
The process of designing a neural network is an iterative process.
3.1.6 Layers
Biologically, neural networks are constructed in a three dimensional way from
microscopic
components.
These
neurons
seen
capable
of
nearly
unrestricted
interconnections. This is not true in any man-made network. Artificial neural network are
the simple clustering of the primitive artificial neurons. This clustering occurs by creating
layers, which are then connected to one another. How these layers connect may also vary.
Basically, all artificially neural networks have a similar structure of topology. Some of the
neurons interface the real world to receive its inputs and other neurons provide the real
world with the network’s outputs. All the rest of the neurons are hidden form view.
Figure 3.3 THREE PECEPTRON FOR IMAGE COMPRESSION
9
As the figure above shows, the neurons are grouped into layers. The input layer
consists of neurons that receive input form the external environment. The output layer
consists of neurons that communicate the output of the system to the user or external
environment. There are usually a number of hidden layers between these two layers; the
figure above shows a simple structure with only one hidden layer.
When the input layer receives the input its neurons produce output, which becomes
input to the other layers of the system. The process continues until a certain condition is
satisfied or until layer is invoked and fires their output to the external environment.
To determine the number of hidden neurons the network should have to perform its
best, one are often left out to the method trial and error. If the hidden number of neurons are
increased too much an over fit occurs, that is the net will have problem to generalize. The
training set of data will be memorized, making the network useless on new data sets.
3.1.7 Communication and types of connections
Neurons are connected via a network of paths carrying the output of one neuron as
input to another neuron. These paths is normally unidirectional, there might however be a
two-way connection between two neurons, because there may be another path in reverse.
3.1.7.1 Inter-layer connections
There are different types of connections used between layers; these connections
between layers are called inter-layer connections.
Fully connected
Each neuron on the first layer is connected to every neuron on the second layer.
Partially connected.
A neuron of the first layer does not have to be connected to all neurons on the
second layer.
10
Feed forward.
The neurons on the first layer send their output to the neurons on the second layer,
but they do not receive any input back form the neuron on the second layer.
Bi-directional.
There is another set of connections carrying the output of the neurons of the second
layer into the neurons of the first layer.
Feed forward and bi-directional connections could be fully-or partially connected.
Hierarchical.
If a neural network has a hierarchical structure, the neurons of a lower layer may
only communicate with neurons on the next level of layer.
Resonance.
The layers have bi-directional connections, and they can continue sending messages
across the connections a number of times until a certain condition is achieved.
3.1.7.2 Intra-layer connections.
In more complex structures the neurons communicate among themselves within a
layer, this is known as intra-layer connections. There are two types of intra-layer
connections.
Recurrent.
The neurons within a layer are fully or partially connected to one another. After
these neurons receive input form another layer, they communicate their outputs with one
another a number of times before they are allowed to send their outputs to another layer.
Generally some conditions among the neurons of the layer should be achieved before they
communicate their outputs to another layer.
11
On-center/off surround.
A neuron within a layer has excitatory connections to itself and its immediate
neighbors, and has inhibitory connections to other neurons. One can imagine this type of
connection as a competitive gang of neurons. Each gang excites itself and its gang member
and inhibits all member of other gangs. After a few rounds of signal interchange, the
neurons with an active output value will win, and is allowed to update its and its gang
member’s weights. (The are two types of connections between two neurons, excitatory or
inhibitory. In the excitatory connection, the output of one neuron increases the action
potential of the neuron to which it is connected. When the connection type between two
neurons is inhibitory, then the output of the neuron sending a message would reduce the
activity or action potential of the receiving neuron. One causes the summing mechanism of
the next neuron to add while the other causes it to subtract. One excites while the other
inhibits.
3.1.8 Learning.
The brain basically learns from experience. Neural networks are sometimes called
machine-learning algorithms, because changing of its connection weights (training) causes
the network to learn the solution to a problem. The strength of connection between the
neurons is stored as a weight-value for the specific connection. The system learns new
knowledge but adjusting these connection weights.
The learning ability of a neural network is determined by its architecture and by the
algorithmic method chosen for training.
The training method usually consists of one of three schemes:
1. Unsupervised learning.
Used no external teacher and is based upon only local information. It is also referred
to as self-organization, in the sense that it self-organizes data presented to the network and
detects their emergent collective properties. Paradigms of unsupervised learning are
Hebbian learning and competitive learning. From Human Neurons to Artificial Neuron
12
other aspect of learning concerns the distinction or not of a separate phase, during which the
network is trained, and a subsequent operation phase. We say that a neural network learns
off-line if the learning phase and the operation phase are distinct. A neural network learns
on-line if it learns and operates at the same time. Usually, supervised learning is performed
off-line, whereas unsupervised learning is performed on-line. The hidden neurons must find
a way to organize themselves without help from the outside. In this approach, no sample
outputs are provided to the network against which it can measure its predictive performance
for a given vector of inputs. This is learning by doing.
2. Reinforcement learning
This incorporates an external teacher, so that each output unit is told what its desired
response to input signals ought to be. During the learning process global information may
be required. Paradigms of supervised learning include error-correction learning,
reinforcement learning and stochastic learning. An important issue concerning supervised
learning is the problem of error convergence, i.e. the minimization of error between the
desired and computed unit values. The aim is to determine a set of weights which minimizes
the error. One well-known method, which is common to many learning paradigms is the
least mean square (LMS) convergence.
This method works on reinforcement from the outside. The connections among the neurons
in the hidden layer are randomly arranged, then reshuffled as the network is told how close
it is to solving the problem. Reinforcement learning is also called supervised learning,
because it requires a teacher. The teacher may be a training set of data or an observer who
grades the performance of the network results.
Both unsupervised and reinforcement suffers from relative slowness and inefficiency
relying on a random shuffling to find the proper connection weights.
3. Back propagation
This method is proven highly successful in training of multilayered neural nets. The
network is not just given reinforcement for how it is doing on a task. Information about
errors is also filtered back through the system and is used to adjust the connections between
the layers, thus improving performance. A form of supervised learning.
13
Off-line or On-line
One can categorize the learning methods into yet another group, off-line or on-line.
When the system uses input data to change its weights to learn the domain knowledge, the
system could be in training mode or learning mode. When the system is being used as a
decision aid to make recommendations, it is in the operation mode, this is also sometimes
called recall.
Off-line
In the off-line learning methods, once the systems enters into the operation mode, its
weights are fixed and do not change any more. Most of the networks are of the off-line
learning type.
On-line
In on-line or real time learning, when the system is in operating mode (recall), it
continues to learn while being used as a decision tool. This type of learning has a more
complex design structure.
3.1.9 Learning laws
There are a variety of learning laws, which are in common use. These laws are
mathematical algorithms used to update the connection weights. Most of these laws are
some sort of variation of the best known and oldest learning law, Hebb’s Rule. Man’s
understanding of how neural processing actually works is very limited. Learning is certainly
more complex than the simplification represented by the learning laws currently developed.
Research into different learning functions continues as new ideas routinely show up in trade
publications etc. a few of the major laws are given as an example below.
Hebb’ Rule
The first and the best known learning rule was introduced by Donald Hebb. The
description appeared in his book The organization of Behavior in 1949. This basic rule is:
If a neuron receives an input from another neuron, and if both are highly active
(mathematically have the same sign), the weight between the neurons should be
strengthened.
14
Hopfield Law
This law is similar to Hebb’s Rule with the exception that it specifies the magnitude
of the strengthening or weakening. It states, “if the desired output and the input are both
active or both inactive, increment the connection weight by the learning rate, otherwise
decrement the weight by the learning rate.” (Most learning functions have some provision
for a learning rate, or learning constant. Usually this term is positive and between zero and
one.)
The Delta Rule
The Delta Rule is a further variation of Hebb’s Rule, and it is one of the most
commonly used. This rule is based on the idea of continuously modifying the strengths of
the input connections to reduce the difference (the delta) between the desired output value
and the actual output of a neuron. This rule changes the connection weights in the way that
minimized the mean squared error of the network. The error is back propagated into
previous layers one layer at a time. The process of back-propagating the network errors
continues until the first layer is reached. The network type called Feed forward, Backpropagation derives its name from this method of computing the error term. This rule is also
referred to as the Windrow-Hoff Learning Rule and the Least Mean Square Learning Rule.
Kohonen’s Learning Law
This procedure, developed by Teuvo Kohonen, was inspired by learning in
biological systems. In this procedure, the neurons compete for the opportunity to learn, or
to update their weights. The processing neuron with the largest output is declared the winner
and has the capability of inhibiting its competitors as well as exciting its neighbors. Only
the winner is permitted output, and only the winner plus its neighbors are allowed to update
their connection weights.
The Kohonen rule does not require desired output. Therefore it is implemented in
the unsupervised methods of learning. Kohonen has used this rule combined with the oncenter/off-surround
intra-layer
connection
to
network,which has an unsupervised learning method.
15
create
the
self-organizing
neural
3.1.10 Applications of Neural Networks
The most common use for neural networks is to project what will most
likely happen. There are many areas where prediction can help in setting priorities. For
example, the emergency room at a hospital can be a hectic place; to know who need the
most critical help can enable a more successful operation. Basically, all organizations must
establish priorities, which govern the allocation of their resources. Neural networks have
been used as a mechanism of knowledge acquisition for expert system in stock market
forecasting with astonishingly accurate results. Neural networks have also been used for
bankruptcy prediction for credit card institutions.
Although one may apply neural network systems for interpretation, prediction
diagnosis, planning, monitoring, debugging, repair, instruction, and control, the most
successful applications of neural networks are in categorization and pattern recognition.
Such a system classifies the object under investigation (e.g. an illness, a pattern, a picture, a
chemical compound, a work, and the financial profile of a customer) as one of numerous
possible categories that, in return, may trigger the recommendation of an action (such as
treatment plan or a financial plan.
A company called Nestor, have used neural network for financial risk assessment for
mortgage insurance decision, categorizing the risk of loans as good or bad. Neural networks
has also been applied to convert text to speech, NET talk is one of the systems developed
for this purpose. Image processing and pattern recognition form an important area of neural
networks, probably one of the most actively research areas of neural networks.
Another area of research for application of neural networks is character recognition
and handwriting recognition. This area has use in banking, credit card processing and other
financial services, where reading and correctly recognizing on documents is of crucial
significance. The pattern recognition capability of neural networks has been used to read
handwriting in processing checks; and human must normally enter the amount into the
system. A system that could automate this task would expedite check processing and reduce
errors.
16
One of the best-known applications is the bomb detector installed in some U.S.
airports. This device called SNOOPE, determine the presence of certain compounds from
the chemical configurations of their components.
In a document from International Joint conference, one can find reports on using
neural networks in areas ranging from robotics, speech, signal processing, vision, character
recognition to musical composition, detection of heart malfunction and epilepsy, fish
detection and classification, optimization, and scheduling. Basically, most applications of
neural networks fall into the follwing five categories:
Prediction
Uses input values to predict some output e.g. pick the best stocks in the market,
predict weather, identify people with cancer risk.
Classification
Use input values to determine the classification e.g. is the input the letter A, is blob
of the video data a plane and what kind of plane is it.
Data association
Like classification but is also recognizes data that contains errors. E.g. not only
identify the character that were scanned but identify when the scanner is not working
properly.
Data Conceptualization
Analyze the inputs so that grouping relationships can be inferred. E.g. extract from
a database the names of those most likely to by a particular product.
Data Filtering
Smooth an input signal. e.g. take the noise out of a telephone signal.
3.2 IMAGE PROCESSING
17
The importance of visual communication has increased tremendously in the last few
decades. The progress in microelectronics and computer technology, together with the
creation of network operating with various channel capacities, is the bases of an
infrastructure for a new are of telecommunications. New applications are preparing a
revolution in the everyday life of our modern society. Communication based applications
include ISDN surveillance. Storage based audiovisual applications include Training,
Education, Entertainment, Advertising, Video mail and Document annotation. Essential for
the introduction of new communication services is low cost. Visual information is one of
the richest and most bandwidth consuming modes of communication.
The digital representation of raw video requires a large amount of data. The
transmission of this raw video data requires a large transmission bandwidth. To reduce the
transmission and storage requirements, the video must be handled in compressed formats.
To meet the requirements, the new applications, powerful data compression techniques are
needed to reduce the global bit rate drastically. Even in the presence of growing
communication channels offering increased bandwidth. The issue of quality is of prime
importance in most applications of compression. In fact, although most applications require
high compression ratios, this requirement is in general in conduction with desire for high
quality in the resulting pictures.
The standardization of video coding techniques has become a high priority because
only a standard can reduce the high cost of video compression codes and resolve the critical
problem of inter operability of equipment from different manufacturers. The existence of
the standards is often the trigger to the volume production of integrated (VLSI) necessary
for significant cost reductions. Bodies such as the international Standards Organization
(ISO) and International.
Telecommunication Union (ITU-T) today recommends the video compression
standards in practice.
18
3.2.1 Image processing
Digital image processing can be classified broadly into four areas:
I. Image Enhancement,
II. Image Restoration,
III. Image Coding,
IV. Image Understanding.
3.2.1.1 Image Enhancement
Image enhancement is the use of image processing algorithms to remove certain
types of distortion in an image. Removing noise, making the edge structures in the image
stand out, enhances the image or any other operations that makes the image to look better.
The most widely used algorithms for enhancement are based on pixel functions that
are known as window operations. A window operation performed on an image is nothing
more than the process of examining the pixels in a certain region of the image, called the
window region, and computing same type of mathematical function derived from the pixels
in the window.
3.2.1.2 Image Restoration
In image restoration, an image has been degraded in some manner and the objective
is to reduce or eliminate the degradation. The development of an image restoration system
depends on the type of degradation.
3.2.1.3 Image coding
The objective of image coding is to represent an image with as few bits as possible
preserving certain level of image quality and intelligibility acceptable for a given
19
application. Image coding can be used in reducing the bandwidth of a communication
channel; when an image needs to be retrieved.
3.2.1.4 Image Understating
The objective of image understating is to symbolically represent the contents of an
image. An application of image understanding includes computer vision and robotics.
Image understanding differs from the other three areas in one major respect. In
image enhancement, restoration and coding both the input and the output are images and
signal processing has been the backbone of many successful systems of these areas. In
image understanding the input is an image, but the output is symbolic representation of the
contents of the image. Successful development of the systems in this area involve not only
signal processing but also other disciplines such as Artificial intelligence.
3.2.2 Image Compression
Direct transmission of the video data requires a high-bit-rate (Bandwidth) channel.
When such a high bandwidth channel is unavailable or not economical, compression
techniques have to be used to reduce the bit rate and ideally maintain the same visual
quality. Similar arguments can be applied to storage media in which the concern is memory
space. Video sequence contain significant amount of redundancy within and between
frames. It is this redundancy frame. It is this redundancy that allows video sequences to be
compressed. Within each individual frame, the values of neighboring pixels are usually
close to one another. This spatial redundancy can be removed fro the image without
degrading the picture quality using “Intraframe” techniques.
Also, most of the information in a given frame may be present in adjacent frames.
This temporal redundancy can also be removed, in addition to the “within frame”
redundancy by “interframe” coding.
20
3.2.3 Principles of Image Compression
The principles of image compression are based on information theory. The amount
of information that a source produce is Entropy. The amount of information one receives
from a source is equivalent to the amount of the uncertainty that has been removed.
A source produces a sequence of variables from a given symbol set. For each
symbol, there is a product of the symbol probability and its logarithm. The entropy is a
negative summation of the products of all the symbols in a given symbol set.
Compression algorithms are methods that reduce the number of symbols used to
represent source information, therefore reducing the amount of space needed to store the
source information or the amount of time necessary to transmit it for a given channel
capacity. The mapping from the source symbols into fewer target symbols is referred to as
Compression and Vice-versa Decompression.
Image compression refers to the task of reducing the amount of data required to
store or transmit an image. At the system input, the image is encoded into its compressed
from by the image coder. The compressed image may then be subjected to further digital
processing, such as error control coding, encryption or multiplexing with other data sources,
before
being used to modulate the analog signal that is actually transmitted through the
channel or stored in a storage medium. At the system output, the image is processed step by
the step to undo each of the operations that were performed on it at the system input. At the
final step, the image is decoded into its original uncompressed form by the image decoder.
If the reconstructed image is identical to the original image the compression is said to be
lossless, otherwise, it is lossy.
21
3.2.4 Performance measurement of image Compression
There are three basic measurements for the IC algorithm.
1. Compression Efficiency
It is measured by compression ratio, which is defined as the ratio of the size (number
of Bits) of the original image data over the size of the compressed image data
2. Complexity
The number of data operations required performing bit encoding and decoding
processes measures complexity of an image compression algorithm. the data
operations include additions, subtractions, multiplications, division and shift
operations.
3. Distortion measurement (DM)
For a lossy compression algorithm, DM is used to measure how much information
has been lost when a reconstructed version of a digital image is produced from the
compressed data. The common distortion measure is the Mean-Square-Error of the
original data and the compressed data. The Single-to-Noise ration is also used to
measure the performance of lossy compression algorithm.
3.2.5 Compression Standards
Digital images and digital video are normally compressed in order to save space on
hard disks and to speed up transmission. There are presently several compression standards
used for network transmission of digital signals on a network. Data sent by a camera using
video standards contain still image mixed with data containing changes, so that unchanged
data (for instance the background) are not sent in every image. Consequently the frame rate
measured in frames per second (fps) is much grater.
22
3.2.6 Image compression techniques
Still images are simple and easy to send. However it is difficult to obtain single
images from a compressed video signal. The video signal uses a lesser data to send or store
a video image and it is not possible to reduce the frame rate using video compression.
Sending single images is easier when using a modem connection or anyway with a narrow
bandwidth.
Main compression standard for still
Main compression standards for video
image
signal
JPEG
M-JPEG (Motion.JPED)
Wavelet
H.261,263etc.
JPEG 2000
MPEG1
GIF
MPEG2
MPEG3
MPEG4
Table 3.1 compression standards
JPEG (Joint Photographic Expert Group)
Popular compression standard used exclusively for still images. Each image is
divided in 8 x 8 pixels; each block is then individually compressed. When using a very high
compression the 8 x 8 blocks can be actually seen in the image. Due to the compression
mechanism, the decompressed image is not the same image which has been compressed;
this because this standard has been designed considering the performance limits of human
eyes. The degree of detail losses can be varied by adjusting compression parameters. It can
store up to 16 million colors.
23
Wavelet
Wavelets are functions used in representing data or other functions. They analyze
the signal at different frequencies with different resolutions. Optimized standard for images
with amount of data with sharp discontinuities. Wavelet compression transforms the entire
image differently from JPEG and is more natural as if follows the shape of the objects in the
picture. It is necessary to use a special software for viewing, being this a non-standardized
compression method.
JPEG2000
Based on Wavelet technology. Rarely used.
GIF(Graphic Interchange Format).
Graphic format used widely with Web images. It is limited to 256 colors and is a
good standard for images which are not too complex. It is not recommended for network
cameras being the compression ration too limited.
M-JPEG (Motion –JPEG)
This is a not a separate. standard but rather a rapid flow of JPEG image that can be
viewed at a rate sufficient to give the illusion of motion. Each frame within the video is
stored as a complete image in JPEG format. Singe image do not interact among the selves.
Image are then displayed sequentially at a high frame rate. This method produces a high
quality video, but at a cost of large files.
24
H.261, H.263 etc.,
Standards approved by ITU (International Telecommunications Union). They are
designed for videoconference applications and produce images with a high.
DCT-Based Image Coding Standard
The idea of compressing an image is not new. The discovery of DCT in 1974 is an
important achievement for the research community working on image compression. The
DCT can be regarded as a discrete-time version of the Fourier-cosine series. It is a close
relative of DFT, a technique for converting a single into elementary frequency components.
Thus DCT can be computed with a Fast Fourier Transform (FFT) like algorithm in O (n log
n) operations. Unlike DFT, DCT is revalued and provides a better approximation of a single
with fewer coefficients. The DCT of a discrete signal x(n), n=0, 1,…,N-1 is defined as:
X(u ) 
r 1
2
 (2n  1)( m) 
a (u )  x (n ) cos 

n 0
N
2N

where C(u) = 0.707 for u = 0 and
= 1 otherwise.
In 1942 JPEG established the first international standard for still image compression
where the encoders and decoders are DCT-based. The JPEG standard specifies there modes
namely sequential, progressive, and hierarchical for lossy encoding, and one mode of
lossless encoding. The baseline JPEG CODER’, which is the sequential encoding in its
simplest form, is briefly discussed here. Fig.3.1 and 3.2 shows the key processing steps in
such as encoder and decoder for grayscale images. Color image compression can be
approximately regarded as compression of multiple grayscale images, which are either
compressed entirely one at a time, or are compressed by alternately interleaving 8 x 8
sample blocks from each in turn. In this article, we focus on grayscale images only.
25
The DCT-based encoder is essentially compression of a stream of 8 x8 blocks of
image samples. Each 8 x 8 block makes its way through each processing step, and yields
output in compressed form into the data stream. Because adjacent image pixels are highly
correlated , the ‘forward’ DCT (FDCT) processing step lays the foundation for achieving
data compression by concentrating most of the signal in the lower spatial frequencies have
zero or near –zero amplitude and need not be encoded. In principle, the DCT introduces no
loss to the source image samples; it merely transforms them to a domain in which they can
be more efficiently encoded.
After output from the FDCT, each of the 64 DCT coefficients is uniformly
quantization in conjunction with a carefully designed 64-element Quantization Table (QT).
At the decoder, the quartered values are multiplied by the corresponding QT elements to
recover the original unquantized values. After quantization, all of the quantized coefficients
are ordered into the “zigzag” sequence as shown in. this ordering helps to facilitate entropy
encoding by placing low frequency non-zero coefficients before high frequency
coefficients. The DC coefficient, which contains a significant fraction of the total image
energy, is differently encoded.
Entropy Coding (EC) achieves additional compression listlessly by encoding the
quantized DCT coefficients more compactly based on their statistical characteristics. The
JPEG
proposal specifies both Huffman coding and arithmetic coding. The baseline
sequential code uses Huffman coding, but codecs with both methods are specified for all
modes of operation. Arithmetic coding, though more complex, normally achieves 5-10%
better compression than Huffman coding.
26
4. IMAGE COMPRESSION WITH NEURAL NETWORKS:
Apart from the existing technology on image compression represented by series of
JPEG, MPEG, and H.26x standards, new technology such as neural networks and genetic
algorithms are being developed to explore the future of image coding. Successful
applications of neural networks to vector quantization have now become well established,
and other aspects of neural network involvement in this area are stepping up to play
significant roles in assisting with those traditional compression techniques. Existing
research can be summarized as follows:
1. Back-Propagation image Compression;
2. Hierarchical Back-Propagation Neural Network
3. Adaptive Back-Propagation Neural Network
4. Hebbian Learning Based Image Compression
5. Vector Quantization Neural Networks;
6. Predictive Coding Neural Networks.
4.1 Basic Back Propagation Neural Network:The neural network structure can be illustrated in Fig.4.1. There layers, one input
layer, one output layer and one hidden layer, are designed. Both input layer and output layer
are fully connected to the hidden layer. Compression is achieved by designing the value of
K, the number of neurons at the hidden layer, less than that of neurons at both input and
output layers. The input image is split up into blocks or vectors of 8x8, 4x4 or 16x16 pixels.
When the input vector is referred to as N-dimensional which connected to each neuron at
the hidden layer can be represented by {wjb j=1,2,… K and I=1,2.. N}, which can also be
described by a matrix of K x N. From the hidden layer to the output layer, the connections
can be represented by {wij;: 1< \ < N, 1< j< K} which is another weight matrix of N x K.
Image compression is achieved by training the network in such a way that the coupling
weights { wij }, scale the input vector of N-dimension into a narrow channel of K-dimension
(K<N) at the hidden layer and produce the optimum output value which makes the quadratic
error between input and output minimum. In accordance with the neural network structure,
the operation can be described as follows:
27
For encoding and
N
h j   w ji xi
i 1
1 j  K
for decoding.

K
xi   w' ji h j
j 1
Where
x i [0,1]
1 j  N
denotes the normalized pixel values for grey scale images
with grey levels [0,255]. The reason of using normalized pixel values is due to the fact that
neural networks can Operate more efficiently when both their inputs and outputs are limited
to a range of [0,1].
Figure 4.1 BACK – PROPAGATION NEURAL NETWORK
28
The above linear networks can also be designed into non-linear if a transfer function
such as sigmoid is added to the hidden layer and the output layer to scale the summation
down in the above equations.
With this basic back-propagation neural network, compression is conducted in two
phases: training and encoding. In the first phase, a set of image samples is designed to train
the network via back-propagation learning rule, which uses each input vector as the desired
output. This is equivalent to compressing the input into the narrow channel represented by
the hidden layer and then reconstructing the input from the hidden to the output layer.
The second phase simply involves the entropy coding of the state vector hj at the
hidden layer. In cases that adaptive training is conducted, the entropy coding of those
coupling weights is also required in order to catch up with some input characteristics that
are not encountered at the training stage. The entropy coding is normally designed as the
simple fixed length binary coding although many advanced variable length entropy-coding
algorithms are available.
This neural network development, in fact, is in the direction of K-L transform
technology, which actually provides the optimum solution for all linear narrow channel type
of image compression neural networks. Equations (1) and (2) are represented in matrix
form:
[h ]  [ w ]T [ x ]
for encoding and decoding.
[ x ]  [ W ' ][ h ]  [ W ' ][ w ]T [ x ]
The K-L transform maps input images into a new vector space where all the
coefficients in the new space is de-correlated. This means that the covariance matrix of the
new vectors is a diagonal matrix whose elements along the diagonal are eigen-values of the
29
covariance matrix of the original input vectors. Let ej and j, i=1, 2…n, be eigen-vectors and
eigen values of cx the covariance matrix for input vector x, and those corresponding eigen
values are arranged in a descending order so that
 > +1, for I=1,2,..n-1.
To extract the principal components, K eigen vectors corresponding to the K largest
eigen-values in cx. In addition, all eigen-vactors in [AK]are ordered in such a way that the
first row of {AK} is the eigen-vector corresponding to the smallest eigen-value. Hence, the
forward K-L transform or encoding can defined as:
[ y]  [AK ][ x]  [mx ]
and the inverse K-L transform or decoding can be defined as:
[ x ]  [A K ]T [ y]  [m x ]
where [mx] is the mean value of [x] and [ x ] represents the reconstructed vectors or image
blocks. Thus the mean square error between x and [ x ] is given by the following equation:
e m  E( x  x ) 2 
n
n
1 n
2
 (x g  x k )    i    i
j1
j k 1
m J 1
where the statistical mean value E{.} is approximated by the average value over all the
input vector samples which, in image coding, are all the non-overlapping blocks of 4x4 or
8x8 pixels.
Therefore, by selecting the K eigen-vectors associated with largest eigen-values to
run K-L transform over input pixels, the resulting errors between reconstructed image and
original one can be minimized due to the fact that the values of s decrease monotonically.
30
For the comparison between the equation pair (3-4) and the equation pair (5-6), it
can be concluded that the linear neural network reaches the optimum solution whenever the
following condition is satisfied:
[ W' ][ W]T  [A K ]T [A K ]
Under this circumstance, the neuron weights from input to hidden and from hidden
to output can be described respectively as follows:
[W' ]  [A K ][ U]1 ; [W]T  [U][A K ]T
Where [U] is an arbitrary K x K matrix and [U] [U]-1 gives an identity matrix of K x
K. Hence, it can be seen that the liner neural network can achieve the same compression
performance as that of K-L transform without necessarily obtaining its weight matrices
being equal to [AK]T and [AK].
4.2 Hierarchical Back-Propagation Neural Network
The basic back-propagation network can be further extended to construct a
hierarchical neural network by adding two more hidden layers into the existing network, in
which the three hidden layers are termed as combiner layer, compressor layer and
decomposer layer. The structure can be shown in Figure 4.2. The idea is to exploit
correlation between pixels by inner hidden layer and to exploit correlation between blocks
of pixels by outer hidden layers. From input layer to combiner layer and decombiner layer
to output layer, local connections are designed, which has the same effect as M fully,
connected neural sub-networks.
31
Figure 4.2 HIERARCHICAL NEURAL NETWORK STRUCTURE
Training such a neural network can be conducted in terms of : (i) Outer Loop Neural
Network 9OLNN) Training; (ii) Inner Loop Neural Network (ILNN) Training; and (iii)
Coupling weight allocation for the Overall Neural Network.
4.3 Adaptive Back-Propagation Neural Network
Adaptive back-propagation neural network is designed to make the neural network
compression adaptive to the content of input image. The general structure for a typical
adaptive scheme can be illustrated in Fig. 4.3, in which a group of neural networks with
increasing number of hidden neurons (hmin, hmax), is designed. The basic idea is to classify
the input image blocks in age blocks into a few sub-sets with different features according to
their complexity measurement. A fine tuned neural network then compresses each sub-set.
32
Figure 4.3 ADAPTIVE NEURAL NETWORK STRUCTURE
Training of such a neural network can be designed as : (a) parallel training (b) serial
training; and (c) activity based training;
The parallel training scheme applies the complete training set simultaneously to all
neural networks and use S/N (signal-to-noise) ratio to roughly classify the image blocks into
the same number of sub-sets as the of neural networks. After this initial coarse classification
is completed, each neural network is then further trained by its corresponding refined subset of training blocks.
Serial training involves an adaptive searching process to build up the necessary
number of neural networks to accommodate the different patterns embedded inside the
training images. Starting with a neural network with predefined minimum number of hidden
neurons, hmin, the neural network is roughly trained by all the image blocks. The S/N ratio,
further training is started to the next neural network with the number of hidden neurons
increased and the corresponding threshold readjusted for further classification. This process
is repeated until the whole training set is classified into a maximum number of sub-sets
corresponding to the same number of neural networks established.
33
In the next two training schemes, extra two parameter, activity A(P j) and four
directions are defined to classify the training set rather than using the neural networks.
Hence the back propagation training of each neural network can be completed in one phase
by its appropriate sub-set.
The so-called activity of the block is defined as:
A(Pi )   A j (Pi (i, j))
even i , j
and
1
1
A(Pi (i, j))    (p i (i, j)  Pi (i  r, j  s)) 2
r  1 s  1
where AP(Pi(I,j)) is the activity of each pixel which concerns its neighboring 8 pixels as r
and s vary from –1 to +1 in equation (11).
Prior to training, all image blocks are classified into four classes according to their
activity values, which are, identified as very low, low, high and very high activities. Hence
four neural networks are designed with increasing number of hidden neurons to compress
the four different sub-sets of input images after the training phase is completed.
On top of the high activity parameter, further feature extraction technique is applied
by considering four main directions presented in image details, i.e., horizontal, vertical and
the two diagonal directions. These preferential direction features can be evaluated by
calculating the values of mean squared differences among neighboring pixels along the four
directions.
For the image patterns classified as high activity, further four neural network
corresponding to the above directions are added to refine their structure and tune their
learning processes to the preferential orientations of the input. Hence the overall neural
network system is designed to have six neural networks among which two correspond to
34
low activity and medium activity sub-sets and other four networks correspond to the high
activity and four direction classifications.
4.4 Hebbian Learning Based Image Compression
While the back-propagation based narrow-channel neural network aim at achieving
compression upper bounded by K-L transform, a number of Hebbian learning rules have
been developed to address the issue how the principal components can be directly extracted
from input image blocks to achieve image data compression. The general neural networks
structural consists of one input layer and one output layer. Hebbian learning rule comes
from hebb’s postulation that if two neurons are very active at the same time which is
illustratyed by the high values of both its output and one of its inputs, the strength of the
connection between the two neurons will grow or increase. Hence, for the output values
expressed as [h]=[W]T[x], the learning rule can be described as:
Wi ( t  1) 
W ( t )  h 1 ( t ) X ( t )
Wi ( t )  h 1 ( t )X( t )
where, Wi(t+1) = {Wi1, Wi2,….WiN}- the ith new coupling weight vector in the next cycle
(t+1); 1 < I < M and M is the number of output neurons.
- learning rate; hi(t)- ith output value; X(t)-input vector, corresponding to each
individual image block.
11-Euclidean norm used to normalize the updated weights and make the learning
stable.
From the basic learning rule, a number of variations have been developed in the
existing research.
35
4.5 Vector Quantization Neural Networks
Since neural networks are capable of learning from input information and optimizing
itself to obtain the appropriate environment for a wide range of tasks, a family of learning
algorithms have been developed for vector quantization. The input vector is constructed
from a K-dimensional space. M neurons are designed to compute the vector quantization
code-book in which each neuron relates to one code-word vitas compling weights. The
coupling weights. The coupling weight, {Wij}, associated with the I’th neuron is eventually
trained to represent the code-word ci in the code-book. As the neural network is being
trained, all the coupling weights will be optimized to represent the best possible partition of
all the input vectors. To train the network, a group of image samples known to both encoder
and decoder is often designated as the training set, and the first M input vectors of the
training data set are normally used to initialize all the neurons. With this general structure,
various learning algorithms have been designed and developed such as Kohone’s selforganising feature mapping, competitive learning, frequency sensitive competitive learning
fuzzy competitive learning, general learning, and distortion equalized fuzzy competitive
learning and PVQ (predictive VQ) neural networks.
Let Wi(t) be the weight vector of the I’th neuron at the I’th iteration, the basic
competitive learning algorithm can be summarized as follows:
1 d ( x , Wi ( t )) min d ( xW j ( t ))
0
otherwise
zi  {
Wi (t  1)  Wi (t )  (x  Wi (t))zi
where d(x, Wi(t)) is the distance in L2 metric between input vector x and the coupling
weight vector Wi(t)= { wi1,wi2….Wik}; K=p x p ;  is the leering rate, and z i is its output.
A so called under utilization problem occurs in competitive learning which means
some of the neurons are left out of the learning process and never win the competition.
Various schemes are developed to tackle this problem. Kohonen self-organising neural
36
network overcomes the problem by updating the winning neuron as well as those in its
neighborhood.
Frequency sensitive competitive learning algorithm address the problem by keeping
a record of how frequent each neuron is the winner to maintain that all neurons is the
network are updated an approximately equal number of times. To implement this scheme,
the distance is ,modified to include the total number of times that the neuron I is the winner.
The modified distance measurement is defined as:
d(x, w(t)i )  d(x, Wi (t)) xui (t)
Where ui(t) is the total number of winning times for neuron I up to the t’th training
cycle. Hence, the more the I’th neuron wins the competition, the greater its distance from
the next input vector. Thus, the change of winning the competition diminishes. This way of
tackling the under-utilization problem does not provide interactive solutions in optimizing
the code-book.
Around the competitive learning scheme, fuzzy membership functions are
introduced to control the transition from soft to crisp decision during the code-boo design
process. The essential idea is that one input vector is assigned to a cluster only to a certain
extent rather than either ‘in’ or ‘out’. The fuzzy assignment is useful particularly at earlier
training stages, which guarantees that all input vectors are included in the formation of new
code-book represented by all the neuron coupling weights. Representative examples
included direct fuzzy competitive learning, fuzzy algorithms for learning vector
quantization and distortion equalized fuzzy competitive learning algorithm etc.
4.6 Predictive Coding Neural Networks
Predictive coding has been proved a powerful technique in de-correlating input data
for speech compression and image compression where a high degree of correlation is
embedded among neighboring data samples. Although general predictive coding is
37
classified into various models such as AR and ARMA etc., auto-regressive model (AR) has
been successfully applied to image compression. Hence, predictive coding in terms of
applications in image compression can be further classified into linear and non-linear AR
models. Conventional technology provides a mature environment and well developed theory
for predictive coding which is represented by LPC (linear predictive coding) PCM (pulse
code modulation), DPCM (delta PCM) or their modified variations. Non-linear predictive
coding, however, is very limited due to the difficulties involved in optimizing the
coefficients. extraction to obtain the best possible predictive values. Under this
circumstance, neural network provides a very promising approach in optimizing non-linear
predictive coding.
With liner AR model, predictive coding can be described by the following equation:
N
X n   iX n i  Vn  p  v n
i 0
where p represents the predictive value for the pixel Xn which is to be encoded in the next
step. its neighboring pixels, Xn-1, Xn-2 ….Xn-N, are used by the linear model to produce the
predictive value. vn stands for the errors between the input pixel and its predictive value. vn
can also be modeled by a set of zero-mean independent and identically distributed random
variables.
Based on the above liner AR model, a multi-layer perception neural network can be
constructed to achieve the design of its corresponding non-liner predictor as shown in
Fig.1.4. For the pixel Xn which is to be predicted, its N neighboring pixels obtained from its
predictive pattern are arranged into one dimensional input vector x{Xn-1 , Xn-2….Xn-N} for the
neural network. A hidden layer is designed to carry out back propagation learning for
training the neural network. The output of each neuron, say the j’th neuron, can be derived
from the equation given below:
N
h f  f ()  f (  WjiX n 1 )
i 0
where f(v) 
1
1 e  r
is a sigmoid transfer function.
38
Figure 4.4 PREDICTIVE NEURAL NETWORK I
To predict those drastically changing features inside image such as edges, contours
etc., and high-order terms are added to improve the predictive performance. This
corresponds to a non-linear AR model expressed as follows:
X n   a i X n i    a jX n 1X n  j     a jk X n i X n  jX n  k
i
i
j
i
j k
Hence , another so called functional link type neural network can be designed to
implement this type of non-liner AR model with high-order terms. The structure of the
network is illustrated in Fig 1.5. it contains only two layer of neurons one for input and the
other for output. Coupling weights, {wi}, between input layer and output layer are trained
towards minimizing the residual energy which is defined as:
 en  ( X n  x n )
RE  n
2
n
where X n is the predictive value for the pixel Xn
39
Figure 4.5 PREDICTIVE NEURAL NETWORK II
40
5. PROPSED IMGAE COMPRESSION USING NEURAL
NETWORK
A two layer feed-forward neural network and the Levenberg Marquardt algorithm
was considered. Image coding using a feed forward neural network consists of the following
steps:
An image, F, is divided into rxc blocks of pixels. Each block is then scanned to form
a input vector x (n) of size p=rxc
It is assumed that the hidden layer of the layer network consists of L neurons each
with P synapses, and it is characterized by an appropriately selected weight matrix Wh.
All N blocks of the original image is passed through the hidden layer to obtain the
hidden signals, h(n), which represent encoded input image blocks, x(n) If L<P such coding
delivers image compression.
It is assumed that the output layer consists of m=p=rxc neurons, each with L
synapses. Let Wy be an appropriately selected output weight matrix. All N hidden vector
h(n), representing an encoded image H, are passed through the output layer to obtain the
output signal, y(n). The output signals are reassembled into p=rxc image blocks to obtain a
reconstructed image, Fr.
There are two error matrices that are used to compare the various image
compression techniques. They are Mean Square Error (MSE) and the Peak Signal-to-Noise
Ratio (PSNR). The MSE is the cumulative squared error between the compressed and the
original image whereas PSNR is the measure of the peak error.
MSE  I
m
n
[ I ( x, y)  I ' x, y ]
MN 
y 1 x 1
41
2
………………5.1
The quality of image coding is typically assessed by the Peak signal-to-noise ratio (PSNR)
defined as
PSNR = 20 log 10 [255/sqrt(MSE)]………………5.2
Training is conducted for a representative class of images using the Levenberg
Marquardt algorithm.
Once the weight matrices have been appropriately selected, any image can be
quickly encoded using the Wh matrix, and then decoded (reconstructed) using the W y
matrix.
Levenberg Marquardt Algorithm
The Levenberg Marquardt algorithm is a variation of Newton’s method that was
designed for minimizing functions that are sums of squares of other nonlinear functions.
This is very well suited to neural network training where the performance index is the mean
squared error.
Basic Algorithm:
Consider the form of Newton’s method where the performance index is sum of
squares. The Newton’s method for optimizing a performance index F(x) is
Xk+1= Xk – Ak –1 gk, …………..5.3
Where Ak = 2 F(x) and gk =F(x);
It is assume d that F (x) is a sum of squares function:
n
F ( x)   vi2 x   V T x  vx  ………….5.4
r 1
Then the jth element of the gradient will would be
n
F x j  F x  / S j  2 Vi x vi x  / x j ……………5.5
i 1
42
The gradient can be written in matrix form:
F(x) = 2JT (x) v(x) ,………………………………………..5.6
Where J(x) is the Jacobian matrix.
Next the Hessian matrix is considered. The k.j element of Hessian matrix would be
 F x
2
kj
  2 F x / xk x j
The Hessian matrix can then be expressed in matrix form:
2 F(x) = 2 JT (x) J(x) + 2 S(x)
Where
n
S x    Vi x . 2 vi x 
i 1
Assuming that S(x) is small, the Hessian matrix is approximated as
2 F(x)  2 JT(x) J(x)
Substituting the values of 2 F(x) &  F(x), we obtain the Gauss-Newton method:
Xk+1 = Xk – [JT (Xk) J ( Xk)]-1 JT (Xk) V(Xk)
One problem with the Gauss-Newton over the standard Newton’s method is that the
matrix H=JTJ may not be invertible. This can be overcome by using the following
modification to the approximate Hessian matrix:
G = H +  I.
This leads to Levenberg –Marquardt algorithm
Xk+1 = Xk – [JT (Xk) J ( Xk)+kI]-1 JT (Xk) V(Xk)
Or
 Xk =- [JT (Xk) J ( Xk)+kI]-1 JT (Xk) V(Xk)
this algorithm has the very useful feature that as k is increased it approaches the
steepest descent algorithm with small learning rate.
43
The iterations of the Levenberg- Marquardt back propagation algorithm (LMBP) can
be summarized as follows:
1. Present all inputs to the network and compute the corresponding network outputs
and the errors eq = tq – a Mq. Compute the sum of squared errors over all inputs. F(x).
F (x) = eq T eq =(ej.q )2 = (vi)2
2.
Compute the Jacobian matrix. Calculate the sensitivities with the recurrence relation.
Augment the individual matrices into the Margquardt sensitivities.
3. Obtain  Xk.
4. Recompute the sum of squared errors using xk +  Xk.. If this new sum of squares is
smaller than that computed in step 1 then divide  by v, let Xk+1 = Xk + Xk and go
back to step 1. if the sum of squares is not reduced, then multiply  by v and go back to
step 3.
Training Procedure
During training procedure data from a representative image or a class of images is
encoded into a structure of the hidden and output weight matrices.
It is assumed that an image, F, used in training of size Rx C and consists of
rxc blocks.
1. The first step is to convert a block matrix F into a matrix X of size P x N containing
training vectors, x(n), formed from image blocks.
That is:
P= r.c and p.N = R.C
2. The target data is made equal to the data, that is:
D=X
3. The network is then trained until the mean squared error, MSE, is sufficiently small.
44
The matrices Wh and Wy will be subsequently used in the image encoding and
decoding steps.
Image Encoding
The hidden-half of the two-layer network is used to encode images. The Encoding
procedure can be described as follows:
FX, H= (Wh. X)
Where X is an encoded image of F.
Image Decoding
The image is decoded (reconstructed) using the output-half the two-layer network.
The decoding procedure is described as follows:
Y = (Wy. H), YF
These steps were performed using MATLAB (Matrix laboratory). The compression
so obtained was though offline learning. In the off-line learning methods, once the systems
enters into the operation mode, its weights are fixed and do not change any more.
45
LEVENBERG-MARQUARDT ALGORITHM
46
6. IMPLEMENTATION OF IMAGE COMPRESSION USING
MATLAB
A sample image was taken as the input to be compressed. At each instance (1:64,
1:64) pixels were considered. Now using blkM2vc function the matrix was arranged column
wise. The target was made equal to the input and the matrix was scaled down. The network
was developed using 4 neurons in the first layer (compression) and 16 neurons in the second
layer (decompression).
The first layer used tangent sigmoid function and the linear function in the second layer.
Then the training was performed using Levenberg-Marquardt Algorithm. The training goal
was set to le-3 and epochs were used. The following functions were used for this purpose
net.traniparam.goal = le-3,
net.trainparam.epochs=100.
After this the network was simulated and its output was plotted against the target.
The function used for this purpose was:
A = sim (net_s,in3);
Rearranging of the matrix was done using function vc2blkM followed by scaling up.
47
MATLAB CODE
comp.m
I = imread('J:\matlab\toolbox\images\imdemos\autumn.tif');
size(I)
image(I)
in1=I(1:64,1:64);
figure(1)
r=4;
imshow(in1)
in2=blkM2vc(in1,[r r]);
in3=in2/255;
in4=in3;
net_c=newff(minmax(in3),[4 16],{'tansig','purelin'},'trainlm');
net.trainparam.show=5;
net.trainparam.epochs=300;
net.trainparam.goal=1e-5;
[net_s,tr]=train(net_c,in3,in4);
a=sim(net_s,in3);
fr=vc2blkM(a,r,64);
asc=fr*255;
az=uint8(asc)
figure(2)
imshow(az)
disp('training is achieved');
disp('consider a new image to be compressed')
II = imread('J:\matlab\toolbox\images\imdemos\fabric.png');
a1=II(1:64,1:64);
figure(5)
imshow(a1)
a2=blkM2vc(a1,[r r]);
a3=a2/255;
out=sim(net_s,a3);
a4=vc2blkM(out,r,64);
a5=a4*255;
a6=uint8(a5);
figure(6)
imshow(a6);
\
48
blkM2vc.m
function vc = blkM2vc(M, blkS)
[rr cc] = size(M) ;
r = blkS(1) ;
c = blkS(2) ;
if (rem(rr, r) ~= 0) | (rem(cc, c) ~= 0)
error('blocks do not fit into matrix')
end
nr = rr/r ;
nc = cc/c ;
rc = r*c ;
vc = zeros(rc, nr*nc);
for ii = 0:nr-1
vc(:,(1:nc)+ii*nc) = reshape(M((1:r)+ii*r,:),rc,nc);
end
49
vc2blkM.m
function M = vc2blkM(vc, r, rM)
%vc2blkM Reshaping a matrix vc of rc by 1 vectors into a block-matrix M of rM by cM
size
% Each rc-element column of vc is converted into a r by c block of a matrix M and placed
as a block-row element
[rc nb] = size(vc) ;
disp(rc);
disp(nb);
pxls = rc*nb ;
if ( (rem(pxls, rM) ~= 0) | (rem(rM, r) ~= 0) )
error('incorrect number of rows of the matrix')
end
cM = pxls/rM ;
if ( (rem(rc, r) ~= 0) | (rem(nb*r, rM) ~= 0) )
error('incorrect block size')
end
c = rc/r ;
xM = zeros(r, nb*c);
xM(:) = vc ;
nrb = rM/r ;
M = zeros(rM, cM);
for ii = 0:nrb-1
M((1:r)+ii*r, :) = xM(:, (1:cM)+ii*cM) ;
end
50
Functions used in MATLAB program:
newff
Create a feed-forward back propagation network
Syntax
net = newff
net = newff(PR,[S1 S2...SNl],{TF1 TF2...TFNl},BTF,BLF,PF)
Description
net = newff creates a new network with a dialog box.
newff(PR,[S1 S2...SNl],{TF1 TF2...TFNl},BTF,BLF,PF) takes,
PR -- R x 2 matrix of min and max values for R input elements
Si -- Size of ith layer, for Nl layers
TFi -- Transfer function of ith layer, default = 'tansig'
BTF -- Backpropagation network training function, default = 'traingdx'
BLF -- Backpropagation weight/bias learning function, default = 'learngdm'
PF -- Performance function, default = 'mse'
and returns an N layer feed-forward backprop network.
The transfer functions TFi can be any differentiable transfer function such as tansig, logsig,
or purelin.
The training function BTF can be any of the backprop training functions such as trainlm,
trainbfg, trainrp, traingd, etc. Caution: trainlm is the default training function because it is
very fast, but it requires a lot of memory to run. If you get an "out-of-memory" error when
training try doing one of these: Slow trainlm training, but reduce memory requirements by
setting net.trainParam.mem_reduc to 2 or more. (See help trainlm.) Use trainbfg, which is
slower but more memory-efficient than trainlm. Use trainrp, which is slower but more
memory-efficient than trainbfg.
The learning function BLF can be either of the backpropagation learning functions such as
learngd or learngdm.
The performance function can be any of the differentiable performance functions such as
mse or msereg.
Algorithm
51
Feed-forward networks consist of Nl layers using the dotprod weight function, netsum net
input function, and the specified transfer functions.
The first layer has weights coming from the input. Each subsequent layer has a weight
coming from the previous layer. All layers have biases. The last layer is the network output.
Each layer's weights and biases are initialized with initnw.
Adaption is done with trains, which updates weights with the specified learning function.
Training is done with the specified training function. Performance is measured according to
the specified performance function.
trainParam
This property defines the parameters and values of the current training function.
net.trainParam
The fields of this property depend on the current training function (net.trainFcn). Evaluate
the above reference to see the fields of the current training function.
train
Train a neural network
Syntax
[net,tr,Y,E,Pf,Af] = train(net,P,T,Pi,Ai,VV,TV)
Description
train trains a network net according to net.trainFcn and net.trainParam.
train(NET,P,T,Pi,Ai,VV,TV) takes,
net -- Neural Network
P -- Network inputs
T -- Network targets, default = zeros
Pi -- Initial input delay conditions, default = zeros
Ai -- Initial layer delay conditions, default = zeros
VV -- Structure of validation vectors, default = []
TV -- Structure of test vectors, default = []
52
and returns,
net -- New network
TR -- Training record (epoch and perf)
Y -- Network outputs
E -- Network errors.
Pf -- Final input delay conditions
Af -- Final layer delay conditions
sim
Simulate closed-loop/open-loop response to arbitrary reference and disturbance signals
Syntax
sim(MPCobj,T,r)
sim(MPCobj,T,r,v)
sim(MPCobj,T,r,SimOptions) or sim(MPCobj,T,r,v,SimOptions)
[y,t,u,xp,xmpc,SimOptions]=sim(MPCobj,T,...)
Description
The purpose of sim is to simulate the MPC controller in closed-loop with a linear timeinvariant model, which, by default, is the plant model contained in MPCobj.Model.Plant. As
an alternative sim can simulate the open-loop behavior of the model of the plant, or the
closed-loop behavior in the presence of a model mismatch between the prediction plant
model and the model of the process generating the output data.
sim(MPCobj,T,r) simulates the closed-loop system formed by the plant model specified in
MPCobj.Model.Plant and by the MPC controller specified by the MPC object MPCobj, and
plots the simulation results. T is the number of simulation steps. r is the reference signal
array with as many columns as the number of output variables.
sim(MPCobj,T,r,v) also specifies the measured disturbance signal v, that has as many
columns as the number of measured disturbances.
uint8
53
Return the stored integer value of a fi object as a built-in uint8
Syntax
Description
The stored integer is the raw binary number, in which the binary point is assumed to be at
the far right of the word.
uint8(a) returns the stored integer value of fi object a as a built-in uint8. If the stored integer
word length is too big for a uint8, or if the stored integer is signed, the returned value
saturates to a uint8
54
MATLAB Results:
6.1 Training procedure till the MSE becomes less than e-5
55
ORIGINAL IMAGE
COMPRESSED IMAGE
DECOMPRESSED IMAGE
56
7. CONCLUSION
The project “IMAGE COMPRESSION AND DECOMPRESSION
USING NEURAL NETWORKS” has been successfully programmed using MATLAB and
tested.
The computing world has a lot to gain from neural networks. Their ability to learn
by example makes them very flexible and powerful. Furthermore there is no need to devise
an algorithm in order to perform a specific task; i.e. there is no need to understand the
internal mechanisms of that task. They are also very well suited for real time systems
because of their fast response and computational times which are due to their parallel
architecture. Neural networks also contribute to other areas of research such as neurology
and psychology. They are regularly used to model parts of living organisms and to
investigate the internal mechanisms of the brain. Perhaps the most exciting aspect of neural
networks is the possibility that some day 'conscious' networks might be produced. There is a
number of scientists arguing that consciousness is a 'mechanical' property and that
'conscious' neural networks are a realistic possibility.
Even though neural networks have a huge potential we will only get the best of them
when they are integrated with computing, AI, fuzzy logic and related subjects. Neural
networks are performing successfully where other methods do not, recognizing and
matching complicated, vague, or incomplete patterns.
57
FUTURE SCOPE:
Artificial Neural Networks is currently a hot research area in image processing
and it is believed that they will receive extensive application to various fields in the next
few years. In contrast with the other technologies, neural networks can be used in every
field such as medicine, marketing, industrial process control etc. This makes our application
flexible and can be extended to any field of interest. Integrated with the other fields like
Artificial intelligence, fuzzy logic neural networks have a huge potential to perform. Neural
networks have been applied in solving a wide variety of problems. It is an emerging and fast
growing field and there is a huge scope for research and development.
58
BIBILIOGRAPHY
1. H.Demuth and M. Beale. Neural Network TOOLBOX User’s Guide. For use with
MATLAB. The Math Works lne.. (1998)
2. Henrieue Ossoinig. Erwn Reisinger., Reinhold Weiss Design and FPGAImplementation of a Neural Network
3. Kiamal
Z.
Pekmestzi.
Multiplexer-Based
Array
Multipliers.
IEEE
TRANSACTIONS ON COMPUTERS, VOL, 48 , JANUARY( 1998)
4. Hennessy. J.L and Patterson. D.A. Compter Architecture: A quantitative Approach.
Morgan Kaufmanns, (1990)
5. J. Jiang. Image compression with neural networks. Signal Processing: Image
Communication 14 (1999) 737-760
59