Download Lossless Image Compression using P-tree

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interferometric synthetic-aperture radar wikipedia , lookup

UNAVCO wikipedia , lookup

Quill (satellite) wikipedia , lookup

Transcript
Lossless, Data-Mining ready Image Compression using P-tree1
Mohammad Kabir Hossain, Fazle Rabbi, and William Perrizo*
Department of Computer Science and Engineering, North South University, Dhaka 1213, Bangladesh
*Department of Computer Science, North Dakota State University, Fargo, ND 58105, USA
Emails: [email protected], [email protected], [email protected]
Abstract: Application areas like remote sensing,
geographical information system, medical imaging etc.
produce and process images of colossal size which require
a large amount of storage space 1or high bandwidth for
communication in its original form. Image compression
techniques can be highly effective for such applications
[1]. Lossless image compression techniques retain
original information in a compact form whereas lossy
compression techniques don't and at the same time might
introduce visual artifacts. In this paper, a new lossless
image compression technique is proposed which exploits
the benefits of Peano count tree, a spatial data stucture
providing an efficient data mining ready representation of
data. Application areas in data mining would be
especially benefited from such compression scheme as
complete reconstruction of the original image is possible
and the compressed data itself is data mining ready.
Keywords: P-tree, Peano ordering, Z-ordering, Lossless
image compression, Linearization, Spatial data mining.
1. INTRODUCTION
Image compression plays a very vital role in applications
like geographical information system, video-conferencing,
satellite imaging, medical imaging, facsimile transmission
etc. which depend on the efficient manipulation, storage
and transmission of binary, gray scale or color images [2].
There are two categories of image compression techniques:
lossless and lossy. In lossless compression scheme the
original image can be reconstructed, whereas in lossy
scheme only a close approximation of the original image
can be obtained. In this context, if data mining comes into
play its part then we are left with only choice of the twolossless image compression. In this paper we are proposing
a new technique for image compression using P-tree.
Image data when encoded as P-tree sturcture gives a
lossless compressed image, which at the same time can
readily be used in data mining if required. The rest of the
paper is organized as follows. In section 2, we reviewed
and genaralized lossless compression scheme. In section 3
a glimpse of row-major scan linearization is given. Section
4 describes P-tree stucture and its variations. Section 5 and
section 6 discuss the encoding scheme and experimental
results respectively. Section 7 argues why this compression
scheme should be adopted. Conclusion is given in section
8.
1
P-tree technology is patent pending at North Dakota State
University
2. REVIEW OF LOSSLESS COMPRESSION
There are two types of data redundancies, which can be
exploited by lossless image compression: coding
redundancy and inter-pixel redundancy. Elimination of
them leads to more compact information representation.
Usually, practical lossless image compression systems
combine these techniques to achieve better compression
ratios. Typical compression system consists of two parts:
encoder and decoder. The input image f(x,y) is fed into
encoder producing a set of symbols g(f(x,y)) describing the
image. Then this set of symbols when required is fed into
decoder, where, a reconstructed output image is generated.
Since we are dealing with lossless compression, f(x,y) is an
exact replica of. The encoder is responsible for reducing or
eliminating any coding or inter-pixel redundancies
presented in the image. This is done by two independent
operations, each one dealing with certain type of
redundancy. In the first stage of encoding inter-pixel
redundancies are reduced or eliminated by mapper. The
resulting data still contains coding redundancies, which are
reduced or eliminated in the second stage by symbol
encoder. Generally, mapper and encoder implement two
independent algorithms, and to operate in one system they
need to agree on the format of data interchanged between
them. The decoder works in reverse order, applying firstly
symbol decoding (inverse operation to symbol encoding)
and then inverse mapper to get the original image f(x,y).
Our proposed technique works much the same way.
Construction of p-tree from image data resembles mapping
and storing it in file resembles encoding.
(a)
data
Mapper
mapped
data
Encoder
encoded
data
Figure 1(a). Encoding
(b)
encoded
data
decoded
dat data
Decoder
Inverse
Mapper
original
data
Figure 3. 8-bit by 8-bit image and its p-tree
Figure 1(b). Decoding
3. LINEARIZATION
When compression schemes such as Huffman coding,
Arithmetic coding, LZW coding are used to compress twodimensional image, the image first must be converted into
a one-dimensional sequence. This conversion is reffered to
as linearization [2]. Row-major scan as depicted in figure 2
is one of the popular linearization schemes which the
proposed compression technique adopts.
In this example, 36 is the number of 1's in the entire image.
This root level is labeled level 0. The numbers 16, 7, 13,
and 0 found at next level (level 1) are the 1-bit count for
the four major quadrants in raster order, or Z order (upper
left, upper right, lower left and lower right). Since the first
and last level-1 quadrants are composed entirely of 1-bits
(called pure-1 quadrants), sub-trees are not needed, and
these branches terminate.
Similarly, quadrants composed entirely of 0-bits are called
pure-0 quadrants, which also cause termination of tree
branches.
This pattern is continued recursively using Peano, or Zordering (recursive raster ordering), of the four subquadrants at each new level. Eventually, every branch
terminates (since, at the "leaf" level, all quadrants are
pure). If we were to expand all sub-trees, including those
for pure quadrants, then the leaf sequence would be the
Peano-ordering of the image. Thus we use the name Peano
Count Tree. More discussion on P-tree can be found in [5].
Figure 2. Row-major scan linearization
But the concept of linearization is somewhat vague here.
Because, though scanning is performed in row-major
sequence, compression is still performed on twodimension. Thus, elimination of local redundancy or
similarity in neighboring pixels occurs along both
dimension. This happens due to P-tree structure, which
will be explained in section 4.
4. PROPOSED MAPPER
In this section, a relatively new data structure p-tree is
introduced which has been used as the mapper of image
data to be encoded.
4.1.1 Peano Mask Tree (PM-tree)
A variation of the P-tree data structure, the Peano Mask
Tree (PM-tree), is similar structure in which masks rather
than counts are used. In a PM-tree, we use a three-value
logic to represent pure-1, pure-0 and mixed quadrants (A 1
denotes pure-1; 0 denotes pure-0; and m denotes mixed).
The PM-tree for the previous example is also given in
Figure 4. We can easily construct the original P-tree from
its PM-tree by calculating counts from leaves to the root in
a bottom-up fashion [6]. Since PM-tre is just and
alternative implementation for a Peano Count Tree, for
simplicity we will use the same term, "P-tree," for Peano
Mask Tree.
4.1 Introducing P-tree
A P-tree is a quadrant-based tree. It recursively divides the
entire image into quadrants and records the count of 1-bits
for each quadrant, thus forming a quadrant count tree. Ptrees are somewhat similar in construction to other data
structures in the literature (e.g. Quadtrees[3] and HHCodes
[4]).
For example, given an 8-row-8-column image of single
bits, its P-tree is as shown in Figure 3.
Figure 4. 8 by 8 image and its peano mask tree
5. ENCODING SCHEME
An image can be viewed as a two-dimensional array of
pixels. Associated with each pixel are various descriptive
attributes, called "bands" [7]. A typical RSI image contains
at least seven bands (Blue, Green, Red, NIR, MIR, TIR
and MIR2) while a TIFF or BMP image contains only
three bands (Blue, Green and Red). Each band has
intensity value in the range 0 to 255. Thus, for TIFF and
BMP images 24 bits are required per pixel.
Assume N bands are associated with each pixel. Each band
Bi (i =1,2,3, ..., N) is represented by a byte. For jth bit of ith
bands a bitwise row-major linearization is performed and a
two-dimensional array is generated. The array has nxn
dimension for n pixel image. That is, every j th bit of ith
band is selected from every pixel to construct the array.
The P-tree constucted over this array is known as basic Ptree Pi,j. Thus for each band, there are eight basic P-trees,
one for each bit position. An N band image has altogether
8N basic P-trees. As far as the encoding scheme is
concerned, the p-tree is not stored as a tree at all. Instead
each array is divided into quadrants recursively using the
same p-tree construction concept and stored in a file as
follows in depth-first order:
1. For mixed quadrant store binary value 10
2. For pure-1 quadrant store binary value 11
3. For pure-0 quadrant store binary value 00
It has been observed that storage of basic p-trees for the
first four bit positions of each band is much less than the
actual data, resulting in good compression. This happens
as usually in image data, neighborhood pixels have similar
properties. Close pixels share the same bit values for high
order bits. Low order bits because of precision difference
usually have different values. Basic p-tree Pi,j where j4
introduces more mixed quadrants and more often than not
recursive division of data into quadrants goes on till pure-1
or pure-0 quadrants are 1 bit long. In such cases, encoding
takes more storage than the actual bits. So, for less four
significant bit positions we do not generate any array or
basic p-trees. We store the bit values as they are in original
uncompressed image.
6. PROSPECTS OF THE COMPRESSION SCHEME
Basic p-trees can be constructed from the compressed file
trivially as the file maintains depth-first order in storing ptrees. Once basic p-trees have been created we get datamining ready structure that facilitates efficient data mining
tasks. Previous works have demonstrated that the p-tree
algebra can perform data mining techniques efficiently and
effectively. The p-tree based decision tree induction
method is significantly faster than existing classification
methods [8]. P-tree data structure allows computing the
Bayesian probability values efficiently. Bayesian
classification with P-trees has been used successfully on
remotely sensed image data to predict yield in precision
agriculture [9]. Experimental results showed that using ptree techniques in an efficient association rule-mining
algorithm, P-ARM has significant improvement compared
to FP-growth and Apriori algorithms [8].
8. CONCLUSION
We are knowledgeable of the fact that it can be argued that
proposed compression scheme doesn’t competitively
compress data like other successful lossless compression
schemes. But no other scheme has ever been proposed that
achieves the two following objectives at the same time:
1) Data compression
2) Data mining ready structure.
Our proposed p-tree based compression achieves both thus
attains an upper hand over other compression techniques.
REFERENCES
[1] Erickson BJ, Manduca A, Persons KR, et. al.
“Evaluation of irreversible compression of
digitized posterior-anterior chest radiographs”
J Digit Imaging 1997; 10(3): 97-102.
[2] B.C. Vemuri, S. Sahni, F.Chen, C. Kapoor, C.
Leonard, and J. Fitzsimmons "Lossless image
compression". Availabel at
http://citeseer.nj.nec.com/559352.html
[3] H. Samet, “Quadtree and related hierarchical
data structure”, ACM Computing Surveys,
16(2): 187-260, June 1984.
[4] HH-codes. Available at
http://www.statkart.no/nlhdb/iveher
/hhtext.htm, 03.10.2000
[5] W. Perrizo, Peano count tree lab notes,
Technical report NDSU-CSOR-TR-01-1,
Fargo, ND, 2001.
[6] M. K. Hossain and W. Perrizo, “Automatic
fingerprint identification system using p-tree”
Proceedings of 6th International Conference of Computer
and Information technology
[7] Qiang Ding, William Perrizo, “Cluster
analysis of spatial data using peano count
tree”, Proceedings of CATA2002, San
Francisco, USA, April 4-6, 2002.
[8] "Decision tree classification of spatial data
streams using peano count trees", Quang
Ding, Qin Ding and William Perrizo,
Proceedings of ACM Symposium on Applied
Computing (SAC' 02), Madrid, Spain, March
2002, pp. 413-417.
[9] Mohamed Hossain, “Bayesian Classification
using P-Tree”, Master of Science Thesis,
North Dakota State University, December
2001.