Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lossless, Data-Mining ready Image Compression using P-tree1 Mohammad Kabir Hossain, Fazle Rabbi, and William Perrizo* Department of Computer Science and Engineering, North South University, Dhaka 1213, Bangladesh *Department of Computer Science, North Dakota State University, Fargo, ND 58105, USA Emails: [email protected], [email protected], [email protected] Abstract: Application areas like remote sensing, geographical information system, medical imaging etc. produce and process images of colossal size which require a large amount of storage space 1or high bandwidth for communication in its original form. Image compression techniques can be highly effective for such applications [1]. Lossless image compression techniques retain original information in a compact form whereas lossy compression techniques don't and at the same time might introduce visual artifacts. In this paper, a new lossless image compression technique is proposed which exploits the benefits of Peano count tree, a spatial data stucture providing an efficient data mining ready representation of data. Application areas in data mining would be especially benefited from such compression scheme as complete reconstruction of the original image is possible and the compressed data itself is data mining ready. Keywords: P-tree, Peano ordering, Z-ordering, Lossless image compression, Linearization, Spatial data mining. 1. INTRODUCTION Image compression plays a very vital role in applications like geographical information system, video-conferencing, satellite imaging, medical imaging, facsimile transmission etc. which depend on the efficient manipulation, storage and transmission of binary, gray scale or color images [2]. There are two categories of image compression techniques: lossless and lossy. In lossless compression scheme the original image can be reconstructed, whereas in lossy scheme only a close approximation of the original image can be obtained. In this context, if data mining comes into play its part then we are left with only choice of the twolossless image compression. In this paper we are proposing a new technique for image compression using P-tree. Image data when encoded as P-tree sturcture gives a lossless compressed image, which at the same time can readily be used in data mining if required. The rest of the paper is organized as follows. In section 2, we reviewed and genaralized lossless compression scheme. In section 3 a glimpse of row-major scan linearization is given. Section 4 describes P-tree stucture and its variations. Section 5 and section 6 discuss the encoding scheme and experimental results respectively. Section 7 argues why this compression scheme should be adopted. Conclusion is given in section 8. 1 P-tree technology is patent pending at North Dakota State University 2. REVIEW OF LOSSLESS COMPRESSION There are two types of data redundancies, which can be exploited by lossless image compression: coding redundancy and inter-pixel redundancy. Elimination of them leads to more compact information representation. Usually, practical lossless image compression systems combine these techniques to achieve better compression ratios. Typical compression system consists of two parts: encoder and decoder. The input image f(x,y) is fed into encoder producing a set of symbols g(f(x,y)) describing the image. Then this set of symbols when required is fed into decoder, where, a reconstructed output image is generated. Since we are dealing with lossless compression, f(x,y) is an exact replica of. The encoder is responsible for reducing or eliminating any coding or inter-pixel redundancies presented in the image. This is done by two independent operations, each one dealing with certain type of redundancy. In the first stage of encoding inter-pixel redundancies are reduced or eliminated by mapper. The resulting data still contains coding redundancies, which are reduced or eliminated in the second stage by symbol encoder. Generally, mapper and encoder implement two independent algorithms, and to operate in one system they need to agree on the format of data interchanged between them. The decoder works in reverse order, applying firstly symbol decoding (inverse operation to symbol encoding) and then inverse mapper to get the original image f(x,y). Our proposed technique works much the same way. Construction of p-tree from image data resembles mapping and storing it in file resembles encoding. (a) data Mapper mapped data Encoder encoded data Figure 1(a). Encoding (b) encoded data decoded dat data Decoder Inverse Mapper original data Figure 3. 8-bit by 8-bit image and its p-tree Figure 1(b). Decoding 3. LINEARIZATION When compression schemes such as Huffman coding, Arithmetic coding, LZW coding are used to compress twodimensional image, the image first must be converted into a one-dimensional sequence. This conversion is reffered to as linearization [2]. Row-major scan as depicted in figure 2 is one of the popular linearization schemes which the proposed compression technique adopts. In this example, 36 is the number of 1's in the entire image. This root level is labeled level 0. The numbers 16, 7, 13, and 0 found at next level (level 1) are the 1-bit count for the four major quadrants in raster order, or Z order (upper left, upper right, lower left and lower right). Since the first and last level-1 quadrants are composed entirely of 1-bits (called pure-1 quadrants), sub-trees are not needed, and these branches terminate. Similarly, quadrants composed entirely of 0-bits are called pure-0 quadrants, which also cause termination of tree branches. This pattern is continued recursively using Peano, or Zordering (recursive raster ordering), of the four subquadrants at each new level. Eventually, every branch terminates (since, at the "leaf" level, all quadrants are pure). If we were to expand all sub-trees, including those for pure quadrants, then the leaf sequence would be the Peano-ordering of the image. Thus we use the name Peano Count Tree. More discussion on P-tree can be found in [5]. Figure 2. Row-major scan linearization But the concept of linearization is somewhat vague here. Because, though scanning is performed in row-major sequence, compression is still performed on twodimension. Thus, elimination of local redundancy or similarity in neighboring pixels occurs along both dimension. This happens due to P-tree structure, which will be explained in section 4. 4. PROPOSED MAPPER In this section, a relatively new data structure p-tree is introduced which has been used as the mapper of image data to be encoded. 4.1.1 Peano Mask Tree (PM-tree) A variation of the P-tree data structure, the Peano Mask Tree (PM-tree), is similar structure in which masks rather than counts are used. In a PM-tree, we use a three-value logic to represent pure-1, pure-0 and mixed quadrants (A 1 denotes pure-1; 0 denotes pure-0; and m denotes mixed). The PM-tree for the previous example is also given in Figure 4. We can easily construct the original P-tree from its PM-tree by calculating counts from leaves to the root in a bottom-up fashion [6]. Since PM-tre is just and alternative implementation for a Peano Count Tree, for simplicity we will use the same term, "P-tree," for Peano Mask Tree. 4.1 Introducing P-tree A P-tree is a quadrant-based tree. It recursively divides the entire image into quadrants and records the count of 1-bits for each quadrant, thus forming a quadrant count tree. Ptrees are somewhat similar in construction to other data structures in the literature (e.g. Quadtrees[3] and HHCodes [4]). For example, given an 8-row-8-column image of single bits, its P-tree is as shown in Figure 3. Figure 4. 8 by 8 image and its peano mask tree 5. ENCODING SCHEME An image can be viewed as a two-dimensional array of pixels. Associated with each pixel are various descriptive attributes, called "bands" [7]. A typical RSI image contains at least seven bands (Blue, Green, Red, NIR, MIR, TIR and MIR2) while a TIFF or BMP image contains only three bands (Blue, Green and Red). Each band has intensity value in the range 0 to 255. Thus, for TIFF and BMP images 24 bits are required per pixel. Assume N bands are associated with each pixel. Each band Bi (i =1,2,3, ..., N) is represented by a byte. For jth bit of ith bands a bitwise row-major linearization is performed and a two-dimensional array is generated. The array has nxn dimension for n pixel image. That is, every j th bit of ith band is selected from every pixel to construct the array. The P-tree constucted over this array is known as basic Ptree Pi,j. Thus for each band, there are eight basic P-trees, one for each bit position. An N band image has altogether 8N basic P-trees. As far as the encoding scheme is concerned, the p-tree is not stored as a tree at all. Instead each array is divided into quadrants recursively using the same p-tree construction concept and stored in a file as follows in depth-first order: 1. For mixed quadrant store binary value 10 2. For pure-1 quadrant store binary value 11 3. For pure-0 quadrant store binary value 00 It has been observed that storage of basic p-trees for the first four bit positions of each band is much less than the actual data, resulting in good compression. This happens as usually in image data, neighborhood pixels have similar properties. Close pixels share the same bit values for high order bits. Low order bits because of precision difference usually have different values. Basic p-tree Pi,j where j4 introduces more mixed quadrants and more often than not recursive division of data into quadrants goes on till pure-1 or pure-0 quadrants are 1 bit long. In such cases, encoding takes more storage than the actual bits. So, for less four significant bit positions we do not generate any array or basic p-trees. We store the bit values as they are in original uncompressed image. 6. PROSPECTS OF THE COMPRESSION SCHEME Basic p-trees can be constructed from the compressed file trivially as the file maintains depth-first order in storing ptrees. Once basic p-trees have been created we get datamining ready structure that facilitates efficient data mining tasks. Previous works have demonstrated that the p-tree algebra can perform data mining techniques efficiently and effectively. The p-tree based decision tree induction method is significantly faster than existing classification methods [8]. P-tree data structure allows computing the Bayesian probability values efficiently. Bayesian classification with P-trees has been used successfully on remotely sensed image data to predict yield in precision agriculture [9]. Experimental results showed that using ptree techniques in an efficient association rule-mining algorithm, P-ARM has significant improvement compared to FP-growth and Apriori algorithms [8]. 8. CONCLUSION We are knowledgeable of the fact that it can be argued that proposed compression scheme doesn’t competitively compress data like other successful lossless compression schemes. But no other scheme has ever been proposed that achieves the two following objectives at the same time: 1) Data compression 2) Data mining ready structure. Our proposed p-tree based compression achieves both thus attains an upper hand over other compression techniques. REFERENCES [1] Erickson BJ, Manduca A, Persons KR, et. al. “Evaluation of irreversible compression of digitized posterior-anterior chest radiographs” J Digit Imaging 1997; 10(3): 97-102. [2] B.C. Vemuri, S. Sahni, F.Chen, C. Kapoor, C. Leonard, and J. Fitzsimmons "Lossless image compression". Availabel at http://citeseer.nj.nec.com/559352.html [3] H. Samet, “Quadtree and related hierarchical data structure”, ACM Computing Surveys, 16(2): 187-260, June 1984. [4] HH-codes. Available at http://www.statkart.no/nlhdb/iveher /hhtext.htm, 03.10.2000 [5] W. Perrizo, Peano count tree lab notes, Technical report NDSU-CSOR-TR-01-1, Fargo, ND, 2001. [6] M. K. Hossain and W. Perrizo, “Automatic fingerprint identification system using p-tree” Proceedings of 6th International Conference of Computer and Information technology [7] Qiang Ding, William Perrizo, “Cluster analysis of spatial data using peano count tree”, Proceedings of CATA2002, San Francisco, USA, April 4-6, 2002. [8] "Decision tree classification of spatial data streams using peano count trees", Quang Ding, Qin Ding and William Perrizo, Proceedings of ACM Symposium on Applied Computing (SAC' 02), Madrid, Spain, March 2002, pp. 413-417. [9] Mohamed Hossain, “Bayesian Classification using P-Tree”, Master of Science Thesis, North Dakota State University, December 2001.