Download 15322.pdf

Joint Segmentation and Image Interpretation using Hidden Markov Models Nidish Kamath, K. Sunil Kumar, U. B. Desai and Rakesh Dugud Indian Institute of Technology - Bombay Signal Processing and Artificial Neural Networks Laboratory Department of Electrical Engineering, Powai, Mumbai 400 076, India. [email protected] Abstract Image interpretation consists of interleaving the low-level task of image segmentation and the high-level task of interpretation. The idea being that the interpretation block guides the segmentation block which in turn helps the interpretation block in better interpretation. In this paper, we develop a joint segmentation and image interpretation scheme using the notion of joint hidden Markov model (HMM) for probabilistic modeling of spatial relationship. We find the optimal interpretation labels, which are nothing but the optimal state sequence of the HMM. 1. Introduction Image interpretation is a process of understanding the image by identifying some important features in an image and analyzing them depending on their spatial relationship. The problem of image interpretation requires the use of both, low-level (segmentation) and high-level (interpretation) vision tasks. Most of the early interpretation schemes assumed the availability of a good segmented image of the scene. But in practice obtaining a good segmented image is difficult for the simple reason that segmentation itself depends on and hence is a function of the output of interpretation. Bajcsy [1] discusses the need for integration of segmentation and interpretation. Modestino and Zhang [8] introduced the use of Markov random field (MRF) models for image interpretation. In this paper, we propose a scheme based on hidden Markov models (HMM) for joint segmentation and image interpretation. Unlike most other work on interpretation we do not assume a priori knowledge of the segmented image. In our approach segmentation and interpretation are interleaved and like some interpretation schemes [8, 4, 6] we assume a priori domain knowledge, but unlike them we use a probabilistic framework for domain knowledge, namely, we assume that the HMMs for various interpretation labels are known. Moreover, the domain knowledge for spatial relationship is characterized by a joint HMM. To our knowledge, the notion of joint HMM is new. Thus, in our scheme the deterministic domain knowledge is replaced by parameters of HMM and This work is supported by a DTSR project on Scene Understanding. joint HMM. The layout of the paper is as follows: In Section 2 we formulate the problem of image interpretation using HMM. In Section 3 we describe the construction of HMM and also give a method of constructing clique functions. We describe the HMM based joint segmentation and interpretation scheme in Section 4. The proposed scheme is validated on real scenes in Section 5 and we conclude in Section 6. 2. Problem Formulation Image interpretation problem involves (i) segmenting the image Y to obtain s Y and (ii) using s Y along with the domain knowledge to interpret the image Y . First we construct the ,1 , D ,1 , D ,1 , wavelet transform of the Y [7] (Y ,1 =DY;LL Y;HL Y;LH , 1 , 1 is segmented and reDY;HH ). The low-pass filtered image Y ,1 , D ,1 , D ,1 ). A no fined using the difference images (DY;HL Y;LH Y;HH interpretation label is a possible label in our scheme and is used to refine the segmented image. Interpretation is carried out until no region is labeled no interpretation. The segmented image is interpreted using HMM. Let R = Ri N i=1 represent N regions in the segmented image and I = Ii N i=1 the Mcorresponding interpretations. If there L, implying that there are M labels L = Li i=1 , then Ii are M N possible interpretations. Let (K ) and (R) represent the HMMs of the domain knowledge of interpretation labels and that of the regions respectively. Then image interpretation can be formulated as a MAP estimation problem f g f g f g 2 H H I (R) = arg max P (I j H(K ); H(R)) (1) I assuming P (j) is a MRF and using Hamersley-Clifford theorem [2], we can express (1) as a Gibbs distribution: , P 1 exp, P (I jH(K ); H(R)) = Z V c2C c I ;H(K );H(R) (2) where, represents the collection of all the cliques and Vc ( ) is the clique function and Z is the partition function. We expect the interpretation process to be less dependent on the domain knowledge because of a two level probabilistic structure. At one level, the interpretation is modeled as a MRF and at the other level the clique functions for the features are based on HMMs. Solution of C Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 2, 2008 at 23:56 from IEEE Xplore. Restrictions apply. k-means segmentation our problem involves two basic steps: Getting explicit expressions for the clique functions and using an optimization method to solve (2). (b) refining d wavelet transform Step I: Segmentation and Refining 3. HMM for Clique Functions Step 0: Initialization (a) We give an example of a construction of a single (eg. shape) and two node clique functions using HMM. The shape contour is represented as a sequence S = si i=1 where is the total number of boundary points. Let (ri ; i ) be the polar representation of si . The representation is made scale invariant, by normalizing. The discrete wavelet transform (DWT) of the ri is found from scales 0; ; nr . and that of i is found from scales 0; ; n . The feature vector used for training the HMM n n ; ri r ; i0 ; ; i ]T . X (i) provides a is X (i) = [ri0 ; representation of the object at different scales and hence robust to data perturbation or noise in the data. Rotational invariance is achieved by considering all the rotated versions of X (i) , Step III: Coarse tro fine resolution presence of no-interpretation labels f g f g def j namely, X = fX (j ); X (j + 1); ; X (j , 2); X (j , 1)g. Thus fX j gj=1 , would be the training sequence for the shape based HMM. Spatial relationships in an image can be modeled using multiple node cliques. Deterministic domain knowledge parameters like common perimeter ratio is used in [8, 4, 6]. In this paper, we propose a new method for modeling adjacency relationships using joint HMM. Consider a two node clique. Let R1 ; R2 represent two adjacent regions in the segmented image. We build HMMs 1 , 2 1 for a specific feature (say shape) corresponding to R1 and R2 respectively. Since the regions are adjacent, there exists what we term as joint HMM which gives information about the spatial relationship between R1 ; R2 which is different from 1 and 2 individually using the following scheme: (i) Let O1 and O2 be two observation sequences generated using 1 and 2 respectively. Construct a vector = [P (O1 ; I2 2 ); P (O2 ; I1 1 )]T here I2 denotes the optimal state sequence corresponding to the observation O1 , supposing that O1 was generated by 2 . (ii) is assumed to model the spatial relationship between the two sequences O1 and O2 . The generated is then used as the observation sequence for training the two node clique HMM. The number of states in this HMM is taken as the number of nodes in the clique. The spatial relations can thus be modeled using joint HMMs. H H H H H jH H 4. The Joint Segmentation and Image Interpretation Scheme The scheme is pictorially depicted in Figure 1. Step 0 - Initialization: Given, (K ) the a priori knowledge base HMMs for M labels and F features and Y (Figure 1(a)), ,1 , D ,1 , D ,1 , the scene to be interpreted; construct DY;LL Y;HH Y;HL , 1 DY;LH using a wavelet filter (Figure 1 (b)). Step I - Segmentation and refining: Segment Y ,1 using kmeans clustering algorithm (Figure 1 (c)) and refine the segmented ,1 , D ,1 and D ,1 and (ii) usimage s Y~ ,1 , (i) using DY;HH Y;HL Y;LH ing a predefined threshold to merge segments whose area is less H 1 note that H1 2 H(R1 ) and H2 2 H(R2 ) Step II: Interpretation - Segmentation loop Figure 1. Joint segmentation and image interpretation scheme. g f H jH interpretation quadtree interpolate than a prespecified minimum area, to get s Y ,1 (Figure 1 (d)). For the details about refining please see [5]. Step II - Interpretation - Segmentation loop: (1)Interpretation: The segments are interpreted using knowledge base HMMs (K ), and (R) the HMMs derived from the segmented image s Y ,1 . The clique functions are constructed as described in Section 3, the interpretation task reduces to that of obtaining the optimal labels or equivalently — determining the optimal state sequence corresponding to the observation sequence for all the regions. In our simulations we used simulated annealing algorithm for the purpose of optimization. Recall that L contains a no-interpretation label, a region R, is assigned a nointerpretation label, when the distance between (R) and (K ) exceeds a threshold (see Appendix A [3]. (2) Segmentation: In case any segment, say Ri has the label nointerpretation, we merge it with one of the interpreted segments which is adjacent to it, depending on the minimum HMM distance criterion. If X Rm = arg min D(Hk (Ri ); Hk (Rj )) (3) H H H H Rj 2n(Ri ) then merge region Ri with Rm (3) Go back to Step II (Interpretation-Segmentation loop) Step III - Coarse to fine resolution: Quadtree interpolate the segmented and interpreted images to obtain the final segmented and interpreted images (see Figure 1). 5. Experimental Results The robustness of the proposed scheme was tested by conducting simulations on noisy images. Because of lack of space we give results for one set of image. The knowledge base consisting of the HMM for features most likely to occur in that class of scene images (in our scene images this would correspond to sky, path, tree, road) was constructed using a scene image (Figure 2) which is different from the scene (Figures 3) to be interpreted. Having acquired the knowledge base HMMs the scheme described in Section 4 is adopted. Experimental results using the proposed scheme is give for a single set of scene (Figures 3) and the robustness of the proposed scheme was tested on Figure 4 by artificially adding Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 2, 2008 at 23:56 from IEEE Xplore. Restrictions apply. (a) (b) (a) Figure 2. (a) Image for generating knowledge base, (b) manually segmented image for finding knowledge base HMMs. left tree (b) sky road (c) path right tree Figure 4. (a) Noisy scene image corrupted by Gaussian N (0; 10) (b) DWT of (a) and (c) final segmented image with interpretation labels. References (a) left tree (b) sky road (c) path right tree Figure 3. (a) Scene to be interpreted (b) DWT of (a) and (c) final segmented image with interpretations. Gaussian noise with 0 mean and variance 10. In each experiment, the discrete wavelet transform [7] of the scene was found (Figures 3(b), 4(b)). For each of the scene images considered the final segmented image is shown (Figures 3(c), and 4(c)). The segmented image in each case is to be interpreted based on the legends following each result. Observe, (i) The algorithm results in a correctly segmented image (Figures 3, 4) and the segmented image is assigned labels from the set sky, tree, path, road , thus correctly interpreting the scene image. (ii) The correct interpretation results due to the proposed scheme on a noisy scene image (Figure 4) demonstrates the robustness of the scheme to noise and hence to gray level variations. f g [1] R. Bajcsy, F. Solina, and A. Gupta. “Segmentation versus Object Representation–Are They Separable?”. Springer-Verlag, 1990. [2] S. Geman and D. Geman. “Stochastic relaxation, Gibbs distribution, and Bayesian restoration of images”. IEEE Tran. on Pattern Analysis and Machine Intelligence, 6:721–741, 1984. [3] B. H. Juang and L. R. Rabiner. “A probabilistic distance measure between HMMs”. AT & T Technical Journal, 64(2):391– 408, 1985. [4] I. Y. Kim and H. S. Yang. “An integration scheme for image segmentation and labeling based on Markov random fields”. IEEE Trans. on Pattern Anal. and Machine Intell., 18:69–73, 1996. [5] K. S. Kumar and U. B. Desai. “Joint segmentation and image interpretation”. Pattern Recognition, page to appear, 1998. [6] V. P. Kumar and U. B. Desai. “Image interpretation using Bayesian networks”. IEEE Tran. on Pattern Analysis and Machine Intelligence, 18:74–77, 1996. [7] S. G. Mallat. “Multifrequency channel decompositions of images and wavelet models”. IEEE Tran. Acoustics, Speech and Signal Processing, 37:2091–2110, 1989. [8] J. A. Modestino and J. Zhang. “A Markov random field model based approach to image interpretation”. IEEE Tran. on Pattern Analysis and Machine Intelligence, 14:606–615, 1992. A Distance measure between two HMMs Distance between two HMMs is a measure 6. Conclusion A joint segmentation and image interpretation algorithm using HMM is proposed. Method to obtain joint HMM for multiple node cliques is introduced and is helpful in building the adjacency relationships. Future work on HMMs can involve obtaining (i) an optimal training scheme for HMM building, (ii) use of higher order statistical information in building HMMs, (iii) updating HMM databases on obtaining new knowledge data, and (iv) theoretical issues have not been looked into, and can be explored further. 1 j log P (Oj ; Qi jH ),log P (Oj ; Qj jH ) j Hi ; Hj ) = Tlim i j !1 T ( D (4) where Oj denotes an observation sequence generated by HMM j and Qi denotes the optimalstate sequence corresponding to j O against the HMM i , and Qj denotes against j . In essence, (4) measures how well the observation sequence generated using one HMM matches against another HMM. However, (4) is not symmetric and hence a symmetric distance is obtained as H H H Hi ; Hj ) def = 12 [D (Hi ; Hj ) + D (Hj ; Hi )] D( Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 2, 2008 at 23:56 from IEEE Xplore. Restrictions apply.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 15322.pdf