Download 15322.pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Joint Segmentation and Image Interpretation using Hidden Markov Models
Nidish Kamath, K. Sunil Kumar, U. B. Desai and Rakesh Dugud
Indian Institute of Technology - Bombay
Signal Processing and Artificial Neural Networks Laboratory
Department of Electrical Engineering, Powai, Mumbai 400 076, India.
[email protected]
Abstract
Image interpretation consists of interleaving the low-level task
of image segmentation and the high-level task of interpretation.
The idea being that the interpretation block guides the segmentation block which in turn helps the interpretation block in better interpretation. In this paper, we develop a joint segmentation
and image interpretation scheme using the notion of joint hidden
Markov model (HMM) for probabilistic modeling of spatial relationship. We find the optimal interpretation labels, which are
nothing but the optimal state sequence of the HMM.
1. Introduction
Image interpretation is a process of understanding the image
by identifying some important features in an image and analyzing them depending on their spatial relationship. The problem of
image interpretation requires the use of both, low-level (segmentation) and high-level (interpretation) vision tasks. Most of the
early interpretation schemes assumed the availability of a good
segmented image of the scene. But in practice obtaining a good
segmented image is difficult for the simple reason that segmentation itself depends on and hence is a function of the output of
interpretation. Bajcsy [1] discusses the need for integration of segmentation and interpretation. Modestino and Zhang [8] introduced
the use of Markov random field (MRF) models for image interpretation.
In this paper, we propose a scheme based on hidden Markov
models (HMM) for joint segmentation and image interpretation.
Unlike most other work on interpretation we do not assume a priori knowledge of the segmented image. In our approach segmentation and interpretation are interleaved and like some interpretation schemes [8, 4, 6] we assume a priori domain knowledge, but
unlike them we use a probabilistic framework for domain knowledge, namely, we assume that the HMMs for various interpretation
labels are known. Moreover, the domain knowledge for spatial relationship is characterized by a joint HMM. To our knowledge, the
notion of joint HMM is new. Thus, in our scheme the deterministic domain knowledge is replaced by parameters of HMM and
This work is supported by a DTSR project on Scene Understanding.
joint HMM. The layout of the paper is as follows: In Section 2
we formulate the problem of image interpretation using HMM. In
Section 3 we describe the construction of HMM and also give a
method of constructing clique functions. We describe the HMM
based joint segmentation and interpretation scheme in Section 4.
The proposed scheme is validated on real scenes in Section 5 and
we conclude in Section 6.
2. Problem Formulation
Image interpretation problem involves (i) segmenting the image Y to obtain s Y and (ii) using s Y along with the domain knowledge to interpret the image Y . First we construct the
,1 , D
,1 , D
,1 ,
wavelet transform of the Y [7] (Y ,1 =DY;LL
Y;HL Y;LH
,
1
,
1 is segmented and reDY;HH ). The low-pass filtered image Y
,1 , D
,1 , D
,1 ). A no
fined using the difference images (DY;HL
Y;LH Y;HH
interpretation label is a possible label in our scheme and is used
to refine the segmented image. Interpretation is carried out until
no region is labeled no interpretation. The segmented image is
interpreted using HMM.
Let R = Ri N
i=1 represent N regions in the segmented image and I = Ii N
i=1 the Mcorresponding interpretations. If there
L, implying that there
are M labels L = Li i=1 , then Ii
are M N possible interpretations. Let (K ) and (R) represent
the HMMs of the domain knowledge of interpretation labels and
that of the regions respectively. Then image interpretation can be
formulated as a MAP estimation problem
f g
f g
f g
2
H
H
I (R) = arg max P (I j H(K ); H(R))
(1)
I
assuming P (j) is a MRF and using Hamersley-Clifford theorem
[2], we can express (1) as a Gibbs distribution:
,
P
1 exp,
P (I jH(K ); H(R)) =
Z
V
c2C c
I ;H(K );H(R)
(2)
where, represents the collection of all the cliques and Vc ( ) is
the clique function and Z is the partition function. We expect the
interpretation process to be less dependent on the domain knowledge because of a two level probabilistic structure. At one level,
the interpretation is modeled as a MRF and at the other level the
clique functions for the features are based on HMMs. Solution of
C
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 2, 2008 at 23:56 from IEEE Xplore. Restrictions apply.
k-means segmentation
our problem involves two basic steps: Getting explicit expressions
for the clique functions and using an optimization method to solve
(2).
(b)
refining
d
wavelet transform
Step I: Segmentation
and Refining
3. HMM for Clique Functions
Step 0: Initialization
(a)
We give an example of a construction of a single (eg. shape)
and two node clique functions using HMM. The shape contour
is represented as a sequence S = si i=1 where is the total number of boundary points. Let (ri ; i ) be the polar representation of si . The representation is made scale invariant, by
normalizing. The discrete wavelet transform (DWT) of the ri
is found from scales 0;
; nr . and that of i is found from
scales 0;
; n . The feature vector used for training the HMM
n
n
; ri r ; i0 ;
; i ]T .
X (i) provides a
is X (i) = [ri0 ;
representation of the object at different scales and hence robust
to data perturbation or noise in the data. Rotational invariance
is achieved by considering all the rotated versions of X (i) ,
Step III: Coarse tro fine resolution
presence of
no-interpretation
labels
f g
f
g
def
j
namely, X = fX (j ); X (j + 1); ; X (j , 2); X (j , 1)g.
Thus fX j gj=1 , would be the training sequence for the shape
based HMM.
Spatial relationships in an image can be modeled using multiple node cliques. Deterministic domain knowledge parameters
like common perimeter ratio is used in [8, 4, 6]. In this paper, we
propose a new method for modeling adjacency relationships using
joint HMM. Consider a two node clique. Let R1 ; R2 represent two
adjacent regions in the segmented image. We build HMMs 1 ,
2 1 for a specific feature (say shape) corresponding to R1 and
R2 respectively. Since the regions are adjacent, there exists what
we term as joint HMM which gives information about the spatial
relationship between R1 ; R2 which is different from 1 and 2
individually using the following scheme: (i) Let O1 and O2 be two
observation sequences generated using 1 and 2 respectively.
Construct a vector = [P (O1 ; I2 2 ); P (O2 ; I1 1 )]T
here I2 denotes the optimal state sequence corresponding to the
observation O1 , supposing that O1 was generated by 2 . (ii) is assumed to model the spatial relationship between the two sequences O1 and O2 . The generated is then used as the observation sequence for training the two node clique HMM. The number
of states in this HMM is taken as the number of nodes in the clique.
The spatial relations can thus be modeled using joint HMMs.
H
H
H
H
H
jH
H
4. The Joint Segmentation and Image Interpretation Scheme
The scheme is pictorially depicted in Figure 1.
Step 0 - Initialization: Given, (K ) the a priori knowledge
base HMMs for M labels and F features and Y (Figure 1(a)),
,1 , D
,1 , D
,1 ,
the scene to be interpreted; construct DY;LL
Y;HH Y;HL
,
1
DY;LH using a wavelet filter (Figure 1 (b)).
Step I - Segmentation and refining: Segment Y ,1 using kmeans clustering algorithm (Figure 1 (c)) and refine the segmented
,1 , D
,1 and D
,1 and (ii) usimage s Y~ ,1 , (i) using DY;HH
Y;HL
Y;LH
ing a predefined threshold to merge segments whose area is less
H
1 note that H1 2 H(R1 ) and H2 2 H(R2 )
Step II: Interpretation - Segmentation loop
Figure 1. Joint segmentation and image interpretation scheme.
g
f
H
jH
interpretation
quadtree interpolate
than a prespecified minimum area, to get s Y ,1 (Figure 1 (d)).
For the details about refining please see [5].
Step II - Interpretation - Segmentation loop:
(1)Interpretation: The segments are interpreted using knowledge
base HMMs (K ), and (R) the HMMs derived from the segmented image s Y ,1 . The clique functions are constructed as
described in Section 3, the interpretation task reduces to that of
obtaining the optimal labels or equivalently — determining the
optimal state sequence corresponding to the observation sequence
for all the regions. In our simulations we used simulated annealing algorithm for the purpose of optimization. Recall that L
contains a no-interpretation label, a region R, is assigned a nointerpretation label, when the distance between (R) and (K )
exceeds a threshold (see Appendix A [3].
(2) Segmentation: In case any segment, say Ri has the label nointerpretation, we merge it with one of the interpreted segments
which is adjacent to it, depending on the minimum HMM distance
criterion. If
X
Rm = arg min
D(Hk (Ri ); Hk (Rj ))
(3)
H
H
H
H
Rj 2n(Ri )
then merge region Ri with Rm
(3) Go back to Step II (Interpretation-Segmentation loop)
Step III - Coarse to fine resolution: Quadtree interpolate the segmented and interpreted images to obtain the final segmented and
interpreted images (see Figure 1).
5. Experimental Results
The robustness of the proposed scheme was tested by conducting simulations on noisy images. Because of lack of space we
give results for one set of image. The knowledge base consisting
of the HMM for features most likely to occur in that class of scene
images (in our scene images this would correspond to sky, path,
tree, road) was constructed using a scene image (Figure 2) which
is different from the scene (Figures 3) to be interpreted. Having
acquired the knowledge base HMMs the scheme described in Section 4 is adopted. Experimental results using the proposed scheme
is give for a single set of scene (Figures 3) and the robustness of
the proposed scheme was tested on Figure 4 by artificially adding
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 2, 2008 at 23:56 from IEEE Xplore. Restrictions apply.
(a)
(b)
(a)
Figure 2. (a) Image for generating knowledge
base, (b) manually segmented image for finding knowledge base HMMs.
left tree
(b)
sky
road
(c)
path
right tree
Figure 4. (a) Noisy scene image corrupted by
Gaussian N (0; 10) (b) DWT of (a) and (c) final
segmented image with interpretation labels.
References
(a)
left tree
(b)
sky
road
(c)
path
right tree
Figure 3. (a) Scene to be interpreted (b) DWT
of (a) and (c) final segmented image with interpretations.
Gaussian noise with 0 mean and variance 10. In each experiment,
the discrete wavelet transform [7] of the scene was found (Figures
3(b), 4(b)). For each of the scene images considered the final segmented image is shown (Figures 3(c), and 4(c)). The segmented
image in each case is to be interpreted based on the legends following each result. Observe, (i) The algorithm results in a correctly
segmented image (Figures 3, 4) and the segmented image is assigned labels from the set sky, tree, path, road , thus correctly
interpreting the scene image. (ii) The correct interpretation results
due to the proposed scheme on a noisy scene image (Figure 4)
demonstrates the robustness of the scheme to noise and hence to
gray level variations.
f
g
[1] R. Bajcsy, F. Solina, and A. Gupta. “Segmentation versus Object Representation–Are They Separable?”. Springer-Verlag,
1990.
[2] S. Geman and D. Geman. “Stochastic relaxation, Gibbs distribution, and Bayesian restoration of images”. IEEE Tran. on
Pattern Analysis and Machine Intelligence, 6:721–741, 1984.
[3] B. H. Juang and L. R. Rabiner. “A probabilistic distance measure between HMMs”. AT & T Technical Journal, 64(2):391–
408, 1985.
[4] I. Y. Kim and H. S. Yang. “An integration scheme for image
segmentation and labeling based on Markov random fields”.
IEEE Trans. on Pattern Anal. and Machine Intell., 18:69–73,
1996.
[5] K. S. Kumar and U. B. Desai. “Joint segmentation and image
interpretation”. Pattern Recognition, page to appear, 1998.
[6] V. P. Kumar and U. B. Desai. “Image interpretation using
Bayesian networks”. IEEE Tran. on Pattern Analysis and Machine Intelligence, 18:74–77, 1996.
[7] S. G. Mallat. “Multifrequency channel decompositions of images and wavelet models”. IEEE Tran. Acoustics, Speech and
Signal Processing, 37:2091–2110, 1989.
[8] J. A. Modestino and J. Zhang. “A Markov random field model
based approach to image interpretation”. IEEE Tran. on Pattern Analysis and Machine Intelligence, 14:606–615, 1992.
A Distance measure between two HMMs
Distance between two HMMs is a measure
6. Conclusion
A joint segmentation and image interpretation algorithm using
HMM is proposed. Method to obtain joint HMM for multiple node
cliques is introduced and is helpful in building the adjacency relationships. Future work on HMMs can involve obtaining (i) an
optimal training scheme for HMM building, (ii) use of higher order statistical information in building HMMs, (iii) updating HMM
databases on obtaining new knowledge data, and (iv) theoretical
issues have not been looked into, and can be explored further.
1 j log P (Oj ; Qi jH ),log P (Oj ; Qj jH ) j
Hi ; Hj ) = Tlim
i
j
!1 T
(
D
(4)
where Oj denotes an observation sequence generated by HMM
j and Qi denotes the optimalstate sequence corresponding to
j
O against the HMM i , and Qj denotes against j . In essence,
(4) measures how well the observation sequence generated using
one HMM matches against another HMM. However, (4) is not
symmetric and hence a symmetric distance is obtained as
H
H
H
Hi ; Hj ) def
= 12 [D (Hi ; Hj ) + D (Hj ; Hi )]
D(
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on December 2, 2008 at 23:56 from IEEE Xplore. Restrictions apply.