Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computer vision wikipedia , lookup
List of 8-bit computer hardware palettes wikipedia , lookup
Edge detection wikipedia , lookup
Color vision wikipedia , lookup
Spatial anti-aliasing wikipedia , lookup
Stereoscopy wikipedia , lookup
Anaglyph 3D wikipedia , lookup
BSAVE (bitmap format) wikipedia , lookup
Hold-And-Modify wikipedia , lookup
Stereo display wikipedia , lookup
IMAGE DATABASES Prof. Hyoung-Joo Kim OOPSLA Lab. Computer Engineering Seoul National University Contents • • • • • • • Introduction Image database systems Indexing issues and basic mechanisms A taxonomy on image indexes Color-spatial hierarchical indexes Signature-based color-spatial retrieval Summary Introduction • Traditional DBMS – effective in managing structured data – not effective for images that are non-alphanumeric and unstructured • Applications to manage images – medical application – geographic information system – satellite image database – criminal database – interior design – art galleries and museum management – etc... Introduction(2) • Content-based image retrieval techniques – techniques that retrieve image based on their visual properties such as texture, color, shape, etc – sequentially comparing image feature is timeconsuming and impractical – need content-based image index Image Database System • Additional functionalities – feature extraction • the system must be able to analyze an image to extract key features such as shape, color, texture – feature-based indexing • the system must build indexes based on the features extracted – content-based retrievals • the system should support a wide range of queries that involve the contents of the image – measure of similarity • the system requires a measure to capture what we humans perceive as similarity between two images Architecture of Image DBMS Input Image PREPROCESSING MODULE Image Input/Scanner Feature Extraction Update Index/ Database QUERY MODULE User Query Output Retrieved Image Interactive Query Formulation Browsing & Feedback Runtime Processor Feature Extraction Feature Matching Concurrency Control & Recovery Manager Feature/Image Database Indexing Issues • Three issues in content-based index design – determine a representation for the indexing feature – determine a similarity measure between two images based on their representation – determine an appropriate index organization Indexing Issues • Desirable properties of a representation(first issue) – exactness – space efficiency – computationally inexpensive similarity matching – preservation of the similarity between the features – automatic extraction – insensitivity to noise, distortion, rotation Indexing Issues • Main criterion of the similarity measure(second issue) – if two images are similar under the indexing feature, then their representations should remain so. • Several alternatives to determine the similarity – exact match • the representation of an image feature is usually coarse, in sense that images with similar feature will be mapped to the same representation – approximate match • the degree of similarity between the image representations is computed based on some approximation techniques Indexing Issues • Criteria for selection of an index structure – the similarity measure can be supported efficiently – storage efficiency – maintenance(update) overhead • Appropriate index structure – be determined by the representation and similarity measure – example • if image feature is represented as a vector, an the similarity measure is the Euclidean distance, then a natural choice is the multi-dimensional point access method Basic Index Schemes • Spatial access methods(SAMs) – basic idea • extract k image features for each image • map images into points in a k-dimensional feature space • use SAMs such as the grid file, quad-tree, the family of Rtree – problem: “high-dimensionality curse” • these techniques perform no better than sequential scanning as the dimension becomes sufficiently large Basic Index Schemes • Inverted file – basic idea • an inverted list is created for each distinct key(indexed features) • the inverted list consists of a list of pointers to the objects that contain features that are similar to the indexed feature – problem • high storage overhead & expensive update • Signature file – refer [Text Database] Taxonomy on Image Indexes • • • • Shape feature Spatial relationship Texture Color Taxonomy on Image Indexes Content-based indexes Objects Spatial relationship Signature Texture 2-D string Multi-level signature file Sequential file Tamura features Signature file Multi-dimensional index Color Shape 1-D string Inverted file Signature Numerical vectors Multi-dimensional index Similarity against representative objects Inverted file Color histogram Color-Spatial Multi-dimensional index Multi-level histogram 2-level B+-tree 3-tier color index Shape Feature • Representation – boundary information – a collection of rectangle that forms a rectangular cover of the shape – mathematical morphology – pattern spectrum – numerical vector using 2-D Fourier transformation or Wavelet transformation • Problems – shape vary widely from object to object – unless the images have very distinct shape, the performance may suffer Spatial relationship • Representation – 2-D string • semantic representation for spatial relationship using a two-dimensional string • projection of the symbols along the x-axis and y-axis • example : “O1 is to left of O2 which is left of O3” • the projection on the x-axis - O1 < O2 < O3 • variations and extensions to the 2-D string have been proposed – Problem • spatial relationships can be drastically affected by the orientation of the image Texture and Color • Texture – can be represented by the coarseness, contrast, directionality – the extraction of text information is a computationally intensive operations • Color – color histogram that captures the color composition of images – color alone is not sufficient to characterize an image • consider two image - one with the top half blue and bottom half red, while the other’s left half is red and it’s right half is blue • although these two images are similar in color composition, the are entirely different to a human observer. – recent studies have proposed to integrate color and its spatial distribution Color-spatial hierarchical Indexes • Hierarchical Indexes – multiple indexing mechanisms are integrated to form a single index structure – three indexes that have been proposed to integrated color and spatial information for retrieval • Two-level B+-tree structure • Three-tier color index • Sequential Multi-Attribute Tree(SMAT) Two-level B+-tree • Feature extraction – the color-spatial information of an image is modeled by splitting the image into 9 equal subareas(33) – the color information within each sub-area is represented by a color histogram – one can obtain a more accurate similarity by matching the corresponding color histogram of two image – the color histogram within each sub-area is mapped into a numerical key • Retrieval technique Two-level B+-tree • Retrieval technique – use two level information – the first level • describes the composition of colors corresponding to the histogram of the region • the colors are grouped into 11 bins • each group is assigned a range which bounds the percentage of pixels in the mage with colors of group – the second level • contains the average H, average V, and average C values of all the 11 histogram bins Two-level B+-tree Level 1: B+-tree on Normal Pixel Count Level 2: B+-tree on Average H,V, and C values Three-tier Color Index • Layer 1: – dominant color classification • a fixed number of dominant colors is extracted • the dominant colors are those with the largest number of pixel count • Layer 2: – multidimensional R-tree structure • the image can be assigned to a partition • prune away image within the candidate partitions that are not relevant • Layer 3: – multi-level color histogram(quad-tree structure) • compare the histograms of the query image with those of remaining potential candidate image Tree-tier Color Index Tier 1: Dominant Color Classification K=1 K=2 K=3 0 1 (0,1) (0,2) .. (0,n) 2 n(no.of colors) (1,2) …...…(2,n) …. (0,1,2) ... (0,1,n) ……. (0,2,n) …...… …. Tier 2: R-tree Tier 1: Multi-Level Color Histogram K : dominant colors SMAT • Height-balanced color-spatial index – the problem with the two approaches • individual tree structures(B+-tree, R-tree, Dominant Color Classification) are height-balanced, the entire hierarchical index structure may not be so. – height-balancing of SMAT • SMAT is a multi-tier index structure similar to two approaches, but remains height-balancing of the entire index structure by controlling the growth of the tree using a certain threshold which is application-specific value. • refer to the VLDB Journal(1998) 7: 115-128 for more information Signature-based Color-Spatial Retrieval • Representation of color-spatial information – an image is partitioned into a grid of mn cells of equal size – the colors that can be used to represent a cell are determined – for a given color, each cell is examined to determine the percentage of the total number of pixels in the cell having that color 0 1 2 Cell does satisfy the threshold Cell does not satisfy the threshold 31 An image partitioned into a 4x8 grid - an image is represented by 32-bit color signature, 10001111001000110000000001100011 Signature-based Color-Spatial Retrieval • The retrieval process – an image with k colors has k color signatures – let Qi and Di denote the signature of color i for a query image Q and a database image D. – let the representative color sets of Q and D be CQ and CD. BitSet(Qi Di ) if color i C D SIM basic(Q, D, i) { BitSet(Qi ) 0 otherwise SIM basic(Q, D) SIM iCQ (Q, D, i) basic Summary • Promising area that require further research – performance evaluation • comparative study will be useful for application designers and practitioners to pick the best method for their applications – more on access method • designing efficient access methods will make the contentbased retrieval techniques more practical and useful – concurrent access and distributed indexing • we expect to see more real-time application as well as application running in parallel or distributed environment Summary – Integration and optimization • most content-based image retrieval techniques only capture a part of the image’s semantics • important issues – selecting an “optimal” set of image features that fits best for an application – developing techniques that can integrate them into achieve the optimal results • one promising method(semantic-based retrieval technique) – use content-based techniques as the basis, but also exploits semantic meanings of the image and queries to support concept-based queries