Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ontology Driven Content Based Image Retrieval John Osborne Paper: Popescu et al, 2007 July 30th/ 2010 Overview • Review – CBIR – Ontology Definition and Example – SCBIR • Concept Hierarchy (Ontology) • Picture Database – Construction and Properties of Database • Image Processing – Filtering and Indexing • RetrieveOnto System – Modes – Evaluation • Conclusion and Future Directions Content Based Image Retrieval • Wikipedia definition • “application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases” • Problems Addressed: – Lack of human understandable semantics • System here allows control of querying conceptual neighborhoods – Scalability • CBIR gets more difficult as database size increases – Interactivity • CBIR not understandable to users Ontology “Specification of a conceptualization” Leukocyte hierarchy from cell ontology Semantic CBIR • Use of semantics (keywords,ontologies) to aid CBIR – Employing ontologies to define high level ontology • Map high level concepts to low level features • Manually • Use machine learning to bridge “semantic gap” • Use visual content, surrounding text from Web to assist CBIR Authors Concept Hierarchy Placental WordNet Statistics • Not a “true” ontology, but structure using a term hierarchy extracted from WordNet – Sub hierarchy of all terms under “placental” • Not classification system, includes “dog has puppy” – Better for their them, they want “general purpose” information • 144 leaf nodes under dog, 10 sub-concepts of dolphin • Hierarchy depth: 1 to 8 – Livestock terminal node from root – Brown Swiss -> dairy cattle -> cattle -> bovine -> bovid -> ruminant -> even-toed ungulate -> ungulate • 1113 nodes with 841 leaf terms • Leaf terms (and only leaf terms) have associated picture sets Bird Word Net Ontology -Paper used placental, not birds - Obviously not scientific The Point • “The role of the term hierarchy is to control, in a humanly understandable fashion, the region of the database where similar items are retrieved” Picture Database Construction • Database not standardized, but created by querying the web • Wanted to deal with heterogeneous sources • Employed “Ask” search engine to populate database as it gave better precision results versus google, yahoo or picsearch – Did their own testing, 20 concepts (50 images per query) and for Ask correct content (keyword association) was 80%, Picsearch (2nd best) was 70% Picture Database Details • • • • Collected over 33K images 31287 after invalid links/files removed Image filtering reduced image count to 25470 Mean # of pictures in a class: 30 – Standard deviation 23.8 – Numbers range from 0 to 147 – Well represented, lion, grizzly, poorly represented “Doe”, “Yearling”, Pteropus capestratus Image Processing • Database is intended to contain only pictures of animals - so they common non-animal pictures such as faces – Used “multi-stage AdaBoost detector” – Details unknown to me – “Aardvark” Clip Art Removal • Clipart and scanned texts (scientific publications) • Detect based on luminance histograms – detect maximum and compute standard deviation with threshold – Not for whole picture, for 16 equals rectangles due to uniform regions looking like clipart • performs better than color counting • 99.8% picture classification (11.3K+ database), 93% classification of the clipart database (5.4K+) Image Indexing • Index with border/interior pixel classification from previous publication – Quantizes each R, G and B component into 4 values – Classify each pixel into border or interior (a pixel whose 4 neighbors have the same quantized color is called interior, border otherwise) – Create 2 64 bin RGB histograms (one for border, one for interior) – Only look at central ¼ of the picture • Automatic segmentation is hard RetrievoOnto System • 3 Major Pieces – Conceptual Hierarchy (Ontology) – Processed Dataset – User interface • UI has 2 modes (Query Mode and Answers Page) • Query Mode – Default query mode • display random set of different leaf concepts – Concept browsing query mode displays 30 random images • Clicking on one brings up page with selection of images from that leaf node image set Answers Page Query was giant panda, but sub hierarchy was defined by procynoid Users controls via button that moves the root concept, whether to search just the particular concept images or a larger sub hierarchy (later slide) Traditional CBIR Answers Page Entire database is searched User Interface to Refine Search Space In this case there are 8 levels for the user to move up and down from giant_panda Database and Filtering Evaluation • Database evaluation – Thirty classes covering a wide area of database used, and 20 of those were presentdd to reviewers who were asked if it was representative of the class • 86% were judged representative • Filtering evaluation – Similar evaluation with 200 pictures (drawings and faces) with 35% not representative Ontology driven versus Classical CBIR Depth of hierarchy was from 3 to 9 Conceptual level 1 was leaf node Fetch large number of correct results when restricting search to your concept, drops off as database expands Some exception, blue whale did well as did western lowland gorilla (not shown here) Do not show percentages Classical CBIR really shown at right 10 images shown, results shown for 30 images but just one chart shown here Future Directions • Different with medical/biological ontologies – General image matching not useful – Image types fall into fewer classes – Images may be different based on particular region that is altered • More structured ontologies – Not just is_a relationship – Synsets will be useless for most tasks, need rigour • Searching is more defined