Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MURI Projects Cortical mechanisms of integration and inference Biomimetic models uniting cortical architectures with graphical/statistical models Application of Bayesian Hypercolumn model to target recognition in HSI Image statistics of hyperspectral and spatiotemporal imagery Transition models to MR spectroscopy for biomedical applications University of Pennsylvania Columbia University MIT Leif Finkel Paul Sajda Ted Adelson Kwabena Boahen Josh Ze’evi Yair Weiss Diego Contreras The Fundamental Problem: Integration Integration of multiple cues (contour, surface, texture, Bottom-up and top-down integration color, motion depth…) horizontal integration The Fundamental Problem: Integration spatiotemporal spatiospectral Integration of multiple cues (contour, surface, texture, color, motion depth…) horizontal integration Probabilistic models offer a unifying approach to integration How the Brain Does It? The Cortical Hypercolumn Fundamental module for processing a localized “patch” (~2o visual angle) of the visual field Contains neural machinery needed to construct statistically description (i.e. multivariant PDF constructed across orientation, scale, wavelength, disparity, velocity, etc.) The Generalized Aperture Problem: Capturing Non-local Dependencies (i.e. Context) h7 h5 h3 h1 h8 h6 h4 h2 Possible mechanism for capturing non-local dependency structure: long-range corticocortical connections (Bosking et. al. 1997) (Geisler Statistical properties of natural images are consistent with such a mechanism et. al,. 2001) Approach: Bayesian Hypercolumn Network T11 T22 T33 T21 h1 T32 h2 h3 T12 P(*|h1) T23 P(*|h2) P(*|h3) space rate Bayesian Hypercolumn as a Canonical Unit in Biological Visual Processing P(logan airport)=1 FC P(DC10, 747, F -15,É) IT P(dof,ab,dp,cl,c1,c2,..) V2-V3-V4 P(,,,d) V1 Bridging Bayesian Networks and Cortical Processing A Hypercolumn Architecture for Computing Target Salience A Bayesian Network Model for Capturing Contextual Cues: Applications to Target Classification, Synthesis and Compression Sha’ashua & Ullman, 1988 Orientation Pinwheels in Visual Cortex Shmuel and Grinvald (2000) Anatomical Connectivity in Striate Cortex V H Bosking, et al., (1997) J. Neurosci Co-circularity Physiology/Psychophysics Sigman, M et al., PNAS 98 (2001) Natural Image Statistics Geisler, WS et al., Vis. Res. 41 (2001) Contour Salience R. Hess & D. Field (1999) Trends in Cog. Sci. o x Intracellular In Vivo Physiological Recordings D. Contreras & L. Palmer, unpublished data A Hypercolumn-Based Model for Estimating Co-Circularity D( D( = D( D( Detect match between local & distant hypercolumn D( D( Hypercolumn receives “matched” inputs from multiple other hypercolumns Multiple Matches Causes Transition to “Chattering” Behavior D. McCormick Synchronization of Chattering Bursts Detects Clique of Connected Hypercolumns Same Chattering Frequency Synchronizes Different Frequencies Don’t Synchronize Sha’ashua & Ullman, 1988 Hypercolumn-Based Co-circularity Measure Bridging Bayesian Networks and Cortical Processing A Hypercolumn Architecture for Computing Target Salience A Bayesian Network Model for Capturing Contextual Cues: Applications to Target Classification, Synthesis and Compression Problem: Integrating Multi-scale Features for Object Recognition/Detection Detecting small objects having few features Discriminating large objects having subtle differences Aim is to do this within a machine learning framework Analogous Problems in Medical Imaging Anatomical and Physiological Context breast cancers tend to be highly vascularized Context provided by multiple modalities leakage seen in fluorescein image can provide insight into clinical significance of drusen in fundus photo Generative Probability Models Statistical Pattern Recognizers are important components of Automatic Target Recognition (ATR) and Computer-aided Detection (CAD) Systems. Most are trained as discriminative models: they model Pr(C | I) C=class, I=image. However there are advantages to generative models: they model Pr(I | C) or Pr(I) . By applying Bayes rule generative models can be used for classification: Pr(C|I)=Pr(I|C)P(C)/Pr(I) discriminative oooo o x ooo o x x xx o o o o x x xx xx x o o o x x x x generative generative Utility of a Generative Model novelty detection – compute absolute value of Pr(I|C) to detect images very different from those used to construct the model. confidence measure on the output of the ATR/CAD system synthesis – by sampling Pr(I|C) we can generate new images for class C. insight into the image structure captured by the model compression– knowing Pr(I|C) gives the optimal code for compressing the image. object optimized compression also noise suppression, segmentation etc. The Hierarchical Image Probability (HIP) Model • Coarse-to-fine conditional dependence. • Short-range (relative to pyramid level) dependencies captured by modeling the distribution of feature vectors. • Longer-range dependencies captured through a set of hidden variables. • Factor probabilities over position to make the model tractable. Coarse-to-fine Conditional Dependence Pyramid divides image structure into scales. Finer scales are conditioned on coarser scale (i.e. objects contain parts, which contain sub-parts, etc.) Pr( I ) Pr( I 0 | I 1) Pr( I1 | I 2 ) ~ Define : G l ( I l 1 , G l ) ~ ~ and the map Γ l : I l G l Factoring Across Scale ~ For any Pr( I ), if l is invertible for l {0, , L 1} then ~ Pr( I ) l Pr(G l | I l 1 ) Pr( I L ) l 0 L 1 ~ ~ ~ Proof : l : I l G l is a change of variables with Jacobian l . ~ ~ Pr( I 0 ) 0 Pr(G 0 ). ~ ~ Since G 0 (G 0 , I1 ), Pr( I 0 ) 0 Pr(G 0 | I1 ) Pr( I1 ). Repeat for I1 ,, I L 1 Models of Pr(Gl|Il+1) Factor over position to make the computations tractability. Need hidden variables (A) to capture non-local dependencies. Assume Fl+1 and A carry relevant information of Il+1. L 1 Pr( I ) Pr( g l | f l 1, x, A) Pr( A | I L ) Pr( I L ) A l 0 x I l 1 where A and its dependencies are arbitrary. Capturing Long-range Dependencies (Context) with Hidden Variables OA Il+1 Il If a large area of I l +1 implies object class A, and class A implies a certain texture in Il , local structure in Il depends on non-local information in Il+1. OA OB or not If Il+1 implies an object class which in turn implies a texture over the region of the object, but Il+1 contains no information for differentiating object class A or B, distant patches are mutually dependent Coarse-to-fine conditioning alone does not make dependencies local Tree Structure of Hidden Variables Choose tree-structure for hidden variables/label Belief network or HMM on a tree Al+2 Al+1 Al 1-D Hidden labels can be thought of as a “learned segmentation” for the image 2-D One Model for Pr(Gl|Il+1) We choose a local integer “label” al at each x in Gl, with coarse-tofine conditioning of al on al+1 to make the dependency tree. L1 Pr( I ) Pr( g l | fl 1 , al , x ) Pr( al | al 1 , x ) Pr( AL1 | I L ) Pr( I L ) A0 ,, AL 1 l 0 xI l 1 Pr( g l | fl 1 , al , x ) is a normal distributi on with mean gl ,al M al fl 1 and covariance al ; can choose M al and al diagonal to simplify. Train the model using Expectation/Maximization (EM) algorithm. Structure of the HIP Model Level l+2 f Level l+1 f g Level l g a analog to a long range analog to a hypercolumn cortico-cortical connections a f a g Example: X-Ray Mammography …dataset and training… Regions of Interest (ROIs) provided by Dr. Maryellen Giger from the University of Chicago (UofC). ROIs represent outputs from UofC CAD system for mass detection. 72 positive and 96 negative ROIs. Half of data used for training, half for testing. Train two HIP models: masses (positives), non-masses (negatives). Choose architecture using minimum description length (MDL) criterion. Bounded number of labels above at 17. Best architecture; 17,17,11,2,1 hidden labels in levels 0-4 respectively. Mass Detection Reducing false positives by 25% without loss in sensitivity Novelty Detection pos pos neg neg Use novelty detection to establish a confidence measure for the detector Image Synthesis ROI image synthesized from positive model ROI images synthesized from negative model Synthesized images can be used to develop intuition of how well model represents the data Compression original JPEG HIP Results on Aerial Imagery example images classification synthesis Az(HIP)=0.87 vs. Az(HPNN)=0.86 %correct(HIP)=85% vs. %correct(D/V)=78% label 1 label 2 Hidden Variable Probabilities compression