Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SUMMARY OPEN PROBLEMS The slides are from several sources through James Hays (Brown); including his own slides. Fundamentals of Computer Vision • Geometry – What can find out from mathematics. • Matching – How to measure the similarity of two regions. • Alignment – How to align points/patches. – How to recover transformation parameters based on matched points. • Grouping – What points/regions/lines belong together. • Categorization/Recognition – What similarities are more important. around year 2010... Geometry • x = K [R t] X – Maps 3D point X to 2D point x. – Rotation R, translation t maps into 3D camera coordinates. – Intrinsic matrix K calibrates a 2D image. • Parallel lines in 3D converge at the vanishing point in the 2D image. – A 3D plane has a vanishing line in the 2D image. • x’T F x=0 or viceversa x F x' = 0 and F => F T T T – Points in two views that correspond to the same 3D point are related by the fundamental matrix F. The two camera matrices (3D=>2D) are invariant to a 3D (4x4) projective transformation and give uncalibrated 2D images! Matching • Does this patch match that patch... – in two simultaneous views -- stereo – in two successive frames -- tracking, flow, SFM – in two pictures of the same object -- recognition ? ? Matching Representations invariant/robust to expected deformations. • Often assume that locally the shape is constant. • Change in viewpoint mostly can be achieved. • Change in lighting or camera gain compensated by local averages. For example, SIFT achieved a relative good descriptor ... mainly for 2D components... which provides robustness to in-plane orientation, lighting, contrast, translation. Alignment of points Search efficiently to align matching patches. • Interest points: find repeatable, distinctive points. – long-range matching -- wide baseline stereo, object recognition – Harris detector -- 2D corners are ofter repeatable – difference of Gaussian -- used in Laplacian image pyramid • Local search – short range matching -- tracking, optical flow – gradient descent on SSD -- often with image pyramid • Windowed search – long-range matching -- recognition, stereo with scanline ) Alignment of sets Find transformation to align matching sets of points. • Geometric transformation – least squares fit (with SVD), if all matches can be trusted – RANSAC: works if fraction of inliers is high -- a more advancent robust estimator already exist... Other cases... -- thin plate spline for more general distortions -- one-to-one correspondence with graph algorithms A_1 A_2 A_3 s Grouping • Clustering: group items (patches, pixels, lines, etc.) that have similar appearance. -- unknown regions often can be allocated to the clustening • Segmentation: group pixels into regions of coherent color, texture, motion, and/or label. • Probabilistical group distributions, like Gaussians, often can estimate the distributions’ parameters, but the nonparametric distributions also should not be neglected. Categorization Match objects, parts, or scenes that may vary in appearance. • Categories are typically defined by human and may be related by function, location, or other non-visual attributes. • What are the important similarities and can be learned from training examples only. Training Labels Training Images Image Features Classifier Training Trained Classifier Object categorization Search by sliding window detector. • May work well for rigid objects in isolation and given pose. • Simple alignment for simple deformations works, but more flexible alignment for articulated objects (part-based models) is difficult. Object or Background? Real object categorization is a difficult problem for the moment... Vision as part of an intelligent system 3D Scene Feature Extraction Texture Grouping Surfaces Interpretation Action Objects Color Bits of objects Agents and goals Optical Flow Stereo Disparity Sense of depth Motion patterns Shapes and properties Open paths Words Walk, touch, contemplate, smile, evade, read on, pick up, … Computer vision is potentially worth major $$$, but there are major challenges to overcome first. These were some successful examples... • Driver assistance MobileEye received >$100M in funding from Goldman Sachs. • Entertainment (Kinect, movies, etc.) Intel is spending $100M for visual computing over next five years. • Security Potential for billions of deployed cameras. • Robot workers Machines instead of people. • many more applications... There are many open problems due to the difference between the human and computer vision. While human vision (working completely differently and largely unknown) can process simultateously and robustly many object, todays best computer vision algorithms will not exceed four-five objects at not very difficult poses. Even the low-level algorithms, like, edge or corner detection, should be reexamined since we already have much more pixels per each image. Maybe better algorithms will come. Top-down ("in the memory") and bottom-up ("what we see now") should more closely interact and from a much lower stage in the processing. Here, we will examine only the object recognition open problems and how they reflect in the low-level solutions. Open problems Object category recognition: where is the cat? Important questions: • How can we better align two object instances. • How do we identify the important similarities of objects within a category. • How do we tell if two patches depict similar shapes. Open problems • Spatial understanding: what she/he will be doing here if have to? Important questions: • What are good representations of space for navigation and interaction. What kind of details are important. • How can we combine single-image cues with multi-view cues. Open problems Object representation: what is it? Important questions: • How can we pose recognition so that it lets us deal with new objects. • What do we want to predict or infer, and to what extent does that rely on categorization. • How do we transfer knowledge of one type of object to another. it is far from a comercial system... yet example A. Farhadi, I. Endres, D. Hoiem, D.A. Forsyth, “Describing Objects by their Attributes”, CVPR 2009 Hays and Efros, SIGGRAPH 2007