Download Slide - Virginia Tech

CloudCV: Large-Scale Computer Vision on the Cloud http://cloudcv.org/ Dhruv Batra Virginia Tech Lead: Machine Learning & Perception Group @ VT Harsh Agrawal CloudCV Team: Clint Solomon Neelima Chavali Yash Goyal Prakriti Banik Outline •  Historical context about Computer Vision •  CloudCV –  A mix of •  Research in my group •  Deployment and demos at cloudcv.org (C) Dhruv Batra 2 Computer Vision: Making Computers See Image from: http://kirkh.deviantart.com/art/BioMech-Eye-168367549 3 Image Understanding Objects Activities Scenes Locations Text / writing Faces Gestures Motions Emotions… “Color College Avenue”, Blacksburg, VA, May 2012 Slide credit: Devi Parikh Computer Vision “spend the summer linking a camera to a computer and getting the computer to describe what it saw” - Marvin Minsky (1966), MIT … 45 years later (C) Dhruv Batra Slide Credit: Devi Parikh 5 Computer Vision OR Vision is HARD! (C) Dhruv Batra Slide Credit: Devi Parikh 6 A Brief History of AI (C) Dhruv Batra 7 A Brief History of AI •  “We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire.” •  The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. •  An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. •  We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.” (C) Dhruv Batra 8 AI Predictions: Experts (C) Dhruv Batra Image Credit: http://intelligence.org/files/PredictingAI.pdf 9 AI Predictions: Non-Experts (C) Dhruv Batra Image Credit: http://intelligence.org/files/PredictingAI.pdf 10 AI Predictions: Failed (C) Dhruv Batra Image Credit: http://intelligence.org/files/PredictingAI.pdf 11 What humans see (C) Dhruv Batra Slide Credit: Larry Zitnick 12 What computers see (C) Dhruv Batra 243 239 240 225 206 185 188 218 211 206 216 225 242 239 218 110 67 31 34 152 213 206 208 221 243 242 123 58 94 82 132 77 108 208 208 215 235 217 115 212 243 236 247 139 91 209 208 211 233 208 131 222 219 226 196 114 74 208 213 214 232 217 131 116 77 150 69 56 52 201 228 223 232 232 182 186 184 179 159 123 93 232 235 235 232 236 201 154 216 133 129 81 175 252 241 240 235 238 230 128 172 138 65 63 234 249 241 245 237 236 247 143 59 78 10 94 255 248 247 251 234 237 245 193 55 33 115 144 213 255 253 251 248 245 161 128 149 109 138 65 47 156 239 255 190 107 39 102 94 73 114 58 17 7 51 137 23 32 33 148 168 203 179 43 27 17 12 8 17 26 12 160 255 255 109 22 26 19 35 24 Slide Credit: Larry Zitnick 13 We’ve come a long way… (C) Dhruv Batra Slide Credit: Devi Parikh 14 We’ve come a long way… (C) Dhruv Batra Slide Credit: Devi Parikh 15 We’ve come a long way… [Fischler and Elschlager, 1973] (C) Dhruv Batra Slide Credit: Devi Parikh 16 We’ve come a long way… (C) Dhruv Batra Slide Credit: Devi Parikh 17 Datasets and computer vision UIUC Cars (2004) S. Agarwal, A. Awan, D. Roth CMU/VASC Faces (1998) H. Rowley, S. Baluja, T. Kanade FERET Faces (1998) P. Phillips, H. Wechsler, J. Huang, P. Raus COIL Objects (1996) S. Nene, S. Nayar, H. Murase MNIST digits (1998-‐10) KTH human acCon (2004) Sign Language (2008) SegmentaCon (2001) Y LeCun & C. Cortes I. Leptev & B. Caputo P. Buehler, M. Everingham, A. Zisserman D. MarVn, C. Fowlkes, D. Tal, J. Malik. 3D Textures (2005) S. Lazebnik, C. Schmid, J. Ponce (C) Dhruv Batra CuRRET Textures (1999) CAVIAR Tracking (2005) Middlebury Stereo (2002) K. Dana B. Van Ginneken S. Nayar J. Koenderink R. Fisher, J. Santos-‐Victor J. Crowley D. Scharstein R. Szeliski Slide Credit: Li Fei-Fei 18 Backpack (C) Dhruv Batra Slide Credit: Li Fei-Fei 19 Flute Strawberry Traffic light Backpack Matchstick Bathing cap Sea lion Racket (C) Dhruv Batra Slide Credit: Li Fei-Fei 20 Large-scale recognition (C) Dhruv Batra Slide Credit: Li Fei-Fei 21 PASCAL VOC 2005-2012 Everingham, Van Gool, Williams, Winn and Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV 2010. 20 object classes 22,591 images Classification: person, motorcycle Detection Segmentation Person Motorcycle Action: riding bicycle (C) Dhruv Batra Slide Credit: Li Fei-Fei 22 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 20 object classes 22,591 images Classification: 1000 object classes 1.4M/50k/100k images Detection: 200 object classes 400k/20k/40k images Dalmatian http://image-net.org/challenges/LSVRC/{2010,…,2014} (C) Dhruv Batra Slide Credit: Li Fei-Fei 23 Data Enabling Richer Models •  [Krizhevsky et al. NIPS12, Donahue ICML14] –  54 million parameters –  Trained on 1.4M images in ImageNet 1k output units Input Image Convolution Layer + Non-Linearity (C) Dhruv Batra Pooling Layer Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP 24 Data Enabling Richer Models •  DistBelief [Dean et al. NIPS12] (C) Dhruv Batra 25 Data Enabling Richer Models •  [Le et al. ICML12] –  2,000 machines / 32,000 cores for 1 week •  DistBelief [Dean et al. NIPS12] –  16 million images and 21k categories –  1.7 Billion parameters –  12,000 cores (C) Dhruv Batra 26 Historical Perspective •  Challenges in computer vision research: future directions of research. Shahriar Negahdaripour and Anil K. Jain. NSF Workshop 1991 •  Panel stressed the need for: –  more experimental validation of models on large datasets –  sharing of images, algorithms, and models between research groups –  greater interaction between academia and industry –  the need for complete computer vision systems that perform real world tasks (C) Dhruv Batra 27 Back to Present •  Frontiers in Computer Vision. Alan Yuille and Aude Oliva. NSF Workshop Nov 2010 •  Noticeable changes since 1991: –  Computers are much faster, have far greater memory, and are much cheaper. –  Computer vision researchers have continued to learn, adapt, develop, and apply tools from mathematics, statistics, computer science, and engineering. –  New tools specific to vision (e.g., SIFT and HOG) –  The use of benchmarked image databases and learning algorithms has become common (C) Dhruv Batra 28 Back to Present •  Frontiers in Computer Vision. Alan Yuille and Aude Oliva. NSF Workshop Nov 2010 •  Remaining concerns: –  increased the fragmentation of the field –  there remains lack of scholarship and little progress made on building on research done by others. –  computer vision datasets do not compare yet to the complexity of the natural world –  academic research is seen as being neither realistic enough to help develop practical real world systems nor insightful enough to yield new theories (C) Dhruv Batra 29 Challenges •  Big data is an enabler and an isolator! •  All researchers repeatedly solving the same problems –  Build and maintain a cluster •  Job scheduler (PBS, Torque) •  Distributed storage (Hadoop FS) –  Scale vision algorithms •  Identify model/data parallelism •  Design & implement multi-threaded vision primitives –  Distributed computing •  Implement mechanisms to avoid race conditions & dead-locks •  Ensure data consistency, locking, good scheduling (C) Dhruv Batra Logistical Computer Vision Distributed Computing 30 CloudCV Caffe Back-‐End Frameworks Cloud / Cluster h]p://CloudCV.org CloudCV-‐API CloudCV-‐API Front-‐End Users / Developers / Mobile Apps (C) Dhruv Batra CV & other researchers 31 CloudCV: Architecture (C) Dhruv Batra 32 CloudCV: Big Picture •  Goal: For developers –  Reduced barrier to entry –  Democratize Computer Vision •  Goal: For researchers –  Easy comparison to baselines –  Access to state-of-art techniques “off-the-shelf” •  Mini-steps –  –  –  –  (C) Dhruv Batra What we have today A few algorithms A few ways to reach CloudCV Where we are headed 33 CloudCV •  Demo 1 –  Support for ImageNet Challenge •  Demo 2 –  Image Classification •  Demo 3 –  Training a new classifier for your categories •  Demo 4 –  Finding Important People in Images •  Demo 5 –  GigaPixel Image Stitching (C) Dhruv Batra 34 “Demo” 1 •  ImageNet Challenge (ILSVRC13) –  Training: 1.4 million –  Val: 50k –  Test: 100k •  Features –  16 “industry standard” •  DeCAF, GIST, HOG2x2, Dense/Sparse SIFT, LBP, Self-Similarity … •  Webpage –  http://cloudcv.org/objdetect/#features •  Total: 400 GB, 19 months or 1.5 years of CPU-time (C) Dhruv Batra 35 CloudCV •  Demo 1 –  Support for ImageNet Challenge •  Demo 2 –  Image Classification •  Demo 3 –  Training a new classifier for your categories •  Demo 4 –  Finding Important People in Images •  Demo 5 –  GigaPixel Image Stitching (C) Dhruv Batra 36 Demo 2 •  [Krizhevsky et al. NIPS12, Donahue ICML14] –  –  –  –  Trained on 1.4M images in ImageNet 1000 categories Available in Caffe framework from BVLC http://cloudcv.org/classify/ 1k output units Input Image Convolution Layer + Non-Linearity (C) Dhruv Batra Pooling Layer Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP 37 Demo 2 •  Drop-box integration –  Files can live on dropbox –  http://cloudcv.org/decaf-server/ (C) Dhruv Batra 38 Demo 2 •  How about if you want to write code? –  Python-API: https://github.com/batra-mlp-lab/pcloudcv •  “python run.py myconfig.json –nologin“ –  Matlab-API: https://github.com/batra-mlp-lab/mcloudcv Caffe Back-‐End AbstracVon Cluster h]p://CloudCV.org API Front-‐End (C) Dhruv Batra 39 CloudCV •  Demo 1 –  Support for ImageNet Challenge •  Demo 2 –  Image Classification •  Demo 3 –  Training a new classifier for your categories •  Demo 4 –  Finding Important People in Images •  Demo 5 –  GigaPixel Image Stitching (C) Dhruv Batra 40 Demo 3 •  [Krizhevsky et al. NIPS12, Donahue ICML14] –  Trained on 1.4M images in ImageNet –  1000 categories –  Available in Caffe framework from BVLC How about adding a 1001th category? Your company logo classifier? In a few seconds, not weeks? http://cloudcv.org/trainaclass/ 1k output units Input Image Convolution Layer + Non-Linearity (C) Dhruv Batra Pooling Layer Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP 41 CloudCV •  Demo 1 –  Support for ImageNet Challenge •  Demo 2 –  Image Classification •  Demo 3 –  Training a new classifier for your categories •  Demo 4 –  Finding Important People in Images •  Demo 5 –  GigaPixel Image Stitching (C) Dhruv Batra 42 Who is the most important person in the photo? (C) Dhruv Batra 43 Why is this useful? •  Better image descriptions •  Automatic photo cropping Two people walking past a crowd (C) Dhruv Batra 44 Why is this useful? •  Better image descriptions •  Automatic photo cropping •  Sort consumer photos (C) Dhruv Batra 45 How do we do this? •  Collect a large dataset –  VT Person Importance Dataset –  Images scraped from Flickr –  Annotations using Mechanical Turk •  For each face measure: –  –  –  –  –  Distance from center Scale Sharpness Face Pose Face Occlusion •  Train a relative importance predictor Results •  http://cloudcv.org/vip/ Method Accuracy Our Approach 78.91% Center Baseline 68.46% Scale Baseline 67.86% Sharpness Baseline 71.03% •  Technical Details: –  VIP: Finding Important People in Images –  Clint S. Mathialagan, Andrew C. Gallagher, Dhruv Batra –  http://arxiv.org/abs/1502.05678 CloudCV •  Demo 1 –  Support for ImageNet Challenge •  Demo 2 –  Image Classification •  Demo 3 –  Training a new classifier for your categories •  Demo 4 –  Finding Important People in Images •  Demo 5 –  GigaPixel Image Stitching (C) Dhruv Batra 48 Parallelization •  Some steps in vision embarrassingly parallel –  Ideal for MapReduce •  However –  Most pipelines in Computer Vision are not! –  Example •  Image Stitching (C) Dhruv Batra 49 GigaPixel Image Stitching Image SVtching (C) Dhruv Batra 50 GigaPixel Image Stitching (C) Dhruv Batra 51 GigaPixel Image Stitching Feature Extraction Vertex Parallel (C) Dhruv Batra 52 GigaPixel Image Stitching Feature Extraction Image/Feature Matching Vertex Parallel Edge Parallel (C) Dhruv Batra 53 GigaPixel Image Stitching Feature Extraction Image/Feature Matching Vertex Parallel Edge Parallel Global Camera Refinement Bundle Adjustment min P̂i ,X̂p X X d(xip , P̂i X̂p ) image i point p Non-linear optimization over camera parameters Pi and 3D locations of points Xp (C) Dhruv Batra 54 GigaPixel Image Stitching Feature Extraction Image/Feature Matching Vertex Parallel Edge Parallel (C) Dhruv Batra Global Camera Refinement Seam Blending Edge Parallel 55 GigaPixel Image Stitching Feature Extraction Image/Feature Matching Vertex Parallel Edge Parallel Global Camera Refinement Seam Blending Edge Parallel Bundle Adjustment min P̂i ,X̂p X X d(xip , P̂i X̂p ) image i point p Levenberg–Marquard Updates Graph-Parallel (C) Dhruv Batra 56 GigaPixel Image Stitching •  http://cloudcv.org/image-stitch/ Feature Extraction Image/Feature Matching Vertex Parallel Edge Parallel Global Camera Refinement Seam Blending Edge Parallel Bundle Adjustment min P̂i ,X̂p X X d(xip , P̂i X̂p ) image i point p Levenberg–Marquard Updates Graph-Parallel (C) Dhruv Batra 57 CloudCV •  Demo 1 –  Support for ImageNet Challenge •  Demo 2 –  Image Classification •  Demo 3 –  Training a new classifier for your categories •  Demo 4 –  Finding Important People in Images •  Demo 5 –  GigaPixel Image Stitching (C) Dhruv Batra 58 Where is CloudCV headed? •  Back-end –  Open model for contributing code •  Dynamic Database –  If “familiar” image, we can get you results without computing –  If new image, we’ll cache the results for the next person •  Lots of challenges unsolved –  Bandwidth, optimal compression –  Computation on front end vs back end –  Compressions on front end that bound performance? •  Coresets, summarization, etc (C) Dhruv Batra 59 Where is CloudCV headed? •  Long way to go •  But we think this is exciting! •  Think about the first APIs for –  Designing webpages –  User authentication, Credit-card processing –  Search, Maps, Twitter feeds, … •  We want to do that for the scientific research and development community. (C) Dhruv Batra 60 Acknowledgements •  Collaborator and Mentor –  Carlos Guestrin (UW / Graphlab / Dato) •  Sponsors Harsh Agrawal Clint Solomon Neelima Chavali Yash Goyal Prakriti Banik CloudCV Team: (C) Dhruv Batra 61 Thanks! (C) Dhruv Batra 62

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slide - Virginia Tech