Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CloudCV:
Large-Scale Computer Vision
on the Cloud
http://cloudcv.org/
Dhruv Batra
Virginia Tech
Lead: Machine Learning & Perception Group @ VT
Harsh Agrawal
CloudCV Team:
Clint Solomon
Neelima Chavali
Yash Goyal
Prakriti Banik
Outline
•  Historical context about Computer Vision
•  CloudCV
–  A mix of
•  Research in my group
•  Deployment and demos at cloudcv.org
(C) Dhruv Batra
2
Computer Vision:
Making Computers See
Image from: http://kirkh.deviantart.com/art/BioMech-Eye-168367549
3
Image Understanding
Objects
Activities
Scenes
Locations
Text / writing
Faces
Gestures
Motions
Emotions…
“Color College Avenue”, Blacksburg, VA, May 2012
Slide credit: Devi Parikh
Computer Vision
“spend the summer linking a camera to a
computer and getting the computer to
describe what it saw”
- Marvin Minsky (1966), MIT
… 45 years later
(C) Dhruv Batra
Slide Credit: Devi Parikh
5
Computer Vision
OR
Vision is HARD!
(C) Dhruv Batra
Slide Credit: Devi Parikh
6
A Brief History of AI
(C) Dhruv Batra
7
A Brief History of AI
•  “We propose that a 2 month, 10 man study of artificial
intelligence be carried out during the summer of 1956 at
Dartmouth College in Hanover, New Hampshire.”
•  The study is to proceed on the basis of the conjecture that
every aspect of learning or any other feature of
intelligence can in principle be so precisely described that
a machine can be made to simulate it.
•  An attempt will be made to find how to make machines
use language, form abstractions and concepts, solve
kinds of problems now reserved for humans, and improve
themselves.
•  We think that a significant advance can be made in one or
more of these problems if a carefully selected group of
scientists work on it together for a summer.”
(C) Dhruv Batra
8
AI Predictions: Experts
(C) Dhruv Batra
Image Credit: http://intelligence.org/files/PredictingAI.pdf
9
AI Predictions: Non-Experts
(C) Dhruv Batra
Image Credit: http://intelligence.org/files/PredictingAI.pdf
10
AI Predictions: Failed
(C) Dhruv Batra
Image Credit: http://intelligence.org/files/PredictingAI.pdf
11
What humans see
(C) Dhruv Batra
Slide Credit: Larry Zitnick
12
What computers see
(C) Dhruv Batra
243
239
240
225
206
185
188
218
211
206
216
225
242
239
218
110
67
31
34
152
213
206
208
221
243
242
123
58
94
82
132
77
108
208
208
215
235
217
115
212
243
236
247
139
91
209
208
211
233
208
131
222
219
226
196
114
74
208
213
214
232
217
131
116
77
150
69
56
52
201
228
223
232
232
182
186
184
179
159
123
93
232
235
235
232
236
201
154
216
133
129
81
175
252
241
240
235
238
230
128
172
138
65
63
234
249
241
245
237
236
247
143
59
78
10
94
255
248
247
251
234
237
245
193
55
33
115
144
213
255
253
251
248
245
161
128
149
109
138
65
47
156
239
255
190
107
39
102
94
73
114
58
17
7
51
137
23
32
33
148
168
203
179
43
27
17
12
8
17
26
12
160
255
255
109
22
26
19
35
24
Slide Credit: Larry Zitnick
13
We’ve come a long way…
(C) Dhruv Batra
Slide Credit: Devi Parikh
14
We’ve come a long way…
(C) Dhruv Batra
Slide Credit: Devi Parikh
15
We’ve come a long way…
[Fischler and Elschlager, 1973]
(C) Dhruv Batra
Slide Credit: Devi Parikh
16
We’ve come a long way…
(C) Dhruv Batra
Slide Credit: Devi Parikh
17
Datasets and computer vision UIUC Cars (2004) S. Agarwal, A. Awan, D. Roth CMU/VASC Faces (1998)
H. Rowley, S. Baluja, T. Kanade
FERET Faces
(1998)
P. Phillips, H. Wechsler, J.
Huang, P. Raus
COIL Objects (1996) S. Nene, S. Nayar, H. Murase MNIST digits (1998-‐10) KTH human acCon (2004) Sign Language (2008) SegmentaCon (2001) Y LeCun & C. Cortes I. Leptev & B. Caputo P. Buehler, M. Everingham, A. Zisserman D. MarVn, C. Fowlkes, D. Tal, J. Malik. 3D Textures (2005)
S. Lazebnik, C. Schmid, J.
Ponce
(C) Dhruv Batra
CuRRET Textures (1999) CAVIAR Tracking (2005) Middlebury Stereo (2002) K. Dana B. Van Ginneken S. Nayar J. Koenderink R. Fisher, J. Santos-‐Victor J. Crowley D. Scharstein R. Szeliski Slide Credit: Li Fei-Fei
18
Backpack
(C) Dhruv Batra
Slide Credit: Li Fei-Fei
19
Flute
Strawberry
Traffic light
Backpack
Matchstick
Bathing
cap
Sea lion
Racket
(C) Dhruv Batra
Slide Credit: Li Fei-Fei
20
Large-scale recognition
(C) Dhruv Batra
Slide Credit: Li Fei-Fei
21
PASCAL VOC 2005-2012
Everingham, Van Gool, Williams, Winn and Zisserman.
The PASCAL Visual Object Classes (VOC) Challenge. IJCV 2010.
20 object classes
22,591 images
Classification: person, motorcycle
Detection
Segmentation
Person
Motorcycle
Action: riding bicycle
(C) Dhruv Batra
Slide Credit: Li Fei-Fei
22
ImageNet Large Scale Visual
Recognition Challenge (ILSVRC)
20 object classes
22,591 images
Classification:
1000 object classes
1.4M/50k/100k images
Detection:
200 object classes
400k/20k/40k images
Dalmatian
http://image-net.org/challenges/LSVRC/{2010,…,2014}
(C) Dhruv Batra
Slide Credit: Li Fei-Fei
23
Data Enabling Richer Models
•  [Krizhevsky et al. NIPS12, Donahue ICML14]
–  54 million parameters
–  Trained on 1.4M images in ImageNet
1k output
units
Input Image
Convolution Layer
+ Non-Linearity
(C) Dhruv Batra
Pooling Layer
Convolution Layer
+ Non-Linearity
Pooling Layer
Fully-Connected MLP
24
Data Enabling Richer Models
•  DistBelief [Dean et al. NIPS12]
(C) Dhruv Batra
25
Data Enabling Richer Models
•  [Le et al. ICML12]
–  2,000 machines / 32,000 cores for 1 week
•  DistBelief [Dean et al. NIPS12]
–  16 million images and 21k categories
–  1.7 Billion parameters
–  12,000 cores
(C) Dhruv Batra
26
Historical Perspective
•  Challenges in computer vision research:
future directions of research.
Shahriar Negahdaripour and Anil K. Jain.
NSF Workshop 1991
•  Panel stressed the need for:
–  more experimental validation of models on large datasets
–  sharing of images, algorithms, and models between
research groups
–  greater interaction between academia and industry
–  the need for complete computer vision systems that perform
real world tasks
(C) Dhruv Batra
27
Back to Present
•  Frontiers in Computer Vision.
Alan Yuille and Aude Oliva.
NSF Workshop Nov 2010
•  Noticeable changes since 1991:
–  Computers are much faster, have far greater memory, and
are much cheaper.
–  Computer vision researchers have continued to learn, adapt,
develop, and apply tools from mathematics, statistics,
computer science, and engineering.
–  New tools specific to vision (e.g., SIFT and HOG)
–  The use of benchmarked image databases and learning
algorithms has become common
(C) Dhruv Batra
28
Back to Present
•  Frontiers in Computer Vision.
Alan Yuille and Aude Oliva.
NSF Workshop Nov 2010
•  Remaining concerns:
–  increased the fragmentation of the field
–  there remains lack of scholarship and little progress made on
building on research done by others.
–  computer vision datasets do not compare yet to the
complexity of the natural world
–  academic research is seen as being neither realistic enough
to help develop practical real world systems nor insightful
enough to yield new theories
(C) Dhruv Batra
29
Challenges
•  Big data is an enabler and an isolator!
•  All researchers repeatedly solving the same problems
–  Build and maintain a cluster
•  Job scheduler (PBS, Torque)
•  Distributed storage (Hadoop FS)
–  Scale vision algorithms
•  Identify model/data parallelism
•  Design & implement multi-threaded vision primitives
–  Distributed computing
•  Implement mechanisms to avoid race conditions & dead-locks
•  Ensure data consistency, locking, good scheduling
(C) Dhruv Batra
Logistical
Computer
Vision
Distributed
Computing
30
CloudCV
Caffe
Back-‐End Frameworks Cloud / Cluster h]p://CloudCV.org CloudCV-‐API CloudCV-‐API Front-‐End Users / Developers / Mobile Apps (C) Dhruv Batra
CV & other researchers 31
CloudCV: Architecture
(C) Dhruv Batra
32
CloudCV: Big Picture
•  Goal: For developers
–  Reduced barrier to entry
–  Democratize Computer Vision
•  Goal: For researchers
–  Easy comparison to baselines
–  Access to state-of-art techniques “off-the-shelf”
•  Mini-steps
– 
– 
– 
– 
(C) Dhruv Batra
What we have today
A few algorithms
A few ways to reach CloudCV
Where we are headed
33
CloudCV
•  Demo 1
–  Support for ImageNet Challenge
•  Demo 2
–  Image Classification
•  Demo 3
–  Training a new classifier for your categories
•  Demo 4
–  Finding Important People in Images
•  Demo 5
–  GigaPixel Image Stitching
(C) Dhruv Batra
34
“Demo” 1
•  ImageNet Challenge (ILSVRC13)
–  Training: 1.4 million
–  Val: 50k
–  Test: 100k
•  Features
–  16 “industry standard”
•  DeCAF, GIST, HOG2x2, Dense/Sparse SIFT, LBP, Self-Similarity …
•  Webpage
–  http://cloudcv.org/objdetect/#features
•  Total: 400 GB, 19 months or 1.5 years of CPU-time
(C) Dhruv Batra
35
CloudCV
•  Demo 1
–  Support for ImageNet Challenge
•  Demo 2
–  Image Classification
•  Demo 3
–  Training a new classifier for your categories
•  Demo 4
–  Finding Important People in Images
•  Demo 5
–  GigaPixel Image Stitching
(C) Dhruv Batra
36
Demo 2
•  [Krizhevsky et al. NIPS12, Donahue ICML14]
– 
– 
– 
– 
Trained on 1.4M images in ImageNet
1000 categories
Available in Caffe framework from BVLC
http://cloudcv.org/classify/
1k output
units
Input Image
Convolution Layer
+ Non-Linearity
(C) Dhruv Batra
Pooling Layer
Convolution Layer
+ Non-Linearity
Pooling Layer
Fully-Connected MLP
37
Demo 2
•  Drop-box integration
–  Files can live on dropbox
–  http://cloudcv.org/decaf-server/
(C) Dhruv Batra
38
Demo 2
•  How about if you want to write code?
–  Python-API: https://github.com/batra-mlp-lab/pcloudcv
•  “python run.py myconfig.json –nologin“
–  Matlab-API: https://github.com/batra-mlp-lab/mcloudcv
Caffe
Back-‐End AbstracVon Cluster h]p://CloudCV.org API Front-‐End (C) Dhruv Batra
39
CloudCV
•  Demo 1
–  Support for ImageNet Challenge
•  Demo 2
–  Image Classification
•  Demo 3
–  Training a new classifier for your categories
•  Demo 4
–  Finding Important People in Images
•  Demo 5
–  GigaPixel Image Stitching
(C) Dhruv Batra
40
Demo 3
•  [Krizhevsky et al. NIPS12, Donahue ICML14]
–  Trained on 1.4M images in ImageNet
–  1000 categories
–  Available in Caffe framework from BVLC
How about adding a 1001th category?
Your company logo classifier?
In a few seconds, not weeks?
http://cloudcv.org/trainaclass/
1k output
units
Input Image
Convolution Layer
+ Non-Linearity
(C) Dhruv Batra
Pooling Layer
Convolution Layer
+ Non-Linearity
Pooling Layer
Fully-Connected MLP
41
CloudCV
•  Demo 1
–  Support for ImageNet Challenge
•  Demo 2
–  Image Classification
•  Demo 3
–  Training a new classifier for your categories
•  Demo 4
–  Finding Important People in Images
•  Demo 5
–  GigaPixel Image Stitching
(C) Dhruv Batra
42
Who is the most important
person in the photo?
(C) Dhruv Batra
43
Why is this useful?
•  Better image descriptions
•  Automatic photo cropping
Two people walking past a crowd
(C) Dhruv Batra
44
Why is this useful?
•  Better image descriptions
•  Automatic photo cropping
•  Sort consumer photos
(C) Dhruv Batra
45
How do we do this?
•  Collect a large dataset
–  VT Person Importance Dataset
–  Images scraped from Flickr
–  Annotations using Mechanical Turk
•  For each face measure:
– 
– 
– 
– 
– 
Distance from center
Scale
Sharpness
Face Pose
Face Occlusion
•  Train a relative importance predictor
Results
•  http://cloudcv.org/vip/
Method
Accuracy
Our Approach
78.91%
Center Baseline
68.46%
Scale Baseline
67.86%
Sharpness Baseline
71.03%
•  Technical Details:
–  VIP: Finding Important People in Images
–  Clint S. Mathialagan, Andrew C. Gallagher, Dhruv Batra
–  http://arxiv.org/abs/1502.05678
CloudCV
•  Demo 1
–  Support for ImageNet Challenge
•  Demo 2
–  Image Classification
•  Demo 3
–  Training a new classifier for your categories
•  Demo 4
–  Finding Important People in Images
•  Demo 5
–  GigaPixel Image Stitching
(C) Dhruv Batra
48
Parallelization
•  Some steps in vision embarrassingly parallel
–  Ideal for MapReduce
•  However
–  Most pipelines in Computer Vision are not!
–  Example
•  Image Stitching
(C) Dhruv Batra
49
GigaPixel Image Stitching
Image SVtching (C) Dhruv Batra
50
GigaPixel Image Stitching
(C) Dhruv Batra
51
GigaPixel Image Stitching
Feature
Extraction
Vertex Parallel
(C) Dhruv Batra
52
GigaPixel Image Stitching
Feature
Extraction
Image/Feature
Matching
Vertex Parallel
Edge Parallel
(C) Dhruv Batra
53
GigaPixel Image Stitching
Feature
Extraction
Image/Feature
Matching
Vertex Parallel
Edge Parallel
Global Camera
Refinement
Bundle Adjustment
min
P̂i ,X̂p
X
X
d(xip , P̂i X̂p )
image i point p
Non-linear optimization
over camera parameters Pi
and 3D locations of points Xp
(C) Dhruv Batra
54
GigaPixel Image Stitching
Feature
Extraction
Image/Feature
Matching
Vertex Parallel
Edge Parallel
(C) Dhruv Batra
Global Camera
Refinement
Seam
Blending
Edge Parallel
55
GigaPixel Image Stitching
Feature
Extraction
Image/Feature
Matching
Vertex Parallel
Edge Parallel
Global Camera
Refinement
Seam
Blending
Edge Parallel
Bundle Adjustment
min
P̂i ,X̂p
X
X
d(xip , P̂i X̂p )
image i point p
Levenberg–Marquard Updates
Graph-Parallel
(C) Dhruv Batra
56
GigaPixel Image Stitching
•  http://cloudcv.org/image-stitch/
Feature
Extraction
Image/Feature
Matching
Vertex Parallel
Edge Parallel
Global Camera
Refinement
Seam
Blending
Edge Parallel
Bundle Adjustment
min
P̂i ,X̂p
X
X
d(xip , P̂i X̂p )
image i point p
Levenberg–Marquard Updates
Graph-Parallel
(C) Dhruv Batra
57
CloudCV
•  Demo 1
–  Support for ImageNet Challenge
•  Demo 2
–  Image Classification
•  Demo 3
–  Training a new classifier for your categories
•  Demo 4
–  Finding Important People in Images
•  Demo 5
–  GigaPixel Image Stitching
(C) Dhruv Batra
58
Where is CloudCV headed?
•  Back-end
–  Open model for contributing code
•  Dynamic Database
–  If “familiar” image, we can get you results without computing
–  If new image, we’ll cache the results for the next person
•  Lots of challenges unsolved
–  Bandwidth, optimal compression
–  Computation on front end vs back end
–  Compressions on front end that bound performance?
•  Coresets, summarization, etc
(C) Dhruv Batra
59
Where is CloudCV headed?
•  Long way to go
•  But we think this is exciting!
•  Think about the first APIs for
–  Designing webpages
–  User authentication, Credit-card processing
–  Search, Maps, Twitter feeds, …
•  We want to do that for the scientific research and
development community.
(C) Dhruv Batra
60
Acknowledgements
•  Collaborator and Mentor
–  Carlos Guestrin (UW / Graphlab / Dato)
•  Sponsors
Harsh Agrawal
Clint Solomon
Neelima Chavali
Yash Goyal
Prakriti Banik
CloudCV Team:
(C) Dhruv Batra
61
Thanks!
(C) Dhruv Batra
62