Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Populating the Galaxy Zoo
Real-time Image Classification with SQL Server R Services
David M Smith @revodavid
R Community Lead
Microsoft Algorithms and Data Science
THANKS to all Sponsors!
EVENT SPONSORS
EXPO SPONSORS
EXPO LIGHT SPONSORS
Meet me at the Community Zone
After this session, you can speak with me in the
Community Zone
WE MIGHT
• Discuss additional questions
• Review parts of my session in more detail
• Network
• Take selfies… ☺
Session goals
The Origin and Eventual Fate of the Universe
Computer Vision and Deep Neural Networks
Deploying a Convolutional Neural Network Using Microsoft R and SQL Server
Image Credit: NASA / Hubble
Image Credit: NASA / Hubble
Image Credit: NASA / Hubble
Image Credit: NASA / Hubble
Image Credit: NASA / Hubble
Image Credit: NASA / Hubble
http://sploid.gizmodo.com/the-incredibly-huge-size-of-andromeda-1493036499
Image Credit: NASA / Hubble
Image Credit: NASA / Hubble
Whirlpool Galaxy (M51) and
companion galaxy
“Grand design” spiral galaxy M81
Barred spiral galaxy NGC 1300
Elliptical galaxy IC 2006
Centaurus A, from European Southern
Observatory: http://www.eso.org
NGC 3125
Forming
Image: http://www.nasa.gov/image-feature/goddard/2016/hubble-views-a-galaxy-fit-to-burst
Ancient
Spiral galaxies
Elliptical galaxies
M10
M50
Collisions and
other events
Forming
NASA, ESA, K. Kuntz (JHU), F. Bresolin (University of Hawaii), J. Trauger (Jet Propulsion Lab), J. Mould (NOAO), Y.-H. Chu (University of
Illinois, Urbana), and STScI
ESO 3250G004
Ancient
The “Hubble tuning fork”
Source: Wikipedia
2 trillion
200 billion
100 Billion
Hubble ultra deep
Hubble deep field
Galaxies in observable universe
http://www.nasa.gov/feature/goddard/2016/hubble-reveals-observable-universe-contains-10-times-more-galaxies-than-previously-thought
Professional
astronomers
The Astronomer by Johannes Vermeer (Wikipedia)
Professional
astronomers
The Astronomer by Johannes Vermeer (Wikipedia)
Citizen data
science
Professional
astronomers
Citizen data
science
Thousands of
images
250K
images
The Astronomer by Johannes Vermeer (Wikipedia)
Professional
astronomers
Citizen data
science
Computer
vision
Thousands of
images
250K
images
Millions of
images
The Astronomer by Johannes Vermeer (Wikipedia)
Demonstration
Data
Hidden
layer(s)
Outcome
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations
HonglakLee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng
A two-dimensional
array of pixels
Neural
network
Spiral
Elliptical
rotation
scaling
translation
Neural
network
Match pieces of the image
Convolution
Matches specific shape
(kernel) across entire image
Automatic feature
generation
Layers can be repeated several (or many) times.
Spiral
Convolution
Convolution
Elliptical
Pooling
Pooling
R Usage Growth
Rexer Data Miner Survey, 2007-2015
76% of analytic
professionals
report using R
36% select R as
their primary tool
Language Popularity
IEEE Spectrum Top Programming Languages, 2016
ConnectR
Microsoft R
Open
RevoScaleR
MicrosoftML
DistributedR
Available in: Microsoft R Server 9, SQL Server 2016/2017
library
library
Load the required R
packages
library(RevoScaleR)
library(MicrosoftML)
Load the required R
packages
Run the neural network
multiClass
library(RevoScaleR)
library(MicrosoftML)
model <- rxNeuralNet(
formula, data = galaxy_data,
netDefinition = netDefinition,
type = "multiClass"
gpu
32
)
Load the required R
packages
Run the neural network
Use GPU acceleration
library(RevoScaleR)
library(MicrosoftML)
model <- rxNeuralNet(
formula, data = galaxy_data,
netDefinition = netDefinition,
type = "multiClass"
acceleration = "gpu",
miniBatchSize = 32
initWtsDiameter = 0.1,
50)
Load the required R
packages
Run the neural network
Use GPU acceleration
Specify hyperparameters
library(RevoScaleR)
library(MicrosoftML)
model <- rxNeuralNet(
formula, data = galaxy_data,
netDefinition = netDefinition,
type = "multiClass"
acceleration = "gpu",
miniBatchSize = 32
initWtsDiameter = 0.1,
numIterations = 50)
What about the network
definition?
library(RevoScaleR)
library(MicrosoftML)
model <- rxNeuralNet(
formula, data = galaxy_data,
netDefinition = netDefinition,
type = "multiClass"
acceleration = "gpu",
miniBatchSize = 32
initWtsDiameter = 0.1,
numIterations = 50)
NET#
https://docs.microsoft.com/en-us/azure/machinelearning/machine-learning-azure-ml-netsharpreference-guide
input pixels [3, 50, 50];
hidden conv1 [64, 24, 24] rlinear from pixels convolve {
KernelShape = [3, 5, 5];
Stride
= [1, 2, 2];
MapCount
= 64;
}
NET#
hidden rnorm1 [64, 11, 11] from conv1 response norm {
KernelShape = [1,
4, 4];
Stride
= [1,
2, 2];
}
hidden pool1 [64, 9, 9] from rnorm1 max pool {
KernelShape = [1, 3, 3];
}
hidden hid1 [256] rlinear from pool1 all;
hidden hid2 [256] rlinear from hid1 all;
output Class [13] softmax from hid2 all;
https://docs.microsoft.com/en-us/azure/machinelearning/machine-learning-azure-ml-netsharpreference-guide
input
[3, 50, 50]
[3,
Input images
5,
rlinear
5]
convolve
64 maps
64
response norm
[1,
4,
4]
max pool
[1, 3, 3]
all
all
output
[13] softmax
normalize
max pooling
fully connected
output
Azure storage
Storage blob
Images
SQL Server
Train model
Data Science
Virtual machine
Skyserver
database
Web
SQL2016 R
Services
Azure N Series
GPU VM
Azure
Train neural network using GPU on Azure
GPU = Graphical processing unit
CPU: 30 hrs GPU: 3 hrs
Call to remote SQL Server instance with R inside
How is it Integrated?
Extensibility
Example Solutions
R Integration
?
R
Data Scientist
Interacts directly with data
Creates models
and experiments
T-SQL Interface
open source/Microsoft R
Manages data and
analytics together
Relational Data
Fraud detection
Sales forecasting
Warehouse efficiency
Predictive
maintenance
Benefits
Analytic Library
010010
100100
010101
Data Analyst/DBA
•
•
•
•
• T-SQL calls a Stored Procedure
• Script is run in SQL through
extensibility model
• Result sets sent through Web API
to database or applications
• Faster deployment of ML models
• Less data movement, faster
insights
• Work with large datasets: mitigate
R memory and scalability
limitations
Demonstration
Publish service with mrsdeploy
Easy Consumption
Data Scientist
Easy Deployment
Microsoft R Client
(mrsdeploy package)
Data Scientist
Microsoft R Client
Services /
Sessions
publishService
(mrsdeploy package)
Microsoft R Server
configured for
operationalizing R analytics
Easy Setup
▪
▪
▪
▪
In-cloud or on-prem
Adding nodes to scale
High availability & load balancing
Remote execution server
Developer
Easy Integration
100K * 3
8
Training images, augmented with rotation
Layers in deep network
176K
Weights to compute in network
2.5B
Weight updates per second
1.8 hours
Computing time on Azure N series GPU
88%
Overall accuracy - training data
55%
Overall accuracy - test data
The technique works, but has scope for improvement!
55%
Overall accuracy on test data
• Convolutional neural nets
can predict galaxy class
• You can use R Server to train
and deploy a model
• Use Azure GPU machines for
faster training
• Deploy to SQL server
Please evaluate all sessions!
QR / LINK on posters and in program
Easy deployment
Build the model first
Deploy as a web service instantly
Johannes Vermeer, The Astronomer
R Open
Microsoft R Server
RTVS
R Open
•Open source R
•Compatible with
CRAN
•MKL for fast linear
algebra
ScaleR
•Parallel computing
•Large scale analytics
DeployR
ConnectR
•Connectivity to databases
and Hadoop
DistributedR
•Distributed computing
•Cross-platform portability
Scalable computing, storage and services
SQL Server 2016 Enterprise Edition
SQL Server R Services
Microsoft R Open
Open Source
R Interpreter
Integration Facilities:
•
SQL Server
Query
Processor
•
•
•
Component Integration
• Launchers
• Parameter Passing
• Results Return
• Console Output
Return
Parallel Data Exchange
(RTM)
Stored Procedures
Package Administration
•
•
•
100% Open Source R
Fully CRAN Compatible
Accelerated Math
Algorithm Library
Fast, Parallel, Storage Efficient Algorithms
•
•
•
•
•
Data Prep
Descriptive Stats
Sampling
Statistical Tests
Predictive Models
•
•
•
•
•
Variable Selection
Clustering
Classification
Custom APIs for R + CRAN
Parallel Scoring