Download Machine Learning MLlib and Tensorflow

Document related concepts

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Forecasting wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Machine Learning
MLlib and Tensorflow
Amir H. Payberah
[email protected]
KTH Royal Institute of Technology
2016/10/10
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
1 / 56
Data
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
2 / 56
Data
Amir H. Payberah (KTH)
Actionable Knowledge
Tensorflow and MLlib
2016/10/10
2 / 56
Data
Actionable Knowledge
That is roughly the problem that Machine Learning addresses!
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
2 / 56
Data and Knowledge
Data
I
Knowledge
Is this email spam or no spam?
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
3 / 56
Data and Knowledge
Data
I
Knowledge
Is this email spam or no spam?
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
3 / 56
Data and Knowledge
Data
Knowledge
I
Is this email spam or no spam?
I
Is there a face in this picture?
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
3 / 56
Data and Knowledge
Data
Knowledge
I
Is this email spam or no spam?
I
Is there a face in this picture?
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
3 / 56
Data and Knowledge
Data
Knowledge
I
Is this email spam or no spam?
I
Is there a face in this picture?
I
Should I lend money to this customer given his
spending behavior?
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
3 / 56
Data and Knowledge
Data
Knowledge
I
Is this email spam or no spam?
I
Is there a face in this picture?
I
Should I lend money to this customer given his
spending behaviour?
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
3 / 56
Data and Knowledge
I
Knowledge is not concrete
I
Spam is an abstraction
I
Face is an abstraction
I
Who to lend to is an abstraction
You do not find spam, faces, and financial advice in datasets,
you just find bits!
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
4 / 56
Knowledge Discovery from Data (KDD)
I
Preprocessing
I
Data mining
I
Result validation
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
5 / 56
KDD - Preprocessing
I
Data cleaning
I
Data integration
I
Data reduction, e.g., sampling
I
Data transformation, e.g., normalization
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
6 / 56
KDD - Mining Functionalities
I
Classification and regression (supervised learning)
I
Clustering (unsupervised learning)
I
Mining the frequent patterns
I
Outlier detection
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
7 / 56
KDD - Result Validation
I
Needs to evaluate the performance of the model on some criteria.
I
Depends on the application and its requirements.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
8 / 56
Data Mining Functionalities
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
9 / 56
Data Mining Functionalities
I
Classification and regression (supervised learning)
I
Clustering (unsupervised learning)
I
Mining the frequent patterns
I
Outlier detection
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
10 / 56
Classification and Regression
(Supervised Learning)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
11 / 56
Supervised Learning (1/3)
I
Right answers are given.
•
Training data (input data) is labeled, e.g., spam/not-spam or a
stock price at a time.
I
A model is prepared through a training process.
I
The training process continues until the model achieves a desired
level of accuracy on the training data.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
12 / 56
Supervised Learning (2/3)
I
Face recognition
Training data
Testing data
[ORL dataset, AT&T Laboratories, Cambridge UK]
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
13 / 56
Supervised Learning (3/3)
I
Set of N training examples: (x1 , y1 ), · · · , (xn , yn ).
I
xi = hxi1 , xi2 , · · · , xim i is the feature vector of the ith example.
I
yi is the ith feature vector label.
I
A learning algorithm seeks a function yi = f(Xi ).
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
14 / 56
Classification vs. Regression
I
Classification: the output variable takes class labels.
I
Regression: the output variable takes continuous values.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
15 / 56
Classification/Regression Models
I
Linear models
I
Decision trees
I
Naive Bayes models
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
16 / 56
Linear Models (1/2)
I
Training dataset: (x1 , y1 ), · · · , (xn , yn ).
I
xi = hxi1 , xi2 , · · · , xim i
I
Model the target as a function of a linear predictor applied to the
input variables: yi = g(wT xi ).
•
E.g., yi = w1 xi1 + w2 xi2 + · · · + wm xim
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
17 / 56
Linear Models (1/2)
I
Training dataset: (x1 , y1 ), · · · , (xn , yn ).
I
xi = hxi1 , xi2 , · · · , xim i
I
Model the target as a function of a linear predictor applied to the
input variables: yi = g(wT xi ).
•
E.g., yi = w1 xi1 + w2 xi2 + · · · + wm xim
Pn
I
Loss function: f (w) :=
i=1 L(g (w
I
An optimization problem minm f(w)
T x ), y )
i
i
w∈R
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
17 / 56
Linear Models (2/2)
I
g(wT xi ) = w1 xi1 + w2 xi2 + · · · + wm xim
I
Loss function: minimizing squared different between predicted value
and actual value: L(g (wT xi ), yi ) := 21 (wT xi − yi )2
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
18 / 56
Linear Models (2/2)
I
g(wT xi ) = w1 xi1 + w2 xi2 + · · · + wm xim
I
Loss function: minimizing squared different between predicted value
and actual value: L(g (wT xi ), yi ) := 21 (wT xi − yi )2
I
Gradient descent
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
18 / 56
Decision Tree (1/2)
I
A greedy algorithm.
I
It performs a recursive binary partitioning of the feature space.
I
Decision tree construction algorithm:
•
•
Find the best split condition (quantified based on the impurity
measure).
Stops when no improvement possible.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
19 / 56
Decision Tree (2/2)
I
Random forest
I
Train a set of decision trees separately.
I
The training can be done in parallel.
I
The algorithm injects randomness into the training process, so that
each decision tree is a bit different.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
20 / 56
Naive Bayes (1/3)
I
Using the probability theory to classify things.
I
Find p(y|x).
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
21 / 56
Naive Bayes (1/3)
I
Using the probability theory to classify things.
I
Find p(y|x).
I
Replace p(y|x) with
Amir H. Payberah (KTH)
p(x|y)p(y)
p(x)
Tensorflow and MLlib
2016/10/10
21 / 56
Naive Bayes (2/3)
p(x|y)p(y)
p(x)
I
Bayes theorem: p(y|x) =
I
p(y|x): probability of instance x being in class y.
I
p(x|y): probability of generating instance x given class y.
I
p(y): probability of occurrence of class y
I
p(x): probability of instance x occurring.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
22 / 56
Naive Bayes (3/3)
Is officer Drew male or female?
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
23 / 56
Naive Bayes (3/3)
p(x|y)p(y)
p(x)
I
p(y|x) =
I
p(male|drew) = ?
I
p(female|drew) = ?
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
23 / 56
Naive Bayes (3/3)
p(x|y)p(y)
p(x)
I
p(y|x) =
I
p(male|drew) =
I
p(female|drew) =
Amir H. Payberah (KTH)
p(drew|male)p(male)
p(drew)
=
1
× 83
3
3
8
p(drew|female)p(female)
p(drew)
Tensorflow and MLlib
= 0.33
=
2
× 58
5
3
8
= 0.66
2016/10/10
23 / 56
Naive Bayes (3/3)
Officer Drew is female.
p(x|y)p(y)
p(x)
I
p(y|x) =
I
p(male|drew) =
I
p(female|drew) =
Amir H. Payberah (KTH)
p(drew|male)p(male)
p(drew)
=
1
× 83
3
3
8
p(drew|female)p(female)
p(drew)
Tensorflow and MLlib
= 0.33
=
2
× 58
5
3
8
= 0.66
2016/10/10
23 / 56
Clustering
(Unsupervised Learning)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
24 / 56
Clustering (1/4)
I
Clustering is a technique for finding similarity groups in data, called
clusters.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
25 / 56
Clustering (1/4)
I
Clustering is a technique for finding similarity groups in data, called
clusters.
I
It groups data instances that are similar to each other in one cluster, and data instances that are very different from each other into
different clusters.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
25 / 56
Clustering (1/4)
I
Clustering is a technique for finding similarity groups in data, called
clusters.
I
It groups data instances that are similar to each other in one cluster, and data instances that are very different from each other into
different clusters.
I
Clustering is often called an unsupervised learning task as no class
values denoting an a priori grouping of the data instances are given.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
25 / 56
Clustering (2/4)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
26 / 56
Clustering (2/4)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
26 / 56
Clustering (2/4)
I
k-means clustering is a popular method for clustering.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
26 / 56
Clustering (3/4)
I
K: number of clusters (given)
•
One mean per cluster.
I
Initialize means: by picking k samples at random.
I
Iterate:
•
•
Assign each point to nearest mean.
Move mean to center of its cluster.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
27 / 56
Clustering (4/4)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
28 / 56
Clustering (4/4)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
28 / 56
Clustering (4/4)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
28 / 56
Clustering (4/4)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
28 / 56
MLlib
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
29 / 56
MLlib Data Types - Local Vector
I
Stored on a single machine
I
Dense and sparse
•
•
Dense (1.0, 0.0, 3.0): [1.0, 0.0, 3.0]
Sparse (1.0, 0.0, 3.0): (3, [0, 2], [1.0, 3.0])
val dv: Vector = Vectors.dense(1.0, 0.0, 3.0)
val sv1: Vector = Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0))
val sv2: Vector = Vectors.sparse(3, Seq((0, 1.0), (2, 3.0)))
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
30 / 56
Data Types - Labeled Point
I
A local vector (dense or sparse) associated with a label.
I
label: label for this data point.
I
features: list of features for this data point.
case class LabeledPoint(label: Double, features: Vector)
val pos = LabeledPoint(1.0, Vectors.dense(1.0, 0.0, 3.0))
val neg = LabeledPoint(0.0, Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)))
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
31 / 56
Supervised Learning in Spark
I
Linear models
I
Decision trees
I
Naive Bayes models
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
32 / 56
Linear Models - Regression
val data: RDD[LabeledPoint] = ...
val splits = labelData.randomSplit(Array(0.7, 0.3))
val (trainigData, testData) = (splits(0), splits(1))
val numIterations = 100
val stepSize = 0.00000001
val model = LinearRegressionWithSGD
.train(trainigData, numIterations, stepSize)
val valuesAndPreds = testData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
33 / 56
Linear Models - Classification (Logistic Regression)
val data: RDD[LabeledPoint] = ...
val splits = labelData.randomSplit(Array(0.7, 0.3))
val (trainigData, testData) = (splits(0), splits(1))
val model = new LogisticRegressionWithLBFGS()
.setNumClasses(10)
.run(trainingData)
val predictionAndLabels = testData.map { point =>
val prediction = model.predict(point.features)
(prediction, point.label)
}
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
34 / 56
Decision Tree - Regression
val data: RDD[LabeledPoint] = ...
val splits = data.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))
val
val
val
val
categoricalFeaturesInfo = Map[Int, Int]()
impurity = "variance"
maxDepth = 5
maxBins = 32
val model = DecisionTree.trainRegressor(trainingData,
categoricalFeaturesInfo, impurity, maxDepth, maxBins)
val labelsAndPredictions = testData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
35 / 56
Decision Tree - Classification
val data: RDD[LabeledPoint] = ...
val splits = data.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))
val
val
val
val
val
numClasses = 2
categoricalFeaturesInfo = Map[Int, Int]()
impurity = "gini"
maxDepth = 5
maxBins = 32
val model = DecisionTree.trainClassifier(trainingData, numClasses,
categoricalFeaturesInfo, impurity, maxDepth, maxBins)
val labelAndPreds = testData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
36 / 56
Random Forest - Regression
val
val
val
val
val
val
val
numClasses = 2
categoricalFeaturesInfo = Map[Int, Int]()
numTrees = 3
featureSubsetStrategy = "auto"
impurity = "variance"
maxDepth = 4
maxBins = 32
val model = RandomForest.trainRegressor(trainingData, categoricalFeaturesInfo,
numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins)
val labelsAndPredictions = testData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
37 / 56
Random Forest - Classification
val
val
val
val
val
val
val
numClasses = 2
categoricalFeaturesInfo = Map[Int, Int]()
numTrees = 3
featureSubsetStrategy = "auto"
impurity = "gini"
maxDepth = 4
maxBins = 32
val model = RandomForest.trainClassifier(trainingData, numClasses,
categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity,
maxDepth, maxBins)
val labelAndPreds = testData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
38 / 56
Naive Bayes
val data: RDD[LabeledPoint] = ...
val splits = data.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))
val model = NaiveBayes.train(trainingData, lambda = 1.0,
modelType = "multinomial")
val predictionAndLabel = test.map(p =>
(model.predict(p.features), p.label))
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
39 / 56
Unsupervised Learning in Spark
I
Kmeans clustering
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
40 / 56
KMeans Clustering
val data: RDD[LabeledPoint] = ...
val numClusters = 2
val numIterations = 20
val clusters = KMeans.train(data, numClusters, numIterations)
// Evaluate clustering by computing Within Set Sum of Squared Errors
val WSSSE = clusters.computeCost(data)
println("Within Set Sum of Squared Errors = " + WSSSE)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
41 / 56
Tensorflow
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
42 / 56
Tensorflow
I
TensorFlow
•
•
An interface for expressing machine learning algorithms
An implementation for executing such algorithms
I
Tensor: the data structure used.
I
Flow: the computational model.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
43 / 56
Programming Model - Dataflow
I
A computation is described by a directed graph composed of a set
of nodes.
I
Tensors (values) flow along normal edges in the graph.
I
Control dependencies are special edges with no data flows.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
44 / 56
Programming Model - Operations and Kernels
I
Operation: a node in the dataflow graph that represents an abstract
computation.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
45 / 56
Programming Model - Operations and Kernels
I
Operation: a node in the dataflow graph that represents an abstract
computation.
I
Operations can have attributes: they must be provided or inferred
at graph-construction time.
•
To make operations polymorphic over different tensor element types,
e.g., add of two tensors of type float or int32.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
45 / 56
Programming Model - Operations and Kernels
I
Operation: a node in the dataflow graph that represents an abstract
computation.
I
Operations can have attributes: they must be provided or inferred
at graph-construction time.
•
I
To make operations polymorphic over different tensor element types,
e.g., add of two tensors of type float or int32.
Kernel: a particular implementation of an operation.
•
Can be run on a particular type of device, e.g., CPU or GPU.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
45 / 56
Programming Model - Tensors (1/4)
I
Tensor: the data structure used.
I
Dense numerical arrays of the same datatype.
I
Specified by:
•
•
•
Rank
Shape
Type
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
46 / 56
Programming Model - Tensors (2/4)
I
Rank: the level of nesting
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
47 / 56
Programming Model - Tensors (3/4)
I
Shape: the number of elements for each dimension.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
48 / 56
Programming Model - Tensors (4/4)
I
Type: the datatype of the numbers stored in the tensor.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
49 / 56
Programming Model - Variables
I
Most tensors do not survive past a single execution of the graph.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
50 / 56
Programming Model - Variables
I
Most tensors do not survive past a single execution of the graph.
I
Variable: a special kind of operation that returns a handle to a
persistent mutable tensor that survives across executions of a graph.
I
Handles to these persistent mutable tensors can be passed to a
handful of special operations, e.g., AssignAdd (equivalent to +=)
that mutate the referenced tensor.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
50 / 56
Programming Model - Sessions
I
Session: clients programs interact with the TensorFlow system by
creating a Session.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
51 / 56
Programming Model - Sessions
I
Session: clients programs interact with the TensorFlow system by
creating a Session.
I
Extend method: to augment the current graph managed by the
session with additional nodes and edges.
I
Run method: to compute the transitive closure of all nodes that
must be executed to compute the outputs.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
51 / 56
Programming Model - Sessions
I
Session: clients programs interact with the TensorFlow system by
creating a Session.
I
Extend method: to augment the current graph managed by the
session with additional nodes and edges.
I
Run method: to compute the transitive closure of all nodes that
must be executed to compute the outputs.
I
Usually a Session is set up once, and then execute the full graph
or a few distinct subgraphs thousands or millions of times via Run
calls.
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
51 / 56
Example - Logistic Regression
I
ŷ =
1
1+e −(β0 +β.x)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
52 / 56
Example - Logistic Regression
I
ŷ =
1
1+e −(β0 +β.x)
import tensorflow as tf
x = tf.placeholder(tf.float32, [1]) # add a placeholder node for the input x
beta0 = tf.Variable(tf.zeros(1)) # the bias
beta = tf.Variable(tf.ones(1)) # the coefficient for x
log_odds = beta0 + x * beta # perform the linear regression step
y_hat = 1 / (1 + tf.exp(-log_adds)) # apply the logistic function
initializer = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(initializer)
pred = sess.run(y_hat, feed_dict={x:[1]}) # evaluate the y_hat node
print(pred)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
52 / 56
Implementation
I
Master-worker model
I
Two implementations:
•
•
Local (single machine)
Distributed (multi machines)
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
53 / 56
Summary
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
54 / 56
Summary
I
Data mining: classification, clustering, frequent patterns, anomaly
I
MLlib and Tensorflow
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
55 / 56
Questions?
Amir H. Payberah (KTH)
Tensorflow and MLlib
2016/10/10
56 / 56