Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Data
Preprocessing
Classification
& Regression
Supervised Learning
Regression
Given the value of an input 𝑋, the output π‘Œ belongs to the set of
real values 𝑅. The goal is to predict output accurately for a new
input.
Classification
The predictions or outputs 𝑦 are categorical, while π‘₯ can take any
set of values (real or categorical). The goal is to select the correct
class for a new instance.
1
Data
Preprocessing
Classification
& Regression
Classification
The task of assigning objects to one
of several predefined categories.
2
Data
Preprocessing
Classification
& Regression
Classification
Feature set
(𝑋)
3
Classification
Model
Class label
(𝑦)
Data
Preprocessing
Classification
& Regression
Definition of Classification
β€’ The input data for a classification task is a collection of
records, each known as an instance, composed of a
feature set and a class label.
β€’ Classification, then, is the task of learning a target
function 𝑓 that maps each feature set π‘₯ to one of the
predefined class labels 𝑦.
β€’ The target function is also known informally as the
classification model. A classification model is useful
4 for several purposes.
Data
Preprocessing
Classification
& Regression
Types of Classification Models
Two purposes of classification models:
5
1
Descriptive Modeling
2
Predictive Modeling
A classification model can serve as an explanatory tool to
distinguish between objects of different classes.
A classification model can also be used to predict the
class label of unknown records.
Data
Preprocessing
Classification
& Regression
Classification Models: An Example
Input Features: X
Instances
6
Class: Y
Make
Cylinders
Length
Weight
Style
Honda
Four
150
1956
Hatchback
Toyota
Four
167.9
2280
Wagon
BMW
Six
176.8
2765
Sedan
Descriptive modeling: Explain what
features define a hatchback, wagon, or sedan.
Predictive modeling: Given another car,
determine the style to which it belongs.
Data
Preprocessing
Classification
& Regression
Building a Classification Model
β€’ A classification technique (or classifier) is a systematic
approach to building classification models from an
input dataset.
β€’ Each technique employs a learning algorithm to
identify a model that best fits the relationship
between the features and class of the input data.
β€’ A key objective of the learning algorithm is to build
models with good generalization capability.
7
Data
Preprocessing
Classification
& Regression
Building a Classification Model
Training
Set
Learning
Algorithm
Learn
Model
Model
Data
Apply
Model
Test Set
8
Types of Learners
β€’ Eager learners learn a model that maps the input
features to the class label a soon as the training data
becomes available.
β€’ Lazy learners delay modeling the training data until
it is needed to classify the test examples.
– An example of a lazy leaner is a rote classifier, which
memorizes the entire training data and performs
classification only if a test instance matches one of the
training examples exactly.
9
Data
Preprocessing
Classification
& Regression
Rote Learning
10
Make
Cylinders
Length
Weight
Style
Honda
Four
150
1956
Hatchback
Toyota
Four
167.9
2280
Wagon
BMW
Six
176.8
2765
Sedan
BMW
Six
176.8
2765
Sedan
Nearest Neighbor Classifier
The Idea:
Find which training data is closest to the test instance,
and classify the test instance as that class.
Problems:
– Computationally expensive.
– Not interpretable.
– Very sensitive to noise/outliers.
11
Data
Preprocessing
Classification
& Regression
Nearest Neighbor Learning
12
Make
Cylinders
Length
Weight
Style
Honda
Four
150
1956
Hatchback
Toyota
Four
167.9
2280
Wagon
BMW
Six
176.8
2765
Sedan
BMW
Six
174.2
2456
Sedan
π‘˜-Nearest Neighbor (kNN) Classifier
The Idea:
Find the π‘˜ training instances that are closest to the test
instance, and classify the test instance as the majority
class of the k nearest training instances.
β€œIf it walks like a duck, quacks like a duck, and
looks like a duck, then it’s probably a duck.”
13
Data
Preprocessing
Classification
& Regression
π‘˜-Nearest Neighbor (kNN) Classifier
Training
New Instance
14
?
Classification
Data
Preprocessing
Classification
& Regression
The kNN Algorithm
The π‘˜-nearest neighbor classification algorithm.
1: Let π‘˜ be the number of nearest neighbors and 𝐷 be the set of training
examples.
2: for each test example 𝑧 = 𝐱 β€² , 𝑦 β€² do
3:
Compute 𝑑 𝐱 β€² , 𝐱 , the distance between 𝑧 and every example,
𝐱, 𝑦 ∈ 𝐷.
4:
5:
Select 𝐷𝑧 βŠ† 𝐷, the set of π‘˜ closest training examples to 𝑧.
𝑦 β€² = argmax π‘₯𝑖 ,𝑦𝑖 βˆˆπ·π‘§ 𝐼 𝑣 = 𝑦𝑖
6: end for
15
𝑣
Data
Preprocessing
Classification
& Regression
Nearest Neighbors
1-nearest neighbor
2-nearest neighbor
3-nearest neighbor
The π‘˜ nearest neighbors of an example π‘₯ are the
data points that have the π‘˜ smallest distances to π‘₯.
16
Data
Preprocessing
Classification
& Regression
Choosing the k for kNN
β€’ If π‘˜ is too small, then the nearest-neighbor classifier
may be susceptible to overfitting because of noise in
the training data.
β€’ If π‘˜ is too large, the nearest-neighbor classifier may
misclassify the test instance because its list of nearest
neighbors may include data points that are located far
away from its neighborhood.
17
Data
Preprocessing
Classification
& Regression
Choosing the k for kNN
π‘˜-nearest neighbor
classification with small π‘˜.
18
π‘˜-nearest neighbor
classification with large π‘˜.
Data
Preprocessing
Classification
& Regression
kNN Prediction: Classification
Classification: Take the majority vote of class labels
among the k-nearest neighbors:
Majority voting:
𝑦 = argmax
𝑣
𝐼 𝑣 = 𝑦𝑖
𝐱 𝑖 ,𝑦𝑖 βˆˆπ·π‘§
where 𝑣 is a class label, 𝑦𝑖 is the class label for one of the
nearest neighbors, and 𝐼 βˆ™ is an indicator function that
returns the value 1 if its argument is true and 0
otherwise.
19
Data
Preprocessing
Classification
& Regression
kNN Prediction: Regression
Regression: Predict the average value of the class
value of the k-nearest neighbors.
Average value:
𝑦=
1
𝐱𝑖 , 𝑦𝑖 ∈ 𝐷𝑧
𝑦𝑖
𝐱 𝑖 ,𝑦𝑖 βˆˆπ·π‘§
where 𝑣 is a class label and 𝑦𝑖 is the class label for one of the
nearest neighbors.
20
Data
Preprocessing
Classification
& Regression
kNN Prediction: Distance-Weighted
β€’ In the majority voting approach, every neighbor has
the same impact on the classification.
β€’ Instead, the influence of each nearest neighbor π‘₯𝑖 can
be weighted according to distance: 𝑀𝑖 = 1 𝑑 𝐱′, 𝐱𝑖 2 .
β€’ Then, the class label can be determined as follows:
𝑦 = argmax
𝑣
21
𝑀𝑖 × πΌ 𝑣 = 𝑦𝑖 .
𝐱 𝑖 ,𝑦𝑖 βˆˆπ·π‘§
Data
Preprocessing
Classification
& Regression
kNN Efficiency
β€’ Very efficient in model induction (training)
– Only store the training data.
β€’ Not particularly efficient in testing
– Computation of distance measure to every training instance
β€’ Note that 1NN and π‘˜NN are equally efficient
– Retrieving the π‘˜ nearest neighbors is (almost) no more
expensive than retrieving a single nearest neighbor
– π‘˜ nearest neighbors can be maintained in a queue.
22
Data
Preprocessing
Classification
& Regression
kNN Summary
β€’ Nearest-neighbor classification is part of a more
general technique known as instance-based learning.
β€’ Lazy learners such as nearest-neighbor classifiers do
not require model building.
β€’ Generate their predictions based on local information.
β€’ Can produce arbitrarily shaped decision boundaries.
β€’ Can easily produce wrong predictions without
appropriate data preprocessing.
23
Data
Preprocessing
Classification
& Regression
Back to Building a Classification Model
Learning
Algorithm
Training
Set
Learn
Model
Model
Data
Apply
Model
Test Set
24
For Nearest Neighbors Classification,
the model is the training set itself.
Data
Preprocessing
Classification
& Regression
Let’s see how a model that actually learns works.
25
Components of Learning
β€’ Suppose that a bank wants to automate the process of
evaluating credit card applications.
– Input π‘₯ (customer information that is used to make a credit
application).
– Target function 𝑓: 𝑋 β†’ π‘Œ (ideal formula for credit approval),
where 𝑋 and π‘Œ are the input and output space, respectively.
– Dataset 𝐷 of input-output examples π‘₯1 , 𝑦1 , … , π‘₯𝑛 , 𝑦𝑛 .
– Hypothesis (skill) with hopefully good performance:
𝑔: 𝑋 β†’ π‘Œ (β€œlearned” formula to be used)
26
Data
Preprocessing
Classification
& Regression
Components of Learning
Unknown Target Function
𝑓: 𝑋 β†’ π‘Œ
(ideal credit approval formula)
Training Examples
π‘₯1 , 𝑦1 , … , π‘₯𝑛 , 𝑦𝑛
(historical records of credit customers)
Hypothesis Set
𝐻
Learning
Algorithm
𝐴
Final
Hypothesis
π‘”β‰ˆπ‘“
(learned credit approval formula)
(set of candidate formulas)
Use data to compute hypothesis 𝑔 that approximates target 𝑓
27
Simple Learning Model: The Perceptron
For 𝐱 = π‘₯1 , … , π‘₯𝑑 (β€œfeatures of the customer”), compute
a weighted score and:
𝑑
Approve credit if
𝑀𝑖 π‘₯𝑖 > threshold ,
𝑖=1
𝑑
Deny credit if
𝑀𝑖 π‘₯𝑖 < threshold .
𝑖=1
28
Simple Learning Model: The Perceptron
This formula can be written more compactly as
𝑑
𝑕 𝐱 = sign
𝑀𝑖 π‘₯𝑖 βˆ’ threshold ,
𝑖=1
where 𝑕 𝐱 = +1 means β€˜approve credit’ and 𝑕 𝐱 =
βˆ’ 1 means β€˜deny credit’; sign 𝑠 = +1 if 𝑠 > 0 and
sign 𝑠 = βˆ’1 if 𝑠 < 0. This model is called a perceptron.
29
Data
Preprocessing
Classification
& Regression
Simple Learning Model: The Perceptron
30
π‘₯1
π‘₯2
π‘₯3
𝑦
1
0
0
-1
1
0
1
1
1
1
0
1
1
1
1
1
0
0
1
-1
0
1
0
-1
0
1
1
1
0
0
0
-1
Input
Nodes
π‘₯1
π‘₯2
π‘₯3
0.3
0.3
Output
Node
Ξ£
0.3
𝑑 = 0.4
𝑦
2
Perceptrons (in ℝ )
𝑕 𝐱 = 𝑠𝑖𝑔𝑛 𝑀0 + 𝑀1 π‘₯1 + 𝑀2 π‘₯2
The perceptron is a linear (binary) classifier:
31
β€’ Customer features x: points on the plane
β€’ Labels 𝑦: (+1), (-1)
β€’ Hypothesis 𝑕: line (divide positive and negative)
Data
Preprocessing
Classification
& Regression
And now…
Let’s see some classifying!
32