Download Supervised Learning

Document related concepts

Forecasting wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Lecture 3
Knowledge-based systems
Sanaullah Manzoor
CS&IT, Lahore Leads University
[email protected]
https://sites.google.com/site/engrsanaullahmanzoor/home
Overview

What is “Machine Learning”?

Learning Types


Supervised learning

Un-Supervised learning
Data Pre-Processing


Discretization Methods: Binning
Classification

Bayesian Classifier
2
Why “Learn”?
Machine learning is programming computers
to optimize a performance criterion using example
data or past experience.
There is no need to “learn” to calculate payroll
Learning is used when:
Human expertise does not exist (navigating on
Mars),
Humans are unable to explain their expertise
(speech recognition)
Solution changes in time (routing on a computer
network)
Solution needs to be adapted to particular cases
(user biometrics)
What is Machine Learning?
Machine Learning
Study of algorithms that
improve their performance
at some task
with experience
Optimize a performance criterion using example
data or past experience.
Role of Computer science: Efficient algorithms to
Solve the optimization problem
Representing and evaluating the model for inference
Data Mining
Definition := “Data mining is the process of identifying
valid, novel, potentially useful, and
understandable patterns in data”
ultimately
Applications:
Retail: Customer relationship management (CRM)
Finance: Credit scoring, fraud detection
Manufacturing: Optimization, troubleshooting
Medicine: Medical diagnosis
Telecommunications: Quality of service optimization
Web mining: Search engines
Growth of Machine Learning
Machine learning is preferred approach to
Speech recognition, Natural language processing
Computer vision
Medical outcomes analysis
Robot control
Computational biology
Machine learning problems
What high-level machine learning
problems have you seen or heard of
before?
Data
examples
Data
Data
examples
Data
Data
examples
Data
Supervised learning
examples
label
label1
label3
labeled examples
label4
label5
Supervised learning: given labeled examples
Supervised learning
label
label1
label3
model/
predictor
label4
label5
Supervised learning: given labeled examples
Supervised learning
model/
predictor
predicted label
Supervised learning: learn to predict new example
Supervised learning: classification
label
apple
apple
Classification: a finite set of
labels
banana
banana
Supervised learning: given labeled examples
Classification Example
Differentiate
between low-risk
and high-risk
customers from
their income and
savings
Supervised learning: regression
label
-4.5
10.1
Regression: label is real-valued
3.2
4.3
Supervised learning: given labeled examples
Regression Example
Price of a used car
y = wx+w0
x : car attributes
(e.g. mileage)
y : price
Supervised learning: ranking
label
1
Ranking: label is a ranking
4
2
3
Supervised learning: given labeled examples
Ranking example
Given a query
and
a set of web
pages,
rank them
according
to relevance
Ranking Applications
User preference, e.g. Netflix “My List” -movie queue ranking
iTunes
flight search (search in general)
Unsupervised learning
Unupervised learning: given data, i.e. examples, but no labels
Approaches
Supervised Learning
Unsupervised Learning
Learning system model
Testing
Input
Samples
Learning
Method
System
Training
Prediction
Learning Types

Supervised Learning: The computer is presented with
example inputs and their desired outputs and the goal is to
learn a general rule that maps inputs to outputs. Also known
as Classification.

Regression, also is a supervised problem, the outputs are continuous
rather than discrete.
Machine learning structure
Supervised learning
Learning Types

Unsupervised Learning: no labels are given to the learning
algorithm, leaving it on its own to find structure in its input.
Unsupervised learning can be a goal in itself (discovering
hidden patterns in data). Known as Clustering.
Machine learning structure
Unsupervised learning
Data Pre-processing
Forms of data preprocessing
Data Pre-processing
Data cleaning
Fill in missing values, smooth noisy data, identify or
remove outliers, and resolve inconsistencies
Data integration
Integration of multiple databases, data cubes, or
files
Data transformation
Normalization and aggregation
Data reduction
Obtains reduced representation in volume but
produces the same or similar analytical results
Why Data Preprocessing?
Data in the real world is dirty
incomplete: lacking attribute values, lacking certain
attributes of interest, or containing only aggregate
data
noisy: containing errors or outliers
inconsistent: containing discrepancies in codes or
names
No quality data, no quality mining results!
Quality decisions must be based on quality data
Data warehouse needs consistent integration of
quality data
Data Cleaning
Data cleaning tasks
Fill in missing values
Identify outliers and smooth out noisy data
Correct inconsistent data
Missing Data
Data is not always available
E.g., many tuples have no recorded value for several
attributes, such as customer income in sales data
Missing data may be due to
equipment malfunction
inconsistent with other recorded data and thus deleted
data not entered due to misunderstanding
certain data may not be considered important at the time of
entry
not register history or changes of the data
Missing data may need to be inferred.
How to Handle Missing Data?
Fill in the missing value manually: tedious + infeasible?
Use a global constant to fill in the missing value: e.g.,
“unknown”, a new class?!
Use the attribute mean to fill in the missing value
Use the most probable value to fill in the missing value:
inference-based such as Bayesian formula or decision
tree
Noisy Data
Noise: random error or variance in a measured
variable
Incorrect attribute values may due to
faulty data collection instruments
data entry problems
data transmission problems
technology limitation
inconsistency in naming convention
Other data problems which requires data cleaning
duplicate records
incomplete data
inconsistent data
How to Handle Noisy Data?
Binning method:
first sort data and partition into (equi-depth) bins
then one can smooth by bin means, smooth by bin
median, smooth by bin boundaries, etc.
Clustering
detect and remove outliers
Combined computer and human inspection
detect suspicious values and check by human
Regression
smooth by fitting the data into regression functions
Simple Discretization Methods: Binning
Equal-width (distance) partitioning:
It divides the range into N intervals of equal size:
uniform grid
if A and B are the lowest and highest values of the
attribute, the width of intervals will be: W = (B-A)/N.
Equal-depth (frequency) partitioning:
It divides the range into N intervals, each containing
approximately same number of samples
Good data scaling
Managing categorical attributes can be tricky.
Binning Methods for Data Smoothing
* Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28,
29, 34
* Partition into (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
Method 1:
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
OR Method 2:
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
Cluster Analysis
Regression y
Y1
Y1’
y=x+1
X1
x
Data Integration
Data integration:
combines data from multiple sources into a coherent
store
Schema integration
integrate metadata from different sources
Detecting and resolving data value conflicts
for the same real world entity, attribute values from
different sources are different
possible reasons: different representations, different
scales, e.g., metric vs. British units
Handling Redundant Data
Redundant data occur often when integration of multiple
databases
The same attribute may have different names in different
databases
One attribute may be a “derived” attribute in another table, e.g.,
annual revenue
Redundant data may be able to be detected by
correlational analysis (Similarity measures)
Careful integration of the data from multiple sources may
help reduce/avoid redundancies and inconsistencies and
improve mining speed and quality
Data Reduction Strategies
Warehouse may store terabytes of data: Complex
data analysis/mining may take a very long time to
run on the complete data set
Data reduction
Obtains a reduced representation of the data set that is
much smaller in volume but yet produces the same (or
almost the same) analytical results
Data reduction strategy
Dimensionality reduction
Dimensionality Reduction
Feature selection (i.e., attribute subset selection):
Select a minimum set of features such that the
probability distribution of different classes given the
values for those features is as close as possible to the
original distribution given the values of all features
reduce # of patterns in the patterns, easier to understand
Heuristic methods (due to exponential # of
choices):
step-wise forward selection
step-wise backward elimination
combining forward selection and backward elimination
decision-tree induction
Example of Decision Tree Induction
Initial attribute set:
{A1, A2, A3, A4, A5, A6}
A4 ?
A6?
A1?
Class 1
>
Class 2
Class 1
Reduced attribute set: {A1, A4, A6}
Class 2
Classification
Classification
Sample case
Classify fish in a fishery
•
Manually employ labor to classify fish
•
Use automated processing with the help of a camera
Sample case
Classification based on light and darkness in skin tone
A Simple Classification Problem
A classification problem:
The grades for students taking this course
Key Steps :
1. Data (what past experience can we rely on?)
2.
Assumptions (what can we assume about the students or the
course?)
3.
Representation (how do we “summarize” a student?)
4.
Estimation (how do we construct a map from students to
grades?)
5.
Evaluation (how well are we predicting?)
6.
Model Selection (perhaps we can do even better?)
Classification Approaches

Bayesian networks

Artificial Neural Network

Genetic Algorithm

Decision Trees

Support Vector Machine
What is Feature selection ?
Feature selection:
Problem of selecting some subset of a
learning algorithm’s input variables upon
which it should focus attention, while
ignoring the rest. Also known as
DIMENSIONALITY REDUCTION.
Feature Subset Selection
Filter Methods
• Select subsets of variables as a pre-processing
step,
independently of the used classifier!!
Bayesian Classifier
Classification!!!!
Lets start with simple classification form (General
Problem)
Problem statement:

Given features X1,X2,…,Xn
 Predict a label Y
Classification!!!!
Example :Digit Recognition
Classifier
Features :
X1,…,Xn  {0,1} (Black vs. White pixels)
Lables :
Y  {5,6} (predict whether a digit is a 5 or a 6)
5
Classification!!!!
Example :Digit Recognition
Classifier
Our Problem can be stated as :
“what is the probability that the image
represents a 5 given its pixels?”
5
Bayesian Classification!!!!
Lets Solve our Problem with
“Bayesian Rule”
(Thomas Bayes 1702 – 1761)
Bayesian classification is Probabilistic
approach .
Bayesian Classification!!!!
Bayesian rules :
Likelihood
Normalization Constant
Prior
Bayesian Classification!!!!
Three components of Bayesian rule are:
1-Likelihood
2-Prior
3-Normalization Constants or Evidence
Likelihood : It is probability of features in a given
class
𝑷(𝑿𝟏 , … 𝑿𝒏 |𝒀)
Like in our case what is probability of
features(𝑿𝟏 , … 𝑿𝒏 ) for given Y={class 5 or class 6}.
Bayesian Classification!!!!
Three components of Bayesian rule are:
1-Likelihood
2-Prior
3-Normalization Constants or Evidence
Prior : It is “probability of occurrence of a class”
If class is 5.
𝑷(𝒀 = 𝟓)
If class is 6.
𝑷(𝒀 = 𝟔)
Bayesian Classification!!!!
Three components of Bayesian rule are:
1-Likelihood
2-Prior
3-Normalization Constants or Evidence
Normalization Constants or Evidence :
It is Probability of occurrence of a feature
𝑷(𝑿𝟏 , , , 𝑿𝒏 )
Bayesian Classification!!!!
Solution of our Example :
If class is 5.
If class is 6.
Which one class is with greater probability that’s our
solution 
Bayesian Classification!!!!
Example 2:
Bayesian Classification!!!!
Solution:
1.Calaculate total Yes and No Probability
Bayesian Classification!!!!
2. Calculate Yes and No
probability in the
feature “Outlook”
Bayesian Classification!!!!
3. Calculate Yes and No
probability in the
feature “Temperature”
Bayesian Classification!!!!
4. Calculate Yes and No
probability in the
feature “Humidity and
Wind”
Bayesian Classification!!!!
Solution:
Bayesian Classification!!!!
Given a new instance (Testing phase)
x’=(Outlook=Sunny, Temperature=Cool,
Humidity=High, Wind=Strong)
Using our calculations:
Bayesian Classification!!!!
x’=(Outlook=Sunny, Temperature=Cool,
Humidity=High, Wind=Strong)
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
Bayesian Classification!!!!
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,
Wind=Strong)
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
Bayesian Classification!!!!
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) =
0.01176
P(Yes|x’):
[P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) =
0.0051
Bayesian Classification!!!!
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) =
0.01176
P(Yes|x’):
[P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) =
0.0051