Download classification

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Classification and Prediction
Classification, Regression, and Prediction

Classification:



Regression:


Predict categorical class labels
Classify data (constructs a model) based on training set
and values (class labels) in a classifying attribute and uses
it in classifying new data
Model continuous-valued functions; i.e., predicts
unknown or missing values
Prediction:


Classification + Regression
Sometimes refers only to regression (e.g., in the text
book)
2
Classification—A Two-Step Process

Step 1. Model construction: describing a set of
predetermined classes



Set of tuples used for model construction: training set
Each tuple/sample is assumed to belong to a predefined
class, as determined by class label attribute
Model is represented as classification rules, decision
trees, or mathematical formulae
NAME
Mike
Mary
Bill
Jim
Dave
Anne
RANK
Assistant Prof
Assistant Prof
Professor
Associate Prof
Assistant Prof
Associate Prof
YEARS TENURED
3
no
7
yes
2
yes
7
yes
6
no
3
no
IF rank = ‘professor’
OR years > 6
THEN tenured =
‘yes’
3
Classification—A Two-Step Process


Step 2. Model usage: for classifying future or
unknown objects
Estimate predictive accuracy of model



Known label of test sample is compared with classified
result from model
Accuracy rate is percentage of test set samples that are
correctly classified by model
Test set is independent of training set, otherwise overfitting will occur
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
(Jeff, Professor, 4)
4
Classification Process (1): Model Construction
Training
Data
NAME
Mike
Mary
Bill
Jim
Dave
Anne
RANK
Assistant Prof
Assistant Prof
Professor
Associate Prof
Assistant Prof
Associate Prof
Classification
Algorithms
YEARS TENURED
3
no
7
yes
2
yes
7
yes
6
no
3
no
Classifier
(Model)
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
5
Classification Process (2): Use Model in Prediction
Classifier
(Model)
Unseen
Data
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Test Data
(Jeff, Professor, 4)
Tenured?
N AME RAN K
YEARS TEN URED
Tom Assistant Prof
2
no
Merlisa Associate Prof
7
no
George Professor
5
yes
Joseph Assistant Prof
7
yes
6
Supervised versus Unsupervised Learning

Supervised learning (classification)



Supervision: Training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
New data is classified based on training set
Unsupervised learning (clustering)


Class labels of training data are unknown
Given a set of measurements, observations, etc., need to
establish existence of classes or clusters in data
7
Classification and Prediction









What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian Classification
Classification based on concepts from association rule
mining
Other Classification Methods
Prediction
Classification accuracy
Summary
8
Issues (1): Data Preparation

Data cleaning



Relevance analysis (feature selection)


Preprocess data in order to reduce noise (e.g., by
smoothing) and handle missing values (e.g., use most
commonly occurring value)
Help to reduce confusion during learning
Remove irrelevant or redundant attributes
Data transformation

Generalize (to higher level concepts) and/or normalize
data (scaling values so that they fall within specified
range)
9
Issues (2): Evaluating Classification Methods

Predictive accuracy



Time to construct model
Time to use model
Robustness


Predict class label
Interpretability:

Speed



Make correct prediction
given noise and missing
values

Understanding and
insight provided by
model
Goodness of rules


Decision tree size
Compactness of
classification rules
Scalability

Construct model
efficiently given data size
10
Related documents