Download Classification and Decision Trees

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Classification and Decision Trees
Iza Moise, Evangelos Pournaras, Dirk Helbing
Iza Moise, Evangelos Pournaras, Dirk Helbing
1
Overview
Classification
Decision Trees
Iza Moise, Evangelos Pournaras, Dirk Helbing
2
Classification
Iza Moise, Evangelos Pournaras, Dirk Helbing
3
Definition
Classification
is a data mining function that assigns items in a collection to target
categories or classes.
The goal
is to accurately predict the target class for each data point.
• Supervised
• Outcome → class
Iza Moise, Evangelos Pournaras, Dirk Helbing
4
Definition
Classification
is a data mining function that assigns items in a collection to target
categories or classes.
The goal
is to accurately predict the target class for each data point.
• Supervised
• Outcome → class
Iza Moise, Evangelos Pournaras, Dirk Helbing
4
Definition
Classification
is a data mining function that assigns items in a collection to target
categories or classes.
The goal
is to accurately predict the target class for each data point.
• Supervised
• Outcome → class
Iza Moise, Evangelos Pournaras, Dirk Helbing
4
Types of Classification
I
binary classification → target attribute has only two values
I
multi class targets have more than two values
crispy classification → given an input, the classifier returns its
label
probabilistic → given an input, the classifier returns its
probabilities to belong to each class
Iza Moise, Evangelos Pournaras, Dirk Helbing
5
Applications
Classification Example: Spam Filtering
Classify as “Spam” or “Not Spam”
1
1
Machine Learning: CS 6375 Introduction, Instructor: Vibhav Gogate,The University of Texas at Dallas
Iza Moise, Evangelos Pournaras, Dirk Helbing
6
Applications[cont.]
Classification Example: Weather
Prediction
2
2
Machine Learning: CS 6375 Introduction, Instructor: Vibhav Gogate,The University of Texas at Dallas
Iza Moise, Evangelos Pournaras, Dirk Helbing
7
Applications[cont.]
• Customer Target Marketing
• Medical Disease Diagnosis
• Supervised Event Detection
• Multimedia Data Analysis
• Document Categorization and Filtering
• Social Network Analysis
Iza Moise, Evangelos Pournaras, Dirk Helbing
8
A Three-Phase Process
1. Training phase: a model is constructed from the training
instances.
→ classification algorithm finds relationships between predictors
and targets
→ relationships are summarised in a model
→ train the model on data with known labels (training data)
2. Testing phase: test the model on a test sample whose class
labels are known but not used for training the model (testing
data)
3. Usage phase: use the model for classification on new data
whose class labels are unknown (new data)
Iza Moise, Evangelos Pournaras, Dirk Helbing
9
Training Phase - Model Construction
3
3
Data Warehousing and Data Mining, Instructor: Prof. Hany Saleeb
Iza Moise, Evangelos Pournaras, Dirk Helbing
10
Testing Phase - Model usage
4
4
Data Warehousing and Data Mining, Instructor: Prof. Hany Saleeb
Iza Moise, Evangelos Pournaras, Dirk Helbing
11
Methods of classification
• Decision Trees
• k-Nearest Neighbours
• Neural Networks
• Logistic Regression
• Linear Discriminant Analysis
Iza Moise, Evangelos Pournaras, Dirk Helbing
12
Decision Trees
Iza Moise, Evangelos Pournaras, Dirk Helbing
13
Main principles
A decision tree
creates a hierarchical partitioning of the data which relates the different partitions at the leaf level to the different classes.
Data requirements:
• Attribute-Value description: object expressible in terms of a
fixed collection of properties or attributes (e.g., hot, mild, cold).
• Predefined classes (target values): the target function has
discrete output values (boolean or multi-class).
• Sufficient data: enough training cases should be provided to
learn the model.
Iza Moise, Evangelos Pournaras, Dirk Helbing
14
Main principles [cont.]
• decision node = test on an
attribute
• branch = an outcome of the test
• leaf node = classification or
decision
• root = the best predictor
• path: a disjunction of tests to
make the final decision
Iza Moise, Evangelos Pournaras, Dirk Helbing
15
Main principles [cont.]
• decision node = test on an
attribute
• branch = an outcome of the test
• leaf node = classification or
decision
• root = the best predictor
• path: a disjunction of tests to
make the final decision
Iza Moise, Evangelos Pournaras, Dirk Helbing
15
Main principles [cont.]
• decision node = test on an
attribute
• branch = an outcome of the test
• leaf node = classification or
decision
• root = the best predictor
• path: a disjunction of tests to
make the final decision
Iza Moise, Evangelos Pournaras, Dirk Helbing
15
Main principles [cont.]
• decision node = test on an
attribute
• branch = an outcome of the test
• leaf node = classification or
decision
• root = the best predictor
• path: a disjunction of tests to
make the final decision
Iza Moise, Evangelos Pournaras, Dirk Helbing
15
Main principles [cont.]
• decision node = test on an
attribute
• branch = an outcome of the test
• leaf node = classification or
decision
• root = the best predictor
• path: a disjunction of tests to
make the final decision
Iza Moise, Evangelos Pournaras, Dirk Helbing
15
Main principles [cont.]
• decision node = test on an
attribute
• branch = an outcome of the test
• leaf node = classification or
decision
• root = the best predictor
• path: a disjunction of tests to
make the final decision
Classification on new instances is done by following a matching
path from the root to a leaf node
Iza Moise, Evangelos Pournaras, Dirk Helbing
15
5
5
Dr. Saed Sayad, adjunct Professor at the University of Toronto
Iza Moise, Evangelos Pournaras, Dirk Helbing
16
Split criterion
A condition (or predicate) on:
• a single attribute → univariate split
• multiple attributes → multivariate split
I
Recursively split the training data
I
Goal: maximize the information gain (the discrimination among
the classes)
→ how well an attribute separates the examples according
to their target classification
Iza Moise, Evangelos Pournaras, Dirk Helbing
17
How to build a decision tree?
Top-down tree construction:
• all training data are the root
• data are partitioned recursively based on selected attributes
• bottom-up tree pruning
→ remove subtrees or branches, in a bottom-up manner, to
improve the estimated accuracy on new cases.
• conditions for stopping partitioning:
• all samples for a given node belong to the same class
• there are no remaining attributes for further partitioning
• there are no samples left
Iza Moise, Evangelos Pournaras, Dirk Helbing
18
Pros and Cons
Pros:
X simple to understand and interpret
X little data preparation and little computation
X indicates which attributes are most important for classification
Iza Moise, Evangelos Pournaras, Dirk Helbing
19
Pros and Cons
Cons:
X learning an optimal decision tree is NP-complete
X perform poorly with many classes and small data
X computationally expensive to train
X over-complex trees do not generalise well from the training
data (overfitting)
Iza Moise, Evangelos Pournaras, Dirk Helbing
20
What’s next?
• k-nearest Neighbors
• Clustering
Iza Moise, Evangelos Pournaras, Dirk Helbing
21