Download file (1.3 MB, pdf)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Chapter 5: Classification
5.1 CLASSIFICATION:
Definition: Classification is the task of learning a target function f that maps each attribute set x
to one of the predefined class label y.
The target function is also known informally as a classification model
Purpose of Classification Model:
1 ) Descriptive Modeling: A classification model can serve as an explanatory tool to
distinguish between objects of different classes.
Example, it would be useful—for both biologists and others—to have a descriptive model
that summarizes the data which explains the features of mammals, reptiles, birds or
amphibians.
Name
Human
Python
Salmon
Whale
Frog
komodo
Eel
salamander
Body
Temperature
warm-blooded
cold-blooded
cold-blooded
warm-blooded
cold-blooded
cold-blooded
cold-blooded
cold-blooded
Skin
Cover
Hair
Scales
Scales
Hair
None
Scales
Scales
None
Gives
Birth
Yes
No
No
Yes
No
No
No
No
Aquatic
Creature
no
no
yes
yes
semi
no
yes
semi
Aerial
Creature
no
no
no
no
no
no
no
no
Has
Legs
yes
no
no
no
yes
yes
no
yes
Hibernates
no
yes
no
no
yes
no
no
yes
Class
Label
mammal
reptile
fish
mammal
amphibian
reptile
fish
amphibian
Table 4.1: The Vertebrate data set
2) Predicative Modeling: A classification model can also be used to predict the class label
of unknown records
As shown in Figure 4.2, a classification model can be treated as a black box that
automatically assigns a class label when presented with the attribute set of an unknown
record. Suppose we are given the following characteristics of a creature known as a gila
monster:
Has Hiber- Class
Name
Body Temperature Skin Gives Aquatic Aerial
Birth
Cover
Creature Creature Legs nates Label
Gila monster Cold Blooded Scales no
no
no
yes yes
?
We can use a classification model built from the data set shown in Table 4.1 to determine the
class to which the creature belongs.
Classification techniques are most suited for predicting or describing data sets with
binary or nominal categories. They are less effective for ordinal categories
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 1
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
5.2 GENERAL APPROACH FOR SOLVING A CLASSIFICATION PROBLEM:
o A classification technique (or classifier) is a systematic approach to building classification
models from an input data set.
o The model generated by a learning algorithm should understand the input data well and
correctly predict the class labels of records it has never seen before.
o Therefore, a key objective of the learning algorithm is to build models with good
generalization capability; i.e., models that accurately predict the class labels of previously
unknown records.
o Figure 4.3 shows a general approach for solving classification problems. First, a training
set consisting of records whose class labels are known must be provided. The training set
is used to build a classification model, which is subsequently applied to the test set, which
consists of records with unknown class labels.
Evaluation of the performance of a classification model is based on the counts of test records
correctly and incorrectly predicted by the model. These counts are tabulated in a table known as a
confusion matrix
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 2
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Table 4.2. Confusion matrix for a 2-class problem
Predicted Class
Class = 1 Class = 0
Actual
Class
Class = 1
Class = 0
f11
f01
f10
f00
o Table 4.2 depicts the confusion matrix for a binary classification problem. Each entry fij in
this table denotes the number of records from class i predicted to be of class j. For
instance, f01 is the number of records from class 0 incorrectly predicted as class 1.
o Based on the entries in the confusion matrix, the total number of correct predictions made
by the model is (f11 + f00) and the total number of incorrect predictions is (f10 + f01).
Comparison of different models using performance metric:
5.3 DECISION TREE INDUCTION:
Decision tree is widely used classification technique
5.3.1 HOW A DECISION TREE WORKS:
o Suppose a new species is discovered by scientists. How can we tell whether it is a mammal
or a non-mammal?
o One approach is to pose a series of questions about the characteristics of the species.
o The first question we may ask is whether the species is cold- or warm-blooded. If it is
cold-blooded, then it is definitely not a mammal.
o Otherwise, it is either a bird or a mammal.
o In the latter case, we need to ask a follow-up question: Do the females of the species give
birth to their young? Those that do give birth are definitely mammals, while those that do
not are likely to be non-mammals
o The series of questions and their possible answers can be organized in the form of a
decision tree, which is a hierarchical structure consisting of nodes and directed edges.
Figure 4.4 shows the decision tree for the mammal classification problem.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 3
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
o The tree has three types of nodes:
 A root node that has no incoming edges and zero or more outgoing edges.

Internal nodes, each of which has exactly one incoming edge and two or more
outgoing edges.

Leaf or terminal nodes, each of which has exactly one incoming edge and no
outgoing edges.
o Classifying a test record is straightforward once a decision tree has been constructed.
Starting from the root node, we apply the test condition to the record and follow the
appropriate branch based on the outcome of the test as shown in figure 4.5
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 4
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
5.3.2 HOW TO BUILD DECISION TREE:
HUNT'S ALGORITHM:
o In Hunt's algorithm, a decision tree is grown in a recursive fashion by partitioning the
training records into successively purer subsets. Let Dt be the set of training records that
are associated with node t and y = {y1, y2, …, yc } no the class labels.
The following is a recursive definition of Hunt's algorithm:


Step 1: If all the records in D, belong to the same class yt. then t is a leaf node
labelled as yt
Step 2: If Dt contains records that belong to more than one class, an attribute test
condition is selected to partition the records into smaller subsets. A child node is
created for each outcome of the test condition and the records in Dt are distributed to
the children based on the outcomes. The algorithm is then recursively applied to each
child node.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 5
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
o Figure 4.6, each record contains the personal information of a borrower along with a class
label indicating whether the borrower has defaulted on loan payments.
o The initial tree for the classification problem contains a single node with class label
Defaulted = No (see Figure 4.7(a)), which means that most of the borrowers successfully
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 6
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
repaid their loans. The tree, however, needs to be refined since the root node contains
records from both classes. The records are subsequently divided into smaller subsets based
on the outcomes of the Home Owner test condition
o Hunt's algorithm is then applied recursively to each child of the root node.
o The left child of the root is therefore a leaf node labelled Defaulted = No (see Figure
4.7(b)). For the right child, we need to continue applying the recursive step of Hunt's
algorithm until all the records belong to the same class.
Design Issues of Decision Tree Induction:
A learning algorithm for inducing decision trees must address the following two issues:
1. How should the training records be split? Select an attribute test condition to divide the
records into smaller subsets.
To implement this step: the algorithm must provide a method for specifying the test condition
i) For different attribute types
ii) For objective measure for evaluating the goodness of each test condition.
2. How should the splitting procedure stop?
i)
A stopping condition is needed to terminate the tree-growing process.
ii)
Continue expanding a node until either all the records belong to the same class or all
the records have identical attribute values.
i i i ) Other criteria can be imposed to allow the tree-growing procedure to terminate earlier.
5.3.3 METHODS FOR EXPRESSING ATTRIBUTE TEST CONDITIONS:
o Decision tree induction algorithm must provide a method for expressing an attribute test
condition and its corresponding outcomes for different attributes types.
o Binary Attributes : The test condition for a binary attribute generates two potential
outcomes, as shown in fig:
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 7
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
o Nominal Attributes:
o Since a nominal attribute can have many values, its test condition can be expressed in
two ways, as shown in Figure 4.9.
o For a multiway split (Figure 4.9(a)), the number of outcomes depends on the number
of distinct values for the corresponding attribute.
o For example, if an attribute such as marital status has three distinct values—single,
married, or divorced—its test condition will produce a three-way split
o Ordinal Attributes:
o Ordinal attributes can also produce binary or multiway splits. Ordinal attribute values
can be grouped as long as the grouping does not violate the order property of the
attribute values.
o Figure 4.10 illustrates various ways of splitting training records based on the Shirt
Size attribute. The groupings shown in Figures 4.10(a) and (b) preserve the order
among the attribute values, whereas the grouping shown in Figure 4.10(c) violates this
property because it combines the attribute values Small and Large into the same
partition while Medium and Extra Large are combined into another partition.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 8
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
o Continuous Attributes:
o For continuous attributes, the test condition can be expressed as a comparison test
(A < v) or (A ≥ v), with binary outcomes, or a range query with outcomes of the form
vi ≤ A < vi+1, for i = 1, … , k .
5.3.4 MEASURES FOR SELECTING THE BEST SPLIT:
o There are many measures that can be used to determine the best way to split the records.
These measures are denned in terms of the class distribution of the records before and after
splitting.
o Let p(i|t) denote the fraction of records belonging to class i at a given node t.
o The measures developed for selecting the best split are often based on the degree of
impurity of the child nodes.
o The smaller the degree of impurity, the more skewed the class distribution.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 9
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
o Example of Impurity Measures include: (Entropy, Gini, Classification error)
5.3.5 ALGORITHM FOR DECISION TREE INDUCTION:
o A skeleton decision tree induction algorithm called TreeGrowth is shown in Algorithm
4.1. The input to this algorithm consists of the training records E and the attribute set F.
The algorithm works by recursively selecting the best attribute to split the data (Step 7)
and expanding the leaf nodes of the tree (step 11 and 12 ) until the stopping criterion is met
(step 1).
Algorithm 4.1 A skeleton decision tree induction algorithm.
TreeGrowth ( E , F)
1. if stopping_cond(E,F) = true then
2.
leaf = createNode()
3.
leaf.label = Classify(E).
4.
return leaf.
5. else
6 . root = createNode().
7 . root.test-cond = f ind_best_split(E, F) .
8 . let V = { v \ v is a possible outcome of root.test-cond } .
9 . for each v € V do
10.
Ev = { e | root.test-cond(e) = v and e € E ) .
11.
child = TreeGrowth(Ev, F) .
12.
add child as descendent of root and label the edge (root —> child) as v.
13. end for
14. end if
15. return root.
The Details of the Algorithm are explained below:
1. The createNode() function extends the decision tree by creating a new node. A node in the
decision tree has either a test condition, denoted as node.test_cond, or a class label,
denoted as node.label.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 10
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
2. The find_best_split() function determines which attribute should be selected as the test
condition for splitting the training records.
3. The Classify() function determines the class label to be assigned to a leaf node.
4. The stopping_cond() function is used to terminate the tree-growing process by testing
whether all the records have either the same class label or the same attribute values.
Another way to terminate the recursive function is to test whether the number of records
has fallen below some minimum threshold.
5. After building the decision tree, a tree-pruning step can be performed to reduce the size of
the decision tree
5.3.6 AN EXAMPLE: WEB ROBOT DETECTION
o Web usage mining is the task of applying data mining techniques to extract useful patterns
from Web access logs.
o These patterns can reveal interesting characteristics of site visitors; e.g., people who
repeatedly visit a Web site and view the same product description page are more likely to
buy the product if certain incentives such as rebates or free shipping are offered.
o In Web usage mining, it is important to distinguish accesses made by human users from
those due to Web robots.
o A Web robot (also known as a Web crawler) is a software program that automatically
locates and retrieves information from the Internet by following the hyperlinks embedded
in Web pages.
Decision Tree Classification can be used to distinguish between accesses by human users
and those by web Robots.
o To classify the Web sessions, features are constructed to describe the
characteristics of each session.
o Some of the features used for the Web robot detection task are depth and
Breadth
o Depth determines the maximum distance of a requested page, where distance
is measured in terms of the number of hyperlinks away from the entry point of
the Web site.
o For example, the home page http://www.cs.umn.edu/kumar is assumed to be
at depth 0,
o whereas http://www.cs.umn.edu/kumar/MINDS/MINDS_papers.htm is located at depth 2
o The breadth attribute measures the width of the corresponding Web graph
o Here sessions used by WebRobots belongs to class-1
o Sessions used by human users belongs to class-2
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 11
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
The model suggests that Web robots can be distinguished from human users in the
following way:
1. Accesses by Web robots tend to be broad but shallow, whereas accesses by human
users tend to be more focused (narrow but deep).
2. Unlike human users, Web robots seldom retrieve the image pages associated with a
Web document.
3. Sessions due to Web robots tend to be long and contain a large number of requested
pages.
4. Web robots are more likely to make repeated requests for the same document since the
web pages retrieved by human uses are often cached by the browser.
Decision Tree:
depth = 1:
| breadth> 7 : class 1
| breadth <= 7:
| | breadth <= 3:
| | | | magePages> 0.375: class 0
| | | | magePages<= 0.375:
| | | | totalPages<=6: class 1
:
:
:
Figure 4.18 Decision tree Model for Web robot detection
5.3.7 CHARACTERISTICS OF DECISION TREE INDUCTION:
1. Decision tree induction is a non-parametric approach for building classification models.
2. Finding an optimal decision tree is| an NP-complete problem. Many decision tree
algorithms employ a heuristic-based approach to guide their search in the vast hypothesis
space.
3. Techniques developed for constructing decision trees are computationally inexpensive,
making it possible to quickly construct models even when the training set size is very
large.
4. Decision trees, especially smaller-sized trees, are relatively easy to interpret. The
accuracies of the trees are also comparable to other classification techniques for many
simple data sets.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 12
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
5. Decision trees provide an expressive representation for learning discrete-valued functions.
6. Decision tree algorithms are quite robust to the presence of noise, especially when
methods for avoiding over-fitting.
7. The presence of redundant attributes does not adversely affect the accuracy of decision
trees.
8. Since most decision tree algorithms employ a top-down, recursive partitioning approach,
the number of records becomes smaller as we traverse down the tree.
9. A sub-tree can be replicated multiple times in a decision tree, as given in figure-4.19 the
algorithm solves the sub-trees by using the divide and conquer algorithm to avoid
complexity.
10. Studies have shown that the choice of impurity measure has little effect on the
performance of decision tree induction algorithms.
5.4 RULE BASED CLASSIFIER:
A rule-based classifier is a technique for classifying records using a collection of "if....then...rules.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 13
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Rule-based Classifier (Example):
Name
human
python
salmon
whale
frog
komodo
bat
pigeon
cat
leopard shark
turtle
penguin
porcupine
eel
salamander
gila monster
platypus
owl
dolphin
eagle
Blood Type
warm
cold
cold
warm
cold
cold
warm
warm
warm
cold
cold
warm
warm
cold
cold
cold
warm
warm
warm
warm
Give Birth
yes
no
no
yes
no
no
yes
no
yes
yes
no
no
yes
no
no
no
no
no
yes
no
Can Fly
no
no
no
no
no
no
yes
yes
no
no
no
no
no
no
no
no
no
yes
no
yes
Live in Water
no
no
yes
yes
sometimes
no
no
no
no
yes
sometimes
sometimes
no
yes
sometimes
no
no
no
yes
no
Class
mammals
reptiles
fishes
mammals
amphibians
reptiles
mammals
birds
mammals
fishes
reptiles
birds
mammals
fishes
amphibians
reptiles
mammals
birds
mammals
birds
Table above shows an example of a model generated by a rule-based classifier for the
vertebrate classification problem.
Example of a rule set for the vertebrate classification problem.
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
Rule: (Condition)  y
– where
 Condition is a conjunctions of attributes
 y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
– Examples of classification rules:
 (Blood Type=Warm)  (Lay Eggs=Yes)  Birds
 (Taxable Income < 50K)  (Refund=Yes)  Evade=No
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 14
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Application of Rule-Based Classifier:
A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule
Name
hawk
grizzly bear
Blood Type
Give Birth
Can Fly
Live in Water
Class
no
yes
yes
no
no
no
?
?
warm
warm
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => Mammal
Rule Coverage and Accuracy:
 Coverage of a rule:
Fraction of records that
satisfy the antecedent of a
rule
 Accuracy of a rule:
Fraction of records that
satisfy both the antecedent
and consequent of a rule
(Status=Single)  No
Coverage = 40%,
Accuracy = 50%
Tid Refund Marital
Status
Taxable
Income Class
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
60K
10
5.4.1 HOW DOES RULE-BASED CLASSIFIER WORK?
A rule-based classifier classifies a test record based on the rule triggered by the record. To
illustrate how a rule-based classifier works, consider the rule set given below
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 15
Department of MCA
Name
lemur
turtle
dogfish shark
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Blood Type
warm
cold
cold
Give Birth
Can Fly
Live in Water
Class
yes
no
yes
no
no
no
no
sometimes
yes
?
?
?

A lemur triggers rule R3, so it is classified as a mammal

The second vertebrate, which is a turtle, triggers the rules r4 and r5. Since the classes
predicted by the rules are contradictory (reptiles versus amphibians), their conflicting
classes must be resolved.

None of the rules are applicable to a dogfish shark. In this case, we need to ensure that
the classifier can still make a reliable prediction even though a test record is not
covered by any rule.
Characteristics of Rule-Based Classifier:
(The above example table of lemur, turtle, dogfish illustrates two important properties of the
rule set generation i.e. Mutually exclusive rule and Exhaustive rule)

Mutually exclusive rules:
– Classifier contains mutually exclusive rules if the rules are independent of each other
– Every record is covered by at most one rule.
– Example lemur and dogfish is mutually exclusive as it triggers only one rule or no
rule, but turtle is not mutually exclusive as it triggers more than one rule i.e. R 4, R5

Exhaustive rules:
– Classifier has exhaustive coverage if it accounts for every possible combination of
attribute values
– Each record is covered by at least one rule
– Ex: In the above example Lemur and turtle are Exhaustive as it triggers at least one
rule, whereas Dogfish is not exhaustive as it does not triggers any rule.
Note: Together these properties ensure that every record is covered by exactly one rule.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 16
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Refund
Classification Rules
Yes
No
NO
Marita l
Status
{Single,
Divorced}
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
{Married}
NO
Taxable
Income
< 80K
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
> 80K
NO
(Refund=No, Marital Status={Married}) ==> No
YES
Here Rules are mutually exclusive and exhaustive
Note: Rule set contains as much information as the tree
Rules Can Be Simplified:
Tid Refund Marital
Status
Taxable
Income Cheat
No
1
Yes
Single
125K
No
Marita l
Status
2
No
Married
100K
No
{Married}
3
No
Single
70K
No
NO
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Refund
Yes
NO
{Single,
Divorced}
Taxable
Income
< 80K
NO
> 80K
60K
YES
Initial Rule: (Refund=No)  (Status=Married)  No
Simplified Rule: (Status=Married)  No
10
Effect of Rule Simplification:

If a rule is not mutually exclusive then
– A record may trigger more than one rule
– Solution?
 Ordered rule set
 Unordered rule set – use voting schemes
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 17
Department of MCA

Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
If a rule is not exhaustive then
– A record may not trigger any rules
– Solution?
 Use a default class

default rule, Rd: ()  yd, must be added to cover the remaining cases.

A default rule has an empty antecedent and is triggered when all other rules have failed.
Ordered Rule Set:
In this approach, the rules in a rule set are ordered in decreasing order of their priority, which
can be defined in many ways {e.g.,. based on accuracy, coverage, total description length, or
the order in which the rules are generated). An ordered rule set is also known as a decision list
Ex: Rules are ordered based on the way they are generated.
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes)  Amphibians
Name
turtle
Blood Type
cold
Give Birth
Can Fly
Live in Water
Class
no
no
sometimes
?
Unordered Rules:

This approach allows a test record to trigger multiple classification rules and considers the
consequent of each rule as a vote for a particular class.

The votes are then tallied to determine the class label of the test record. The record is usually
assigned to the class that receives the highest number of votes.

In some cases, the vote may be weighted by the rule's accuracy.
Advantages:
o Unordered rules are less susceptible to errors caused by the wrong rule being selected
to classify a test record.
o Model building is also less expensive because the rules do not have to be kept in
sorted order.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 18
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Disadvantages:
o Classifying a test record can be quite an expensive task because the attributes of the
test record must be compared against the precondition of every rule in the rule set.
5.4.2 RULE ORDERING SCHEMES:

Rule ordering can be implemented on a rule-by-rule basis or on a class-by-class basis.
Rule-Based Ordering Scheme
o This ordering scheme ensures that every test record is classified by the "best" rule
covering it.
o Individual rules are ranked based on their quality
Class-based ordering
o Rules that belong to the same class appear together.
o In this approach, rules that belong to the same class appear together in the rule set R.
o The rules are then collectively sorted on the basis of their class information.
Comparison Between Rule Based and Class Based Ordering Schemes:
Rule-based Ordering
Class-based Ordering
(Refund=Yes) ==> No
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
(Refund=No, Marital Status={Married}) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
5.4.3 HOW TO BUILD A RULE BASED CLASSIFIER:
There are two broad classes of methods for extracting classification rules:
(1) Direct method, which extract classification rules directly from data, and
(2) Indirect method, which extract classification rules from other classification models, such
as decision trees and neural networks
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 19
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
5.4.3.1 DIRECT METHODS FOR RULE EXTRACTION:

The sequential covering algorithm is often used to extract rules directly from data.

The algorithm extracts the rules one class at a time for data sets that contain more than two
classes.

For the vertebrate classification problem, the sequential covering algorithm may generate
rules for classifying birds first, followed by rules for classifying mammals, amphibians,
reptiles, and finally, fishes

The criterion for deciding which class should be generated first depends on a number of
factors, such as fraction of training records that belong to a particular class or the cost of
misclassifying records from a given class
Sequential Covering Algorithm:
1. Start from an empty rule
2. Grow a rule using the Learn-One-Rule function
3. Remove training records covered by the rule
4. Repeat Step (2) and (3) until stopping criterion is met

During rule extraction, all training records for class” y” is considered to be positive examples,
while those that belong to other classes are considered to be negative examples.

A rule is desirable if it covers most of the positive examples and none (or very few) of the
negative examples.

Once such a rule is found, the training records covered by the rule are eliminated.

The new rule is added to the bottom of the decision list R.

This procedure is repeated until the stopping criterion is met.

The algorithm then proceeds to generate rules for the next class.

Figure 5.2 demonstrates how the sequential covering algorithm works for a data set that
contains a collection of positive and negative examples.

The rule R1, whose coverage is shown in Figure 5.2(b), is extracted first because it covers the
largest fraction of positive examples.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 20
Department of MCA

Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
All the training records covered by Rl are subsequently removed and the algorithm proceeds
to look for the next best rule, which is R2.
Steps Followed to Build Rule In Detail:
1. Learn-One-Rule Function
o The objective of the Learn-One-Rule function is to extract a classification rule that
covers many of the positive examples and none (or very few) of the negative
examples in the training set. However, finding an optimal rule is computationally
expensive given the exponential size of the search space.
2. Rule Growing Strategy:
o There are two common strategies for growing a classification rule:
General-to-specific or Specific-to-general
o Under the general-to-specific strategy, an initial rule r: {} —• y is created, where the
left-hand side is an empty set and the right-hand side contains the target class. The
rule has poor quality because it covers all the examples in the training set.
o New conjuncts are subsequently added to improve the rule's quality.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 21
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
\\\\\\\\\\\\\\\\\\\\\
General-to-specific rule-growing strategy for the vertebrate classification problem:
o The conjunct Body Temperature = warm-blooded is initially chosen to form the rule
antecedent.
o The algorithm then explores all the possible candidates like Gives-Birth=yes.
o This process continues until the stopping criterion is met.
Specific-to-general strategy rule-growing strategy for the vertebrate classification
problem:
o Suppose a positive example for mammals is chosen as the initial seed.
o To improve its coverage, the rule is generalized by removing the conjunct
Hibernate=no.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 22
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
o The refinement step is repeated until the stopping criterion is met, e.g., when the rule
starts covering negative examples.
3. Rule Evaluation:
o An evaluation metric is needed to determine which conjunct should be added (or
removed) during the rule-growing process.
Example, consider a training set that contains 60 positive examples and 100 negative
examples.
Suppose we are given the following two candidate rules:
Rule R1: covers 50 positive examples and 5 negative examples,
Rule R2: covers 2 positive examples and no negative examples.
o The accuracies for R1 and R2 are 90.9% and 100%, respectively.
o However, R 1 is the better rule despite its lower accuracy.
o The high accuracy for R2 is potentially spurious because the coverage of the rule is
too low.
4. Rule Pruning:
o The Rule generated by the Learn-One-Rule function can be pruned to improve their
generalization errors.
Ex: If the error on validation set decreases after pruning, we should keep the simplified rule.
5. Rationale for sequential covering:
o After a rule is extracted, the sequential covering algorithm must eliminate all the
positive and negative examples covered by the rule.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 23
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
The Rational for doing this is given below:
o Figure above shows three possible rules, R1, R2, R3, extracted from a data set that
contains 29 positive examples and 21 negative examples.
o The accuracies of R1, R2, and R3 are 12/15 (80%), 7/10 (70%), 8/12 (66.7%)
respectively.
o R1 is generated first as it has the highest accuracy.
5.4.3.2 INDIRECT METHOD FOR RULE EXTRACTION:
It is a method for generating a rule set from a decision tree. In principle, every path from the
root node to the leaf node of a decision tree can be expressed as a classification rule
P
No
Yes
Q
No
-
Rule Set
R
Yes
No
Yes
+
+
Q
No
Lecturer: Syed Khutubuddin Ahmed
Yes
r1: (P=No,Q=No) ==> r2: (P=No,Q=Yes) ==> +
r3: (P=Yes,R=No) ==> +
r4: (P=Yes,R=Yes,Q=No) ==> r5: (P=Yes,R=Yes,Q=Yes) ==> +
+
Contact: [email protected]
Page 24
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Example: Consider the following three rules from above Figure:
r2 : (P = No) ^ (Q = Yes)  +
r3 : (P = Yes) ^ (R = No)  +
r5 : (P = Yes) ^ R = Yes) A (Q = Yes)  +
Observe that the rule set always predicts a positive class when the value of Q is Yes.
Therefore, we may simplify the rules as follows:
r2': (Q = Yes)  +
r3: (P = Yes) ^ (R = No)  +
r3 is retained to cover the remaining instances of the positive class. Although the rules
obtained after simplification are no longer mutually exclusive, they are less complex and are
easier to interpret
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 25
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Rule Generation:
o Classification rules are extracted for every path from the root to one of the leaf nodes
in the decision tree.
o Given a classification rule r : A —> y, we consider a simplified rule, r' : A' —► y,
where A' is obtained by removing one of the conjuncts in A.
Ex: r2 : (P = No) ^ (Q = Yes)  + is simplified to r2': (Q = Yes)  +
o The simplified rule with the lowest pessimistic error rate is retained provided its error
rate is less than that of the original rule.
o The rule-pruning step is repeated until the pessimistic error of the rule cannot be
improved further.
o Because some of the rules may become identical after pruning, the duplicate rules
must be discarded.
Rule Ordering:
o After generating the rule set, rules uses the class-based ordering scheme to order
the extracted rules.
o Rules that predict the same class are grouped together into the same subset.
o
The total description length for each subset is computed, and the classes are arranged
in increasing order of their total description length.
o The class that has the smallest description is given the highest priority because it is
expected to contain the best set of rules.
Advantages of Rule-Based Classifiers
o highly expressive as decision trees
o Easy to interpret
o Easy to generate
o Can classify new instances rapidly
o Performance comparable to decision trees
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 26
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
5.5 NEAREST-NEIGHBOR CLASSIFIERS:

The classification framework shown in
fig involves a two-step process:
1. an inductive step for constructing
a classification model from data,
and
2. a deductive step for applying the
model to test examples

Decision tree and Rule-based classifiers are examples of Eager learners because they are
designed to learn a model that maps the input attributes to the class label as soon as the
training data becomes available.

An opposite strategy would be to delay the process of modeling the training data until it is
needed to classify the test examples. Techniques that employ this strategy are known as lazy
learners.

An example of a lazy learner is the Rote classifier, which memorizes the entire training data
and performs classification only if the attributes of a test instance match one of the training
examples exactly. An obvious drawback of this approach is that some test records may not be
classified because they do not match any training example.

One way to make this approach more flexible is to find all the training examples that are
relatively similar to the attributes of the test example.

These examples, which are known as nearest neighbors, can be used to determine the class
label of the test example.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 27
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
The justification for using nearest neighbors is best exemplified by the following saying:
Basic idea:
"If it walks like a duck, quacks like a duck, and looks like a duck, then it's probably a
duck."
Compute
Distance
Training
Records

Test
Record
Choose k of the
“nearest” records
A nearest-neighbor classifier represents each example as a data point in a d-dimensional
space, where d is the number of attributes.

Given a test example, we compute its proximity to the rest of the data points in the training
set, using one of the proximity measures.

Figure 5.7 illustrates the 1-, 2-, and 3-nearest neighbors of a data point located at the center of
each circle.

The data point is classified based on the class labels of its neighbors.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 28
Department of MCA

Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
In the case where the neighbors have more than one label, the data point is assigned to the
majority class of its nearest neighbors.

In Figure 5.7(a), the 1-nearest neighbor of the data point is a negative example.

Therefore the data point is assigned to the negative class.

If the number of nearest neighbors is three, as shown in Figure 5.7(c), then the neighborhood
contains two positive examples and one negative example. Using the majority voting scheme,
the data point is assigned to the positive class.

In the case where there is a tie between the classes (see Figure 5.7(b)), we may randomly
choose one of them to classify the data point.

The preceding discussion underscores the importance of choosing the right value for k. If k is
too small, then the nearest-neighbor classifier may be susceptible to over-fitting because of
noise in the training data.

On the other hand, if k is too large, the nearest-neighbor classifier may misclassify the test
instance because its list of nearest neighbors may include data points that are located far away
from its neighborhood (see Figure 5.8).
Characteristics of Nearest-Neighbor Classifiers
The characteristics of the nearest-neighbor classifier are summarized below:

Nearest-neighbor classification is part of a more general technique known as instance-based
learning, which uses specific training instances to make predictions without having to
maintain an abstraction (or model) derived from data.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 29
Department of MCA

Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Lazy learners such as nearest-neighbor classifiers do not require model building. However,
classifying a test example can be quite expensive because we need to compute the proximity
values individually between the test and training examples.

Nearest-neighbor classifiers make their predictions based on local information, whereas
decision tree and rule-based classifiers attempt to find a global model that fits the entire input
space.

Nearest-neighbor classifiers can produce arbitrarily shaped decision boundaries. Such
boundaries provide a more flexible model representation compared to decision tree and rulebased classifiers

Nearest-neighbor classifiers can produce wrong predictions unless the appropriate proximity
measure and data preprocessing steps are taken. For example, suppose we want to classify a
group of people based on attributes such as height (measured in meters) and weight
(measured in pounds). The height attribute has a low variability, ranging from 1.5 m to 1.85
m, whereas the weight attribute may vary from 90 lb. to 250 lb. If the scale of the attributes is
not taken into consideration, the proximity measure may be dominated by differences in the
weights of a person.
5.6 BAYESIAN CLASSIFIERS:

In many applications the relationship between the attribute set and the class variable is nondeterministic.

In other words, the class label of a test record cannot be predicted with certainty even though
its attribute set is identical to some of the training examples.

This situation may arise because of noisy data.

For example, consider the task of predicting whether a person is at risk for heart disease
based on the person's diet and workout frequency.

Although most people who eat healthily and exercise regularly have less chance of
developing heart disease, they may still do so because of other factors such as heredity,
excessive smoking, and alcohol abuse.

Determining whether a person's diet is healthy or the workout frequency is sufficient is also
subject to interpretation, which in turn may introduce uncertainties into the learning problem.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 30
Department of MCA

Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Bayes theorem is a statistical principle for combining prior knowledge of the classes
with new evidence gathered from data.
5.6.1 Bayes Theorem:

Consider a football game between two rival teams: Team 0 and Team 1. Suppose Team 0
wins 65% of the time and Team 1 wins the remaining matches. Among the games won by
Team 0, only 30% of them come from playing on Team 1 's football field. On the other hand,
75% of the victories for Team 1 are obtained while playing at home. If Team 1 is to host the
next match between the two teams, which team will most likely emerge as the winner?
This question can be answered by using the well-known Bayes theorem

To solve this problem we may use the concept of probability.
Formula of Bayes theorem:

The Bayes theorem can be used to solve the prediction problem stated at the beginning of this
section.
Let
X be the random variable that represents the team hosting the match
Y be the random variable that represents the winner of the match.
Both X and Y can take on values from the set {0,1}.
We can summarize the information given in the problem as follows:
Probability Team 0 wins is P ( Y = 0) = 0.65.
Probability Team 1 wins is P ( Y = 1 ) = 1 - P ( Y = 0 ) = 0.35.
Probability Team 1 hosted the match it won is P ( X = l | Y = 1 ) = 0 . 7 5 .
Probability Team 1 hosted the match won by Team 0 is P ( X = 1|Y = 0) = 0.3.
Our objective is to compute P ( Y = 1 | X = 1), which is the conditional probability that Team 1
wins the next match it will be hosting, and compares it against P ( Y = 0 | X = 1).
Using the Bays theorem we obtain P(Y=1 | X=1) = 0.5738
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 31
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Using the Bayes theorem, we obtain
P ( Y = 0 | X = 1) = 1 - P ( Y = 1 | X = 1) = 1-0.5738=0.4262.
Since P ( Y = l | X = 1) > P ( Y = 0 | X = 1), Team 1 has a better chance than Team 0 of winning
the next match.
To estimate the class-conditional probabilities P(X|Y):
Two implementations of Bayesian classification methods are used:
1.
Naive Bayes classifier
2.
Bayesian belief network.
1. Naive Bayes Classifier
A naive Bayes classifier estimates the class-conditional probability by assuming that the attributes
are conditionally independent
Conditional independence:
An example of conditional independence is the relationship between a person's arm length and his or
her reading skills. One might observe that people with longer arms tend to have higher levels of
reading skills. This relationship can be explained by the presence of a confounding factor, which is
age. A young child tends to have short arms and lacks the reading skills of an adult. If the age of a
person is fixed, then the observed relationship between arm length and reading skills disappears.
Thus, we can conclude that arm length and reading skills are conditionally independent when the age
variable is fixed.
5.7 ESTIMATING PREDICTIVE ACCURACY OF CLASSIFICATION METHODS

The accuracy of a classification method is the ability of the method to correctly determine the
class of a randomly selected data instance.

It may be expressed as the probability of correctly classifying unseen data.

Estimating the accuracy of a supervised classification method can be difficult if only the
(raining data is available and all of that data has been used in building the model.

In such situations, overoptimistic predictions are often made regarding the accuracy of the
model.

The accuracy estimation problem is much easier when much more data is available than is
required for training the model.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 32
Department of MCA

Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
We would obviously like to obtain an accuracy estimate (e.g. a mean value and variance) that
has low bias and low variance.

Accuracy may be measured using a number of metrics. These include:
Sensitivity
Specificity
Precision and Accuracy.

The methods for estimating errors include:
Holdout
Random
Sub-sampling,
Cross-validation and leave-one-out

Let us assume that the test data has a total of T objects. When testing a method we find that C
of the T objects arc correctly classified. The error rate then may be defined as
Error rate = (T – C)/T

This error rate is expected to be a good estimate if the number of test data T is large.

A method of estimating the error rate is considered biased if it either tends to underestimate
the error or tends to overestimate it.

A "confusion matrix" is sometimes used to represent the result of testing.

Advantage of using Confusion matrix: It not only tells us how many got misclassified but
also what misclassifications occurred.

For example, in Table 3.11 we can see that 1 out of 10 objects that belonged to Class 3 got
misclassified to Class 1.
Table 3.11
A c o n f u s i o n ma t r i x f o r t h r e e c l a s s e s
Predicted Class
1
2
3

1
8
2
0
True Class
2
1
9
0
3
1
2
7
Using Table 3.11 we can define the terms "false positive" (FP) and "false negative" (FN).
False positive cases are those that did not belong to a class but were allocated to it.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 33
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
5.7.1 Methods for estimating the accuracy of a method:
Holdout Method:
o The holdout method requires a training set and a test set.
o The sets are mutually exclusive.
o A larger training set would produce a better classifier, while a larger test set would
produce a better estimate of the accuracy. A balance must be achieved.
Random Sub-sampling Method:
o Random sub-sampling is very much like the holdout method except that it does not
rely on a single test set.
o The holdout estimation is repeated several times and the accuracy estimate is obtained
by computing the mean of the several trials.
o Random sub-sampling is likely to produce better error estimates than those by
the holdout method.
K-fold Cross-validation Method:
o In K-fold cross-validation, the available data is randomly divided into k disjoint
subsets of approximately equal size.
o One of the subsets is then used as the test set and the remaining k - 1 sets are used for
building the classifier.
o The test set is then used to estimate the accuracy.
o This is done repeatedly k times so that each subset is used as a test subset once.
o
Cross-validation has been tested extensively and has been found to generally work
well when sufficient data is available.
Leave-one-out Method
o Leave-one-out is a simpler version of K-fold cross-validation.
o In this method. One of the training samples is taken out and the model is generated
using the remaining training data.
o Once the model is built, the one remaining sample is used for testing and the result is
coded as 1 or 0 depending if it was classified correctly or not.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 34
Department of MCA
o
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
The average of such results provides an estimate of the accuracy.
o The leave-one-out method is useful when the dataset is small.
o For large training datasets. Leave-one-out can become expensive.
Bootstrap Method:
o In this method, given a dataset of size n, a bootstrap sample is randomly selected
uniformly with replacement by sampling n times and used to build a model.
o It can be shown that only 63.2% of these samples are unique.
o The error in building the model is estimated by using the remaining 36.8% of objects
that are not in the bootstrap sample.
o The final error is then computed as 0.632 times the training error plus 0.368 times the
testing error.
o The figures 0.632 and 0.368 are based on the assumption that if there were n samples
available initially from which n samples with replacement were randomly selected for
training data then the expected percentage of unique samples in the training data
would be 0.632 or 63.2% and the number of remaining unique objects used in the test
data would be 0.368 or 36.8% of the initial sample.
o This is repeated and the average of error estimates is obtained.
o The bootstrap method is unbiased and, in contrast to leave-one-out.
o It has low variance but many iterations are needed for good error estimates if the
sample is small.
o A-fold cross-validation, leave-one-out and bootstrap techniques are used in many
decision tree packages including CART and Clementine.
5.8 IMPROVING ACCURACY OF CLASSIFICATION METHODS

Bootstrapping, bagging and boosting are techniques for improving the accuracy of
classification results.

They have been shown to be very successful for certain models, for example, decision trees.

All three involve combining several classification results from the same training data.

The aim of building several decision trees by using training data that has thrown confusion is
to find out how these decision trees differ from those that have been obtained earlier.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 35
Department of MCA

Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
The bootstrapping shows that, all the samples selected are somewhat different from each
other since, on the average, only 63.2% of the objects in them are unique. The bootstrap
samples are then used for building decision trees which are then combined to form a single
decision tree.

Bagging (the name is derived from Bootstrap and aggregating) combines classification
results from multiple models or results of using the same method on several different sets of
training data.

Bagging may also be used to improve the stability and accuracy of a complex classification
model with limited training data by using sub-samples obtained by resampling, with
replacement, for generating models.

In contrast to simple voting used in bagging, weighted voting can also be used.

A technique called boosting involves applying weights to different instances of training data
depending on the quality of the previous result.

Often the idea is to give more weight to those instances that are not being correctly classified
and less weight to those that are frequently classified correctly. One approach is that different
weights can be applied to data in different classes. For example, the weights could be
inversely proportional to the accuracy of prediction in each class.

Thus classes that are not being classified with a high level of accuracy will get a higher
weight and the data from these classes will be selected more frequently in resampling.

When the classifier is used again, the result can improve the performance of classes that were
doing poorly in the last result. This is followed by another iteration of computing weights and
so on until some stopping criterion is met
Benefits of these methods are:
o These techniques can provide a level of accuracy that usually cannot be obtained by a
large single-tree model.
o Creating a single decision tree from a collection of trees in bagging and boosting is
not difficult
o These methods can often help in avoiding the problem of over fitting since a number
of trees based on random samples are used
o Boosting appears to be on the average better than bagging although it is not always so.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 36
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
o On some problems bagging does better than boosting.
o On the other hand, it should be noted that it is more difficult to visualize a large set of
decision trees compared to a single tree.
o Also, generally bagging and boosting produce a final classifier that is much more
difficult to understand. Nevertheless, bagging and boosting generally do improve
accuracy and are being used in several decision tree software packages.
5.9 OTHER EVALUATION CRITERIA FOR CLASSIFICATION METHODS

Other criteria for evaluation of classification methods, like those for all data mining
techniques, are:
1. Speed
2. Robustness
3. Scalability
4. Interpretability
5. Goodness of the model
6. Flexibility
7. Time complexity
Speed
o Speed involves not just the time or computation cost of constructing a model (e.g. a
decision tree), it also includes the time required to learn to use the model.
o
Obviously, a user wishes to minimize both times although it has to be understood
that any significant data mining project will take time to plan and prepare the data.
Robustness
o Data errors are common, in particular when data is being collected from a number of
sources and errors may remain even after data cleaning.
o It is therefore desirable that a method be able to produce good results in spite of some
errors and missing values in datasets.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 37
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Scalability
o Many data mining methods were originally designed for small datasets.
o Many have been modified to deal with large problems. Given that large datasets are
becoming common, it is desirable that a method continues to work efficiently for
large disk-resident databases as well.
Interpretability
o An important task of a data mining professional is to ensure that the results of data
mining are explained to the decision makers.
o It is therefore desirable that the end-user be able to understand and gain insight from
the results produced by the classification method.
Goodness of the Model
o For a model to be effective, it needs to fit the problem that is being solved for
example, in a decision tree classification, it is desirable to find a decision tree of the
right size and compactness with high accuracy.
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 38
Department of MCA
Data Mining & Warehousing-CH-5 Notes KNS Institute of Technology
Possible Questions from This chapter:
1. Define classification and explain the purpose of classification model? Ans : Page-1
2. Explain the general approach followed for building classification problems? Ans: Page-2
3. Explain how a decision tree works? 5 marks Ans: page-3, 4 & 5
4. Explain How to build a decision tree or Explain Hunt’s Algorithm? 5 marks Ans: page-5, 6
&7
5. Explain the different methods used to express the attribute test condition using figures ans
also explain the measures taken for selecting the best split? 5+5 marks. Ans:Page-7, 8, 9 &
10
6. Explain the algorithm for decision tree induction? 5 marks Ans:Page-10 to 11
7. Write a note on web Robot detection? 5 marks Ans: page-11 & 12
8. Explain the different characteristics of decision tree induction? 5 marks Ans: Page-12 & 13
9. Explain Rule based classifier and how does rule based classifier works in detail? 10 marks
Ans: Page-13, 14, 15 & 16
10. Explain the characteristics of rule based classifier? 5 marks, Ans: Page-16
11. Explain the effect of rule simplification? 5 marks, Ans: Page-17 & 18
12. Explain Rule ordering schemes in Rule based classifier? 5 marks Ans: Page-19
13. Explain in detail building of Rule based classifier using Direct method technique? 10 Marks,
Ans: Page-20 to 24
14. Explain in detail building of Rule based classifier using Indirect method technique and also
give the advantages of using Rule based classifiers? 10 marks, Ans: 24 to 26
15. Explain Nearest Neighbour classifier in Detail and also give the characteristics of nearest
neighbour classifier? 10 marks, Ans: Page-27 to 30
16. Explain Bayesian classifiers in detail? 10 marks Ans: Page- 30 to 32
17. Explain the methods used for estimating the accuracy? 5 marks Ans: Page-34 & 35
18. Explain the evaluation criteria for classification methods? 5 marks Ans: Page-37 & 38
Lecturer: Syed Khutubuddin Ahmed
Contact: [email protected]
Page 39