Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Rule-Based Classifier l Classify records by using a collection of “if…then…” rules l Rule: (Condition) → y – where Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining human python salmon whale frog komodo bat pigeon cat leopard shark turtle penguin porcupine eel salamander gila monster platypus owl dolphin eagle Blood Type warm cold cold warm cold cold warm warm warm cold cold warm warm cold cold cold warm warm warm warm Give Birth 4/18/2004 yes no no yes no no yes no yes yes no no yes no no no no no yes no Can Fly Live in Water no no no no no no yes yes no no no no no no no no no yes no yes no no yes yes sometimes no no no no yes sometimes sometimes no yes sometimes no no no yes no 1 (Taxable Income < 50K) ∧ (Refund=Yes) → Evade=No Introduction to Data Mining 4/18/2004 2 R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians 4/18/2004 Tid Refund Marital A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds Name hawk grizzly bear Blood Type warm warm Give Birth Can Fly Live in Water Class no yes yes no no no ? ? The rule R1 covers a hawk => Bird The rule R3 covers the grizzly bear => Mammal 3 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 How does Rule-based Classifier Work? Taxable Coverage of a rule: Status Income Class 1 Yes Single 125K No – Fraction of records 2 No Married 100K No that satisfy the 3 No Single 70K No antecedent of a rule 4 Yes Married 120K No 5 No Divorced 95K Yes l Accuracy of a rule: 6 No Married 60K No – Fraction of the 7 Yes Divorced 220K No records that satisfy 8 No 85K Single Yes 9 No Married 75K No the antecedent, that 10 No Single 90K Yes also satisfy the (Status=Single) → No consequent of a rule Coverage = 40%, Accuracy = 50% 0 1 Introduction to Data Mining (Blood Type=Warm) ∧ (Lay Eggs=Yes) → Birds u © Tan,Steinbach, Kumar l Rule Coverage and Accuracy © Tan,Steinbach, Kumar u Application of Rule-Based Classifier mammals reptiles fishes mammals amphibians reptiles mammals birds mammals fishes reptiles birds mammals fishes amphibians reptiles mammals birds mammals birds Introduction to Data Mining l y is the class label Class R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians © Tan,Steinbach, Kumar Condition is a conjunction of attribute tests u – LHS: rule antecedent or condition – RHS: rule consequent – Examples of classification rules: Rule-based Classifier (Example) Name u 4/18/2004 5 R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians Name lemur turtle dogfish shark Blood Type warm cold cold Give Birth Can Fly Live in Water Class yes no yes no no no no sometimes yes ? ? ? A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Characteristics of Rule-Based Classifier l l From Decision Trees To Rules Mutually exclusive rules – Every record is covered by at most one rule Classification Rules (Refund=Yes) ==> No Refund Exhaustive rules – Classifier has exhaustive coverage if it accounts for every possible combination of attribute values – Each record is covered by at least one rule Yes No NO Marita l Status {Single, Divorced} (Refund=No, Marital Status={Single,Divorced}, Taxable Income<80K) ==> No {Married} (Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes (Refund=No, Marital Status={Married}) ==> No NO Taxable Income < 80K > 80K NO YES Rules are mutually exclusive and exhaustive Rule set contains as much information as the tree © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 Rules Can Be Simplified © Tan,Steinbach, Kumar No NO Marita l Status {Single, Divorced} {Married} NO Taxable Income < 80K > 80K NO YES Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes l 60K u l (Refund=No) ∧ (Status=Married) → No Introduction to Data Mining 4/18/2004 9 Ordered Rule Set l Use a default class © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Rule Ordering Schemes Rules are rank ordered according to their priority l Rule-based ordering l Class-based ordering – An ordered rule set is known as a decision list l Unordered rule set: use voting schemes Ordered rule set Rules are no longer exhaustive – A record may not trigger any rules – Solution? u Simplified Rule: (Status=Married) → No © Tan,Steinbach, Kumar 8 Rules are no longer mutually exclusive – A record may trigger more than one rule – Solution? u 1 0 Initial Rule: 4/18/2004 Effect of Rule Simplification Refund Yes Introduction to Data Mining – Individual rules are ranked based on their quality When a test record is presented to the classifier – It is assigned to the class label of the highest ranked rule it has triggered – Rules that belong to the same class appear together – If none of the rules fired, it is assigned to the default class R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles R5: (Live in Water = sometimes) → Amphibians Name turtle © Tan,Steinbach, Kumar Blood Type cold Give Birth Can Fly Live in Water Class no no sometimes ? Introduction to Data Mining 4/18/2004 11 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12 Building Classification Rules l Direct Method: Sequential Covering Direct Method: 1. Extract rules directly from data u e.g.: RIPPER, CN2, Holte’s 1R u 2. 3. 4. l Indirect Method: Start from an empty rule-list Grow a rule using the Learn-One-Rule function Remove training records covered by the rule Repeat Step (2) and (3) until stopping criterion is met Extract rules from other classification models (e.g. decision trees, neural networks, etc). u e.g: C4.5rules u © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13 Example of Sequential Covering © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14 Example of Sequential Covering… R1 R1 R2 (iii) Step 2 (ii) Step 1 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15 Aspects of Sequential Covering l Rule Growing l Instance Elimination l Rule Evaluation l Stopping Criterion l Rule Pruning © Tan,Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16 4/18/2004 18 Rule Growing l Introduction to Data Mining (iv) Step 3 4/18/2004 17 Two common strategies © Tan,Steinbach, Kumar Introduction to Data Mining Rule Growing (Examples) l Rule growing: Ripper CN2 Algorithm: Suppose R0 covers 60 positive cases and 40 negative. We compare two possible refinements of R0: 1. R1 covers 50 positive and 10 negative cases. 2. R2 covers 30 positive and 5 negative cases. – Start from an empty conjunct: {} – Add conjuncts that minimizes the entropy measure: {A}, {A,B}, … – Determine the rule consequent by taking majority class of instances covered by the rule l RIPPER Algorithm: Their information gains are: – Start from an empty rule: {} => class – Add conjuncts that maximizes FOIL’s information gain measure: R0: {} => class (initial rule) R1: {A} => class (rule after adding conjunct) u Gain(R0, R1) = p1 [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ] u where p0: number of positive instances covered by R0 n0: number of negative instances covered by R0 p1: number of positive instances covered by R1 n1: number of negative instances covered by R1 u u © Tan,Steinbach, Kumar Introduction to Data Mining Hence we prefer refinement R1 even though it has lower accuracy. 4/18/2004 19 Instance Elimination l © Tan,Steinbach, Kumar Introduction to Data Mining Why do we need to eliminate instances? If we don’t remove the cases covered by R1 then R3 covers 8 positive cases and 4 negative cases. Otherwise it covers 6 positive and 2 negative cases. Why do we remove positive instances? – Ensure that the next rule is different l For R2 it doesn’t make any difference: it covers 7 positive and 3 negative cases. Why do we remove negative instances? – Prevent underestimating accuracy of rule – Compare rules R2 and R3 in the diagram © Tan,Steinbach, Kumar Introduction to Data Mining So if we don’t remove the covered cases, the accuracy of R3 is underestimated (8/12 instead of 6/8). 4/18/2004 21 Rule Evaluation l Metrics: – Accuracy – Laplace n = c n n +1 = c n+k n + kp – M-estimate = c n+k 20 Instance Elimination – Otherwise, the next rule is identical to previous rule l 4/18/2004 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 22 Rule evaluation: example R: condition ⇒ class = c Suppose we have a 2-class problem with 60 positive and 100 negative examples. Consider 1. R1 covers 50 positive examples and 5 negative. 2. R2 covers 2 positive examples and no negative. Accuracy of R1 is 50/55 = 0.91. Accuracy of R2 is 2/2 = 1. n : Number of instances covered by rule nc : Number of instances of class c covered by rule With Laplace correction this becomes: R1: (50+1)/(55+2) = 0.89 R2: (2+1)/(2+2) = 0.75 k : Number of classes p : Prior probability of class c With m-estimate (p=60/160=3/8): R1: (50+2×3/8)/(55+2)=0.89 R2: (2+2×3/8)/(2+2) = 0.69 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 23 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 24 Stopping Criterion and Rule Pruning l l Summary of Direct Method Stopping criterion – Compute the gain – If gain is not significant, stop growing rule Rule Pruning – Similar to post-pruning of decision trees – Reduced Error Pruning: Remove one of the conjuncts in the rule Compare error rate on validation set before and after pruning u If error improves, prune the conjunct u l Grow a single rule l Prune the rule (if necessary) l Remove instances of rule l Add rule to Current Rule Set l Repeat u © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 25 Direct Method: RIPPER l l Introduction to Data Mining 4/18/2004 27 Pruning in Ripper: Example (cf. exc. 2) Suppose we have the rule R: A and B ⇒ class=c. Suppose we have a validation set (not used for growing the rule) that contains 500 positive examples (i.e. with class=c) and 500 negative examples. Suppose that among all cases in the validation set that satisfy condition A, there are 200 positive examples and 50 negative examples. Among all cases in the validation set that meet both condition A and condition B, there are 100 positive examples and 5 negative examples. Should condition B be pruned? © Tan,Steinbach, Kumar Introduction to Data Mining Introduction to Data Mining 4/18/2004 26 Direct Method: RIPPER For 2-class problem, choose one of the classes as positive class, and the other as negative class – Learn rules for positive class – Negative class will be default class For multi-class problem – Order the classes according to increasing class prevalence (fraction of instances that belong to a particular class) – Learn the rule set for smallest class first, treat the rest as negative class – Repeat with next smallest class as positive class © Tan,Steinbach, Kumar © Tan,Steinbach, Kumar 4/18/2004 29 l Growing a rule: – Start from empty rule-body – Add conjuncts as long as they improve (?) FOIL’s information gain – Stop when rule no longer covers negative examples – Prune the rule immediately using incremental reduced error pruning – Measure for pruning: v = (p-n)/(p+n) p: number of positive examples covered by the rule in the validation set u n: number of negative examples covered by the rule in the validation set u – Pruning method: delete any final sequence of conditions that maximizes v © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 28 Pruning in Ripper LHS Rule v=(p-n)/(p+n) {} (500-500)/1000=0 {A} (200-50)/250=0.6 {A,B} (100-5)/105=0.9 Since {A,B} maximizes v, the rule is not pruned. © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 30 Direct Method: RIPPER l Indirect Methods Building a Rule Set: – Use sequential covering algorithm Finds the best rule that covers the current set of positive examples u Eliminate both positive and negative examples covered by the rule u – Each time a rule is added to the rule set, compute the new description length stop adding new rules when the new description length is d bits longer than the smallest description length obtained so far (description length depends on accuracy and complexity of the rule). u © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 31 © Tan,Steinbach, Kumar Introduction to Data Mining Indirect Method: C4.5rules Rule Pruning in C4.5rules Extract rules from an unpruned decision tree l For each rule, R: A → class=c, – consider an alternative rule R-: A- → class=c where A- is obtained by removing one of the conditions in A – Compare the pessimistic error rate for R against all R- s. – Prune if one of the R- s has lower pessimistic error rate – Repeat until we can no longer improve generalization error R: if A then class=c l © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 33 4/18/2004 R-: if A- then class=c where A- has one condition less than A. Let X be the condition considered for removal from A. Make a cross-table: A- and X Not X Class=c Class ≠ c Y1 Y2 E1 E2 R covers Y1 + E1 cases, and makes E1 errors. Pessimistic estimate UCF(E1,Y1 + E1). © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Rule Pruning in C4.5rules Rule Pruning in C4.5rules Consider the rule Condition Deleted TSH > 6 Y1+Y2 E1+E2 3 1 Pessimistic Error rate 55% FTI ≤ 64 6 1 34% TSH 2 measured=t 1 68% 2 T4U measured=t 1 68% Thyroid surgery=t 59 97% if TSH > 6 FTI ≤ 64 TSH measured = true T4U measured = true thyroid surgery = true then class negative Suppose this rule covers 3 training cases, 2 of which have class negative. The pessimistic error estimate, using CF=0.25 is U25%(1,3) is 68%, because P(E ≤ 1) is approximately 0.25 under a binomial distribution with N=3, and probability of error is 0.68 on each draw. 32 3 34 Lowest pessimistic error is obtained by deleting FTI ≤ 64 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 35 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 36 Indirect Method: C4.5rules l Example Name Instead of ordering the rules, order subsets of rules (class ordering) – Each subset is a collection of rules with the same rule consequent (class) – Compute description length of each subset human python salmon whale frog komodo bat pigeon cat leopard shark turtle penguin porcupine eel salamander gila monster platypus owl dolphin eagle Description length = L(error) + g L(model) g is a parameter that takes into account the presence of redundant attributes in a rule set (default value = 0.5) u u – Order according to increasing description length © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 37 C4.5 versus C4.5rules versus RIPPER No (Give Birth=Yes) → Mammals (Give Birth=No, Can Fly=No, Live in Water=No) → Reptiles Live In Water? Mammals Yes ( ) → Amphibians RIPPER: No (Live in Water=Yes) → Fishes Fishes Yes Birds © Tan,Steinbach, Kumar (Give Birth=No, Can Fly=No, Live In Water=No) → Reptiles Can Fly? Amphibians (Can Fly=Yes,Give Birth=No) → Birds No () → Mammals no no yes yes sometimes no no no no yes sometimes sometimes no yes sometimes no no no yes no yes no no no yes yes yes yes yes no yes yes yes no yes yes yes yes no yes Class mammals reptiles fishes mammals amphibians reptiles mammals birds mammals fishes reptiles birds mammals fishes amphibians reptiles mammals birds mammals birds 4/18/2004 38 PREDICT ED CLASS Amphibians Fishes Reptiles Birds ACT UAL Amphibians 2 0 0 CLASS Fishes 0 2 0 Reptiles 1 0 3 Birds 1 0 0 M ammals 0 0 1 0 0 0 3 0 M ammals 0 1 0 0 6 PREDICT ED CLASS Amphibians Fishes Reptiles Birds ACT UAL Amphibians 0 0 0 CLASS Fishes 0 3 0 Reptiles 0 0 3 Birds 0 0 1 M ammals 0 2 1 0 0 0 2 0 M ammals 2 0 1 1 4 Reptiles Introduction to Data Mining 4/18/2004 39 Advantages of Rule-Based Classifiers As highly expressive as decision trees Easy to interpret l Easy to generate l Can classify new instances rapidly l Performance comparable to decision trees l l © Tan,Steinbach, Kumar Introduction to Data Mining Live in Water Have Legs RIPPER: (Have Legs=No) → Reptiles Sometimes Can Fly no no no no no no yes yes no no no no no no no no no yes no yes C4.5 and C4.5rules: (Give Birth=No, Can Fly=Yes) → Birds (Give Birth=No, Live in Water=Yes) → Fishes Yes © Tan,Steinbach, Kumar Lay Eggs no yes yes no yes yes no yes no no yes yes no yes yes yes yes yes no yes C4.5 versus C4.5rules versus RIPPER C4.5rules: Give Birth? Give Birth yes no no yes no no yes no yes yes no no yes no no no no no yes no Introduction to Data Mining 4/18/2004 41 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 40