Download Fuzzy Logic in Machine Learning

Document related concepts
no text concepts found
Transcript
Fuzzy Logic in Machine Learning Eyke Hüllermeier Knowledge Engineering & Bioinforma2cs Lab Department of Mathema2cs and Computer Science Marburg University, Germany SBAI-­‐2011, Sao Joao del-­‐Rei, Brazil, SEP 2011 PROLOGUE Use of concepts, tools, and techniques from fuzzy sets and fuzzy logic to enhance machine learning methods Fuzzy Logic for Machine Learning things in-­‐between, e.g., neuro-­‐fuzzy sytems Machine Learning for Fuzzy Logic Use of machine learning methods to support the data-­‐driven design of (rule-­‐based) fuzzy systems (help to overcome the knowledge acquisi2on boSleneck) 2
AGENDA 1.  Introduc2on and Background a.  Machine Learning b.  Fuzzy Logic 2.  Fuzzy Modeling in Machine Learning 3.  Fuzzy PaSern Trees 4.  Summary & Conclusions This is a purely non-­‐technical talk! 3
BACKGROUND ON MACHINE LEARNING §  The major goal of machine learning (ML) is to create computer systems that are able to automa2cally improve their performance on the basis of experience, and to adapt their behavior to changing environmental condi2ons à „crea2ng machines that can learn“ Computer Poker 4
BACKGROUND ON MACHINE LEARNING §  The major goal of machine learning (ML) is to create computer systems that are able to automa2cally improve their performance on the basis of experience, and to adapt their behavior to changing environmental condi2ons à „crea2ng machines that can learn“ §  During 30-­‐40 years of research, ML has developed -  sound theoreNcal frameworks (such as sta2s2cal learning theory, algorithmic learning theory, MDL, PAC learning, …) , as well as -  a large variety of methods (including kernel methods, neural networks, sta2s2cal models, Bayesian methods, graphical models, rules and decision trees, instance-­‐
based learning, induc2ve logic programming, rela2onal learning, …) 5
MACHINE LEARNING APPLICATIONS banking and finance (stock predic2on, fraud detec2on, …) business (CRM, response predic2on, …) smart environments Internet (informa2on retrieval, email classifica2on, personaliza2on, …) technical systems (diagnosis, control, monitoring, …) biometrics (person iden2fica2on, …) medicine (diagnosis, prosthe2cs, ...) media (speech/image recogni2on, video mining, …) bioinforma2cs, genomic data analysis games (e.g. poker, soccer, …) 6
TOPICS OF ONGOING RESEARCH Ac2ve learning Bagging/boos2ng Construc2ve induc2on Coopera2ve learning Cost-­‐sensi2ve learning Co-­‐training Hierarchical classifica2on Deep learning Dimensionality reduc2on Ensemble learning Graph-­‐based learning Incremental induc2on Kernel methods Learning on data streams Ordered classifica2on Manifold learning Mul2-­‐instance learning Mul2-­‐strategy learning Mul2-­‐task learning Mul2-­‐view learning On-­‐line learning Preference learning Ranking Rela2onal learning Semi-­‐supervised learning Transfer learning … … and a lot more! 7
MODEL INDUCTION The core of machine learning is model inducNon, i.e., learning (predic2ve) models from observed training data. A simple seQng of supervised learning: Given (i.i.d.) training data and a hypothesis space , induce a model loss func2on unknown data-­‐
genera2ng process 8
MODEL INDUCTION Example (classifier learning): 9
LEARNING WITH/FROM STRUCTURED DATA (e.g. a molecule) (e.g. a sentence) (e.g. a parse tree) 10
INGREDIENTS OF MODEL INDUCTION §  Model induc2on = searching for an op2mal model among a set of candidate models as specified by the hypothesis (model) space What is a plausible model in light of the data and knowledge? background knowledge data/observa2ons induc2on principle learning algorithm model inducNon ... the realm of staNsNcs and probability! model (and its representa2on) 11
CAPACITY CONTROL The key to successful learning/generaliza2on is a proper capacity control: The model class must be flexible enough (to allow approxima2on of the pointwise loss-­‐minimizer) but not too flexible (to prevent overfieng the data) risk Typical bound on the true risk: With probability true risk empirical risk correc2on depending on capacity and sample size capacity à the more prior knowledge is available (about the right type of model and the right level of flexibility), the easier learning becomes ! 12
BEYOND ACCURACY §  Apart from accuracy, a model can be judged on the basis of many other proper2es, notably interpretability. §  In many applica2on domains (e.g., medical diagnosis), „black box“ models like neural networks are refused. x y §  Instead, more comprehensible and transparent „white box“ models are sought. x If x=a then y=b y 13
AGENDA 1.  Introduc2on and Background a.  Machine Learning b.  Fuzzy Logic 2.  Fuzzy Modeling in Machine Learning 3.  Fuzzy PaSern Trees 4.  Summary & Conclusions 14
WHAT IS FUZZY LOGIC ? What is Fuzzy Logic? §  Fuzzy Logic in the narrow sense is a special type of mul2-­‐valued logic (branch of mathema2cal logic, see e.g. works by P. Hajek). §  Fuzzy Logic in the wide sense is considered as a collec2on of methods, tools, and techniques for modeling and reasoning about vagueness and vague concepts. „Fuzzy logic as a mathema2cal framework for formalizing (human-­‐like) approximate reasoning.“ 15
IS TRUTH A BIVALENT CONCEPT? P: „The Wembley goal was indeed a goal.“ … uncertain but indeed either true of false! goal P: „This was a foul by Adriano.“ Truth degree might be unclear, since „foul“ is a vague concept … foul 16
IS TRUTH A BIVALENT CONCEPT? extension State of “reality” situa2ons in a football match
intension gradual match Human concepNon “foul” An intelligent system expected to mimic the learning or decision making behavior of a human being should have a proper model of human concep2on. 17
MEMBERSHIP FUNCTIONS 1 membership degree 0 fever
35 40 temp
temperature 37,4 °C 18
FUZZY LOGIC TOOLS Fuzzy Logic offers sound theore2cal founda2ons as well as tools and techniques for modeling and (approximate) reasoning with vague concepts: §  Fuzzy sets and fuzzy rela2ons §  Generalized logical connec2ves -  conjunc2on, -  disjunc2on, -  … § 
§ 
§ 
§ 
§ 
Fuzzy rules Generalized aggrega2on operators Fuzzy quan2fiers („for most“, „for some“, …) Generalized (non-­‐addi2ve) measures … The calculus of fuzzy logic is truth-­‐funcNonal (in contrast to probability) ! 19
FUZZY RULES IF (x1 is A1) AND (x2 is A2) THEN (y is B)
temp is HIGH
1 HIGH
0 35 40 temp
20
FUZZY RULES is
is
is
is
is
A11)
A12)
A13)
A14)
A15)
AND
AND
AND
AND
AND
(x2
(x2
(x2
(x2
(x2
is
is
is
is
is
A21)
A22)
A23)
A24)
A25)
FUZZY INFERENCE THEN
THEN
THEN
THEN
THEN
(y
(y
(y
(y
(y
DE-­‐ FUZZIFY x1 x2 x3 x4 x5 (x1
(x1
(x1
(x1
(x1
FUZZIFY IF
IF
IF
IF
IF
is
is
is
is
is
B 1)
B 2)
B 3)
B 4)
B 5)
y 21
AGENDA 1.  Introduc2on and Background a.  Machine Learning b.  Fuzzy Logic 2.  Fuzzy Modeling in Machine Learning 3.  Fuzzy PaSern Trees 4.  Summary & Conclusions 22
POTENTIAL CONTRIBUTIONS OF FL TO ML §  Graduality §  Granularity §  Robustness §  Representa2on of Uncertainty §  Incorpora2on of Background Knowledge §  Aggrega2on, Combina2on and Informa2on Fusion E.H. Fuzzy Machine Learning and Data Mining. WIREs Data Mining and Knowledge Discovery, 2011. 23
FUZZY MODELING IN MACHINE LEARNING induc2on principle background knowledge data/observa2ons learning algorithm model inducNon model (and its representa2on) What you get out strongly depends on what you put in! 24
FUZZY MODELING IN MACHINE LEARNING induc2on principle background knowledge learning algorithm model inducNon data/observa2ons model (and its representa2on) There is a need for modeling ! output modeling specifica2on of the model space (with the right capacity) feature modeling and selec2on 25
FUZZY MODELING IN MACHINE LEARNING Learning/generaliza2on without a bias (prior knowledge about the model to be learned) is impossible! aSribute 2 1 0 0 aSribute 1 1 26
FUZZY MODELING IN MACHINE LEARNING Learning/generaliza2on without a bias (prior knowledge about the model to be learned) is impossible! 1 aSribute 2 quadra2c discriminant 0 0 aSribute 1 1 27
FUZZY MODELING IN MACHINE LEARNING Learning/generaliza2on without a bias (prior knowledge about the model to be learned) is impossible! aSribute 2 1 0 0 best “explana2on” of the data in terms of a linear model aSribute 1 1 à  significantly different results for the same data! à  learning method should support the easy incorpora2on of an expert‘s background knowledge 28
FUZZY MODELING IN MACHINE LEARNING Fuzzy logic supports a seamless transi2on between „knowledge-­‐driven“ and „data-­‐driven“ model specifica2on! IF
IF
IF
IF
IF
(x1
(x1
(x1
(x1
(x1
is
is
is
is
is
A11)
A12)
A13)
A14)
A15)
AND
AND
AND
AND
AND
(x2
(x2
(x2
(x2
(x2
is
is
is
is
is
A21)
A22)
A23)
A24)
A25)
THEN
THEN
THEN
THEN
THEN
(y
(y
(y
(y
(y
is
is
is
is
is
B 1)
B 2)
B 3)
B 4)
B 5)
Complete model specified by hand, using linguis2c modeling techniques. 29
FUZZY MODELING IN MACHINE LEARNING Fuzzy logic supports a seamless transi2on between „knowledge-­‐driven“ and „data-­‐driven“ model specifica2on! IF
IF
IF
IF
IF
(x1
(x1
(x1
(x1
(x1
is
is
is
is
is
A11)
A12)
A13)
A14)
A15)
AND
AND
AND
AND
AND
(x2
(x2
(x2
(x2
(x2
is
is
is
is
is
A21)
A22)
A23)
A24)
A25)
THEN
THEN
THEN
THEN
THEN
(y
(y
(y
(y
(y
is
is
is
is
is
B 1)
B 2)
B 3)
B 4)
B 5)
Model structure (qualita2ve part) specified by hand, membership func2ons (quan2ta2ve part) learned from data. 30
FUZZY MODELING IN MACHINE LEARNING Fuzzy logic supports a seamless transi2on between „knowledge-­‐driven“ and „data-­‐driven“ model specifica2on! IF
IF
IF
IF
IF
(x1
(x1
(x1
(x1
(x1
is
is
is
is
is
A11)
A12)
A13)
A14)
A15)
AND
AND
AND
AND
AND
(x2
(x2
(x2
(x2
(x2
is
is
is
is
is
A21)
A22)
A23)
A24)
A25)
THEN
THEN
THEN
THEN
THEN
(y
(y
(y
(y
(y
is
is
is
is
is
B 1)
B 2)
B 3)
B 4)
B 5)
Model structure (qualita2ve part) partly specified by hand, partly learned from data. 31
feature X (age) feature A (weight) FEATURE MODELING AND SELECTION feature B (height) feature Y (BMI) A descrip2on of instances in terms of features X, Y allows for a much beSer discriminaNon between the two classes ! 32
FUZZY FEATURE MODELING shapes of cells healthy pathological Hudchinson-­‐Gilford syndrome [Thibault et al. Cells nuclei classifica,on using shape and texture indexes. WSCG 2008.] 33
FUZZY FEATURE MODELING meaningful high-­‐level feature (e.g., roundness) expert knowledge LINGUISTIC MODELING FUZZY INFERENCE …
low-­‐level feature Reduces dimensionality and increases interpretability! 34
FUZZY FEATURE MODELING Use of gradual (instead of binary) features can increase discriminaNve power! homogenity (texture) 1 0 0 homo-­‐
geneous not homo-­‐
geneous round not round roundness (shape) separable 1 non-­‐separable 35
MODELING DATA/OBSERVATIONS unordered, categorical (classifica2on) ordered, categorical (ordinal classifica2on) numerical (regression) * < ** < *** < **** < *****
0 fuzzy (modeling imprecise data) 0 36
FUZZY MODELING IN MACHINE LEARNING Despite being largely „data-­‐driven“, machine learning is not a „knowledge-­‐free“ methodology. Instead, successful learning requires a-­‐priori background knowledge and a proper modeling -  of the data (which is ouen overlooked) -  and the underlying hypothesis space (type of model), and fuzzy logic can be very useful in this regard (incorpora2ng expert knowledge into the learning process) ! à  fuzzy logic is complementary to sta2s2cs and probability! à  this aspect (like the „true strengths“ of FL in general) are s2ll not sufficiently emphasized in the current literature 37
AGENDA 1.  Introduc2on and Background a.  Machine Learning b.  Fuzzy Logic 2.  Fuzzy Modeling in Machine Learning 3.  Fuzzy Padern Trees 4.  Summary & Conclusions 38
FUZZY PATTERN TREES §  FPT is a type of fuzzy model that was independently introduced in -  Z. Huang, TD. Gedeon, and M. Nikravesh. PaSern trees induc2on: A new machine learning approach. IEEE TFS 16(4), 2008. -  Y. Yi, T. Fober and E.H. Fuzzy Operator Trees for Modeling Ra2ng Func2ons. Int. J. Comp. Intell. and Appl. 8(1), 2009. §  It has recently been further developed in -  R. Senge and E.H. PaSern Trees for Regression and Fuzzy Systems Modeling. Proc. WCCI-­‐2010, Barcelona, Spain, 2010. -  R. Senge and E.H. Top-­‐Down Induc2on of Fuzzy PaSern Trees. IEEE TFS, 19(2), 2011. §  Two main moNvaNons: -  disadvantages of rule-­‐based fuzzy models -  quality assessment in produc2on (measurement robo2cs) 39
INTERPRETABILITY OF RULE-­‐BASED MODELS §  „reasonable“ fuzzy sets are normally not guaranteed by a learning algorithm (and doing so may compromise accuracy); §  without knowing the membership func2ons, the true meaning remains ambiguous; §  interpretability is further compromised by -  length and number of rules, -  the interac2on between rules, -  inference mechanism (logical operators, rule weighing, …) Auer all, fuzzy rule-­‐based models might be less „white-­‐box“ than many people tend to believe ... 40
KEY PROBLEMS OF RULE-­‐BASED METHODS §  flat structure of the model (à curse of dimensionality) grid-­‐based models sequen2al covering 41
KEY PROBLEMS OF RULE-­‐BASED METHODS §  flat structure of the model (à curse of dimensionality) §  restric2on to (quasi-­‐)axis-­‐parallel decision boundaries 42
KEY PROBLEMS OF RULE-­‐BASED METHODS §  flat structure of the model (à curse of dimensionality) §  restric2on to (quasi-­‐)axis-­‐parallel decision boundaries Flexibility of fuzzy models requires many rules! Fuzzy paSern trees overcome these problems by using a hierarchical model structure and generalized aggregaNon operators. 43
SIMPLE QUALITY CONTROL all right all right all right all right measurements 44
SIMPLE QUALITY CONTROL not all right all right not all right all right measurements 45
SIMPLE QUALITY CONTROL AND basic criterion basic criterion basic criterion 46
DRAWBACKS OF THIS APPROACH -  Bivalent, non-­‐gradual evaluaNon is not natural and does not support a proper ranking of products. -  No compensaNon: Several good proper2es cannot compensate for a single bad one. -  Extremely sensiNve toward noise. -  „Flat“ structure of the evalua2on scheme is not scalable. single device interior of a car 47
DRAWBACKS OF THIS APPROACH -  Bivalent, non-­‐gradual evaluaNon is not natural and does not support a proper ranking of products. -  No compensaNon: Several good proper2es cannot compensate for a single bad one. -  Extremely sensiNve toward noise. -  „Flat“ structure of the evalua2on scheme is not scalable. single device fuzzy logic-­‐based evalua2on generalized aggrega2on operators hierarchical modeling interior of a car 48
A SYSTEM IMITATING HUMAN ASSESSMENT percepNons Sound Smell Human User Assessment SensaNon 49
A SYSTEM IMITATING HUMAN ASSESSMENT percepNons Sound Smell Human User Assessment SensaNon Quan2fica2on of individual criteria through measurement robo2cs 50
A SYSTEM IMITATING HUMAN ASSESSMENT percepNons Sound Smell Human User Assessment SensaNon Evalua2on of individual criteria and aggrega2on into an overall assessment 51
EVALUATION OF INDIVIDUAL CRITERIA „fuzzy range“ of acceptabel values 1 measurement
0 excluded ideal excluded 52
GENERALIZED AGGREGATION Three modes of aggregaNon: -  ConjuncNve („and“): both criteria must be fulfilled -  DisjuncNve („or“): either of the criteria must be fulfilled -  Averaging 53
GENERALIZED AGGREGATION: CONJUNCTION 54
GENERALIZED AGGREGATION: DISJUNCTION 55
GENERALIZED AGGREGATION very strict conjunc2ve disjunc2ve very tolerant 56
GENERALIZED AGGREGATION: AVERAGING 57
SEAMLESS TRANSITION FROM STRICT TO TOLERANT very strict conjunc2ve disjunc2ve very tolerant averaging 58
HIERARCHICAL MODELING recursive par22oning of criteria into subcriteria skills AND math formal skills language skills AVG OR CS Chinese AND Spanish French 59
HIERARCHICAL MODELING skills 0.3 recursive par22oning of criteria into subcriteria AND formal skills language skills 0.8 AVG OR 0.7 0.9 math 0.5 CS 0 0.5 Chinese AND 0.1 Spanish number of points 0.9 French 60
KNOWLEDGE-­‐DRIVEN MODELING A human expert (quality engineer) is specifying the model by hand … 61
DATA-­‐DRIVEN MODELING §  Known versus unknown structure -  es2ma2on of parameters (model calibra2on) -  structure + parameter learning (model inducNon) §  Direct versus indirect feedback -  direct: product + raNng (numeric or ordinal) -  indirect: comparison between products 62
FUZZY PATTERN TREE INDUCTION §  Star2ng with primi2ve paSern trees (fuzzy subset of an aSribute’s domain), A §  candidate trees are itera2vely expanded §  and selected based on a tree performance measure (MSE on training data), §  un2l a stopping condi2on is met. 63
FUZZY PATTERN TREE INDUCTION §  Star2ng with primi2ve paSern trees (fuzzy subset of an aSribute’s domain), §  candidate trees are itera2vely expanded AVG A B §  and selected based on a tree performance measure (MSE on training data), §  un2l a stopping condi2on is met. 64
FUZZY PATTERN TREE INDUCTION §  Star2ng with primi2ve paSern trees (fuzzy subset of an aSribute’s domain), §  candidate trees are itera2vely expanded §  and selected based on a tree performance measure (MSE on training data), AVG A AND B D §  un2l a stopping condi2on is met. 65
FUZZY PATTERN TREE INDUCTION §  Star2ng with primi2ve paSern trees (fuzzy subset of an aSribute’s domain), §  candidate trees are itera2vely expanded §  and selected based on a tree performance measure (MSE on training data), AVG A AND D OR B C §  un2l a stopping condi2on is met. 66
FUZZY PATTERN TREE INDUCTION §  Star2ng with primi2ve paSern trees (fuzzy subset of an aSribute’s domain), AVG §  candidate trees are itera2vely expanded §  and selected based on a tree performance measure (MSE on training data), AVG A MIN E D MAX B C §  un2l a stopping condi2on is met. 67
FUZZY PATTERN TREE INDUCTION §  Star2ng with primi2ve paSern trees (fuzzy subset of an aSribute’s domain), AVG §  candidate trees are itera2vely expanded §  and selected based on a tree performance measure (MSE on training data), §  un2l a stopping condi2on is met. AVG E AND A AND B D OR B C greedy beam search 68
THE WINE QUALITY DATA The task is to predict the quality of wine (on a scale from 0 to 10) based on proper2es like acidity, level of alcohol, etc. (UCI benchmark data). acidity alcohol … sulfates quality 7.4
9.4
…
0.56
5
7.8
10
…
0.46
3
7.8
10.5
…
0.80
6
11.2
9.3
…
0.91
3
7.4
9.8
…
0.55
5
7.3
10.6
…
0.53
4
…
…
…
…
…
69
THE WINE QUALITY DATA PaSern tree induced from a given set of data (wine proper2es + ra2ng): AVG alcohol high AND acidity low OR acidity med sulfates med 70
FEATURES OF FUZZY PATTERN TREES §  interpretability of the model class AVG §  modularity: recursive par22oning of critria into sub-­‐criteria §  flexibility without the tendency to overfit the data alcohol high §  monotonicity in single aSributes AND acidity low OR §  built-­‐in feature selecNon acidity med sulfates med §  yield compeNNve performance for classificaNon and regression 71
SUMMARY & CONCLUSIONS §  ML is a more recent research topic in the fuzzy community (compared to control, op2miza2on, decision making, …). §  Like in other fields, FL can reasonably extend, generalize, and complement exis2ng founda2ons and methods in ML. §  Modeling (background knowledge) is one of the main strengths, and crucial for successful learning (which does not only require data). §  Besides, ML can benefit from other fuzzy logic tools, including generalized uncertainy formalisms, aggrega2on operators, … 72
SUMMARY & CONCLUSIONS §  Fuzzy padern trees as a novel model class for machine learning. §  InteresNng features: interpretability , modularity, non-­‐linearity, monotonicity, handling imprecise data, … §  Viable alternaNve to rule-­‐based models (more compact models thanks to hierarchical instead of flat model structure) §  Compe22ve performance in classificaNon and regression. §  Evolving fuzzy padern trees (eFPT) for learning on data streams. 73
Literature Fuzzy logic and machine learning (survey ar2cles): §  E.H. Fuzzy Sets in Machine Learning and Data Mining: Status and Prospects. Fuzzy Sets and Systems, 156(3), 2005. §  E.H. Fuzzy Machine Learning and Data Mining. WIREs Data Mining and Knowledge Discovery, 2011. Fuzzy paSern trees: §  Huang, TD. Gedeon, and M. Nikravesh. PaSern trees induc2on: A new machine learning approach. IEEE TFS 16(4), 2008. §  Y. Yi, T. Fober and E.H. Fuzzy Operator Trees for Modeling Ra2ng Func2ons. Int. J. Comp. Intell. and Appl. 8(1), 2009. §  R. Senge and E.H. PaSern Trees for Regression and Fuzzy Systems Modeling. Proc. WCCI-­‐2010, Barcelona, Spain, 2010. §  R. Senge and E.H. Top-­‐Down Induc2on of Fuzzy PaSern Trees. IEEE TFS, 19(2), 2011. §  A. Shaker and E.H. Evolving Fuzzy PaSern Trees for Binary Classifica2on on Data Streams. Informa2on Sciences, to appear. Java implementa2on of fuzzy paSern induc2on: http://www.uni-marburg.de/fb12/kebi
74