Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Combining fuzzy and statistical uncertainty: probabilistic fuzzy systems and their applications prof.dr.ir. Jan van den Berg - TUDelft: Faculties of TPM and EWI - Cyber Security Academy The Hague [email protected] http://tbm.tudelft.nl/index.php?id=30084&L=1 1 Summary of the talk • Two complementary conceptualizations of uncertainty will be discussed: statistical and fuzzy uncertainty. • These uncertainties can be combined into one theory on probabilistic fuzzy events. Using this theory,classical fuzzy systems can be generalized to probabilistic fuzzy systems (PFS). • PFS can be induced using both expert knowledge and data enabling both interpretability and accuracy (despite the fact there remains an accuracy-interpretability dilemma we have to deal with in practice). • To finalize, one or two examples of PFSs we developed will be shown: next time! 2 Agenda • • • • • • • Probabilistic/statistical uncertainty Fuzzy uncertainty Fuzzy systems Probabilistic fuzzy theory Probabilistic fuzzy systems Applications (next time) Conclusions 3 Probabilitic/statistical uncertainty • Probabilitic/statistical uncertainty: well-known notion • Crisp events occur with a certain probability • These probabilities can be assessed statistically • E.g., frequentist approach: by repeating experiments • A lot of theory on unbiased estimation, ML estimation, etc. • In continuous outcome spaces, probability distributions are used • Mathematical statistics: descriptive and inferential statistics (the latter on drawing conclusions from data using some model for the data) 4 Agenda • • • • • • • Probabilistic/statistical uncertainty Fuzzy uncertainty Fuzzy systems Probabilistic fuzzy theory Probabilistic fuzzy systems Applications Conclusions 5 Crisp sets Collection of definite, well-definable objects (elements) to form a whole, having crisp boundaries Representations of sets: • list of all elements A = {x1, ,xn}, xj X • elements with property P A={x|x satisfies P }, x X • Venn diagram • characteristic function fA: X {0,1}, fA(x) = 1, x X fA(x) = 0, x X Real numbers larger than 3: 1 A 0 X 3 6 Fuzzy sets • Sets with fuzzy, gradual boundaries (Zadeh 1965) • A fuzzy set A in X is characterized by its membership function mA: X [0,1] A fuzzy set A is completely determined by the set of ordered pairs Real numbers about 3: 1 mA(x) A={(x,mA(x))| x X} X is called the domain or universe of discourse 0 X 3 7 Fuzzy sets on discrete universes • Fuzzy set C = “desirable city to live in” X = {SF, Boston, LA} (discrete and non-ordered) C = {(SF, 0.9), (Boston, 0.8), (LA, 0.6)} • Fuzzy set A = “sensible number of children” Membership Grades X = {0, 1, 2, 3, 4, 5, 6} (discrete universe) A = {(0, .1), (1, .3), (2, .7), (3, 1), (4, .6), (5, .2), (6, .1)} Number of Children 8 Fuzzy sets on continuous universes • Fuzzy set B = “about 50 years old” X = Set of positive real numbers (continuous) mB(x) 1 x 50 1 10 2 Membership Grades B = {(x, mB(x)) | x in X} Age 9 Fuzzy partition Membership Grades • Fuzzy partition formed by the linguistic values “young”, “middle aged”, and “old” • For any age: sum of membership values = 1 Young Middle Aged Old 1 0.5 0 0 10 20 30 40 50 60 70 80 90 Age 10 Operations with fuzzy sets A X A mA ( x ) 1 m A ( x ) C A B mc ( x) max( m A ( x), m B ( x)) m A ( x) m B ( x) C A B m c ( x) m A ( x) m B ( x ) m A ( x ) m B ( x) C A B mc ( x) min( m A ( x), m B ( x)) m A ( x) m B ( x) C A B m c ( x) m A ( x) m B ( x) Note the multiple definitions! 11 Set theoretic operations, examples (b) Fuzzy Set "not A" (a) Fuzzy Sets A and B 1 A B 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 (d) Fuzzy Set "A AND B" (c) Fuzzy Set "A OR B" 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0 maximum minimum 0.2 0 12 Agenda • • • • • • • Probabilistic/statistical uncertainty Fuzzy uncertainty Fuzzy systems Probabilistic fuzzy theory Probabilistic fuzzy systems Applications Conclusions 13 Fuzzy Modeling (expert-driven) •FM can be defined based on expert knowledge: • Many human concepts (big, long, high, very much, adequate, satisfactory, …) are defined in a (context dependent) quantitative way, describing state of nature • Examples of facts about the world: • The president is middle-aged • The water supply is insufficient • Birth weight of Romanian children is quite low Need for formalization: linguistic variable 14 Linguistic variable • A numerical variable takes numerical values: Age = 65 (defines a crisp event) • A linguistic variables takes linguistic values: Age is old (defines a fuzzy event) • A linguistic value is defined by a fuzzy set (enabling a characterization with gradual transitions): Membership Grades Ex: A fuzzy partition of linguistic cariable ‘Age’ formed with linguistic values “young”, “middle aged”, and “old”: Young Middle Aged Old 1 0.5 0 0 10 20 30 40 Age 50 60 70 80 90 15 Expert-driven fuzzy modeling, cont. •Experts can express their knowledge in • Facts about the world (see above), and • Fuzzy IF-THEN rules •Example fuzzy rules: • IF Nutrician state is poor AND Birth weight is medium AND Respiration disease is absent THEN Child mortality rate is medium • IF Nutrician state is medium AND Birth weight is not too low AND Respiration disease is absent THEN Child mortality rate is rather low •Need for Reasoning/Inference mechanism 16 Fuzzy (Inference) System (FS) Inference engine output Rule base (knowledge base) Defuzzifier Fuzzifier Fuzzifier (interface: from crisp to fuzzy) Rule base (enabling interpretability/transparency) Inference engine (implements fuzzy reasoning) Defuzzifier (interface: from fuzzy to crisp) Note: Fuzzifier or defuzzifier may be absent input • • • • • 17 Example Fuzzy System: Mamdani model •Five major steps: 1. 2. 3. 4. 5. Fuzzification Degree of fulfillment Inference Aggregation Defuzzification • Computations according to Mamdani reasoning apply a generalized form of classical modus ponens: Given: x is A' and If x is A, then y is B, Conclude: y is B' 18 Mamdani reasoning - example 19 Resulting FS: a smooth non-linear mapping 20 Inducing a fuzzy model from data If income is Low then tax is Low If income is High then tax is High 21 Bias-variance dilemma (!) Interpretability-accuracy dilemma (!) Algorithms exist to gradually induce more and more rules from data • Bias-variance dilemma to find models of right complexity • Interpretability-accuracy dilemma is another key issue (of Data Mining) 22 Agenda • • • • • • • Probabilistic/statistical uncertainty Fuzzy uncertainty Fuzzy systems Probabilistic fuzzy theory Probabilistic fuzzy systems Applications Conclusions 23 Probability of a crisp and fuzzy events • Crisp Pr( A) xA p( x) dx A ( x) p( x) dx X • Fuzzy (Zadeh, 1968) Probability of a fuzzy event is the expectation of its membership function! Pr( A) m A ( x) p( x) dx X • Conditional probability: Pr( A B) Pr( A | B) Pr( B) X m A ( x) m B ( x) p( x) dx X m B ( x) p( x) dx Satisfies P(A|A) = 1 24 An example, continuous domain calculating the probability that a randomly selected Indian woman is tall Answering the question: 3.5 “What is the probability that a randomly selected Indian woman is tall?” 3 2.5 tall 2 1.5 Pr( A) m A ( x) f ( x) dx E ( m A ( x)) pdf of length 1 fuzzy pdf 0.5 0 -0.5 1.50 1.68 2.0 25 Probabilistic fuzzy events, discrete case • 2 discrete probabilistic fuzzy events: A1 = (m (x ), m (x )) = (m, 1 - n) and A2 = (m ’ (x 1), m ’ (x )) = (1 – m,n 1 • If x 2 2 occurs, then A1 occurs with degree m 1 and A2 occurs with degree 1- m • Similarly, if x 2 occurs… • E.g., A1 means “tall” and A2 = “small” where x 1 and x 2 are two values of the x -variable length • Pr(A1) = mp + (1 – n)(1 – p), and Pr(A 2) = (1 –m )p + n (1 – p) 26 Simple estimation of probabilities • Let x1, …, xn be a random sample on a domain X • The probability of a crisp event A can be estimated by 1 A (x k ) n k • The probability of a fuzzy event Ai can be estimated by 1 m Ai (x k ) n k assuming that X is well-formed (~ fuzzy partition), i.e. x k m Ai (x k ) 1 i 27 Deterministic and probabilistic rules Rule Rq : If x is Aq then y is Cq Model parameters: Aq , Cq Ex: If current returns are large, then future returns will be large If current returns are large, then future returns will be large with probability r1 future returns will be small with probability 1-r1 Linguistic vagueness Probabilistic uncertainty 28 Probabilistic fuzzy rules Rule Rq : If x is Aq then y is C1q with Pr( C1q|Aq ) and ...and y is C jq with Pr( C jq|Aq ) and ...and y is C Jq with Pr( C Jq|Aq ) Model parameters: Aq , C jq , Pr(C jq|Aq ) 29 A ‘crazy’ theoretical sidestep •Probabilistic Fuzzy Entropy •To be used to induce fuzzy decision trees 30 Definition of PF Entropy, discrete source • Given a (well-formed) sample space with a fuzzy partition of fuzzy events A1, …, AC defined by membership functions m1 ( x),, m C ( x) occurring with probabilities Pr(A1), …, Pr(AC ), the PFE is defined as: H sf (m , p) Pr( Ac ) log 2 (Pr( Ac )) E ( log 2 (Pr( Ac ))) C c 1 C E ( mc ( x)) log 2 ( E ( mc ( x))) E ( log 2 ( E ( mc ( x))) c 1 • PFE is a probabilistic type of entropy defined in a fuzzily partitioned (sample) space! 31 A very special information source Consider 2 discrete strictly complementary statistical fuzzy events: A1 = (m (x1), m (x2)) = (m, 1 - m) , A2 = (m ’ (x1), m ’ (x2)) = (1 – m,m) 32 A very special information source, cont. Since A1 = (m, 1 - m), A2 = (1 – m, m), Pr (x1 ) = p , and Pr (x2 ) = 1 – p, it follows that: Pr(A1) = mp + (1 – m)(1 – p) = 2mp + 1 – m – p = 1 – Q (1) Pr(A2) = p (1 – m) + (1 – p) m = m + p – 2mp = Q (2) 33 Entropy of information source generating 2 strictly complementary stat fuzzy events • Using definition of PFE (sh. 31 ) and equations (1) and (2) from previous sheet, it follows that Hsf (m,p ) = - Q log2 Q - (1 - Q ) log2 (1 - Q ) where Q (m,p ) = m + p - 2 m p • Q (like 1 – Q) relates to the combined uncertainty of the probabilistic fuzzy events based on their fuzziness m and the probability of occurrence p: 34 Further interpreting Q • Q (m,p ) = m + p - 2 m p • Q = 0 or 1 no uncertainty; Q = 0.5 highest uncertainty • Illumination and interpretation: - m = p = 0 or 1, Q = 0 two crisp events, one of which occurs with probability 1 - m = 0, p = 1 or m = 1, p = 0 Q = 1, same explanation! - p = 0.5 or m = 0.5 Q = 0.5 two fuzzy events having equal prob, or two non-distinguishable fuzzy events!! 35 Interpretation of Hpf (m,p) Hpf (m,p ) = - Q log2 Q - (1 - Q ) log2 (1 - Q ) • PFE quantifies the combined uncertainty • Illumination: - Q = 0 or 1 ~ no uncertainty ~ H (m,p ) = 0, in 4 corners - p = 0.5 or m = 0.5 ~ Q = 0.5 ~ two fuzzy events having equal prob, or two non-distinguishable fuzzy events ~ highest uncertainty ~ H (m,p ) = 1 - if m = 0 or 1: classical ‘crisp’ entropy - if p = 0 or 1: ‘fuzzy’ entropy only 36 Agenda • • • • • • • Probabilistic/statistical uncertainty Fuzzy uncertainty Fuzzy systems Probabilistic fuzzy theory Probabilistic fuzzy systems Applications Conclusions 37 Probabilistic fuzzy systems, with discrete probability distribution Consists of a set of rules whose antecedents are fuzzy conditions and whose consequents are probability distributions consequent antecedent fuzzy set where, and (3) 38 Deterministic vs. probabilistic FS Y B3 If x is A4 then y is B2 y is B1 with probability p(B1 | A4), and y is B2 with probability p(B2 | A4), and y is B3 with probability p(B3 | A4). B2 A1 B1 A3 A4 A5 A2 X 39 Probabilistic fuzzy system, with continuous probability distribution Rule Rq : If x is Aq then f ( y) f ( y | Aq ) Additive reasoning: Q mA f ( y | x) q 1 (x) f ( y | Aq ) q Q m Aq (x) Q q (x) f ( y | Aq ) q 1 q 1 Q y E( y | x) q (x) E( y | Aq ) q 1 40 Probability distribution characterization • In general, different characterizations can be used for the conditional probability density in the rule consequents • This characterization could be an approximation with a histogram or an explicit model for density, e.g., a normal or other distribution • In PFS, we can select a fuzzy histogram characterization 41 Histograms, classical crisp case • Let x1, …, xn be a random sample from a univariate distribution with pdf f(x) • Let the characteristic functions i (x) (defining crisp bins/intervals Ai) constitute a crisp partitioning: x : i ( x) 1 i j : i ( x). j ( x) 0 i • A histogram estimates f (x) (from data xk ) as follows: 1 i ( x)( i ( xk )) ^ i ( x) pi n k f ( x) | Ai | i i i ( x)dx 42 Fuzzy histograms • Let x1, …, xn be a random sample of size n from a (univariate) distribution with pdf f(x) • Let the membership functions mi (x) (defining fuzzy bins Ai) constitute a fuzzy partitioning: x : mi ( x) 1 i • A fuzzy histogram estimates f (x) (from data xk ) as follows: ^ f ( x) i mi ( x) pi | Ai | i 1 mi ( xk )) n k mi ( x)dx mi ( x)( 43 Crisp vs. fuzzy histogram pdf normal distribution Membership values Crisp histogram Fuzzy histogram 1.2 1 A1 A2 0.8 A3 0.6 A4 A5 0.4 A6 0.2 A7 3.5 3.15 2.8 2.45 2.1 1.75 1.4 1.05 0.7 0.35 -0 -0.4 -0.7 -1.1 -1.4 -1.8 -2.1 -2.5 -2.8 -3.2 -3.5 0 44 PFS: fuzzy histogram model 45 Fuzz IEEE 2013, Hyderabad India 45 Fuzzy histogram model 46 Fuzz IEEE 2013, Hyderabad India 46 Probabilistic Mamdani systems Rule Rq : If x is Aq then y C1 with Pr( C1|Aq ) and ...and y C j with Pr( C j|Aq ) and ...and y C J with Pr( C J |Aq ) Q Reasoning: J y E( y | x) q (x) Pr(C j|Aq ) z j q 1 j 1 Centroid of fuzzy consequent set Cj 47 Probabilistic fuzzy output model 48 Probabilistic TS systems Zero-order probabilistic Takagi-Sugeno Rule Rq : If x is Aq then y y1 with Pr( y1|Aq ) and y y 2 with Pr( y 2|Aq ) and ...and y y J with Pr( y J |Aq ) In essence, consequents are crisp sets centered around Q Reasoning: yj J y E( y | x) q (x) y j Pr( y j|Aq ) q 1 j 1 49 Relation to deterministic FSs • Zero-order Takagi-Sugeno system Q y* q (x)cq Takagi-Sugeno reasoning q 1 Q c.f. J y E( y | x) q (x) Pr( y j|Aq ) z j q 1 j 1 J Select cq Pr( y j|Aq ) z j j 1 50 Probabilistic fuzzy systems summary • Essentially a fuzzy system that estimates a probability density function, i.e. the fuzzy system approximates a p.d.f. • Usually p.d.f. is conditional on the input • Linguistic information is coded in fuzzy rules • Combine linguistic uncertainty with probabilistic uncertainty • Different types of fuzzy systems can be extended to the PFS equivalent (e.g. Mamdani fuzzy systems, Takagi-Sugeno fuzzy systems) 51 PFS design • Identifying mental world vs. observed world (van den Eijkel 1999) • Mental world: linguistic descriptions, fuzzy conceptualization, experts’ knowledge • Observed world: data measurements, probability density functions, optimal consequent parameters • Optimal design given a mental world: application of conditional probability measures for fuzzy events • Optimal design given an observed world: nonlinear optimization techniques 52 Parameter determination Sequential Method Part 1 MF parameters Maximum Likelihood Method Part 1 – Initialization Sequential Method Part 2 - Optimization Part 2 Probability parameters MF parameters Probability parameters MF – Membership function 53 Sequential method Part 1: Finding the membership function (MF) parameters is a fuzzy set defined by a membership function E.g. – Gaussian MF parameters: o o v – center of MF σ – width of MF FCM Clustering 54 MF determination • For the first part of the sequential method, well-known techniques from fuzzy modeling can be applied • • • • • • • Fuzzy clustering in input-output product space Fuzzy clustering in input and output space Expert-driven design Similarity-based rule-base simplification Feature selection Heuristic approaches Etc. 55 Sequential method Part 2: Finding the probability parameters - Pr(ωc|Aj) Set the parameters Pr(ωc|Aj) equal to estimates of the conditional probabilities - conditional probability estimation 56 56 Estimation of probability parameters • Conditional probabilities Pr(Cj | Aq) can be assessed directly by using the definition of the probability of joint events: Pr(C j | Aq ) Pr( Aq C j ) Pr( Aq ) ( x k , yk ) m A (x k ) mC ( yk ) q xk j m A (x k ) q • This method does not provide maximum likelihood estimates of the probability parameters. 57 57 Maximum likelihood method Part 2: Optimization of vj, σj and Pr(ωc|Aj) Likelihood of a data set Minimization of the negative loglikelihood Optimize parameters vj, σj and Pr(ωc|Aj) that minimize the error function Constrained optimization problem (probability parameters Pr(ωc|Aj) must satisfy summation conditions) 58 58 Maximum likelihood method Constrained optimization problem using ujc Unconstrained optimization problem Unconstrained minimization of vj, σj and ujc Gradient descent optimization algorithm is used to minimize the objective function – i.e. the available classification examples are processed one by one and updates are performed after each sample 59 59 Experimental comparison (1) • Use Gaussian membership functions d ( xl cql ) 2 m Aq ( x) exp 2 l 1 ql • The centers cql are determined using fuzzy c-means clustering • The widths σql are set equal to σql = minj′ ≠ j ||cq – cq′|| 60 Fuzz IEEE 2013, Hyderabad India 60 Experimental comparison (2) • Misclassification rates Sequential method Wisconsin breast cancer Wine 0.261 (0.036) 0.034 (0.048) • Calculated using ten-fold cross-validation 0.029 Maximum likelihood • Standard deviations reported within parentheses (0.021) 0.023 (0.041) 61 Fuzz IEEE 2013, Hyderabad India 61 Future research directions • New estimation methods for the model parameters • Joint estimation • Information-theory based techniques • Better optimization methods • • • • • • Interaction linguistic knowledge and data-driven estimation Optimizing model complexity, model simplification Interpretability of probabilistic fuzzy models Linguistic descriptions of probability density functions Equivalence to other systems: e.g. fuzzy Markov models Density estimation using more complex models as rule consequents: e.g. fuzzy GARCH models • New applications 62 Fuzz IEEE 2013, Hyderabad India 62 Agenda • • • • • • • Probabilistic/statistical uncertainty Fuzzy uncertainty Fuzzy systems Probabilistic fuzzy theory Probabilistic fuzzy systems Applications Conclusions 63 Applications • Next time! 64 Agenda • • • • • • • Probabilistic/statistical uncertainty Fuzzy uncertainty Fuzzy systems Probabilistic fuzzy theory Probabilistic fuzzy systems Applications Conclusions 65 Concluding remarks • Probabilistic fuzzy systems combine linguistic uncertainty and probabilistic uncertainty • Very useful in applications where a probabilistic model (pdf estimation) has to be conditioned (or constrained) by linguistic information • Good parameter estimation methods exist and the added value of these models has been demonstrated in various applications 66 Conclusions • Fuzzy models usually show smooth non-linear behavior • If certain measures are taken, fuzzy models are interpretable • Fuzzy models can be induced from • expert knowledge (grid selection and definition of fuzzy rules): this expert- driven approach was very successful in control theory) • a data set: the data-driven approach • The accuracy-interpretability dilemma is prominent! • The bias-variance dilemma is prominent! 67 Selected bibliography • J. van den Berg, W. M. van den Bergh, and U. Kaymak. Probabilistic and statistical fuzzy set foundations of competitive exception learning. In Proceedings of the Tenth IEEE International Conference on Fuzzy Systems, volume 2, pages 1035–1038, Melbourne, Australia, Dec. 2001. • J. van den Berg, U. Kaymak, and W.-M. van den Bergh. Probabilistic reasoning in fuzzy rule-based systems. In P. Grzegorzewski, O. Hryniewicz, and M. A. Gil, editors, Soft Methods in Probability, Statistics and Data Analysis, Advances in Soft Computing, pages 189–196. Physica Verlag, Heidelberg, 2002. • J. van den Berg, U. Kaymak, and W.-M. van den Bergh. Fuzzy classification using probability-based rule weighting. In Proceedings of 2002 IEEE International Conference on Fuzzy Systems, pages 991–996, Honolulu, Hawaii, May 2002. • U. Kaymak, W.-M. van den Bergh, and J. van den Berg. A fuzzy additive reasoning scheme for probabilistic Mamdani fuzzy systems. In Proceedings of the 2003 IEEE International Conference on Fuzzy Systems, volume 1, pages 331–336, St. Louis, USA, May 2003. • U. Kaymak and J. van den Berg. On probabilistic connections of fuzzy systems. In Proceedings of the 15th Belgium-Netherlands Artificial Intelligence Conference, pages 187–194, Nijmegen, Netherlands, Oct. 2003. • J. van den Berg, U. Kaymak, and W.-M. van den Bergh. Financial markets analysis by using a probabilistic fuzzy modelling approach. International Journal of Approximate Reasoning, 35: 291–305, 2004. 68 Selected bibliography • • • • • • L. Waltman, U. Kaymak, and J. van den Berg. Maximum likelihood parameter estimation in probabilistic fuzzy classifiers. In Proceedings of the 14th Annual IEEE International Conference on Fuzzy Systems, pages 1098–1103, Reno, Nevada, USA, May 2005. D. Xu and U. Kaymak. Value-at-risk estimation by using probabilistic fuzzy systems. In Proceedings of the 2008 IEEE World Congress on Computational Intelligence (WCCI 2008), pages 2109–2116, Hong Kong, June 2008. R. J. Almeida and U. Kaymak. Probabilistic fuzzy systems in value-at-risk estimation. International Journal of Intelligent Systems in Accounting, Finance and Management, 16(1/2):49–70, 2009. J. Hinojosa, S. Nefti, and U. Kaymak. Systems control with generalized probabilistic fuzzy-reinforcement learning. IEEE Transactions on Fuzzy Systems, 19(1):51–64, February 2011. R. J. Almeida, N. Basturk, U. Kaymak, and V. Milea. A multi-covariate semiparametric conditional volatility model using probabilistic fuzzy systems. In Proceedings of the 2012 IEEE International Conference on Computational Intelligence in Financial Engineering and Economics (CIFEr 2012), pages 489– 496, New York City, USA, 2012. J. van den Berg, U. Kaymak, and R.J. Almeida. Function approximation using probabilistic fuzzy systems. IEEE Transactions on Fuzzy Systems, 2013. 69