Download Information Mining with Relational and Possibilistic Graphical Models

Information Mining with Relational and Possibilistic Graphical Models Prof. Dr. Rudolf Kruse University of Magdeburg Faculty of Computer Science Magdeburg, Germany [email protected] N SF EURO UZZY Example: Continuously Adapting Gear Shift Schedule in VW New Beetle classification of driver / driving situation by fuzzy logic fuzzification inference machine defuzzification gear shift computation interpolation accelerator pedal filtered speed of accelerator pedal number of changes in pedal direction rule base sport factor [t] determination of speed limits for shifting into higher or lower gear depending on sport factor gear selection sport factor [t-1] N SF EURO UZZY Continuously Adapting Gear Shift Schedule: Technical Details  Mamdani controller with 7 rules  Optimized program 24 Byte RAM AG4 } on Digimat 702 Byte ROM  Runtime 80 ms 12 times per second a new sport factor is assigned  How to find suitable rules? N SF EURO UZZY Information Mining  Information mining is the non-trivial process of identifying valid, novel, potentially useful, and understandable information and patterns in heterogeneous information sources.  Information sources are  data bases,  expert background knowledge,  textual description,  images,  sounds, ... N SF EURO UZZY Information Mining Problem Information Information Modeling Understanding Understanding Preparation Evaluation Deployment Determine Problem Objectives Collect Initial Information Select Information Select Modeling Technique Evaluate Results Plan Deployment Assess Situations Describe Information Clean Information Generate Test Review Design Process Determine Information Mining Goals Explore Information Construct In- Build Model formation Verify Produce Project Information Plan Quality Integrate Information Assess Model Determine Next Steps Plan Monitoring and Maintenance Produce Final Results Review Project Format Information N SF EURO UZZY Example: Line Filtering  Extraction of edge segments (Burns’ operator)  Production net: edges  lines  long lines  parallel lines  runways N SF EURO UZZY SOMAccess V1.0 Available on CD-ROM: G. Hartmann, A. Nölle, M. Richards, and R. Leitinger (eds.), Data Utilization Software Tools 2 (DUST-2 CD-ROM), Copernicus Gesellschaft e.V., Katlenburg-Lindau, 2000 (ISBN 3-9804862-3-0) N SF EURO UZZY Current Research Topics  Multi-dimensional data analysis: Data warehouse and OLAP (on-line analytical processing)  Association, correlation, and causality analysis  Classification: scalability and new approaches  Clustering and outlier analysis  Sequential patterns and time-series analysis  Similarity analysis: curves, trends, images, texts, etc.  Text mining, web mining and weblog analysis  Spatial, multimedia, scientific data analysis  Data preprocessing and database completion  Data visualization and visual data mining  Many others, e.g., collaborative filtering N SF EURO UZZY Fuzzy Methods in Information Mining here: Exploiting quantitative and qualitative information  Fuzzy Data Analysis (Projects with Siemens)  Dependency Analysis (Project with Daimler) N SF EURO UZZY Analysis of Imprecise Data Fuzzy Database A B C 1 Large Very large Medium 2 2.5 Medium About 7 3 [3,4] Small [7,8]  A Linguistic modeling C   1 2 3  Computing with words The mean w.r.t. A is „approximately 5“ B Statistics with fuzzy sets Linguistic approximation Mean of attribute A N SF EURO UZZY Fuzzy Data Analysis Strong law of large numbers (Ralescu, Klement, Kruse, Miyakoshi, ...) Let {xk | k 1} be independent and identically distributed fuzzy random variables such that E||supp x1|| <  . Then  x1  x2    xn  d , E (co( x1))   0 n   Books:Kruse, Meyer: Statistics with Vague Data, Reidel, 1987 Bandemer, Näther: Fuzzy Data Analysis, Kluwer, 1992 Seising, Tanaka and Guo, Wolkenhauer, Viertl, ... N SF EURO UZZY Analysis of Daimler/Chrysler Database  Database: ~ 18.500 passenger cars > 100 attributes per car  Analysis of dependencies between special equipment and faults.  Results used as a starting point for technical experts looking for causes. N SF EURO UZZY Bayesian Networks Qualitative Part knowledge about (conditional) independence, causality, ... directed acyclic graph A B C + Quantitative Part = Model local models on low-dimensional spaces ABC P(A,B,C) abc abc abc abc 0.8 0.1 0.1 0.0 unique joint model on the (high-dimensional) space N SF EURO UZZY Example: Genotype Determination of Jersey Cattle variables: 22, state space 6  1013, parameters: 324 Graphical Model • node  random variable Phenogr.1 (3 diff.) • edges  conditional dependencies Phenogr.2 (3 diff.) Genotype (6 diff.) • decomposition  P ( X 1 ,, X 22 )  22  P( X i | parents ( X i )) i 1 • diagnosis  P(  | knowledge) N SF EURO UZZY Learning Graphical Models data + prior information A Inducer B C local models N SF EURO UZZY The Learning Problem known structure B A C complete data A <a4, <a3, B b3, b2, C c1> c4> Statistical Parametric Estimation (closed from eq.): statistical parameter fitting, ML Estimation, Bayesian Inference, ... incomplete data Parametric Optimization: (missing values, hidden variables,...) A <a4, <a3, B ?, b2, C c1> ?> EM, gradient descent, ... unknown structure B A C Discrete Optimization over Structures (discrete search): likelihood scores, MDL Problem: search complexity heuristics Combined Methods: structured EM only few approaches Problems: criterion for fit? new variables? local maxima? fuzzy values? N SF EURO UZZY Information Mining 18.500 passenger cars Linguistic modeling 130 attributes per car Fuzzy Database Imprecise data Computing with words IF air conditioning and electr. roof top Then more battery faults Learning graphical models Rule generation relational/possibilistic graphical model N SF EURO UZZY A Simple Example Example World Relation color shape size • 10 simple geometric objects, 3 attributes • one object is chosen at random and examined.                     small medium small medium medium large medium medium medium large • Inferences are drawn about the unobserved attributes. N SF EURO UZZY The Reasoning Space Geometric Interpretation Relation color shape size                     small medium small medium medium large medium medium medium large        large medium small Each cube represents one tuple N SF EURO UZZY Prior Knowledge and Its Projections             large medium small    large medium small              large medium small large medium small N SF EURO UZZY Cylindrical Extensions and Their Intersection        large medium small     Intersecting the cylindrical extensions of the projection to the subspace formed by color and shape and of the projection to the subspace formed by shape and size yields the original three-dimensional relation.          large medium small  large medium small N SF EURO UZZY Reasoning  Let it be known (e.g. from an observation) that the given object is green. This information considerably reduces the space of possible value combinations.  From the prior knowledge it follows that the given object must be - either a triangle or a square and - either medium or large              large medium small  large medium small N SF EURO UZZY Reasoning with Projections The same result can be obtained using only the projections to the subspaces without reconstructing the original three-dimensional space:     s m l color extend  size shape project project  extend         This justifies a network representation color s m l shape size N SF EURO UZZY Interpretation of Graphical Models  Relational Graphical Model  Decomposition + local models  Example colour shape graph size colour shape size hypergraph  Learning a relational graphical model  Searching for a suitable decomposition + local relations N SF EURO UZZY Genotype Determination of Danish Jersey Cattle Assumptions about parents: risk about misstatement genotype mother Reliability of databases genotype father Inheritance rules genotype child, 6 possible values 4 lysis values measured by photometer Blood group determination N SF EURO UZZY Qualitative Knowledge parental error Dam correct Sire correct phenogroup 1 stated dam phenogroup 2 stated dam phenogroup 1 stated sire phenogroup 2 stated sire phenogroup 1 true dam phenogroup 2 true dam phenogroup 1 true sire phenogroup 2 true sire phenogroup 1 offspring phenogroup 2 offspring genotype offspring factor 40 (F1) factor 41 (F2) factor 42 (V1) factor 43 (V2) lysis 40 lysis 41 lysis 42 lysis 43 N SF EURO UZZY Example: Genotype Determination of Jersey Cattle variables: 22, state space 6  1013, parameters: 324 Graphical Model • node  random variable Phenogr.1 (3 diff.) • edges  conditional dependencies Phenogr.2 (3 diff.) Genotype (6 diff.) • decomposition  P ( X 1 ,, X 22 )  22  P( X i | parents ( X i )) i 1 • diagnosis  P(  | knowledge) N SF EURO UZZY Learning Graphical Models from Data • Test whether a distribution is decomposable w.r.t. a given graph. This is the most direct approach. It is not bound to a graphical representation, but can also be carried out w.r.t. other representations of the set of subspaces to be used to compute the (candidate) decomposition of the given distribution. • Find an independence map by conditional independence tests. This approach exploits the theorems that connect conditional independence graphs and graphs that represent decompositions. it has the advantage that a single conditional independence test, if it fails, can exclude several candidate graphs. • Find a suitable graph by measuring the strength of dependences. This is a heuristic, but often highly successful approach, which is based on the frequently valid assumption that in a distribution that is decomposable w.r.t. a graph an attribute is more strongly dependent on adjacent attributes than on attributes that are not directly connected to them. N SF EURO UZZY Is Decomposition Always Possible?       2       large medium small 1    large medium small              large medium small large medium small N SF EURO UZZY Direct Test for decomposability 1. 2. color shape shape size     4. color shape size color shape size                   size      7. color shape size      large medium small large medium small 6. color shape size     large medium small 5. 3. color 8. color shape large medium small size     color shape size               large medium small large medium small  large medium small  large medium small N SF EURO UZZY Evaluation Measures and Search Methods  An exhaustive search over all graphs is too expensive:   n   2 2  possible n undirected graphs for n attributes. i 1  n  i n1  f n     1 i 1  2 i f n  i  possible directed acyclic graphs.  Therefore all learning algorithms consist of an evaluation measure (scoring function), e.g.  Hartley information gain  relative number of occurring value combinations and a (heuristic) search method, e.g.  guided random search  greedy search (K2 algorithm) conditional independence search N SF EURO UZZY Measuring the Strengths of Marginal Dependences  Relational networks: Find a set of subspaces, for which the intersection of the cylindrical extensions of the projections to these subspaces contains as few additional states as possible.  This size of the intersection depends on the sizes of the cylindrical extensions, which in turn depend on the sizes of the projections.  Therefore it is plausible to use the relative number of occurring value combinations to assess the quality of a subspace. subspace color  shape shape  size size  color possible combinations 12 9 12 occurring combinations 6 5 8 relative number 50% 56% 67%  The relational network can be obtained by interpreting the relative numbers as edge weights and constructing the minimal weight spanning tree. N SF EURO UZZY Conditional Independence Tests        Hartley information needed to determine coordinates: log24+ log23= log212 3.58 coordinate pair: log26 2.58 gain: log212- log26= log22 =1 Definition: Let A and B be two attributes and R a discrete possibility measure with adom(A):  bdom(B):R(A=a,B=b)=1 Then     Hartley   A, B   log 2 adom  A R A  a   log 2 bdom  B  RB  b  I gain  log 2 adom  A bdom  B  R A  a, B  b  adom  A R A  a   bdom  B  RB  b   log 2 adom  A bdom  B  R A  a, B  b       is called the Hartley information gain of A and B w.r.t. R. N SF EURO UZZY Conditional Independence Tests (continued)  The Hartley information gain can be used directly to test for (approximate) marginal independence. attributes relative number of possible value combinations Hartley information gain color, shape 6/(3*4)=1/2=50% log23+ log24- log26=1 color, size 6/(3*4)=2/3=67% log23+ log24- log280.58 shape, size 5/(3*3)=5/9=56% log23+ log23- log25  0.85  In order to test for (approximate) conditional independence:  Compute the Hartley information gain for each possible instantiation of the conditioning attributes.  Aggregate the result over all possible instantiations, for instance, by simply averaging them. N SF EURO UZZY Direct Test for Decomposability Definition: Let p1 and p2 be two strictly positive probability distributions on the same set e of events. Then I KLdiv  p1, p2    p1 E  log 2 Ee p1 E  p2  E  is called the Kullback-Leibler information divergence of p1 and p2.  The Kullback-Leibler information divergence is non-negative.  It is zero if and only if p1  p2.  Therefore it is plausible that this measure can be used to asses the quality of the approximation of a given multi-dimensional distribution p1 by the distribution p2 that is represented by a given graph: The smaller the value of this measure, the better the approximation. N SF EURO UZZY Direct Test for Decomposability (continued) 1. 2. A B B C 6. A B C 0 -4401 B C 7. C 0.111 -4563 A B C 0.429 -4830 A B 4. A 0.137 -4612 0.566 -5041 5. 3. A 0.540 -4991 8. A B C 0.402 -4780 C A B C 0 -4401 Upper numbers: The Kullback-Leibler information divergence of the original distribution and its approximation. Lower numbers: The binary logarithms of the probability of an example database (log-likelihood of data). N SF EURO UZZY Evaluation Measures / Scoring Functions Relational Networks Relative number of occurring value combinations  Hartley Information Gain  Probabilistic Networks 2-Measure  Mutual Information / Cross Entropy / Information Gain  (Symmetric) Information Gain Ratio  (Symmetric/Modified) Gini Index  Bayesian Measures (g-function, BDeu metric)  Other measures that are known from Decision Tree Induction  N SF EURO UZZY A Probabilistic Evaluation Measure Mutual Information / Cross Entropy / Information Gain n  based on Shannon entropy H    pi log 2 pi i 1 I gain  A, B   H A  H A|B  H A  H B  H AB    P A  a  log 2 P A  a   adom  A   PB  b  log 2 PB  b  bdom  B    P A  a, B  b  log 2 P A  a, B  b  P B  b   P A  a | B  b  log 2 P A  a | B  b   adom  A  bdom  B  Idea: H A|B    bdom B  adom A  N SF EURO UZZY Possiblity Theory  fuzzy set induces possibility  A   sup   1 A cloudy 2  0  55, 60  3  axioms 50 65 85 100   0  0     1  A  B  max  A ,  B  A  B  min  A ,  B N SF EURO UZZY Possibility Distributions and the Context Model  Let  be the set of all possible states of the world, 0 the actual (but unknown) state.  Let C={c1,…,ck} be a set of contexts (observers, frame conditions etc.), (C,2C,P) a finite probability space (context weights).  Let g:C2 be a set-valued mapping, assigning to each context the most specific correct set-valued specification of 0. g is called a random set (since it is a set-valued random variable); thesets g(c) are also called focal sets.  The induced one point coverage of g or the induced possibility distribution is  g :   0,1  g    Pc  C |   g c . N SF EURO UZZY Database-induced Possibility Distributions Focal Sets A C D a1 b2 c3 d1 D {d1, d2} d3 {d1, d3 , d4} a1 b3 c3 d1 a1 b2 c3 d2 a1 b3 c3 d2 a3 b1 c2 d3 {d1, d4}  a3 b2 c2 d3 a2 b3 c1 d1   Imprecise Database A a1 a3 {a2, a4} B {b2, b3} {b1, b2} b3 {a1, a2 , a3}  b2  C c3 c2 {c1, c2} *  B    Each imprecise tuple – or, more precisely, the set of all precise tuples compatible with it – is interpreted as a focal set of a random set.  In the absence of other information equal weights are assigned to the contexts.  In this way an imprecise database induces a possibility distribution. N SF EURO UZZY Reasoning 0 0 0 0 0 0 0 0 0 70 0 0 0 0 0 0 0 0 0 70 20 10 all numbers in parts per 1000 70 60 10 large 70 0 0 large 0 small 0 0 medium 0 40 70 0 0 small 0 60 • Using the information that the given 10 object is green. 70 70 40 20 40 10 0 0 0 0 l 70 20 10 0 0 0 0 0 0 0 m 70 60 10 0 0 0 0 0 0 0 0 0 0 70 60 10 medium 70 s 20 40 10 N SF EURO UZZY Reasoning with Projections Again the same result can be obtained using only projections to subspaces (maximal degrees of possibility): s m 0 0 0 70 new old 90 80 80 old new 90 70 70 old size new 40 shape min new 40 80 10 70 0 70 0 0 30 10 70 60 0 0 0 60 80 90 20 10 0 10 0 0 color max l 70 new old 70 80 max min 60 70 10 90 20 80 70 20 70 70 40 70 20 40 60 20 90 60 30 10 10 10 s This justifies a network representation: color 70 old new column new line 70 shape m size l N SF EURO UZZY POSSINFER N SF EURO UZZY Possibilistic Evaluation Measures / Scoring Functions  Specificity Gain [Gebhardt and Kruse 1996, Borgelt et al. 1996]  (Symmetric) Specificity Gain Ratio [Borgelt et al. 1996]  Analog of Mutual Information [Borgelt and Kruse 1997]  Analog of the 2-measure [Borgelt and Kruse 1997] N SF EURO UZZY Possibilistic Evaluation Measures Reduction to the relational case via -cuts log21 + log21 - log21 = 0 0.4 0.4 log22 + log22 - log23  0.42 0.3 0.3 log23 + log22 - log25  0.26 0.2 0.2 log24 + log23 - log28  0.58 0.1 0.1 log24 + log23 - log212 = 0 0 0 Usable relational measures  relative number of value combinations/Hartley information gain  specificity gain  number of additional value combinations in the Cartesian product of the marginal distributions N SF EURO UZZY Specificity Gain Definition: Let A and B be two attributes and  a possibility measure.  S gain ( A, B)   sup log 2 ( 0  log 2 (  log 2 (   ( A  a)) adom ( A)   ( A a)) adom ( A)    ( A  a, B  b)) d adom ( A) bdom ( B ) is called the specificity gain of A and B w.r.t. .  Generalization of Hartley information gain on the basis of the -cut view of possibility distributions.  Analogous to Shannon information gain. N SF EURO UZZY Specificity Gain in the Example projection to subspace 40 30 80 80 10 90 10 70 20 s 20 40 90 m 80 70 60 l 70 20 30 large 40 medium 60 small 80 70 80 90 20 70 40 minimum of marginals 70 60 10 70 70 40 80 70 80 80 70 90 70 70 70 s 70 80 90 m 70 70 70 l 70 80 80 large 70 medium 80 small 80 70 80 90 70 70 70 specificity gain 70 70 70 0.055 bit 0.048 bit 70 70 70 0.027 bit N SF EURO UZZY Learning Graphical Models from Data 1. 2. color shape shape size     4. color shape size color shape size                   size      7. color shape size      large medium small large medium small 6. color shape size     large medium small 5. 3. color 8. color shape large medium small size     color shape size               large medium small large medium small  large medium small  large medium small N SF EURO UZZY Data Mining Tool Clementine N SF EURO UZZY Analysis of Daimler/Chrysler Database electrical roof top air conditioning faulty battery type of engine faulty compressor type of tyres slippage control faulty brakes Fictitious example: There are significantly more faulty batteries, if both air conditioning and electrical roof top are built into the car. N SF EURO UZZY Example Subnet Influence of special equipment on battery faults: (fictitious) frequency of battery faults electrical sliding roof with without air conditioning with without 8% 3% 3% 2%  significant deviation from independent distribution  hints to possible causes and improvements  here: larger battery may be required, if an air conditioning system and an electrical sliding roof are built in (The dependencies and frequencies of this example are fictious, true numbers are confidential.) N SF EURO UZZY Resources http://fuzzy.cs.uni-magdeburg.de free software tools such as NEFCLASS, … C. Borgelt, R. Kruse: Graphical models – Methods for data analysis and mining Wiley, Chichester, January 2002. N SF EURO UZZY

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Information Mining with Relational and Possibilistic Graphical Models