Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15th 2006 Overview I. Causal Models II. Learning Algorithms III. Performance Modeling IV. Extensions Causal Performance Models Pag.2 I. Multivariate Analysis Variables Experimental data Probabilistic model of joint distribution? Relational information? A priori unknown relations Causal Performance Models Pag.3 A. Representation of distributions Factorization P(A, B, C, D)=P(A).P(B|A).P(C|A, B).P(D|A, B, C) Reduction of factorization complexity C B P(C|A, B)=P(C|B) ó A Bayesian Network A, B, C, D Ordering 1 B C A C D B A C B Causal Performance Models Pag.4 D D D A D A A, D, B, C Ordering 2 A C B B C C C B D B B. Representation of Independencies Conditional independence P(A|B, C) = P(A|B) ó A C B Qualitative property: P(rain|quality of speech)=P(rain)? Markov condition in graph Variable becomes independent from all its non-descendants by conditioning on its direct parents. – graphical d-separation criterion A B D Causal Performance Models Pag.5 C A C B B d-separates A from C A A is d-separated from D D A is not d-separated from D, given B A D B Faithfulness Independence-map: All independencies in the Bayesian network appear in the distribution Faithfulness: Joint Distribution ó Directed Acyclic Graph Conditional independencies ó d-separation Theorem: if a faithful graph exists, it is the minimal factorization. Causal Performance Models Pag.6 C. Representation of Causal Mechanisms Model of the underlying physical mechanisms Definition through interventions do(A=a) A do(A=a) B P(B|A=a) B A P(B) causal model + Conditional Probability Distributions + Causal Markov Condition = Bayesian network Causal Performance Models Pag.7 Reductionism Causal modeling = reductionism Canonical representation: unique, minimal, independent Building block = P(Xi|parents(Xi)) Whole theory is based on this modularity X2 X1 X3 Intervention = change of block X4 X5 Causal Performance Models Pag.8 do(X3=a) X1 X2 X3 =a X5 X4 Ultimate motivation for causality If causal mechanisms are unrelated model is faithful Model = canonical representation able to explain all qualitative properties (independencies) close to reality Causal Performance Models Pag.9 II. Learning Algorithms Two types: Constraint-based based on the independencies Scoring-based searches set of all models, give a score of how good they represent distribution Causal Performance Models Pag.10 Step 1: Adjacency search Property: adjacent nodes do not become independent Algorithm: start with full-connected graph check for marginal independencies check for conditional independencies A A C C Causal Performance Models Pag.11 D A A B C C D B D B Step 2: Orientation A Property: B V-structure can be recognized Algorithm: look for v-structures derived rules A C D Causal Performance Models Pag.12 A B C A B C A B A D A D C C A C C B B B C A A D A C A C B B A C A C B Assumptions General statistical assumptions: No selection bias Random sample Sufficient data for correctness of statistical tests Underlying network is faithful Causal sufficiency A C No unknown common causes B Causal Performance Models Pag.13 Criticism Definition causality? About predicting the effect of changes to the system Faithfulness assumption Eg.: accidental cancellation Causal Markov Condition X U V Y “All relations are causal” Learning algorithms are not robust Statistical tests make mistakes Causal Performance Models Pag.14 X Y Part III: Performance Analysis High-Performance computing parallel system 1 processor Performance Questions: Performance prediction Parameter-dependency? Reasons of bad performance? System-dependency? Effect of Optimizations? Causal Performance Models Pag.15 PhD?? Causal modeling (cf. COMO lab, VUB) Representation form Close to reality Learning algorithms TETRAD tool (open-source, java) Causal Performance Models Pag.16 Performance Models Aim performance analysis Support software developer High-performance applications Expected properties offer insight into causes performance degradation prediction estimate effect of optimizations reusable submodels separate application and system-dependency reason under uncertainty causal models Causal Performance Models Pag.17 Integrated in statistical analysis Statistical characteristics Regression analysis Probability table compression Outlier detection Application Experiments 1 Profiling 2 Model Construction Causal Model 3 Except ions Database Di 4 ve r ge User Inspection Causal Performance Models Pag.18 Curve Fitting nc es CPT compression Analytical Model Iterative process 1. Perform additional experiments 2. Extract additional characteristics 3. Indicate exceptions 4. Analyze the divergences of the data points with the current hypotheses A. Model construction Model of computation time of LU decomposition algorithm n #op #instrop Cop datatype L1Mop elementsize fclock L2Mop elementsize (redundant variable) is sufficient for influence datatype -> cache misses regression analysis on submodels X=f(parents) analysis of parameters Causal Performance Models Pag.19 Tcomp B. Detection of unexpected dependencies Point-to-point communication performance background communication Causal Performance Models Pag.20 C. Finding explanations for outliers Exceptional data in communication performance measurements Probability table compression X P(X=1) X0 X1 0 0 X2 1 X3 1 Y } } Causal Performance Models Pag.21 Y0 Y0 Y1 Y1 => derived variable Interesting features IV. Complexity of Performance Data Mixture discrete and continuous variables Mutual Information & Kernel Density Estimation Non-linear relations Mutual Information & Kernel Density Estimation Deterministic relations Augmented models & Complexity criterion Context variables Work in progress Context-specific independencies Work in progress Causal Performance Models Pag.22 A. Information-theoretic Dependency Entropy of random variable X Mutual Information Causal Performance Models Pag.23 Discretized entropy for continuous variable B. Kernel Density Estimation See applets Trade-off maximal entropy <> typicalness Conclusions Limited number data points needed Discretization of continuous data justified Form-free dependency measure Causal Performance Models Pag.24 C. Deterministic relations X Y Y=f(X) Z Y becomes independent from Z conditioned on X ~ violation of the intersection condition (Pearl ’88) Not faithfully describable X X Y Z Y X Z Z Y Solution: augmented causal model - add regularity to model - adapt inference algorithms Causal Performance Models Pag.25 Z X Z Y Y Z X The Complexity Criterion X & Y contain equivalent information about Z X Y Select simplest relation X Y Z Z Complexity(Y-Z) < Complexity(X-Z) Causal Performance Models Pag.26 Augmented causal model Restrict conditional independencies Generalize d-separation Reestablish faithfulness { X S Z Y X eq Z Y S eq Consistent models under Complexity Increase assumption X Y Z Compl(X-Z) ≥ Compl(X-Y) Compl(X-Z) ≥ Compl(Y-Z) Causal Performance Models Pag.27 Theory works! Deterministic B A Causal Performance Models Pag.28 Probabilistic Conclusions Benefit of the integration of statistical techniques Causal modeling is a challenge – wants to know the inner from the outer More information – http://parallel.vub.ac.be – http://parallel.vub.ac.be/~jan Causal Performance Models Pag.29