Download KMC-20050525 - Kansas State University

First Annual LEAP-KMC Workshop Data Mining and Machine Learning for Predicting Energetics Wednesday, 25 May 2005 William H. Hsu Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org This presentation is: http://www.kddresearch.org/KSU/CIS/KMC-20050525.ppt Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Outline of Talk • Research Overview – Data Mining in LEAP-KMC: problem specification – Design Choices for Machine Learning • General Approaches – Time Series – Unsupervised Learning – Supervised Learning: Spatial Pattern Recognition (see Methodology) • Machine Learning Methodologies – – – – Traditional Inducers (Waikato Environment for Knowledge Analysis) Graphical Models: Bayesian Networks, DBNs Genetic and Evolutionary Computation (GEC) Artificial Neural Networks (ANNs) • Early Results • Data Model Requirements • Current and Continuing Work Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Data Mining in LEAP-KMC Rahman, Kara, et al. (2004) Central Atom 1st shell 2nd shell 3rd shell Source Data Data Mining Process Target Model: Equations, Parametric Form Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Design Choices for Machine Learning Determine Type of Training Experience “Keyframed” States & Intermediate Trajectories Typed Transitions Occupancy Vector State Representation Determine Target Function Transition Prediction Model External Function Estimator Determine Representation of Learned Function Polynomial Linear Comb. of Occupancy Variables Artificial neural network Determine Learning Algorithm Linear Regression Gradient Descent Completed Design Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Linear Time Series Models • Linear Models – Moving Average (MA(q)) model aka finite impulse response (FIR) filter q • x t    b j  e t  j  j 0 • x(t) = observed stochastic process • e(t) = external signal (e.g., white Gaussian noise) – Autoregressive (AR(p)) model aka infinite impulse response (IIR) filter p • x t    ai  x t  i   e t  i 0 p • If e(t) is additive WGN: x t    ai  x t  i  i 0 – Autoregressive moving average (ARMA(p,q)) model p q i 0 j 0 • x t    ai  x t  i    b j  e t  j  • Combines AR and MA model parameters (can express either or both) • Order of a Linear Model: p, q (e.g., AR(1)) • Learning: Finding Hyperparameters (p, q), Parameters (ai, bj) Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Temporal Probabilistic Reasoning • Goal: Estimate P(X ti | y 1r ) • Filtering: r = t Adapted from Murphy (2001), Guo (2002) – Intuition: infer current state from observations – Applications: signal identification – Variation: Viterbi algorithm • Prediction: r < t – Intuition: infer future state – Applications: prognostics • Smoothing: r > t – Intuition: infer past hidden state – Applications: signal enhancement • Applications: Modeling – State transitions (prediction) – Energetics (estimation) Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Unsupervised Learning [1]: Objectives • Unsupervised Learning – Given: data set D x Supervised Learning fˆx  f(x) x Unsupervised Learning y • Vectors of attribute values (x1, x2, …, xn) • No distinction between input attributes and output attributes (class label) – Return: (synthetic) descriptor y of each x • Clustering: grouping points (x) into inherent regions of mutual similarity • Vector quantization: discretizing continuous space with best labels • Dimensionality reduction: projecting many attributes down to a few • Feature extraction: constructing (few) new attributes from (many) old ones • Intuitive Idea – Want to map independent variables (x) to dependent variables (y = f(x)) – Need to discover y based on numerical criterion (e.g., distance metric) – Don’t always know what “dependent variables” (y) are Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Unsupervised Learning [2]: Clustering DimensionalityReducing Projection (x’) Clusters of Similar Records Delaunay Triangulation Voronoi (Nearest Neighbor) Diagram (y) Cluster Formation and Segmentation Algorithm (Sketch) Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Graphical Models [1]: Bayesian Networks • Conditional Independence – X is conditionally independent (CI) from Y given Z (sometimes written X  Y | Z) iff P(X | Y, Z) = P(X | Z) for all values of X, Y, and Z – Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning)  T  R | L • Bayesian (Belief) Network – Acyclic directed graph model B = (V, E, ) representing CI assertions over  – Vertices (nodes) V: denote events (each a random variable) – Edges (arcs, links) E: denote conditional dependencies • n Markov Condition for BBNs (Chain Rule): P X , X ,  , X   P X | parents X   i 1 2 n i i 1 Each node is conditionally independent of all others given its Markov blanket: parents + children + children’s parents From slides for Russell & Norvig 2e (2003) Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Graphical Models [2]: Inference and Learning • General-Case BBN Structure Learning: Use Inference to Compute Scores • Optimal Strategy: Bayesian Model Averaging – Assumption: models h  H are mutually exclusive and exhaustive – Combine predictions of models in proportion to marginal likelihood • Compute conditional probability of hypothesis h given observed data D • i.e., compute expectation over unknown h for unseen cases • Let h  structure, parameters   CPTs     P x m 1 | D  P x 1 , x 2 ,  , x n | x 1 , x 2  ,  , x m    P x m 1 | D, h  P h | D         hH Posterior Score Marginal Likelihood Prior over Parameters P h | D   P D | h   P h   P h   P D | h, Θ   P Θ | h  dΘ  Prior over Structures Kansas State University KDD Lab (www.kddresearch.org) Likelihood Kansas State University Department of Computing and Information Sciences Genetic and Evolutionary Computation [1]: Simple Genetic Algorithm (SGA) • Algorithm Simple-Genetic-Algorithm (Fitness, Fitness-Threshold, p, r, m) // p: population size; r: replacement rate (aka generation gap width), m: string size – P  p random hypotheses // initialize population – FOR each h in P DO f[h]  Fitness(h) // evaluate Fitness: hypothesis  R – WHILE (Max(f) < Fitness-Threshold) DO – 1. Select: Probabilistically select (1 - r)p members of P to add to PS P hi   – 2. Crossover:  f hi  p j 1   f hj – Probabilistically select (r · p)/2 pairs of hypotheses from P – FOR each pair <h1, h2> DO PS += Crossover (<h1, h2>) // PS[t+1] = PS[t] + <offspring1, offspring2> – 3. Mutate: Invert a randomly selected bit in m · p random members of PS – 4. Update: P  PS – 5. Evaluate: FOR each h in P DO f[h]  Fitness(h) – RETURN the hypothesis h in P that has maximum fitness f[h] Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Genetic and Evolutionary Computation [2]: Genetic Programming (GP) Adapted from The Genetic Programming Notebook © 2002 Jaime J. Fernandez http://www.geneticprogramming.com Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Artificial Neural Networks [1]: Backpropagation of Error • Recall: Backprop Training Rule Derived from Error Gradient Formula • Algorithm Train-by-Backprop (D, r) – r: constant learning rate (e.g., 0.05) – Initialize all weights wi to (small) random values – UNTIL the termination condition is met, DO FOR each <x, t(x)> in D, DO Input the instance x to the unit and compute the output o(x) = (net(x)) FOR each output unit k, DO δk  ok x 1 ok x t k x   ok x  FOR each hidden unit j, DO δ j  h j x 1  h j x   k outputs Output Layer o1 Hidden Layer h1 v j ,kδk o2 h2 h3 v42 h4 u 11 Input Layer x1 x2 x3 Update each w = ui,j (a = hj) or w = vj,k (a = ok) wstart-layer, end-layer  wstart-layer, end-layer +  wstart-layer, end-layer wstart-layer, end-layer  r end-layer aend-layer – RETURN final u, v Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Artificial Neural Networks [2]: Recurrence and Time Series • Representing Time Series with ANNs – Feedforward ANN: y(t + 1) = net (x(t)) – Need to capture temporal relationships • Solution Approaches – Directed cycles – Feedback • Output-to-input [Jordan] • Hidden-to-input [Elman] • Input-to-input – Captures time-lagged relationships • Among x(t’  t) and y(t + 1) • Among y(t’  t) and y(t + 1) – Learning with recurrent ANNs • Elman, 1990; Jordan, 1987 • Principe and deVries, 1992 • Mozer, 1994; Hsu and Ray, 1998 Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Mixture Models, Instance-Based Learning (IBL), and Radial Basis Functions (RBFs) • Mixture Model Construction [cf. McCullagh and Nelder, 1990] – Update-Inducer • Single training step for each expert module  • e.g., ANN: one backprop cycle, aka epoch – Compute-Activation g1 • Depends on ME architecture Gating Network • Idea: smoothing of “winner-take-all” (“hard” max) • Softmax activation function (Gaussian mixture model) gl  e k   w l x e   w j x g2 x y1 Expert Network y2 Expert Network j 1 • Possible Modifications – Batch (as opposed to online) updates: lift Update-Weights out of outer FOR loop – Classification learning (versus concept learning): multiple yj values – Arrange gating networks (combiner inducers) in hierarchy (HME) Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Kernel Methods: Radial Basis Function (RBF) Networks and Support Vector Machines (SVM) • What Are RBF Networks? – Global approximation to target function f, in terms of linear combination of local approximations – Typical uses: image, signal classification – Different kind of artificial neural network (ANN) – Closely related to distance-weighted regression, but “eager” instead of “lazy” • Activation Function … 1 … a1(x) a2(x) an(x) k – where ai(x) are attributes describing instance x and f x   w 0   w u  Ku d x u , x  u 1 – Common choice for Ku: Gaussian kernel function Ku d xu , x   e  1 2 d  xu , x  2σu2 [Mitchell, 1997] Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Results [1]: Supervised Learning – Energy Estimation Results for 36-bit occupancy vector, 10-fold cross-validation Target attribute: external energy function (numeric) Source data: Baza C500, Step16MDD Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Results [2]: Unsupervised Learning - Clustering Cluster Tree for 36-bit Occupancy Vector Database from Cobweb – WEKA 238 merges, 186 splits, 1106 clusters Visualization of Energy (x) vs. Cluster Membership (y) using Clusters found by Expectation-Maximization (EM) – WEKA 20 clusters, log likelihood = -11.66 Kansas State University KDD Lab (www.kddresearch.org) Customizable Visualization Interface (Future Work): King Kansas State University Department of Computing and Information Sciences Results [3]: Unsupervised and Supervised - Classification Results for 36-bit occupancy vector, Unsupervised Discretize (scalar quantization) filter Algorithm: J48 (cf. Quinlan’s C4.5) Correctly Classified Instances Incorrectly Classified Instances Mean absolute error Root mean squared error 559 154 0.1683 0.3532 78.4011 % 21.5989 % 3 bins === Confusion Matrix === a b c <-- classified as 115 15 3 | a = '(-inf-0.333333]' 14 221 59 | b = '(0.333333-0.666667]' 2 61 223 | c = '(0.666667-inf) Correctly Classified Instances Incorrectly Classified Instances Mean absolute error Root mean squared error 362 351 0.1065 0.281 50.7714 % 49.2286 % === Confusion Matrix === a b c d e f g h i j 26 2 1 2 1 0 0 0 0 0 2 22 2 2 0 0 1 0 0 0 0 1 32 10 2 4 1 0 2 0 2 7 10 20 7 3 3 1 1 0 0 0 0 7 23 15 7 4 0 0 0 0 2 1 15 71 21 9 7 0 0 2 1 3 7 19 60 24 13 1 0 0 2 0 3 8 27 40 19 4 0 0 0 0 3 7 11 20 57 6 0 0 0 1 1 2 1 4 7 11 | | | | | | | | | | <-- classified as a = '(-inf-0.1]' b = '(0.1-0.2]' c = '(0.2-0.3]' d = '(0.3-0.4]' e = '(0.4-0.5]' f = '(0.5-0.6]' g = '(0.6-0.7]' h = '(0.7-0.8]' i = '(0.8-0.9]' j = '(0.9-inf)' Kansas State University KDD Lab (www.kddresearch.org) 10 bins Kansas State University Department of Computing and Information Sciences Plan and Progress • Progress to Date – Supervised, unsupervised learning: baseline experiments using WEKA – Draft design of extended data model and annotators • Timeline – 4th Q 2004 - 1st Q 2005: from SL-KMC to LEAP-KMC – Understanding, redesigning KMC data models: database, trace files, etc. – Preliminary experiments using traditional inducers – 2nd Q 2005: extensible data models and scalable learning algorithms – Data model for 36-bit, 200-bit geometry: transitions, in-betweening – Refined learning algorithms: interpolation, prediction – 2nd half 2005: dynamic models, general ANN & GP architecture; info vis – 2005-2006: theory-guided constructive induction appplication – 2006-2007: 3-D spatial representations (esp. graphical models) – 2007 and beyond: learning equivalence classes, abstractions Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Continuing Work: Change of Representation (Parameter Tuning Wrappers) Dtrain (Inductive Learning) D: Training Data [2] Representation Evaluator for Learning Problems Dval (Inference)  I:e Inference Specification α f(α) Representation Candidate Fitness Representation Genetic Wrapper for Change of Representation and Inductive Bias Control [1] Genetic Algorithm α̂ Optimized Representation Hsu, Guo, Perry, Stilson (GECCO 2002); Hsu (Information Sciences, 2005) Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Future Work: 3-D Modeling and Graphical Models Dynamic Bayes Net for Predicting 3-D Energetics Continuing Work: Speeding up Approximate Inference using Edge Deletion - J. Thornton (2005) Bayesian Network tools in Java (BNJ) v4 - W. Hsu, J. M. Barber, J. Thornton (2005) Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences References: Machine Learning, KDD, Monte Carlo Simulation • Machine Learning, Data Mining, and Knowledge Discovery – K-State KDD Lab: literature survey and resource catalog (2005) http://www.kddresearch.org/Resources – Bayesian Network tools in Java (BNJ): Hsu, Barber, Guo, King, Thornton http://bnj.sourceforge.net – KDD Lab Thesis Repository http://www.kddresearch.org Learning in Computational Physics – Illinois Genetic Algorithms Lab (IlliGAL) http://www-illigal.ge.uiuc.edu – K-State Visualization Lab http://www.kddresearch.org/Groups/Visualization Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences Acknowledgements • Kansas State University Lab for Knowledge Discovery in Databases – Undergraduate programmers: Andrew L. King ([email protected]), Joanne Stone ([email protected]) – Other grad students: Jeffrey M. Barber ([email protected]) – Graduate research assistants: Jason Li ([email protected]), Waleed Al-Jandal ([email protected]) • Joint Work with – KSU Department of Physics: Dr. Talat Rahman, Dr. Abdelkader Kara, Dr. Ahlam Al-Rawi, Altaf Karim – KSU Computing and Information Sciences: Dr. Virgil Wallentine, Charlie Thornton, Arthi Ramakrishnan • Other Research Affliates – University of Toledo: Dr. Jacques Amar – Russian Academy of Sciences: Dr. Oleg Trushin – Hong Kong University of Science & Tech: Dr. Haipeng Guo Kansas State University KDD Lab (www.kddresearch.org) Kansas State University Department of Computing and Information Sciences

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download KMC-20050525 - Kansas State University