Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Operations Research & Data Mining Siggi Olafsson Associate Professor Department of Industrial Engineering Iowa State University 20th European Conference on Operational Research Rhodes, Greece, July 5 - 8 Purpose of Talk Should I be here? Give a definition and an overview of data mining as it relates to operations research Present some examples to give the flavor for the type of work that is possible My views and future of OR and data mining Aim for it to be accessible without prior knowledge of data mining 20th European Conference on Operational Research, July 4-7, 2004 2 Overview Background Intersection of OR and Data Mining Optimization algorithms used for data mining Data mining used in OR applications Production scheduling Optimization methods applied to output of standard data mining algorithms Data visualization Attribute selection Classification Unsupervised learning Selecting and improving decision trees Open research areas 20th European Conference on Operational Research, July 4-7, 2004 3 Background Rapidly growing interest in data mining among operations research academics and practitioners For example evidenced by increased data mining presence in professional organizations New INFORMS Section on Data Mining Large number of data mining sessions at INFORMS and IIE research conferences Special issues in Computers & Operations Research, IIE Transactions, Discrete Applied Mathematics, etc. Numerous presentations/sessions at this conference 20th European Conference on Operational Research, July 4-7, 2004 4 What is Data Mining? 20th European Conference on Operational Research, July 4-7, 2004 5 What is Data Mining, Really? Extracting meaningful, previously unknown patterns or knowledge from large databases The knowledge discovery process Define Objective Business/scientific objective Data mining objective Prepare Data Mine Knowledge Classification Data cleaning Association rule Data selection discovery Attribute selection Clustering Visualization 20th European Conference on Operational Research, July 4-7, 2004 Interpret Results Predictive models Structural insights 6 Interdisciplinary Field Statistics Machine Learning Data Mining Databases Optimization 20th European Conference on Operational Research, July 4-7, 2004 7 Input Engineering Preparing the data may take as much as 70% of the entire effort Numerous steps, including Combining data sources Transforming attributes Data cleaning Data selection Attribute selection Data visualization Many of those have connections with operations research and optimization in particular 20th European Conference on Operational Research, July 4-7, 2004 8 Overview Background Intersection of OR and Data Mining Optimization algorithms used for data mining Data mining used in OR applications Production scheduling Optimization methods applied to output of standard data mining algorithms Data visualization Attribute selection Classification Unsupervised learning Selecting and improving decision trees Open research areas 20th European Conference on Operational Research, July 4-7, 2004 9 Data Visualization Visualizing the data is important in any data mining project Generally difficult because the data is always high-dimensional, i.e., hundreds or thousands of attributes (variables) How can we best visualize such data in 2 or 3 dimensions? Traditional techniques include multidimensional scaling, which uses nonlinear optimization 20th European Conference on Operational Research, July 4-7, 2004 10 Optimization Formulation Recent combinatorial optimization formulation by AbbiwJackson, Golden, Raghavan, and Wasil (2004) Map a set M of m points from Rr to Rq, q = 2,3 Approximate the q-dimensional space by a lattice N min F d iM jM kN lN j 1 s.t. x kN (i, j ), d new (k , l ) xik x jl original ik 1, i M xik 0,1 d original(i, j ) Distance measure in R r d new (k , l ) Distance measure in R q F Function such as least square, Sammon map, etc 20th European Conference on Operational Research, July 4-7, 2004 11 Solution Methods Quadratic Assignment Problem (QAP) Not possible to solve exactly for large scale problems Local search procedure proposed Key to the formulation is selection of objective function, e.g., Sammon map min 1 d original(i, j ) iM iM jM j i d original(i , j ) d new ( k , l ) xik x jl jM kN lN j i 20th European Conference on Operational Research, July 4-7, 2004 2 d original(i, j ) 12 Overview Background Intersection of OR and Data Mining Optimization algorithms used for data mining Data mining used in OR applications Production scheduling Optimization methods applied to output of standard data mining algorithms Data visualization Attribute selection Classification Unsupervised learning Selecting and improving decision trees Open research areas 20th European Conference on Operational Research, July 4-7, 2004 13 Attribute Selection Usually large number of attributes Some attributes are redundant or irrelevant and should be removed Benefits: Faster subsequent induction Simpler models (important in data mining) Better (predictive) performance of models Discover which attributes are important (descriptive or structural knowledge) 20th European Conference on Operational Research, July 4-7, 2004 14 Optimization Formulation Define decision variable 1, if attribute j is selected, xj otherwise. 0, Combinatorial optimization problem max f x x1 , x2 ,..., xn s.t. x j 0,1 j x Number of solutions is 2n-1 How should the objective function be defined? 20th European Conference on Operational Research, July 4-7, 2004 15 Solution Methods Non-linear objective function (Defining a good objective is a major issue) Mathematical programming approach (Bradley, Mangasarian and Street, 1998) Metaheuristics have been applied extensively Genetic algorithms, simulated annealing Nested partitions method (Olafsson and Yang, 2004) Intelligent partitioning: take advantage of what is known in data mining about evaluating attributes Random instance sampling: in each step the algorithm uses a sample of instances, which improves scalability 20th European Conference on Operational Research, July 4-7, 2004 16 Learning from Data Each data point (instance) represents an example from which we can learn The instances are either Labeled (supervised learning) One attribute is of special interest (called the class or target) and each instance is labeled by its class value Unlabeled (unsupervised learning) Instances are assumed to be independent (However, spatial and temporal data mining are active areas of research) 20th European Conference on Operational Research, July 4-7, 2004 17 Learning Tasks in Data Mining Classification (supervised learning) Clustering (unsupervised learning) Learn how to classify data in one of a given number of categories or classes Learn natural groupings (clusters) of data Association Rule Discovery Learn correlations (associations) among the data instances Also called market basket analysis 20th European Conference on Operational Research, July 4-7, 2004 18 Overview Background Intersection of OR and Data Mining Optimization algorithms used for data mining Data mining used in OR applications Production scheduling Optimization methods applied to output of standard data mining algorithms Data visualization Attribute selection Classification Unsupervised learning Selecting and improving decision trees Open research areas 20th European Conference on Operational Research, July 4-7, 2004 19 Classification Classification is the most common learning task in data mining Many methods have been proposed Decision trees, neural networks, support vector machines, Bayesian networks, etc. The algorithm is trained on part of the data and the accuracy tested on independent data (or use cross-validation) Optimization is relevant to many classification methods 20th European Conference on Operational Research, July 4-7, 2004 20 Optimization Formulation Suppose we have n attributes and each instance has been labeled as belonging to one of two classes Represent by two matrices A and B Need to learn what separates the points in the two sets (if they can be separated) In a 1965 Operations Research article, Olvi Mangasarian studied the case where the two sets can be separated with a hyperplane: Aw e , Bw e , wx 0 20th European Conference on Operational Research, July 4-7, 2004 21 Separating Hyperplane x2 Closest points in convex hulls Class A Class B d c Separating hyperplane x1 20th European Conference on Operational Research, July 4-7, 2004 22 Finding the Closest Points Formulate as QP: min c ,d s.t. 1 2 cd 2 c i xi i:Class A d x i i i:Class B i 1 i 1 i:Class A i:Class B i 0 20th European Conference on Operational Research, July 4-7, 2004 23 Support Vector Machines Support Vectors Class A x2 Class B Separating Hyperplane x1 20th European Conference on Operational Research, July 4-7, 2004 24 Limitations The points (instances) may not be separable by a hyperplane Add error terms to minimize A linear separation is quite limited x2 Class A Class B x1 Solution is to map the data to a higher dimensional space 20th European Conference on Operational Research, July 4-7, 2004 25 Wolfe Dual Problem First formulate the Wolfe dual 1 w i i j yi y j x i x j 2 i, j i 0 i C 2 max α subject to y i i 0. i Now the data only appears in the dot product in the objective function 20th European Conference on Operational Research, July 4-7, 2004 26 Kernel Functions Use kernel functions to map the data and replace the dot product with K (x, y ) (x) (y ) : Rn H For example, K (x, y ) (x y 1) p x y / 2 2 2 K (x, y ) e K (x, y ) tanh( x y ) 20th European Conference on Operational Research, July 4-7, 2004 27 Other Classification Work Extensive publications on SVM and mathematical programming for classifications Several other approaches also relevant, e.g. Logical Analysis of Data (LAD) learns logical expressions to classify the target attribute (series of papers by Hammer, Boros, et al.) Related approach is Logic Data Miner Lsquare (e.g., talk by Felici, Truemper, and Paola last Monday) Bayesian networks are often used, and finding the best structure of such networks is a combinatorial optimization problem Further discussed in the next talk 20th European Conference on Operational Research, July 4-7, 2004 28 Overview Background Intersection of OR and Data Mining Optimization algorithms used for data mining Data mining used in OR applications Production scheduling Optimization methods applied to output of standard data mining algorithms Data visualization Attribute selection Classification Unsupervised learning Selecting and improving decision trees Open research areas 20th European Conference on Operational Research, July 4-7, 2004 29 Data Clustering Now we do not have labeled data to train (unsupervised learning) Want to identify natural clusters or groupings of data instances Many possible set of clusters What makes a set of clusters good? 20th European Conference on Operational Research, July 4-7, 2004 30 Optimization Formulation Given a set A of m points, find the centers Cj of k clusters that minimize the 1-norm m min C ,D s.t. T min e Dij i 1 j Dij AiT C j Dij , i 1,..., m; j 1,..., k This formulation is due to Bradley, Mangasarian, and Street (1997) Much more work is needed in this area 20th European Conference on Operational Research, July 4-7, 2004 31 Association Rule Discovery Find strong associations among instances (e.g., high support and confidence) Originally used in market basket analysis, e.g., what products are candidates for cross-sell, upsell, etc. Define an item as an attribute-value pair Algorithm approach (Agrawal et al., 1992, Apriori and related methods): Generate frequent item sets with high support Generate rules from these sets with high confidence 20th European Conference on Operational Research, July 4-7, 2004 32 Objectives for Association Rules Want high support and high confidence Maximizing support would lead to only discovering a few trivial rules (those that occur very frequently) Maximizing confidence leads to obvious rules (those that are 100% accurate) Support and confidence are usually treated as constraints (user specified minimum) Still need measures for good rules (i.e., rules that add insights and are hence interesting) Significant opportunities for optimizing the rules that are obtained (not much work, yet) 20th European Conference on Operational Research, July 4-7, 2004 33 Overview Background Intersection of OR and Data Mining Optimization algorithms used for data mining Data mining used in OR applications Production scheduling Optimization methods applied to output of standard data mining algorithms Data visualization Attribute selection Classification Unsupervised learning Selecting and improving decision trees Open research areas 20th European Conference on Operational Research, July 4-7, 2004 34 Data Mining for OR Applications Data mining can be used to complement traditional OR methods in many areas Example applications areas: E-commerce Supply chain management (e.g., to enable customer-value management in the chain) Production scheduling 20th European Conference on Operational Research, July 4-7, 2004 35 Data Mining for Scheduling Production scheduling is often ad-hoc in practice Experience and intuition of human schedulers Li and Olafsson (2004) propose a method to learn directly from production data Benefits Make scheduling practices explicit Incorporate in automatic scheduling system Insights into operations Improve schedules 20th European Conference on Operational Research, July 4-7, 2004 36 Background Scheduling task Given a finite set of jobs, sequence the jobs in order of priority Many simple dispatching rules available Machine learning in scheduling Considerable work over two decades Expert systems Inductive learning Select dispatching rules from simulated data Has not been applied directly to scheduling data (which would be data mining) 20th European Conference on Operational Research, July 4-7, 2004 37 Simple Example: Dispatching List Job ID Release Time Start Time Processing Time Completion Time J5 J1 J3 J4 J2 0 10 18 0 30 0 17 32 52 59 17 15 20 7 5 17 32 52 59 64 How were these five jobs scheduled? Longest processing time first (LPT) 20th European Conference on Operational Research, July 4-7, 2004 38 Data Mining Formulation Determine the target concept Dispatching rules are a pair-wise comparison Learning task: Given two jobs, which job should be dispatched first? Data preparation Construct a flat file Each line (instance/data object) is an example of the target concept 20th European Conference on Operational Research, July 4-7, 2004 39 Prepared Data File Job 1 Processing Time1 Release 1 J1 15 10 J2 5 30 Yes J1 15 10 J3 20 18 Yes J1 15 10 J4 7 0 Yes J1 15 10 J5 17 0 No J2 5 30 J1 15 10 No J2 5 30 J3 20 18 No J2 5 30 J4 7 0 No Job 2 Processing Time2 20th European Conference on Operational Research, July 4-7, 2004 Release 2 Job1Scheduled First 40 Input Engineering Attribute creation (i.e., composite attributes) and attribute selection is an important part of data mining Add attributes: ProcessingTimeDifference ReleaseDifference Job1Longer Job1ReleasedFirst Select the best subset of attributes Apply the C4.5 decision tree algorithm 20th European Conference on Operational Research, July 4-7, 2004 41 Decision Tree Job 1 Longer? Yes No Job 1 Released First? No Yes Yes LPT for released jobs Job 1 Released First? Yes Processing Time Difference 5 Processing Time Difference >5 No Do not wait for Job 1 if not much longer than Job 2 No Yes -8 No No > -8 Yes Wait for Job 1 to be released if it is much longer than Job 2 20th European Conference on Operational Research, July 4-7, 2004 42 Structural Knowledge The dispatching rule is LPT Mine data that use this rule and the processing time and release time data The induced model takes into account: Possible range of processing times Largest delay caused by a not released job New structural patterns, not explicitly known by the dispatcher, discovered Next step is to improve schedules Instance selection: learn from best practices Optimize the decision tree 20th European Conference on Operational Research, July 4-7, 2004 43 Overview Background Intersection of OR and Data Mining Optimization algorithms used for data mining Data mining used in OR applications Production scheduling Optimization methods applied to output of standard data mining algorithms Data visualization Attribute selection Classification Unsupervised learning Selecting and improving decision trees Open research areas 20th European Conference on Operational Research, July 4-7, 2004 44 Optimizing Decision Trees Decision tree induction is often unstable Genetic algorithms have been used to select the best tree from a set of trees Kennedy et al. (1997) encode decision trees and define crossover and mutation operators The accuracy of the tree is the fitness function A series of papers by Fu, Golden, et al. (2003; 2004a; 2004b) builds further on this approach Other optimization methods could also apply and other outputs can be optimized 20th European Conference on Operational Research, July 4-7, 2004 45 Overview Background Intersection of OR and Data Mining Optimization algorithms used for data mining Data mining used in OR applications Production scheduling Optimization methods applied to output of standard data mining algorithms Data visualization Attribute selection Classification Unsupervised learning Selecting and improving decision trees Open research areas 20th European Conference on Operational Research, July 4-7, 2004 46 Conclusions Although data mining related optimization work dates back to the 1960s, most problems are still open or need more research Need to be aware of the key concerns of data mining: extracting meaningful, previously unknown patterns or knowledge from large databases Algorithms should handle massive data sets, that is, be scalable with respect to both time and memory use Results often focus on simple to interpret meaningful patterns that provide structural insights Previously unknown means few modeling assumptions that restrict what can be discovered 20th European Conference on Operational Research, July 4-7, 2004 47 Open Problems Many data mining problems can be formulated as optimization problems Seen numerous examples, e.g., classification and attribute selection (most work for these problems) Many areas have not been addressed or need more work (in particular, clustering and association rule mining) Optimizing model outputs is very promising Use of data mining in OR applications has been very little investigated Supply chain management Logistics and transportation Planning and scheduling 20th European Conference on Operational Research, July 4-7, 2004 48 Questions? For more information after today: Email me at [email protected] Visit my homepage at http://www.public.iastate.edu/~olafsson Consult Dilbert 20th European Conference on Operational Research, July 4-7, 2004 49 Select References The following surveys on optimization and data mining are available: 1. 2. Padmanabhan, B. and A. Tuzhilin (2003). “On the Use of Optimization for Data Mining: Theoretical Interactions and eCRM Opportunities,” Management Science 49: 1327-1343. Bradley, P.S., U.M. Fayyad, and O.L. Mangasarian (1999). “Mathematical Programming for Data Mining: Formulations and Challenges,” INFORMS Journal of Computing 11: 217-238. Work mentioned in presentation: 3. 4. 5. 6. 7. 8. 9. 10. 11. Abbiw-Jackson, B. Golden, S. Raghavan, and E. Wasil (2004). “A Divide-and-Conquer Local Search Heuristic for Data Visualization,” Working Paper, University of Maryland. Boros, E. P.L. Hammer, T. Ibaraki, A. Kogan (1997). “Logical Analysis of Numerical Data,” Mathematical Programming 79: 163-190. Bradley, P.S., O.L. Mangasarian, and W.N. Street (1997). “Clustering via Concave Minimization,” in M.C. Mozer, M.I. Jordan, T. Petsche (eds.) Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA. Bradley, P.S., O.L. Mangasarian, and W.N. Street (1998). “Feature Selection via Mathematical Programming,” INFORMS Journal of Computing 10: 209-217. Fu, Z., B. Golden, S. Lele, S. Raghavan, and E. Wasil (2003). “A Genetic Algorithm-Based Approach for Building Accurate Decision Trees,” INFORMS Journal of Computing 15: 3-22. Kennedy, H., C. Chinniah, P. Bradbeer, and L. Morss (1997). “The Construction and Evaluation of Decision Trees: A Comparison of Evolutionary and Concept Learning Methods,” in D. Corne and J.L. Shapiro (eds.) Evolutionary Computing, Lecture Notes in Computer Science, Springer-Verlag, 147-161. Li, X. and S. Olafsson (2004). “Discovering Dispatching Rules using Data Mining,” Journal of Scheduling, to appear. Mangasarian, O.L. (1965). “Linear and Nonlinear Separation of Patterns by Linear Programming,” Operations Research 13: 455-461. Olafsson, S. and J. Yang (2004). “Intelligent Partitioning for Feature Selection,” INFORMS Journal on Computing, to appear. 20th European Conference on Operational Research, July 4-7, 2004 50