Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Applications of Artificial Intelligence Data Mining 8 The Problem Data Mining Knowledge Discovery Data Mining Methods Knowledge Discovery Methods Slide 1 ©J.K. Debenham, 2003 Applications of Artificial Intelligence The Problem • How do we identify useful patterns in large amounts of noisy, incomplete raw data? Slide 2 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Remember • No inductive technique can produce a definitive “answer”. • No inductive technique can make a firm decision. Slide 3 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data Mining The Problem 8 Data Mining Knowledge Discovery Data Mining Methods Knowledge Discovery Methods Slide 4 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Growth databases has far outpaced our ability to interpret this data Need for a new generation of tools and techniques for automated and intelligent database analysis These tools and techniques are the subject of the rapidly emerging field of knowledge discovery in databases (KDD) Slide 5 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Advances in data collection: from remote sensors widespread use of bar codes computerisation of transactions have generated a flood of data. Slide 6 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Advances in data storage technology: faster, higher capacity, and cheaper storage devices better database management systems, and data warehousing technology have allowed us to transform this data deluge into “mountains” of stored data. Slide 7 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Eg Wal-Mart handles over 20 million transactions a day Mobil Oil Corporation, is developing a data warehouse capable of storing over 100 terabytes of data related to oil exploration NASA Earth Observing System (EOS) projected to generate the order of 50 gigabytes of data per hour when operational in the late 1990s Slide 8 ©J.K. Debenham, 2003 Applications of Artificial Intelligence History First International Conference on Knowledge Discovery and Data Mining (1995) Slide 9 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Finding useful patterns (or nuggets of knowledge) in raw data: knowledge discovery in databases data mining knowledge extraction information discovery information harvesting data archaeology data pattern processing Slide 10 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Knowledge Discovery in Databases (KDD) coined in 1989 refers to the broad process of finding knowledge in data emphasises the “high-level” application of particular data mining methods used by artificial intelligence and machine learning researchers Slide 11 ©J.K. Debenham, 2003 Applications of Artificial Intelligence The term Data Mining has been commonly used by statisticians, data analysts and the MIS community “fishing” or “dredging,” and sometimes a “mining,” can be a dangerous activity in that invalid patterns can be discovered without proper interpretation Slide 12 ©J.K. Debenham, 2003 Applications of Artificial Intelligence KDD overall process of discovering useful knowledge from data = data mining + additional steps to ensure that useful information (knowledge) is derived from the data typically interactive and iterative, involving the repeated application of data mining and the interpretation of the patterns generated Slide 13 ©J.K. Debenham, 2003 Applications of Artificial Intelligence KDD related to machine learning, pattern recognition, databases, statistics, artificial intelligence, knowledge acquisition for expert systems, and data visualisation Slide 14 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data warehousing MIS trend for collecting and cleaning transactional data and making them available on-line OLAP (on-line analytical processing) provides multi-dimensional data analysis, which is superior to SQL KDD and OLAP: new generation of intelligent information extraction and management tools Slide 15 ©J.K. Debenham, 2003 Applications of Artificial Intelligence A simple data set with 2 classes Slide 16 ©J.K. Debenham, 2003 Applications of Artificial Intelligence horizontal axis represents the income of the person vertical axis represents the total personal debt of the person x’s represent persons who have defaulted on their loans, o’s represent persons whose loans are in good status. Slide 17 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data Mining The Problem Data Mining 8 Knowledge Discovery Data Mining Methods Knowledge Discovery Methods Slide 18 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Knowledge discovery in databases the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. Slide 19 ©J.K. Debenham, 2003 Applications of Artificial Intelligence • Data: a set of facts F (eg., cases in a database). Eg: the 23 cases. • Pattern: is an expression E in a language L describing facts in a subset FE of F. E is called a pattern if it is simpler than the enumeration of all facts in FE. Eg the pattern: “If income < $t, then person has defaulted on the loan”. Slide 20 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Single threshold on income variable to try to classify the loan data set Slide 21 ©J.K. Debenham, 2003 Applications of Artificial Intelligence • Process: KDD is a multi-step process: data preparation, search for patterns, knowledge evaluation, and refinement involving iteration after modification. • Non-trivial: to have some degree of search autonomy. Eg: computing the mean income of persons does not qualify as discovery. • Validity: Discovered patterns should be valid on new data with some degree of certainty. Slide 22 ©J.K. Debenham, 2003 Applications of Artificial Intelligence • A measure of certainty is a function C mapping expressions in L to an ordered measurement space MC. An expression E ∈ L about a subset FE # F can be assigned a certainty measure c = C(E, F). –Eg, if the boundary for the pattern is moved to the right its certainty measure would drop since more good loans would be admitted into the shaded region (no loan). Slide 23 ©J.K. Debenham, 2003 Applications of Artificial Intelligence • Novel: Novelty can be measured with respect to changes in data (by comparing current values to previous or expected values) or knowledge (how a new finding is related to old ones). • Potentially Useful: The patterns should potentially lead to some useful actions. Eg: the expected increase in profits to the bank (in dollars) associated with the decision rule. Slide 24 ©J.K. Debenham, 2003 Applications of Artificial Intelligence • Utility function: U maps expressions in L to a partially or totally ordered measure space MU: hence, u = U(E, F). • Ultimately Understandable: A goal of KDD is to make patterns understandable to facilitate a better understanding of the underlying data. • Simplicity measure: a function S mapping expressions E ∈ L to a partially or totally ordered measure space MS: hence, s = S(E, F). Slide 25 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Interestingness: overall measure of pattern value, combining validity, novelty, usefulness, and simplicity Some KDD systems have an explicit interestingness function i = I(E,F,C,N,U,S) which maps expressions in L to a measure space MI. Other systems define interestingness indirectly via an ordering of the discovered patterns. Slide 26 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Knowledge: A pattern E in L is called knowledge if for some userspecified threshold i ∈ MI, I(E,F,C,N,U,S) > i. by no means absolute. purely user-oriented, and determined by whatever functions and thresholds the user chooses I is “interestingness”, E is “pattern”, F is “data”, C is “validity”, N is “novelty”, U is “utility”, S is “simplicity” Slide 27 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data Mining is a step in the KDD process consisting of particular data mining algorithms that, under some acceptable computational efficiency limitations, produces a particular enumeration of patterns Ej over F. Slide 28 ©J.K. Debenham, 2003 Applications of Artificial Intelligence KDD Process is the process of using data mining methods (algorithms) to extract (identify) knowledge according to the specifications of measures and thresholds, using the database F along with any required preprocessing, sub-sampling, and transformations of F. Slide 29 ©J.K. Debenham, 2003 Applications of Artificial Intelligence • The data mining component of the KDD process is mainly concerned with means by which patterns are extracted and enumerated from the data. • Knowledge discovery –t h e evaluation and possibly interpretation of the patterns to make the decision of what constitutes knowledge and what does not. –the choice of encoding schemes, preprocessing, sampling, and projections of the data prior to data mining. Slide 30 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Overview of KDD process. Slide 31 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Steps in KDD process: 1 Developing an understanding of the application domain, the relevant prior knowledge, and the goals of the end-user. 2 Creating a target data set: selecting a data set, or focusing on a subset of variables or data samples, on which discovery is to be performed. Slide 32 ©J.K. Debenham, 2003 Applications of Artificial Intelligence 3 Data cleaning and pre-processing: removal of noise and outliers, collecting the necessary information to account for noise, strategies for missing data fields, accounting for time and other changes. 4 Data reduction and projection: representing the data depending on the goal of the task. Dimensionality reduction to reduce the effective number of variables under consideration. Slide 33 ©J.K. Debenham, 2003 Applications of Artificial Intelligence 5 Choose the data mining task: is goal of the KDD process: classification, regression, clustering, etc? 6 Choose the data mining algorithm(s): select method(s) to be used to search for patterns. Decide which models and parameters are appropriate. Match data mining method with the overall criteria of the KDD process –(eg. user may be more interested in understanding the model than its predictive capabilities). Slide 34 ©J.K. Debenham, 2003 Applications of Artificial Intelligence 7 Data mining: search for patterns, classify rules or trees, regression, clustering, etc. 8 Interpret mined patterns. 9 Consolidate discovered knowledge: incorporate knowledge into the performance system, or simply document it. Includes checking for and resolving potential conflicts with previously believed (or extracted) knowledge. Slide 35 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data Mining The Problem Data Mining Knowledge Discovery 8 Data Mining Methods Knowledge Discovery Methods Slide 36 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data mining component of the KDD process –iterative application of particular data mining methods Terms –model f(x) = αx2 + βx is a model. –pattern is an instantiation of a model, eg., f(x) = 3x2 + x is a pattern Slide 37 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data mining –fitting models to, –or determining patterns from, observed data. Fitted models are inferred knowledge: –whether or not the models reflect useful or interesting knowledge is part of the overall, interactive KDD process where subjective human judgment is usually required. Slide 38 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Two primary formalisms for model fitting –statistical approach allows for nondeterministic effects (for example, f(x) = αx + e, where e could be a Gaussian random variable) –logical approach is purely deterministic [f(x) = αx] does not admit uncertainty Focus on the statistical/probabilistic approach –the most widely-used in practice Slide 39 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data mining methods are based on –machine learning, –pattern recognition, –statistics: classification, clustering, graphical models, etc. bewildering array of different algorithms Consist of three primary components: –model representation –model evaluation, and –search Slide 40 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Two “high-level” goals of data mining: –Prediction: involves using some variables or fields in the database to predict unknown or future values of other variables of interest. –Description: focuses on finding human-interpretable patterns describing the data. in KDD, description tends to be more important than prediction Slide 41 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Primary Data Mining tasks: • Classification • Regression • Clustering • Summarisation • Dependency Modelling • Change and Deviation Detection Slide 42 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Classification –learning a function that classifies a data item into one of several predefined classes. Eg: classifying trends in financial markets and automated identification of objects of interest in large image databases Eg: partition loan data into two class regions (it is not possible to separate the classes perfectly using a linear decision boundary.) Slide 43 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Linear classification boundary Slide 44 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Regression –learning a function which maps a data item to a prediction variable Eg: estimating the probability that a patient will die given test results, predicting consumer demand for a new product as a function of advertising expenditure, “total debt” is fitted as a linear function of “income”: –the fit is poor since there is only a weak correlation between the two variables. Slide 45 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Regression for the loan data set Slide 46 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Clustering –identify finite set of categories or clusters to describe the data Categories may be mutually exclusive and exhaustive, or otherwise Eg: discover sub-populations for consumers in marketing databases, loan data set into 3 clusters –clusters overlap, data points belong to more than one cluster. Slide 47 ©J.K. Debenham, 2003 Applications of Artificial Intelligence – Labels replaced by +’s because class membership is no longer assumed known Slide 48 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Probability density estimation • closely related to clustering • consists of techniques for estimating the joint multivariate probability density function of all of the variables/ fields in the database Slide 49 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Summarisation –methods for finding a compact description for a subset of data Eg: tabulating the mean and standard deviations for all fields, the derivation of summary rules, multivariate visualisation techniques, and the discovery of functional relationships between variables. often applied to interactive exploratory data analysis and automated report generation Slide 50 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Dependency Modelling –finding model which describes dependencies between variables Two level of models: –the structural level »specifies (often in graphical form) which variables are locally dependent on each other –the quantitative level »specifies the strengths of the dependencies Slide 51 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Dependency Modelling contd Eg: Probabilistic dependency networks –use conditional independence to specify the structure of the model and probabilities to specify the strengths of the dependencies. –increasingly finding applications »Eg: the development of probabilistic medical expert systems from databases, information retrieval, and modelling of the human genome. Slide 52 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Change and Deviation Detection –focuses on discovering the most significant changes in the data from previously measured or normative values. Slide 53 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Components of Data Mining Algorithms Three primary components in any data mining algorithm: –model representation, –model evaluation, –search. Slide 54 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Model Representation –the language L for describing patterns If representation is too limited, then no amount of training will produce an accurate model for the data – Eg: a decision tree representation, using univariate (single-field) node-splits, partitions the input space into hyper-planes which are parallel to the attribute axes. Such a decisiontree method cannot discover from data the formula x = y no matter how much training data it is given. Slide 55 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Model Representation contd So it is important to understand the representational assumptions inherent in the method. More powerful representational power for models increases the danger of over-fitting the training data resulting in reduced prediction accuracy on unseen data. In addition the search becomes much more complex and interpretation of the model is typically more difficult. Slide 56 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Model Evaluation estimates how well a particular pattern meets the criteria of the KDD process Evaluation of predictive accuracy (validity) is based on cross validation Evaluation of descriptive quality involves predictive accuracy, novelty, utility, and understandability Both logical and statistical criteria can be used for model evaluation. Eg: the maximum likelihood principle chooses the parameters for the model which yield the best fit to the training data. Slide 57 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Search Method Two components –Parameter Search –Model Search • Parameter search: for parameters which optimise the model evaluation criteria given data and a fixed representation. –Eg: For general models: greedy iterative methods are commonly used, eg., the gradient descent method of backpropagation for neural networks. Slide 58 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Search Method contd •Model Search • a loop over the parameter search method • For each specific model representation, the parameter search method is instantiated to evaluate the quality of that particular model • use heuristic search techniques due to size of the space. Slide 59 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data mining methods –Decision Trees and Rules –Non-linear Regression and Classification Methods –Example-based Methods –Probabilistic Graphical Dependency Models –Relational Learning Models Slide 60 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Decision Trees and Rules that use univariate splits have a simple representational form, making the inferred model relatively easy to comprehend by the user restriction to a particular tree or rule representation can significantly restrict the functional form (and thus the approximation power) of the model Slide 61 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Threshold “split” severely limits the type of classification boundaries Slide 62 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Solution enlarge the model space to allow more general expressions (such as multi-variate hyper-planes at arbitrary angles) this makes the model more powerful for prediction but more difficult to comprehend. Slide 63 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Non-linear Regression and Classification Methods family of techniques for prediction which fit linear and non-linear combinations of basis functions (sigmoids, splines, polynomials) to combinations of the input variables Eg: feedforward neural networks, adaptive spline methods, projection pursuit regression, etc. Slide 64 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Non-linear decision boundary which a neural network might find for the loan data Slide 65 ©J.K. Debenham, 2003 Applications of Artificial Intelligence But while the classification boundaries of previoous example may be more accurate than the simple linear threshold boundary, the linear boundary has the advantage that the model can be expressed as a simple rule of the form “if income is greater than threshold t then loan will have good status” to some degree of certainty. Slide 66 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Example-based Methods The representation is simple: use representative examples from the databases to approximate a model, ie., predictions on new examples are derived from the properties of “similar” examples in the model whose prediction is known. Techniques include nearest-neighbour classification and regression algorithms and case-based reasoning systems. Slide 67 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Nearest neighbour classifier: the class at any new point in the 2dimensional space is the same as the class of the closest point in the original training data set method requires a well-defined distance metric for evaluating the distance between data points Related techniques: kernel density estimation, and mixture modelling. Slide 68 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Nearest neighbour classifier for the loan data set Slide 69 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Probabilistic Graphical Dependency Models Graphical models specify the probabilistic dependencies using a graph structure. In its simplest form, the model specifies which variables are directly dependent on each other. –recent work on methods whereby both the structure and parameters of graphical models can be learned from databases directly Slide 70 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Relational Learning Models Relational learning (= inductive logic programming) uses the more flexible pattern language of first-order logic. can easily find formulas such as X=Y. Most research on relational learning is logical in nature. The extra representational power of relational models comes at the price of significant computational demands in terms of search. Slide 71 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Two important points: • Automated Search: focus mainly on automated methods for extracting patterns and/or models from data. • Beware the Hype: automated methods in data mining are still in a fairly early stage of development. There are no established criteria for deciding which methods to use in which circumstances and many of the approaches are based on crude heuristic approximations. Slide 72 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data Mining The Problem Data Mining Knowledge Discovery Data Mining Methods 8 Slide 73 Knowledge Discovery Methods ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data analyst – in response to a goal, queries a database to extract data – “analyses” the data using data analysis and/or visualisation tools – hence “insight” about the data – use presentation tools to disseminate this insight to a broader audience Slide 74 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Analyst’s task: Slide 75 ©J.K. Debenham, 2003 Applications of Artificial Intelligence For example: “What are the factors that lead to a successful Father’s Day promotion?” extract data such as sales volume of products sold during a specific Father’s Day period; characteristics of these products; and characteristics of the promotion itself, such as price discount, amount of in-store support, advertising support etc. Slide 76 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Example contd analyse –define a measure that could be used to quantify achievement of the goal eg “percentage increase in sales.” –segment the products based on this percentage sales increase measure –investigate the characteristics of products with relatively higher sales increases –contrast their attributes with those with relatively lower sales increases. Slide 77 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Example contd: Visualise the data in each segment –products were evenly distributed across the segments –characteristics of products within a segment varied significantly hence probably not something intrinsic about the type of product that made it a successful seller Slide 78 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Example contd: conjecture concerning some extrinsic properties of the products analysis of the correlation of some specific attributes with the percentage sales increase could lead to discovery that price discount, in-store promotional support, and advertising support were all higher in the segments with higher percentage sales increases Slide 79 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Example Analyst was involved in three main tasks: –(1) model selection and evolution, –(2) data analysis, and –(3) output generation. Slide 80 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Complex Aspects of the KDD Process –Task Discovery –Data Discovery –Data Cleaning –Background Knowledge Slide 81 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Task Discovery the client will state the problem or goal as if it were clear and focused, but further investigation is always warranted the requirements for the task must be engineered by spending time with the customer and various parts of the customer’s organisation Slide 82 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data Discovery spend time sifting through the raw data, just getting a feel for what the data looks like and what ground it covers and what it does not cover the main goal of KDD cannot be pinned down completely without a detailed understanding of the structure, coverage, and quality of the data Slide 83 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Data Cleaning client’s data virtually always has problems it may have been collected in an ad hoc manner, have unfilled fields in records, have mistakes in data entry so KDD process cannot succeed without a serious effort to clean the data without the previous data discovery phase, the analyst will have no idea if the data quality can support the task at all Slide 84 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Background Knowledge background knowledge resident only in the mind of the expert some analysis techniques take advantage of formally-represented knowledge in the course of fitting data to a model –ReMind uses a qualitative model in generating a decision treeT –time-series forecasting techniques can take advantage of explicit representation of seasonality Slide 85 ©J.K. Debenham, 2003 Applications of Artificial Intelligence Complete KDD process Slide 86 ©J.K. Debenham, 2003