Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Agenda • • • • • Introduction 2 Data Mining Opas Wongtaweesap (Aj’OaT) Intelligent Information Systems Development and Research Laboratory Centre Motivation Motivation: Why data mining? What is data mining? Data Mining: On what kind of data? Data mining functionality Classification of data mining systems • “Necessity is the Mother of Invention” • Data explosion problem • Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories. URL:: http://www.facebook.com/OaTCoM, http://twitter.com/OaTCoM E-mail:: [email protected], [email protected] Data Mining & WEKA Evolution of Database Technology 2 Motivation (Con’t) • 1960s: • 1970s: – Relational data model, relational DBMS implementation • Solution: Data warehousing and data mining – Data warehousing and on-line analytical processing – Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases • 1980s: – RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.) Input Data Data Mining & WEKA Hidden Information Patterns or Knowledge Raw Information – Data mining and data warehousing, multimedia databases, and Web databases 4 Output (Information) Process • 1990s—2000s: Data Mining & WEKA What is Data Mining? 5 What is not Data Mining? • Data mining (knowledge discovery in databases): • • • • – Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases Query Processing. Expert Systems. Small Machine Learning. Statistical Programs. Database Technology Information Science Data Mining & WEKA 6 Statistics Machine Learning – Knowledge discovery in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence etc. 7 Data Mining & WEKA Confluence of Multiple Disciplines • Alternative names: Data Mining & WEKA 3 Motivation (Con’t) • Rutherford D. Rogers. • We are drowning in data, but starving for knowledge! – Data collection, database creation, IMS and network DBMS Data Mining & WEKA 8 Visualization Other Disciplines Data Mining & WEKA 9 Why Data Mining? Potential Applications • Database analysis and decision support Potential Applications Forecasting • http://www.tops.co.th/crm/index-crm-th.html – Market analysis and management • Target Marketing, Market Basket Analysis. • Customer Relation Management (CRM). • Cross Selling, Market Segmentation – Risk analysis and management • Forecasting, Customer Retention. • Improved Underwriting, Quality Control. • Competitive Analysis – Fraud detection and management Data Mining & WEKA 10 Why Data Mining? Potential Applications Data Mining & WEKA 11 Data Mining: A KDD Process Data Mining & WEKA 12 Steps of a KDD Process • Other Applications • Learning the application domain: – relevant prior knowledge and goals of application – Text Mining (NEWS Group, Email, Documents) and Web Analysis. – Intelligent query answering. – Medical outcomes analysis, Astronomy. – Sports scouting. • Creating a target data set: data selection • Data cleaning and preprocessing: (may take 60% of effort!) • Data reduction and transformation: – Find useful features, dimensionality/variable reduction, invariant representation. Data Mining & WEKA 13 Steps of a KDD Process 14 Data Mining and BI • Choosing functions of data mining – summarization, classification, regression, association, clustering. • Choosing the mining algorithm(s) • Data mining: search for patterns of interest • Pattern evaluation and knowledge presentation Increasing potential to support business decisions Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Mining & WEKA 15 DM System Architecture End User Business Analyst Data Analyst Data Exploration Statistical Analysis, Querying and Reporting – visualization, transformation, removing redundant patterns, etc. • Use of discovered knowledge Data Mining & WEKA Data Mining & WEKA 16 Data Warehouses / Data Marts OLAP, MDA Data Sources Paper, Files, Information Database Systems, OLTP Data Providers, Mining & WEKA DBA 17 Data Mining & WEKA 18 DM: On What Kind of Data? • • • • Spatial databases Time-series data Relational databases Data warehouses Transactional databases Advanced DB and information repositories – – – – – – Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases and multimedia databases Heterogeneous and legacy databases WWW Data Mining & WEKA 19 Data Mining & WEKA 20 Data Mining Functionalities I • Classification and Prediction – Finding models (functions) that describe and distinguish classes or concepts for future prediction – E.g., classify countries based on climate, or classify cars based on gas mileage – Presentation: decision-tree, classification rule, neural network – Prediction: Predict some unknown or missing numerical values – Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions • Association (correlation and causality) – Multi-dimensional vs. single-dimensional association – age(X, “20..29”) ^ income(X, “20..29K”) buys(X, “PC”) [support = 2%, confidence = 60%] – contains(T, “computer”) contains(x, “software”) [1%, 75%] 22 Data Mining & WEKA 23 Data Mining Functionalities III Data Mining Functionalities IV • Cluster analysis • Outlier analysis – Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns – Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity Data Mining & WEKA 21 Data Mining Functionalities II • Concept description: Characterization and discrimination Data Mining & WEKA Data Mining & WEKA Data Mining & WEKA 24 Outlier Analysis – Outlier: a data object that does not comply with the general behavior of the data – It can be considered as noise or exception but is quite useful in fraud detection, rare events analysis • Trend and evolution analysis – Trend and deviation: regression analysis – Sequential pattern mining, periodicity analysis – Similarity-based analysis • Other pattern-directed or statistical analyses 25 Data Mining & WEKA 26 Data Mining & WEKA 27 Outlier Analysis (Con’t) DM: Classification Schemes Data Mining Output Model • General functionality • Data Mining functionalities are used to specify the kind of patterns to be found in mining tasks. – Descriptive data mining – Predictive data mining – Descriptive data mining / Unsupervised Model • Characterize general properties of data in DB e.g. Association rules mining. • Different views, different classifications – – – – Data Mining & WEKA 28 A Multi-Dimensional View of Data Mining Classification Data Mining & WEKA – Predictive data mining / Supervised Model • Perform inference on current data in order to make predictions. – Prediction Model. – Classification Model. 29 A Multi-Dimensional View of Data Mining Classification • Databases to be mined Data Mining & WEKA 30 Data Mining: A KDD Process • Techniques utilized – Relational, transactional, object-oriented, objectrelational, active, spatial, time-series, text, multimedia, heterogeneous, legacy, WWW, etc. – Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc. • Applications adapted • Knowledge to be mined – Retail, telecommunication, banking, fraud analysis, DNA mining (Bioinformatic), stock market analysis, Web mining, Weblog analysis, etc. – Characterization, discrimination, association, classification, clustering, trend, deviation and outlier analysis, etc. – Multiple/integrated functions and mining at multiple levels Data Mining & WEKA Kinds of databases to be mined Kinds of knowledge to be discovered Kinds of techniques utilized Kinds of applications adapted 31 Data Mining & WEKA 32 Data Mining & WEKA 33