Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Outline DME Introduction Data Mining and Exploration Amos Storkey 1 Overview 2 What is Data Mining? 3 Examples 4 History of Data Mining 5 Data Science School of Informatics, University of Edinburgh Amos Storkey — DME Introduction 1/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview 2 What is Data Mining? 3 Examples 4 History of Data Mining 5 Data Science Amos Storkey — DME Introduction 2/23 Overview What is Data Mining? Examples History of Data Mining Data Science Outline 1 Amos Storkey — DME Introduction Outline 3/23 Amos Storkey — DME Introduction 4/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Data Mining and Exploration: Introduction Data Mining and Exploration http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Please sign up on nb.mit.edu by following the link in your email. Welcome Administration Books (Hand Mannila and Smyth) Mini Project Paper presentations Lab classes These lecture slides are based extensively on previous versions of the course written by Chris Williams. Amos Storkey — DME Introduction 5/23 Overview What is Data Mining? Examples History of Data Mining Data Science Amos Storkey — DME Introduction Data Mining and Exploration Data Mining and Exploration Course Introduction Course Introduction Welcome Welcome Administration Administration Books (Hand Mannila and Smyth) Mini Project Paper presentations Lab classes Books (Hand Mannila and Smyth) Mini Project Paper presentations Lab classes Amos Storkey — DME Introduction 6/23 Overview What is Data Mining? Examples History of Data Mining Data Science 6/23 Amos Storkey — DME Introduction 6/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Data Mining and Exploration Data Mining and Exploration Course Introduction Course Introduction Welcome Welcome Administration Administration Books (Hand Mannila and Smyth) Mini Project Paper presentations Lab classes Amos Storkey — DME Introduction Books (Hand Mannila and Smyth) Mini Project Paper presentations Lab classes 6/23 Overview What is Data Mining? Examples History of Data Mining Data Science Amos Storkey — DME Introduction Data Mining and Exploration Data Mining and Exploration Course Introduction Course Introduction Welcome Welcome Administration Administration Books (Hand Mannila and Smyth) Mini Project Paper presentations Lab classes Amos Storkey — DME Introduction 6/23 Overview What is Data Mining? Examples History of Data Mining Data Science Books (Hand Mannila and Smyth) Mini Project Paper presentations Lab classes 6/23 Amos Storkey — DME Introduction 6/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Overview Overview Relationships between courses Relationships between courses What is data mining? What is data mining? Example applications Example applications Data mining and KDD (Knowledge Discovery in Databases) Data mining and KDD (Knowledge Discovery in Databases) Models and patterns Models and patterns Data mining tasks Data mining tasks Components of data mining algorithms Components of data mining algorithms Issues in data mining Issues in data mining Amos Storkey — DME Introduction 7/23 Overview What is Data Mining? Examples History of Data Mining Data Science Amos Storkey — DME Introduction Overview Overview Relationships between courses Relationships between courses What is data mining? What is data mining? Example applications Example applications Data mining and KDD (Knowledge Discovery in Databases) Data mining and KDD (Knowledge Discovery in Databases) Models and patterns Models and patterns Data mining tasks Data mining tasks Components of data mining algorithms Components of data mining algorithms Issues in data mining Issues in data mining Amos Storkey — DME Introduction 7/23 Overview What is Data Mining? Examples History of Data Mining Data Science 7/23 Amos Storkey — DME Introduction 7/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Overview Overview Relationships between courses Relationships between courses What is data mining? What is data mining? Example applications Example applications Data mining and KDD (Knowledge Discovery in Databases) Data mining and KDD (Knowledge Discovery in Databases) Models and patterns Models and patterns Data mining tasks Data mining tasks Components of data mining algorithms Components of data mining algorithms Issues in data mining Issues in data mining Amos Storkey — DME Introduction 7/23 Overview What is Data Mining? Examples History of Data Mining Data Science Amos Storkey — DME Introduction Overview Overview Relationships between courses Relationships between courses What is data mining? What is data mining? Example applications Example applications Data mining and KDD (Knowledge Discovery in Databases) Data mining and KDD (Knowledge Discovery in Databases) Models and patterns Models and patterns Data mining tasks Data mining tasks Components of data mining algorithms Components of data mining algorithms Issues in data mining Issues in data mining Amos Storkey — DME Introduction 7/23 Overview What is Data Mining? Examples History of Data Mining Data Science 7/23 Amos Storkey — DME Introduction 7/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Overview Relationships between courses PMR Probabilistic modelling and reasoning. Learning and inference for probabilistic models. Relationships between courses What is data mining? IAML Introductory Applied Machine Learning. Basic introductory course on supervised and unsupervised learning. Example applications Data mining and KDD (Knowledge Discovery in Databases) MLPR Machine Learning and Pattern Recognition. More detailed course on Bayesian Machine Learning. Models and patterns RL Reinforcement Learning. Apologies - this course is not running this year. Data mining tasks Components of data mining algorithms DME Develops ideas from MLPR, IAML, PMR to deal with real-world data sets. Also data visualization and new techniques. Issues in data mining Amos Storkey — DME Introduction 7/23 Overview What is Data Mining? Examples History of Data Mining Data Science Amos Storkey — DME Introduction 8/23 Overview What is Data Mining? Examples History of Data Mining Data Science This course. What is data mining? Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. Hand, Mannila, Smyth Beginning to End of the machine learning and data mining process. We are drowning in information, but starving for knowledge! Naisbett [Data mining is the] extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases. Han Amos Storkey — DME Introduction 9/23 Amos Storkey — DME Introduction 10/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Data mining: pejorative sense Example applications Historically data mining was used in a pejorative sense by statisticians for the idea that, if you search long enough, you can always find some model to fit your data arbitrarily well. Scientific SKICAT (Sky Image Cataloging and Analysis Tool) developed at JPL and Caltech. See http://www-aig.jpl.nasa.gov/public/mls/skicat/ skicat_home.html. Predict if object is a star or galaxy. Example: David Rhine, a ”parapsychologist” at Duke in the 1950’s tested students for ”extrasensory perception”, by asking them to guess 10 cards—red or black. He found about 1/1000 of them guessed all 10, and instead of realizing that that is what you would expect from random guessing, declared them to have ESP. When he retested them, he found they did no better than average. His conclusion: telling people they have ESP causes them to lose it! Quote from Jeffrey Ullman, Stanford Commercial Decision trees constructed from bank-loan histories to decide whether or not to grant a loan Amos Storkey — DME Introduction Marketing ”Diapers and beer”. Observation that customers who buy diapers are more likely to buy beer than average allowed supermarkets to place beer and diapers nearby, knowing that many customers would walk between them. Placing potato chips between increased sales of all three items Financial Predict price movements in order to make more lucrative investments 11/23 Overview What is Data Mining? Examples History of Data Mining Data Science Amos Storkey — DME Introduction 12/23 Overview What is Data Mining? Examples History of Data Mining Data Science Datamining and KDD CRISP-DM methodology Knowledge Discovery in Databases. Figure from Han and Kamber. Cross Industry Standard Process for Data Mining, http://www.crisp-dm.org/ Six Phases Evaluation and Knowledge Presentation Data Mining Business Understanding Data Understanding Patterns Data Preparation Selection and Transformation Modelling Data warehouse Evaluation Cleaning and Integration Deployment Databases Amos Storkey — DME Introduction Flat files 13/23 Amos Storkey — DME Introduction 14/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Data Mining: History Data Mining: Relationships to Other Fields Statistics Machine Learning 1989 IJCAI workshop on KDD (Piatetsky-Shapiro) Database technology 1991-1994 workshops on KDD Visualization 1996 Advances in Knowledge Discovery and Data Mining (eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy) ... Relationship of Machine Learning to Data Mining Machine Learning is concerned with making computers that learn things for themselves. 1995 onwards: International Conferences Data mining is more concerned with enabling humans to learn from data Amos Storkey — DME Introduction 15/23 Overview What is Data Mining? Examples History of Data Mining Data Science 16/23 Overview What is Data Mining? Examples History of Data Mining Data Science Data Science Models and Patterns A model structure is a global summary of the data set. Example: linear regression, makes a prediction for all input values In fact this course is really about Data Science Data Science is about integrating data driven enterprise in to a whole process of doing things. Pattern structures make statements only about restricted regions of the space spanned by the variables. Example: if X > x1 then prob(Y > y1) = p1 [ Equivalently prob(Y > y1|X > x1 ) = p1 ] Example: detection of outliers Data Science is about the skills of a practitioner. It recognises the need for people in the process. However Data Science is also about automation - do the right things efficiently. Amos Storkey — DME Introduction Amos Storkey — DME Introduction 17/23 Amos Storkey — DME Introduction 18/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Data Mining Tasks Data Mining Tasks Exploratory Data Analysis Descriptive Modelling Exploratory Data Analysis Descriptive Modelling Density estimation Cluster analysis/segmentation Density estimation Cluster analysis/segmentation Predictive Modelling: Classification and Regression Discovering Patterns and Rules Predictive Modelling: Classification and Regression Discovering Patterns and Rules Association rules Outlier detection Association rules Outlier detection Mining Complex Types of Data Mining Complex Types of Data Retrieval by Content (RBC) for text, images Time series and sequence data Spatial data Text mining Mining the WWW (content, structure, usage) Amos Storkey — DME Introduction Retrieval by Content (RBC) for text, images Time series and sequence data Spatial data Text mining Mining the WWW (content, structure, usage) 19/23 Overview What is Data Mining? Examples History of Data Mining Data Science 19/23 Overview What is Data Mining? Examples History of Data Mining Data Science Data Mining Tasks Data Mining Tasks Exploratory Data Analysis Descriptive Modelling Exploratory Data Analysis Descriptive Modelling Density estimation Cluster analysis/segmentation Density estimation Cluster analysis/segmentation Predictive Modelling: Classification and Regression Discovering Patterns and Rules Predictive Modelling: Classification and Regression Discovering Patterns and Rules Association rules Outlier detection Association rules Outlier detection Mining Complex Types of Data Mining Complex Types of Data Retrieval by Content (RBC) for text, images Time series and sequence data Spatial data Text mining Mining the WWW (content, structure, usage) Retrieval by Content (RBC) for text, images Time series and sequence data Spatial data Text mining Mining the WWW (content, structure, usage) Amos Storkey — DME Introduction Amos Storkey — DME Introduction 19/23 Amos Storkey — DME Introduction 19/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Data Mining Tasks Data Mining Tasks Exploratory Data Analysis Descriptive Modelling Exploratory Data Analysis Descriptive Modelling Density estimation Cluster analysis/segmentation Density estimation Cluster analysis/segmentation Predictive Modelling: Classification and Regression Discovering Patterns and Rules Predictive Modelling: Classification and Regression Discovering Patterns and Rules Association rules Outlier detection Association rules Outlier detection Mining Complex Types of Data Mining Complex Types of Data Retrieval by Content (RBC) for text, images Time series and sequence data Spatial data Text mining Mining the WWW (content, structure, usage) Amos Storkey — DME Introduction Retrieval by Content (RBC) for text, images Time series and sequence data Spatial data Text mining Mining the WWW (content, structure, usage) 19/23 Overview What is Data Mining? Examples History of Data Mining Data Science Amos Storkey — DME Introduction 19/23 Overview What is Data Mining? Examples History of Data Mining Data Science Components of Data Mining Algorithms Some Issues in Data Mining (based on list by Han) Headings • • • • • Task Structure of model or pattern Score function Optimization and search method Data Management Strategy Mining methodology and user interaction Example: Neural Network e.g. Incorporation of background knowledge e.g. Handling noise and incomplete data Regression Neural network function Squared error Gradient descent unspecified Performance and scalability Diversity of data types Handling relational and complex types of data Mining information from heterogeneous databases and WWW Ref: HMS chapter 1 Amos Storkey — DME Introduction Applications, social impacts 20/23 Amos Storkey — DME Introduction 21/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Some Issues in Data Mining Some Issues in Data Mining (based on list by Han) (based on list by Han) Mining methodology and user interaction Mining methodology and user interaction e.g. Incorporation of background knowledge e.g. Handling noise and incomplete data e.g. Incorporation of background knowledge e.g. Handling noise and incomplete data Performance and scalability Performance and scalability Diversity of data types Diversity of data types Handling relational and complex types of data Mining information from heterogeneous databases and WWW Handling relational and complex types of data Mining information from heterogeneous databases and WWW Applications, social impacts Applications, social impacts Amos Storkey — DME Introduction 21/23 Amos Storkey — DME Introduction Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Some Issues in Data Mining Some Issues in Data Mining (based on list by Han) (based on list by Han) Mining methodology and user interaction Mining methodology and user interaction e.g. Incorporation of background knowledge e.g. Handling noise and incomplete data e.g. Incorporation of background knowledge e.g. Handling noise and incomplete data Performance and scalability Performance and scalability Diversity of data types Diversity of data types Handling relational and complex types of data Mining information from heterogeneous databases and WWW Handling relational and complex types of data Mining information from heterogeneous databases and WWW Applications, social impacts Amos Storkey — DME Introduction 21/23 Applications, social impacts 21/23 Amos Storkey — DME Introduction 21/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Tentative Lecture Outline Tentative Lecture Outline Visualizing and Exploring Data Visualizing and Exploring Data Descriptive Data Modelling Descriptive Data Modelling Including hierarchical clustering Including hierarchical clustering Data Preprocessing Data Preprocessing Data cleaning Data integration and transformation Data reduction Data cleaning Data integration and transformation Data reduction Predictive Modelling Predictive Modelling Overview of regression and classification Decision trees Support Vector machines Performance evaluation Dealing with unbalanced classes Overview of regression and classification Decision trees Support Vector machines Performance evaluation Dealing with unbalanced classes Amos Storkey — DME Introduction 22/23 Overview What is Data Mining? Examples History of Data Mining Data Science Amos Storkey — DME Introduction Tentative Lecture Outline Tentative Lecture Outline Visualizing and Exploring Data Visualizing and Exploring Data Descriptive Data Modelling Descriptive Data Modelling Including hierarchical clustering Including hierarchical clustering Data Preprocessing Data Preprocessing Data cleaning Data integration and transformation Data reduction Data cleaning Data integration and transformation Data reduction Predictive Modelling Predictive Modelling Overview of regression and classification Decision trees Support Vector machines Performance evaluation Dealing with unbalanced classes Amos Storkey — DME Introduction 22/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview of regression and classification Decision trees Support Vector machines Performance evaluation Dealing with unbalanced classes 22/23 Amos Storkey — DME Introduction 22/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Tentative Lecture Outline Tentative Lecture Outline Visualizing and Exploring Data Descriptive Data Modelling Patterns Including hierarchical clustering A priori algorithm Data Preprocessing Mining Complex Data Data cleaning Data integration and transformation Data reduction Web mining: Page Rank (google) Retrieval by Content Text, time series, images Predictive Modelling Guest lectures. Overview of regression and classification Decision trees Support Vector machines Performance evaluation Dealing with unbalanced classes Amos Storkey — DME Introduction Paper presentations. 22/23 Overview What is Data Mining? Examples History of Data Mining Data Science Amos Storkey — DME Introduction Tentative Lecture Outline Tentative Lecture Outline Patterns Patterns A priori algorithm A priori algorithm Mining Complex Data Mining Complex Data Web mining: Page Rank (google) Retrieval by Content Text, time series, images Web mining: Page Rank (google) Retrieval by Content Text, time series, images Guest lectures. Guest lectures. Paper presentations. Paper presentations. Amos Storkey — DME Introduction 23/23 Overview What is Data Mining? Examples History of Data Mining Data Science 23/23 Amos Storkey — DME Introduction 23/23 Overview What is Data Mining? Examples History of Data Mining Data Science Overview What is Data Mining? Examples History of Data Mining Data Science Tentative Lecture Outline Tentative Lecture Outline Patterns Patterns A priori algorithm A priori algorithm Mining Complex Data Mining Complex Data Web mining: Page Rank (google) Retrieval by Content Text, time series, images Web mining: Page Rank (google) Retrieval by Content Text, time series, images Guest lectures. Guest lectures. Paper presentations. Paper presentations. Amos Storkey — DME Introduction 23/23 Amos Storkey — DME Introduction 23/23