Download DME Introduction Outline Outline Outline

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Outline
DME Introduction
Data Mining and Exploration
Amos Storkey
1
Overview
2
What is Data Mining?
3
Examples
4
History of Data Mining
5
Data Science
School of Informatics, University of Edinburgh
Amos Storkey — DME Introduction
1/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview
2
What is Data Mining?
3
Examples
4
History of Data Mining
5
Data Science
Amos Storkey — DME Introduction
2/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Outline
1
Amos Storkey — DME Introduction
Outline
3/23
Amos Storkey — DME Introduction
4/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Data Mining and Exploration: Introduction
Data Mining and Exploration
http://www.inf.ed.ac.uk/teaching/courses/dme/
Course Introduction
Please sign up on nb.mit.edu by following the link in your email.
Welcome
Administration
Books (Hand Mannila and Smyth)
Mini Project
Paper presentations
Lab classes
These lecture slides are based extensively on previous versions of
the course written by Chris Williams.
Amos Storkey — DME Introduction
5/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Amos Storkey — DME Introduction
Data Mining and Exploration
Data Mining and Exploration
Course Introduction
Course Introduction
Welcome
Welcome
Administration
Administration
Books (Hand Mannila and Smyth)
Mini Project
Paper presentations
Lab classes
Books (Hand Mannila and Smyth)
Mini Project
Paper presentations
Lab classes
Amos Storkey — DME Introduction
6/23
Overview What is Data Mining? Examples History of Data Mining Data Science
6/23
Amos Storkey — DME Introduction
6/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Data Mining and Exploration
Data Mining and Exploration
Course Introduction
Course Introduction
Welcome
Welcome
Administration
Administration
Books (Hand Mannila and Smyth)
Mini Project
Paper presentations
Lab classes
Amos Storkey — DME Introduction
Books (Hand Mannila and Smyth)
Mini Project
Paper presentations
Lab classes
6/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Amos Storkey — DME Introduction
Data Mining and Exploration
Data Mining and Exploration
Course Introduction
Course Introduction
Welcome
Welcome
Administration
Administration
Books (Hand Mannila and Smyth)
Mini Project
Paper presentations
Lab classes
Amos Storkey — DME Introduction
6/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Books (Hand Mannila and Smyth)
Mini Project
Paper presentations
Lab classes
6/23
Amos Storkey — DME Introduction
6/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview
Overview
Relationships between courses
Relationships between courses
What is data mining?
What is data mining?
Example applications
Example applications
Data mining and KDD (Knowledge Discovery in
Databases)
Data mining and KDD (Knowledge Discovery in
Databases)
Models and patterns
Models and patterns
Data mining tasks
Data mining tasks
Components of data mining algorithms
Components of data mining algorithms
Issues in data mining
Issues in data mining
Amos Storkey — DME Introduction
7/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Amos Storkey — DME Introduction
Overview
Overview
Relationships between courses
Relationships between courses
What is data mining?
What is data mining?
Example applications
Example applications
Data mining and KDD (Knowledge Discovery in
Databases)
Data mining and KDD (Knowledge Discovery in
Databases)
Models and patterns
Models and patterns
Data mining tasks
Data mining tasks
Components of data mining algorithms
Components of data mining algorithms
Issues in data mining
Issues in data mining
Amos Storkey — DME Introduction
7/23
Overview What is Data Mining? Examples History of Data Mining Data Science
7/23
Amos Storkey — DME Introduction
7/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview
Overview
Relationships between courses
Relationships between courses
What is data mining?
What is data mining?
Example applications
Example applications
Data mining and KDD (Knowledge Discovery in
Databases)
Data mining and KDD (Knowledge Discovery in
Databases)
Models and patterns
Models and patterns
Data mining tasks
Data mining tasks
Components of data mining algorithms
Components of data mining algorithms
Issues in data mining
Issues in data mining
Amos Storkey — DME Introduction
7/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Amos Storkey — DME Introduction
Overview
Overview
Relationships between courses
Relationships between courses
What is data mining?
What is data mining?
Example applications
Example applications
Data mining and KDD (Knowledge Discovery in
Databases)
Data mining and KDD (Knowledge Discovery in
Databases)
Models and patterns
Models and patterns
Data mining tasks
Data mining tasks
Components of data mining algorithms
Components of data mining algorithms
Issues in data mining
Issues in data mining
Amos Storkey — DME Introduction
7/23
Overview What is Data Mining? Examples History of Data Mining Data Science
7/23
Amos Storkey — DME Introduction
7/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview
Relationships between courses
PMR Probabilistic modelling and reasoning. Learning and
inference for probabilistic models.
Relationships between courses
What is data mining?
IAML Introductory Applied Machine Learning. Basic introductory
course on supervised and unsupervised learning.
Example applications
Data mining and KDD (Knowledge Discovery in
Databases)
MLPR Machine Learning and Pattern Recognition. More detailed
course on Bayesian Machine Learning.
Models and patterns
RL Reinforcement Learning. Apologies - this course is not
running this year.
Data mining tasks
Components of data mining algorithms
DME Develops ideas from MLPR, IAML, PMR to deal with
real-world data sets. Also data visualization and new
techniques.
Issues in data mining
Amos Storkey — DME Introduction
7/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Amos Storkey — DME Introduction
8/23
Overview What is Data Mining? Examples History of Data Mining Data Science
This course.
What is data mining?
Data mining is the analysis of (often large)
observational data sets to find unsuspected
relationships and to summarize the data in novel ways
that are both understandable and useful to the data
owner. Hand, Mannila, Smyth
Beginning to End of the machine learning and data mining
process.
We are drowning in information, but starving for
knowledge! Naisbett
[Data mining is the] extraction of interesting
(non-trivial, implicit, previously unknown and
potentially useful) information or patterns from data in
large databases. Han
Amos Storkey — DME Introduction
9/23
Amos Storkey — DME Introduction
10/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Data mining: pejorative sense
Example applications
Historically data mining was used in a pejorative sense by
statisticians for the idea that, if you search long enough,
you can always find some model to fit your data arbitrarily
well.
Scientific SKICAT (Sky Image Cataloging and Analysis
Tool) developed at JPL and Caltech. See
http://www-aig.jpl.nasa.gov/public/mls/skicat/
skicat_home.html. Predict if object is a star or galaxy.
Example: David Rhine, a ”parapsychologist” at Duke in the
1950’s tested students for ”extrasensory perception”, by
asking them to guess 10 cards—red or black. He found
about 1/1000 of them guessed all 10, and instead of
realizing that that is what you would expect from random
guessing, declared them to have ESP. When he retested
them, he found they did no better than average. His
conclusion: telling people they have ESP causes them to
lose it!
Quote from Jeffrey Ullman, Stanford
Commercial Decision trees constructed from bank-loan
histories to decide whether or not to grant a loan
Amos Storkey — DME Introduction
Marketing ”Diapers and beer”. Observation that
customers who buy diapers are more likely to buy beer
than average allowed supermarkets to place beer and
diapers nearby, knowing that many customers would walk
between them. Placing potato chips between increased
sales of all three items
Financial Predict price movements in order to make more
lucrative investments
11/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Amos Storkey — DME Introduction
12/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Datamining and KDD
CRISP-DM methodology
Knowledge Discovery in Databases. Figure from Han and
Kamber.
Cross Industry Standard Process for Data Mining,
http://www.crisp-dm.org/
Six Phases
Evaluation and Knowledge
Presentation
Data Mining
Business Understanding
Data Understanding
Patterns
Data Preparation
Selection and
Transformation
Modelling
Data
warehouse
Evaluation
Cleaning and
Integration
Deployment
Databases
Amos Storkey — DME Introduction
Flat files
13/23
Amos Storkey — DME Introduction
14/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Data Mining: History
Data Mining: Relationships to Other Fields
Statistics
Machine Learning
1989 IJCAI workshop on KDD (Piatetsky-Shapiro)
Database technology
1991-1994 workshops on KDD
Visualization
1996 Advances in Knowledge Discovery and Data Mining
(eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R.
Uthurusamy)
...
Relationship of Machine Learning to Data Mining
Machine Learning is concerned with making computers
that learn things for themselves.
1995 onwards: International Conferences
Data mining is more concerned with enabling humans to
learn from data
Amos Storkey — DME Introduction
15/23
Overview What is Data Mining? Examples History of Data Mining Data Science
16/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Data Science
Models and Patterns
A model structure is a global summary of the data set.
Example: linear regression, makes a prediction for all input
values
In fact this course is really about Data Science
Data Science is about integrating data driven enterprise in
to a whole process of doing things.
Pattern structures make statements only about restricted
regions of the space spanned by the variables.
Example:
if X > x1 then prob(Y > y1) = p1
[ Equivalently prob(Y > y1|X > x1 ) = p1 ]
Example: detection of outliers
Data Science is about the skills of a practitioner. It
recognises the need for people in the process.
However Data Science is also about automation - do the
right things efficiently.
Amos Storkey — DME Introduction
Amos Storkey — DME Introduction
17/23
Amos Storkey — DME Introduction
18/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Data Mining Tasks
Data Mining Tasks
Exploratory Data Analysis
Descriptive Modelling
Exploratory Data Analysis
Descriptive Modelling
Density estimation
Cluster analysis/segmentation
Density estimation
Cluster analysis/segmentation
Predictive Modelling: Classification and Regression
Discovering Patterns and Rules
Predictive Modelling: Classification and Regression
Discovering Patterns and Rules
Association rules
Outlier detection
Association rules
Outlier detection
Mining Complex Types of Data
Mining Complex Types of Data
Retrieval by Content (RBC) for text, images
Time series and sequence data
Spatial data
Text mining
Mining the WWW (content, structure, usage)
Amos Storkey — DME Introduction
Retrieval by Content (RBC) for text, images
Time series and sequence data
Spatial data
Text mining
Mining the WWW (content, structure, usage)
19/23
Overview What is Data Mining? Examples History of Data Mining Data Science
19/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Data Mining Tasks
Data Mining Tasks
Exploratory Data Analysis
Descriptive Modelling
Exploratory Data Analysis
Descriptive Modelling
Density estimation
Cluster analysis/segmentation
Density estimation
Cluster analysis/segmentation
Predictive Modelling: Classification and Regression
Discovering Patterns and Rules
Predictive Modelling: Classification and Regression
Discovering Patterns and Rules
Association rules
Outlier detection
Association rules
Outlier detection
Mining Complex Types of Data
Mining Complex Types of Data
Retrieval by Content (RBC) for text, images
Time series and sequence data
Spatial data
Text mining
Mining the WWW (content, structure, usage)
Retrieval by Content (RBC) for text, images
Time series and sequence data
Spatial data
Text mining
Mining the WWW (content, structure, usage)
Amos Storkey — DME Introduction
Amos Storkey — DME Introduction
19/23
Amos Storkey — DME Introduction
19/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Data Mining Tasks
Data Mining Tasks
Exploratory Data Analysis
Descriptive Modelling
Exploratory Data Analysis
Descriptive Modelling
Density estimation
Cluster analysis/segmentation
Density estimation
Cluster analysis/segmentation
Predictive Modelling: Classification and Regression
Discovering Patterns and Rules
Predictive Modelling: Classification and Regression
Discovering Patterns and Rules
Association rules
Outlier detection
Association rules
Outlier detection
Mining Complex Types of Data
Mining Complex Types of Data
Retrieval by Content (RBC) for text, images
Time series and sequence data
Spatial data
Text mining
Mining the WWW (content, structure, usage)
Amos Storkey — DME Introduction
Retrieval by Content (RBC) for text, images
Time series and sequence data
Spatial data
Text mining
Mining the WWW (content, structure, usage)
19/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Amos Storkey — DME Introduction
19/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Components of Data Mining Algorithms
Some Issues in Data Mining
(based on list by Han)
Headings
•
•
•
•
•
Task
Structure of model or pattern
Score function
Optimization and search method
Data Management Strategy
Mining methodology and user interaction
Example: Neural Network
e.g. Incorporation of background knowledge
e.g. Handling noise and incomplete data
Regression
Neural network function
Squared error
Gradient descent
unspecified
Performance and scalability
Diversity of data types
Handling relational and complex types of data
Mining information from heterogeneous databases and
WWW
Ref: HMS chapter 1
Amos Storkey — DME Introduction
Applications, social impacts
20/23
Amos Storkey — DME Introduction
21/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Some Issues in Data Mining
Some Issues in Data Mining
(based on list by Han)
(based on list by Han)
Mining methodology and user interaction
Mining methodology and user interaction
e.g. Incorporation of background knowledge
e.g. Handling noise and incomplete data
e.g. Incorporation of background knowledge
e.g. Handling noise and incomplete data
Performance and scalability
Performance and scalability
Diversity of data types
Diversity of data types
Handling relational and complex types of data
Mining information from heterogeneous databases and
WWW
Handling relational and complex types of data
Mining information from heterogeneous databases and
WWW
Applications, social impacts
Applications, social impacts
Amos Storkey — DME Introduction
21/23
Amos Storkey — DME Introduction
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Some Issues in Data Mining
Some Issues in Data Mining
(based on list by Han)
(based on list by Han)
Mining methodology and user interaction
Mining methodology and user interaction
e.g. Incorporation of background knowledge
e.g. Handling noise and incomplete data
e.g. Incorporation of background knowledge
e.g. Handling noise and incomplete data
Performance and scalability
Performance and scalability
Diversity of data types
Diversity of data types
Handling relational and complex types of data
Mining information from heterogeneous databases and
WWW
Handling relational and complex types of data
Mining information from heterogeneous databases and
WWW
Applications, social impacts
Amos Storkey — DME Introduction
21/23
Applications, social impacts
21/23
Amos Storkey — DME Introduction
21/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Tentative Lecture Outline
Tentative Lecture Outline
Visualizing and Exploring Data
Visualizing and Exploring Data
Descriptive Data Modelling
Descriptive Data Modelling
Including hierarchical clustering
Including hierarchical clustering
Data Preprocessing
Data Preprocessing
Data cleaning
Data integration and transformation
Data reduction
Data cleaning
Data integration and transformation
Data reduction
Predictive Modelling
Predictive Modelling
Overview of regression and classification
Decision trees
Support Vector machines
Performance evaluation
Dealing with unbalanced classes
Overview of regression and classification
Decision trees
Support Vector machines
Performance evaluation
Dealing with unbalanced classes
Amos Storkey — DME Introduction
22/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Amos Storkey — DME Introduction
Tentative Lecture Outline
Tentative Lecture Outline
Visualizing and Exploring Data
Visualizing and Exploring Data
Descriptive Data Modelling
Descriptive Data Modelling
Including hierarchical clustering
Including hierarchical clustering
Data Preprocessing
Data Preprocessing
Data cleaning
Data integration and transformation
Data reduction
Data cleaning
Data integration and transformation
Data reduction
Predictive Modelling
Predictive Modelling
Overview of regression and classification
Decision trees
Support Vector machines
Performance evaluation
Dealing with unbalanced classes
Amos Storkey — DME Introduction
22/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview of regression and classification
Decision trees
Support Vector machines
Performance evaluation
Dealing with unbalanced classes
22/23
Amos Storkey — DME Introduction
22/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Tentative Lecture Outline
Tentative Lecture Outline
Visualizing and Exploring Data
Descriptive Data Modelling
Patterns
Including hierarchical clustering
A priori algorithm
Data Preprocessing
Mining Complex Data
Data cleaning
Data integration and transformation
Data reduction
Web mining: Page Rank (google)
Retrieval by Content
Text, time series, images
Predictive Modelling
Guest lectures.
Overview of regression and classification
Decision trees
Support Vector machines
Performance evaluation
Dealing with unbalanced classes
Amos Storkey — DME Introduction
Paper presentations.
22/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Amos Storkey — DME Introduction
Tentative Lecture Outline
Tentative Lecture Outline
Patterns
Patterns
A priori algorithm
A priori algorithm
Mining Complex Data
Mining Complex Data
Web mining: Page Rank (google)
Retrieval by Content
Text, time series, images
Web mining: Page Rank (google)
Retrieval by Content
Text, time series, images
Guest lectures.
Guest lectures.
Paper presentations.
Paper presentations.
Amos Storkey — DME Introduction
23/23
Overview What is Data Mining? Examples History of Data Mining Data Science
23/23
Amos Storkey — DME Introduction
23/23
Overview What is Data Mining? Examples History of Data Mining Data Science
Overview What is Data Mining? Examples History of Data Mining Data Science
Tentative Lecture Outline
Tentative Lecture Outline
Patterns
Patterns
A priori algorithm
A priori algorithm
Mining Complex Data
Mining Complex Data
Web mining: Page Rank (google)
Retrieval by Content
Text, time series, images
Web mining: Page Rank (google)
Retrieval by Content
Text, time series, images
Guest lectures.
Guest lectures.
Paper presentations.
Paper presentations.
Amos Storkey — DME Introduction
23/23
Amos Storkey — DME Introduction
23/23
Related documents