Download Introduction to Data Mining and Machine Learning Techniques

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Introduction to Data Mining and Machine
Learning Techniques
Iza Moise, Evangelos Pournaras
Iza Moise, Evangelos Pournaras
1
Overview
Main principles of data mining
Definition
Steps of a data mining process
Supervised vs. unsupervised data mining
Applications
Data mining functionalities
Iza Moise, Evangelos Pournaras
2
Definition
Data mining
is the automated process of discovering interesting (non-trivial, previously unknown, insightful and potentially useful) information or
patterns, as well as descriptive, understandable, and predictive models from (large-scale) data.
Also known as:
• Knowledge discovery in databases (KDD)
• Data analysis
• Information harvesting
• Business intelligence
Iza Moise, Evangelos Pournaras
3
Goals
X search consistent patterns and/or systemic relationships
between data
X validate the findings by applying the detected patterns to new
subsets of data
X predict new findings on new datasets
Iza Moise, Evangelos Pournaras
4
Data Mining is...
• k-means clustering
• decision trees
• neural networks
• Bayesian networks
Iza Moise, Evangelos Pournaras
5
Data Mining is not...
• Data warehousing
• SQL / Ad Hoc Queries / Reporting
• Software Agents
• Online Analytical Processing (OLAP)
• Data Visualization
Iza Moise, Evangelos Pournaras
6
Knowledge Discovery Process
Iza Moise, Evangelos Pournaras
7
Steps of a data mining process
1. Exploration
2. Model building and validation
3. Deployment → prediction
Iza Moise, Evangelos Pournaras
8
Exploration
• data preparation
→ data cleaning, data transformation etc.
• elaborate exploratory analyses using a wide variety of
graphical and statistical methods, depending on the nature of
the analytic problem
• determine the general nature of models that can be taken into
account in the next stage
Iza Moise, Evangelos Pournaras
9
Model building and validation, and Deployment
• Model building and validation:
→ consider various models and choose the best one
→ validate the model on existing data
• Deployment:
→ apply the model to new data
Iza Moise, Evangelos Pournaras
10
Model building and validation, and Deployment
• Model building and validation:
→ consider various models and choose the best one
→ validate the model on existing data
• Deployment:
→ apply the model to new data
Iza Moise, Evangelos Pournaras
10
Model building and validation, and Deployment
• Model building and validation:
→ consider various models and choose the best one
→ validate the model on existing data
• Deployment:
→ apply the model to new data
Iza Moise, Evangelos Pournaras
10
Two styles of data mining: Predictive and
Descriptive
1. Predictive
→ Predict the value of a specific attribute (target or dependent)
based on the value of other attributes (explanatory or
predictors)
2. Descriptive
→ Derive patterns that summarise the relationships among
data points
Iza Moise, Evangelos Pournaras
11
Supervised vs. Unsupervised
Supervised
• predictive or directed
• useful when you have a specific
target value to predict about
your data
• Classification, Regression,
Anomaly Detection
Unsupervised
• descriptive or undirected
• finds hidden structure and
relation within the data
• Clustering, Association, Feature
Extraction
Iza Moise, Evangelos Pournaras
12
Supervised Data Mining
• a pre-specified target variable
• explanatory variables or predictors and ↔ one (or more)
dependent variables or target
• goal: specify relationships between predictors and target
• training: the model is given many training data where the target
is known
• testing: the model is applied to data for which the target is
unknown
Iza Moise, Evangelos Pournaras
13
Unsupervised Data Mining
• determine the existence of classes or clusters in the data
• exploratory analysis
• all variable are treated in the same way
Iza Moise, Evangelos Pournaras
14
Overview
Main principles of data mining
Definition
Steps of a data mining process
Supervised vs. unsupervised data mining
Applications
Data mining functionalities
Iza Moise, Evangelos Pournaras
15
Fielded Applications
• Web mining
→ PageRank, Search Engine, Recommenders, Social Networks
• Screening Images
• Diagnosis
• Marketing and Sales
→ Market basket analysis, Direct marketing
• Biology, biomedicine, astronomy, chemistry
Iza Moise, Evangelos Pournaras
16
Beers and Diapers
Iza Moise, Evangelos Pournaras
17
Overview
Main principles of data mining
Definition
Steps of a data mining process
Supervised vs. unsupervised data mining
Applications
Data mining functionalities
Iza Moise, Evangelos Pournaras
18
1. Supervised Data Mining
X Classification
X Regression
X Outlier detection
X Frequent pattern mining
2. Unsupervised Data Mining
X Clustering
X Feature Extraction
→ definition
→ real use-cases
→ method
→ pros and cons
Iza Moise, Evangelos Pournaras
19
1. Supervised Data Mining
X Classification
X Regression
X Outlier detection
X Frequent pattern mining
2. Unsupervised Data Mining
X Clustering
X Feature Extraction
→ definition
→ real use-cases
→ method
→ pros and cons
Iza Moise, Evangelos Pournaras
19