Download Lecture slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Introduction
Introduction
1.
2.
3.
4.
5.
6.
Data Mining and Knowledge Discovery
Data Mining Methods
Supervised Learning
Unsupervised Learning
Other Learning Paradigms
Introduction to Data Preprocessing
Introduction
1.
2.
3.
4.
5.
6.
Data Mining and Knowledge Discovery
Data Mining Methods
Supervised Learning
Unsupervised Learning
Other Learning Paradigms
Introduction to Data Preprocessing
Data Mining and Knowledge Discovery
• Vast amounts of data are around us in our
world, raw data that is mainly intractable for
human or manual applications.
• Data Mining (DM) is about solving problems
by analyzing data present in real databases.
• DM is distinghished as synonym of the
Knowledge Discovery in Databases (KDD)
process, or as the main step of KDD.
Data Mining and Knowledge Discovery
• KDD definition: “the nontrivial process of identifying
valid, novel, potentially useful, and ultimately
understandable patterns in data”
• Stages:
–
–
–
–
–
–
Problem Specification.
Problem Understanding.
Data Preprocessing.
Data Mining.
Evaluation.
Results Exploitation.
Data Mining and Knowledge Discovery
KDD process:
Introduction
1.
2.
3.
4.
5.
6.
Data Mining and Knowledge Discovery
Data Mining Methods
Supervised Learning
Unsupervised Learning
Other Learning Paradigms
Introduction to Data Preprocessing
Data Mining Methods
Data Mining Methods
• Statistical Methods:
– Regression Models:
• They are used in estimation tasks, requiring the class of
equation modelling to be used.
• Linear, quadratic and logistic regression are the most
well known.
• They may have problems with missing values, outliers
and redundant/harmful features.
Data Mining Methods
• Statistical Methods:
– Artificial Neural Networks (ANNs):
• They are powerful mathematical models suitable for
almost all DM tasks, especially predictive one.
• Multi-layer perceptron (MLP), Radial Basis Function
Networks (RBFNs) and Learning Vector Quantization
(LVQ) are the most well known.
• They require numeric attributes and may have
problems with missing values.
• They are robust to outliers and noise.
Data Mining Methods
• Statistical Methods:
– Bayesian Learning:
• It uses the probability theory as a framework for
making rational decisions under uncertainty.
• Naïve Bayes is the most well known technique.
• They are very sensitive to the redundancy and
usefulness of some of the attributes and examples from
the data, together with noisy and outliers examples.
• They require nominal attributes and cannot deal with
missing values.
Data Mining Methods
• Statistical Methods:
– Instance-based Learning:
• The examples are stored verbatim, and a distance
function is used to determine which members of the
database are closest to a new example with a desirable
prediction.
• The K-Nearest Neighbor (KNN) is the most
representative method.
• They are good candidates to be improved through data
reduction procedures.
Data Mining Methods
• Statistical Methods:
– Support Vector Machines (SVMs):
• They are machine learning algorithms based on
learning theory and similar to ANNs in the sense that
they are used for estimation and perform very well
when data is linearly separable.
• They require numeric non-missing data and are
commonly robust against noise and outliers.
Data Mining Methods
• Symbolic Methods:
– Rule Learning:
• They search for a rule that explains some part of the
data, separate these examples and recursively conquer
the remaining examples.
• AQ, CN2, RIPPER, PART and FURIA are good examples of
this family.
• They require numeric non-missing data and are
commonly robust against noise and outliers.
• They require nominal data (sometimes with an implicit
process) and dispose of an innate selector of interesting
attributes from data.
Data Mining Methods
• Symbolic Methods:
– Decision Trees:
• They construct predictive models formed by iterations
of a divide and conquer scheme of hierarchical
decisions.
• CART, C4.5 and PUBLIC are good examples of this
family.
• They are closely related to rule learning methods and
suffer from the same disadvantage as them.
Data Mining Methods
• Data descriptive tasks:
– Clustering:
• It appears when there is no class information to be
predicted but the examples must be divided into
natural groups or clusters.
• Well known examples of clustering algorithms are kMeans, COBWEB and Self Organizing Maps.
• They prefer numeric data together with no-missing
data and the absence of noise and outliers.
Data Mining Methods
• Data descriptive tasks:
– Association Rules:
• Set of techniques that aim to find association
relationships in the data.
• The Apriori technique is the most emblematic
technique to address this problem.
• Data transformation (mainly discretization) and
reduction is often needed to perform high quality
analysis in this DM problem.
Introduction
1.
2.
3.
4.
5.
6.
Data Mining and Knowledge Discovery
Data Mining Methods
Supervised Learning
Unsupervised Learning
Other Learning Paradigms
Introduction to Data Preprocessing
Supervised Learning
• Prediction methods are commonly referred to
as supervised learning. Supervised methods
are thought to attempt the discovery of the
relationships between input attributes and a
target attribute.
• A training set is given and the objective is to
form a description that can be used to predict
unseen examples.
Supervised Learning
• Problems:
– Classification
• The domain of the target attribute is finite and
categorical.
• A classifier must assign a class to a unseen example.
– Regression
• The target attribute is formed by infinite values.
• To fit a model to learn the output target attribute as a
function of input attributes.
– Time Series Analysis
• Making predictions in time.
Introduction
1.
2.
3.
4.
5.
6.
Data Mining and Knowledge Discovery
Data Mining Methods
Supervised Learning
Unsupervised Learning
Other Learning Paradigms
Introduction to Data Preprocessing
Unsupervised Learning
• There is no supervisor and only input data is
available.
• The aim is now to find regularities,
irregularities, relationships, similarities and
associations in the input.
Unsupervised Learning
• Problems:
– Clustering
– Association Rules
– Pattern Mining
• It is adopted as amore general term than frequent
pattern mining or association mining.
– Outlier Detection
• Ot is the process of finding data examples with
behaviours that are very different from the expectation
(outliers or anomalies).
Introduction
1.
2.
3.
4.
5.
6.
Data Mining and Knowledge Discovery
Data Mining Methods
Supervised Learning
Unsupervised Learning
Other Learning Paradigms
Introduction to Data Preprocessing
Other Learning Paradigms
• Imbalanced Learning
– A classification problem where the data has
exceptional distribution on the target attribute.
– The number of examples representing the class of
interest is much lower than that of the other
classes.
• Multi-instance Learning
– imposed restrictions on models in which each
example consists of a bag of instances instead of
an unique instance.
Other Learning Paradigms
• Multi-label Classification
– Each instance is associated not with a class, but
instead with a subset of them.
• Semi-supervised Learning
– It is concerned with the design of models in the
presence of both labeled and unlabeled data.
– Semi-supervised
classification
and
Semisupervised clustering.
– Relationship with Active Learning.
Other Learning Paradigms
• Subgroup Discovery
– It is formed as the result of the hybridization
between classification and association mining.
– They aim to extract interesting rules with respect
to a target attribute.
• Transfer Learning
– Aims to extract the knowledge from one or more
source tasks and apply the knowledge to a target
task.
– The so-called data shift problem is closely related.
Other Learning Paradigms
• Data Stream Learning
– When all data is not available at a specific
moment, it is necessary to develop learning
algorithms that treat the input as a continuous
data stream.
– Each instance can be inspected only once and
must then be discarded to make room for
subsequent instances.
Introduction
1.
2.
3.
4.
5.
6.
Data Mining and Knowledge Discovery
Data Mining Methods
Supervised Learning
Unsupervised Learning
Other Learning Paradigms
Introduction to Data Preprocessing
Introduction to Data Preprocessing
• Unfortunately, real-world databases are highly
influenced by negative factors such the
presence of noise, missing values, inconsistent
and superfluous data and huge sizes in both
dimensions, examples and features.
• Low-quality data will lead to low-quality DM
performance.
Introduction to Data Preprocessing
• Forms of Data Preparation
Introduction to Data Preprocessing
• Data Cleaning
– Correct bad data, filter some incorrect data out of
the data set and reduce the unnecessary detail of
data.
• Data Transformation
– The data is consolidated so that the mining
process result could be applied or may be more
efficient.
• Data Integration
– Merging of data from multiple data stores.
Introduction to Data Preprocessing
• Data Normalization
– To express data in the same measurements units,
scale or range.
• Missing Data Imputation
– To fill the variables that contain missing values
with some intuitive data.
• Noise Identification
– To detect random errors or variances in a
measured variable.
Introduction to Data Preprocessing
Forms
of
Data
Reduction
Introduction to Data Preprocessing
• Feature Selection
– Achieves the reduction of the data set by
removing irrelevant or redundant features (or
dimensions).
• Instance Selection
– Consists of choosing a subset of the total available
data to achieve the original purpose of the DM
application as if the whole data had been used.
Introduction to Data Preprocessing
• Discretization
– Transforms quantitative data into qualitative data,
that is, numerical attributes into nominal
attributes with a finite number of intervals.
• Feature Extraction/Instance Generation
– Extends both the feature and instance selection by
allowing the modification of the internal values
that represent each example or attribute.