Download Audio Information Retrieval: Machine Learning Basics Outline

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Mixture model wikipedia , lookup

Principal component analysis wikipedia , lookup

Cluster analysis wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Outline
Preface
A System Overview
System Types and Components
Outline
Preface
A System Overview
System Types and Components
Outline
Audio Information Retrieval: Machine Learning
Basics
C. G. v. d. Boogaart
Preface
A System Overview
R. Lienhart
Multimedia Computing Lab
University of Augsburg
{boogaart,lienhart}@multimedia-computing.org
www.multimedia-computing.org
Outline
Preface
A System Overview
System Types and Components
System Types and Components
Outline
Preface
Outline
A System Overview
System Types and Components
Literature
This chapter inherits partly from the book chapter
Preface
Literature
Repetition
A System Overview
System Types and Components
• “Pattern recognition for multimedia content analysis” in
Multimedia Retrieval, 2007, [Ranguelova and Huiskes, 2007].
It gives a compact overview of pattern recognition and machine
learning for multimedia information retrieval. See also the already
mentioned references
• T. M. Mitchell, “Machine Learning”, 1997, [Mitchell, 1997]
• C. M. Bishop, “Pattern Recognition and Machine Learning”,
2006, [Bishop, 2006]
for the topic of state-of the art machine learning methods.
Outline
Preface
A System Overview
System Types and Components
Outline
Preface
A System Overview
Repetition: Terms
Repetition: Terms
Relatives . . .
Audio Information Retrieval
System Types and Components
We already mentioned, the synonyms:
• pattern recognition
• machine learning
Audio Information Retrieval: The collection of techniques, systems
and applications applying information retrieval
and/or data mining methods to sound.
• data mining
• information retrieval
Synonyms: Audio Mining, Machine Listening, . . .
Besides the different history, they describe very similar things and
are often interchanged.
“Pattern recognition has its origins in engineering, whereas machine
learning grew out of computer science. However, these activities
can be viewed as two facets of the same field.” [Bishop, 2006].
Outline
Preface
A System Overview
System Types and Components
Definition of term and day to day use vary!
Outline
Preface
Repetition: Components, Building Blocks
A System Overview
Outline
AIR is enabled and driven by several scientific fields:
• (Digital) signal processing, filter theory (done)
• Learning, intelligence (now)
• Pattern recognition
• Machine learning
• Data mining
• Information retrieval
• Linguistics, phonetics (perhaps −)
• Music theory (perhaps +)
Last part of lecture: AIR applications (to come).
• Speech
• Music
• General, misc, environment sound
Preface
A System Overview
System Types and Components
System Types and Components
Outline
Preface
A System Overview
System Types and Components
Outline
Preface
A System Overview
Pattern recognition (PR) aims to classify the data (patterns) based
on (either a priori knowledge or on) statistical information
extracted from them. A PR systems consists of:
pattern recognition systems for discussion of there common
ground.
Classification
Regression
Statistics in general
Vector spaces
• A sensor that gathers the data.
• A feature extractor that computes numeric or symbolic
representations (features) from the data.
• A classifier to classify the patterns to suitable categories.
• We try now to summarize some of the very basic terms and
Learning Machines
facts that have to be mentioned in discussing this topic.
Outline
Preface
System Types and Components
A System Overview
• There are several ways of phrasing machine learning and
•
•
•
•
A System Overview
A System Overview
A System Overview
System Types and Components
A input-output categorization of PR systems:
Pattern classification/supervised learning: From a training set of
example patterns with known classification, the
systems learns a prediction function. It is applied to
new input patterns of unknown classification. The
goal is good generalization and to avoid overfitting.
Reinforcement learning: The system responds to a given input
pattern by generating an output. The output is
rewarded or punished according to a reward function,
allowing the system to improve its output. Typically
the inputs are perceived states and the outputs are
actions of the system.
Pattern clustering/unsupervised learning: The system is expected
to discover natural structure in unlabeled patterns
itself, e.g by grouping the patterns into clusters
(clustering).
Outline
Preface
A System Overview
System Types and Components
A System Overview
For pattern classification and clustering, the PR process can be
subdivided in the following stages:
1. Pattern representation
• Feature extraction
• Feature selection, dimension reduction
2. Modeling: Choosing a model that explains with a trained
prediction function the outputs based on the inputs.
3. Learning: Classification or clustering
4. Evaluation: Measure performance of system.
Outline
Preface
A System Overview
System Types and Components
Outline
Preface
A System Overview
A System Overview
System Types and Components
A System Overview
Data sets involved:
Training set: Data set used for parameter estimation of the model
i.e. for estimating the underlying probability density
function.
Typical pitfalls are:
Overfitting: High performance on the trainings set, even on
outliers and errors in the set.
Validation set: Independent data set used during training, e.g. to
measure the recognition performance and decide
when to stop the training.
No “generalization”: High performance only, if data is from the
same domain (e.g. old English literature) but bad on
all other domains (e.g. radio news).
Test set: Independent data set for measuring the achieved
performance.
The goal is high performance on a (unseen) new (test) set, learned
on the training set.
Outline
Preface
A System Overview
Outline
System Types and Components
Outline
Preface
A System Overview
System Types and Components
Pattern Classification
Mission Statement
Preface
A System Overview
System Types and Components
Pattern Classification
Models and Classifiers
Unsupervised Learning and Clustering
Dimension Reduction
The aim is to generalize from the class structure of a set of labeled
example patterns (trainings set). For the trainings set
• We know the class they have.
• We know their feature representation.
The feature representation can be considered as a point or vector
in a feature space.
Outline
Preface
A System Overview
System Types and Components
Outline
Preface
Pattern Classification
Outline
A System Overview
System Types and Components
Pattern Classification
Issues
Measuring Classifier Performance
Residual uncertainty: Relationship between features and classes is
not deterministic, but probabilistic. Given a feature
vector, the class is not fully determined: features ~x ,
class k, joint probability p(~x , k)
Limited availability of data: Not for every possible input vector, a
(set of) training example(s) is(are) available,
especially for high dimensional feature spaces.
Need for prior assumptions
Noise and error: This affects features and class memberships.
Under- and overfitting: How much can the model follow the data?
To simple models can lead to underfitting, to
complex models can lead to overfitting, because they
can model everything.
Irrelevant feature variables
The classifier determines a decision boundary in the feature space,
which separates the classes from each other.
The performance of the classification can be measured with
Preface
A System Overview
System Types and Components
Models and Classifiers
The complexity of the shape of the decision boundary depends on
the complexity of the classifier. Some classifiers can separate
several classes, others separate two classes and a tree of classifiers
has to be employed to separate more classes.
Examples:
• Probability density functions (PDF)
• Gaussian Mixture Models (GMM)
• Hidden Markov Models (HMM)
• Dynamic Bayesian Nets (DBN)
• (N-)nearest neighbors
• Artificial neural networks
• Support vector machines, kernelized methods (Kernel Trick)
• Boosting (ensemble method).
• Training error
• Test error
• False positive rate, false negative rate
Often access to (labeled) data is extremely limited. To reuse as
much of the training data as test data (or vice versa) K -fold
cross-validation is used:
• The sample set is split in K subsets.
• K tests are performed where at each test one of the sets is
taken as test set.
• The results are averaged.
Outline
Preface
A System Overview
System Types and Components
Unsupervised Learning and Clustering
Types of clustering methods:
Hierarchical methods: Items are grouped which are closest to each
other in the order of their proximity (or split the
farthest pair by increasing distance). Examples:
single link, complete link and Ward clustering
([Ward, 1963]).
Iterative optimization methods: A criterion which measures the
quality of the clustering is optimized. E.g.: K-means
clustering.
The obtained clustering depends on several choices, especially on
the distance measure between two patterns.
Outline
Preface
A System Overview
System Types and Components
Outline
Preface
A System Overview
System Types and Components
Bishop, C. M. (2006).
Pattern Recognition and Machine Learning (Information
Science and Statistics).
Springer.
Dimension Reduction
Feature extraction: Creating new features by combination and
transformation of the original features.
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA). Oriented
PCA (OPCA).
• Multidimensional scaling.
Feature selection: From the original features select a subset of
features relevant for building a good classifier.
Mitchell, T. M. (1997).
Machine Learning.
McGraw-Hill, New York.
Ranguelova, E. and Huiskes, M. (2007).
Pattern recognition for multimedia content analysis.
In Blanken, H. M., Blok, H. E., Feng, L., and Vries, d. A. P.,
editors, Multimedia Retrieval, Data-Centric Systems and
Applications, pages 53–95. Springer-Verlag, Berlin, Germany /
Heidelberg, Germany / London, UK / etc.
Ward, J. J. (1963).
Hierarchical grouping to optimize an objective function.
Outline
Preface
A System Overview
Journal of the American Statistical Association,
58(301):236–244.
System Types and Components