Download A Comparison of Statistical and Rule-Based Models of Melodic Segmentation Janaki Ramachandran

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Transcript
A Comparison of Statistical and
Rule-Based Models of Melodic
Segmentation
Janaki Ramachandran
Dr. Chuan
Music Informatics and Computing
Spring 2011
What is Melodic Segmentation?
●
●
●
Identifying melodic “phrases.” A phrase is an
important musical organizational unit.
Used in MIR
–
feature computation
–
melody indexing
–
retrieval of melodic excerpts
Humans perceive music in terms of phrases
Melodic Segmentation
●
●
Different algorithms use different inputs and
approaches to automatically identify melodic
segments.
–
Supervised learning
–
Unsupervised learning
–
Music-theoretic rules
–
Global/Local Information
This work compares performance of various existing
algorithms and proposes a new hybrid model.
Model Measures
●
●
●
TP (True Positives) – Number of times a
model correctly predicts a certain outcome
FP (False Positives) – Number of times a
model predicts a certain outcome incorrectly
FN (False Negatives) – Number of times a
model incorrectly does not predict a certain
outcome
Metrics
●
Model performance is assessed in terms of:
–
Precision
●
–
–
Precision = TP / (TP + FP)
Recall
●
Recall = TP / (TP + FN)
●
F1 = (2 · precision · recall) / (precision + recall)
F1
Melodic Segmentation Models:
GTTM
●
●
Generative Theory of Tonal Music
Identifies discontinuities and changes
between musical events
–
Temporal proximity (rests)
–
Pitch
–
Duration
–
Dynamics
GTTM Cont'd
●
●
Ex. of GPRs (Grouping Preference Rules)
“The most widely studied of these GPRs predict that
phrase boundaries will be perceived between two
melodic events whose temporal proximity is less
than that of the immediately neighbouring events
due to a slur, a rest (GPR 2a) or a relatively long
inter-onset interval or IOI (GPR 2b) or when the
transition between two events involves a greater
change in register (GPR 3a), dynamics (GPR 3b),
articulation (GPR 3c) or duration (GPR 3d) than the
immediately neighbouring transitions.” [1]
Melodic Segmentation Models:
LBDM
●
Local Boundary Detection Model
●
Boundaries associated with change
●
●
Boundary strengths calculated and normalized
statistically, based on change in the intervals
between events and the size of the intervals
(proximity)
Weighted sum for boundary strength profiles
calculated using weights for pitch, rest, ioi, etc.
(determined by trial and error), and boundaries are
predicted to occur where profile > predefined value
Melodic Segmentation: Grouper
●
●
Melody denoted by assigning onset time, off time, and
chromatic pitch to a note and placing it into a hierarchy
Three PSPRs (Phrase Structure Preference Groups)
–
“PSPR 1 (Gap Rule): prefer to locate phrase boundaries
at (a) large IOIs and (b) large offset-to-onset intervals
(OOI); PSPR 1 is calculated as the sum of the IOI and
OOI divided by the mean IOI of all previous notes;” [1]
–
“PSPR 2 (Phrase Length Rule): prefer phrases with
about 10 notes, achieved by penalising predicted
phrases by |(log2 N) − 3| where N is the number of notes
in the predicted phrase;” [1]
–
“PSPR 3 (Metrical Parallelism Rule): prefer to begin
successive groups at parallel points in the metrical
hierarchy.”[1]
Grouper Cont'd
●
●
Melody is analyzed using dynamic
programming approach
Phrases analyzed using all three rules, with
different weights assigned to each rule
(determined by trial and error)
Information Dynamics of Music
Model (IDyOM)
●
●
●
●
●
Model proposed by this work, based on perception
of groupings in terms of anticipation of change.
Context doesn't imply continuation, or unexpected
continuation
Model generates probabilities of an event e
occurring given a specific context c
Information content h(e | c) – degree to which the
event is unexpected
Entropy – Uncertainty (H); Average information
content of all events that the model 'experiences'
IDyOM Cont'd
●
●
●
●
Also based on psychology
Adults and infants instinctively use statistical
techniques and pitch, interval, etc. to identify
groupings.
Also related to cognitive language processing,
and how we separate words based on the
probability of a transition.
Successful melodic segmentation algorithms
can also separate word boundaries in
written/spoken language.
IDyOM Cont'd
●
●
●
●
●
●
Model based on n-grams – Collections of sequences of
n symbols associated with frequency counts.
After model is 'trained,' uses frequency counts for
analysis.
N – 1: Number of symbols → the order.
Lower-order models are more general, and do not take
the context into account as much.
Higher-order models are very specific, but may not be
as 'regular' as the training set, skews probability.
IDyOM attempts to optimize accuracy by giving more
weight to higher-order predictions.
IDyOM Cont'd
●
Tries to take the local statistical nature of the
current melody into account, which some
general models will not do.
●
Uses long-term and short-term models.
●
Long-term: entire training set
●
Short-term: trains incrementally for each
individual song
IDyOM Cont'd
●
●
Music is multidimensional, and the
combination of attributes affects perception.
Each feature is analyzed independently, then
the probability of each note is calculated as
the product of the probabilities of its attributes.
IDyOM Cont'd
●
●
Model output consists of the 'expectation' of
the next note based on context.
–
Pitch, IOI, OOI
–
Overall probability of the event is the
multiplication of the probabilities of its
attributes
–
Unexpectedness
Based on statistical rules rather than symbolic
rules
Previous Comparison
●
●
●
●
Prior to this work, there was no overall comparison
that compared a large number of melodic
segmentation algorithms.
Previous comparisons used segmentation manually
denoted by humans, which prevented the study of a
large group of melodies.
Previous works only compared smaller groups of
algorithms.
Different works used different metrics and
performance factors, and different analytical
techniques, so could not be compared directly.
Method
●
●
●
Grouper has binary boundary indicator for
each note (0 – no boundary/1 – boundary)
Other algorithms output numbers which
indicate boundary strength
Made algorithms comparable by picking
boundaries based on boundary strengths.
Simple Picker Method
●
●
●
●
Principle 1: Note after boundary should have
boundary strength greater than the note after it.
Principle 2: Note after boundary should have
boundary strength greater than previous note.
To be considered more than just a local peak, the
boundary strength should be greater than a certain
threshold – a certain number of standard deviations
greater than the mean.
Last note implies a phrase boundary.
Results
●
●
●
[1]
Grouper, LBDM, GPR2,
and IdyOM had highest
F1 scores.
Significant differences
between these.
Hybrid algorithm created
to test for potential further
performance increases.
Analysis
●
●
●
GPR2 had very good performance. Implies
that rests have stronger influence on
boundary definitions.
Models that performed better took rests into
account, whereas IDyOM included it with its
probabilities.
Future research: consider algorithm
performance in terms of different musical
contexts.
Analysis Cont'd
●
●
●
●
●
Hybrid performed the best.
Logistic regression seems to improve the boundary
prediction function.
IDyOM did not use any rules of music theory, and
did not train on pre-segmented data, but still
performed well. Useful for songs that do not have
pre-annotated boundaries.
'Expectedness' strongly related to boundary
detection.
Future Research: include entropy.
Reference
●
[1] D. Muellensiefen, M. Pearce and G.
Wiggins, “A Comparison of Statistical and
Rule-Based Models of Melodic
Segmentation,” ISMIR 2008.