Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Comparison of Statistical and Rule-Based Models of Melodic Segmentation Janaki Ramachandran Dr. Chuan Music Informatics and Computing Spring 2011 What is Melodic Segmentation? ● ● ● Identifying melodic “phrases.” A phrase is an important musical organizational unit. Used in MIR – feature computation – melody indexing – retrieval of melodic excerpts Humans perceive music in terms of phrases Melodic Segmentation ● ● Different algorithms use different inputs and approaches to automatically identify melodic segments. – Supervised learning – Unsupervised learning – Music-theoretic rules – Global/Local Information This work compares performance of various existing algorithms and proposes a new hybrid model. Model Measures ● ● ● TP (True Positives) – Number of times a model correctly predicts a certain outcome FP (False Positives) – Number of times a model predicts a certain outcome incorrectly FN (False Negatives) – Number of times a model incorrectly does not predict a certain outcome Metrics ● Model performance is assessed in terms of: – Precision ● – – Precision = TP / (TP + FP) Recall ● Recall = TP / (TP + FN) ● F1 = (2 · precision · recall) / (precision + recall) F1 Melodic Segmentation Models: GTTM ● ● Generative Theory of Tonal Music Identifies discontinuities and changes between musical events – Temporal proximity (rests) – Pitch – Duration – Dynamics GTTM Cont'd ● ● Ex. of GPRs (Grouping Preference Rules) “The most widely studied of these GPRs predict that phrase boundaries will be perceived between two melodic events whose temporal proximity is less than that of the immediately neighbouring events due to a slur, a rest (GPR 2a) or a relatively long inter-onset interval or IOI (GPR 2b) or when the transition between two events involves a greater change in register (GPR 3a), dynamics (GPR 3b), articulation (GPR 3c) or duration (GPR 3d) than the immediately neighbouring transitions.” [1] Melodic Segmentation Models: LBDM ● Local Boundary Detection Model ● Boundaries associated with change ● ● Boundary strengths calculated and normalized statistically, based on change in the intervals between events and the size of the intervals (proximity) Weighted sum for boundary strength profiles calculated using weights for pitch, rest, ioi, etc. (determined by trial and error), and boundaries are predicted to occur where profile > predefined value Melodic Segmentation: Grouper ● ● Melody denoted by assigning onset time, off time, and chromatic pitch to a note and placing it into a hierarchy Three PSPRs (Phrase Structure Preference Groups) – “PSPR 1 (Gap Rule): prefer to locate phrase boundaries at (a) large IOIs and (b) large offset-to-onset intervals (OOI); PSPR 1 is calculated as the sum of the IOI and OOI divided by the mean IOI of all previous notes;” [1] – “PSPR 2 (Phrase Length Rule): prefer phrases with about 10 notes, achieved by penalising predicted phrases by |(log2 N) − 3| where N is the number of notes in the predicted phrase;” [1] – “PSPR 3 (Metrical Parallelism Rule): prefer to begin successive groups at parallel points in the metrical hierarchy.”[1] Grouper Cont'd ● ● Melody is analyzed using dynamic programming approach Phrases analyzed using all three rules, with different weights assigned to each rule (determined by trial and error) Information Dynamics of Music Model (IDyOM) ● ● ● ● ● Model proposed by this work, based on perception of groupings in terms of anticipation of change. Context doesn't imply continuation, or unexpected continuation Model generates probabilities of an event e occurring given a specific context c Information content h(e | c) – degree to which the event is unexpected Entropy – Uncertainty (H); Average information content of all events that the model 'experiences' IDyOM Cont'd ● ● ● ● Also based on psychology Adults and infants instinctively use statistical techniques and pitch, interval, etc. to identify groupings. Also related to cognitive language processing, and how we separate words based on the probability of a transition. Successful melodic segmentation algorithms can also separate word boundaries in written/spoken language. IDyOM Cont'd ● ● ● ● ● ● Model based on n-grams – Collections of sequences of n symbols associated with frequency counts. After model is 'trained,' uses frequency counts for analysis. N – 1: Number of symbols → the order. Lower-order models are more general, and do not take the context into account as much. Higher-order models are very specific, but may not be as 'regular' as the training set, skews probability. IDyOM attempts to optimize accuracy by giving more weight to higher-order predictions. IDyOM Cont'd ● Tries to take the local statistical nature of the current melody into account, which some general models will not do. ● Uses long-term and short-term models. ● Long-term: entire training set ● Short-term: trains incrementally for each individual song IDyOM Cont'd ● ● Music is multidimensional, and the combination of attributes affects perception. Each feature is analyzed independently, then the probability of each note is calculated as the product of the probabilities of its attributes. IDyOM Cont'd ● ● Model output consists of the 'expectation' of the next note based on context. – Pitch, IOI, OOI – Overall probability of the event is the multiplication of the probabilities of its attributes – Unexpectedness Based on statistical rules rather than symbolic rules Previous Comparison ● ● ● ● Prior to this work, there was no overall comparison that compared a large number of melodic segmentation algorithms. Previous comparisons used segmentation manually denoted by humans, which prevented the study of a large group of melodies. Previous works only compared smaller groups of algorithms. Different works used different metrics and performance factors, and different analytical techniques, so could not be compared directly. Method ● ● ● Grouper has binary boundary indicator for each note (0 – no boundary/1 – boundary) Other algorithms output numbers which indicate boundary strength Made algorithms comparable by picking boundaries based on boundary strengths. Simple Picker Method ● ● ● ● Principle 1: Note after boundary should have boundary strength greater than the note after it. Principle 2: Note after boundary should have boundary strength greater than previous note. To be considered more than just a local peak, the boundary strength should be greater than a certain threshold – a certain number of standard deviations greater than the mean. Last note implies a phrase boundary. Results ● ● ● [1] Grouper, LBDM, GPR2, and IdyOM had highest F1 scores. Significant differences between these. Hybrid algorithm created to test for potential further performance increases. Analysis ● ● ● GPR2 had very good performance. Implies that rests have stronger influence on boundary definitions. Models that performed better took rests into account, whereas IDyOM included it with its probabilities. Future research: consider algorithm performance in terms of different musical contexts. Analysis Cont'd ● ● ● ● ● Hybrid performed the best. Logistic regression seems to improve the boundary prediction function. IDyOM did not use any rules of music theory, and did not train on pre-segmented data, but still performed well. Useful for songs that do not have pre-annotated boundaries. 'Expectedness' strongly related to boundary detection. Future Research: include entropy. Reference ● [1] D. Muellensiefen, M. Pearce and G. Wiggins, “A Comparison of Statistical and Rule-Based Models of Melodic Segmentation,” ISMIR 2008.