Download Visualization in Comparative Music Research

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Visualization in Comparative Music Research
Petri Toiviainen and Tuomas Eerola
University of Jyväskylä, Finland
1 Introduction
A great deal of research in musicology has concentrated on the analysis
and comparison of different musical styles, genres, and traditions. This
paradigm stems from the comparative and systematic musicology of late
19th century. Typical research questions in this area of inquiry involve the
evolution of a musical style, typical musical features in the works a composer, or similarities and differences across music traditions from various
geographical regions.
Research aimed at tackling these kinds of questions has traditionally
been based on visual analysis of notated scores (when these have been
available), or aural analysis of music recordings. While studies utilizing
these kinds of methods have undoubtedly shed light on similarities and differences on both temporal and spatial dimensions, they have two potential
limitations. First, visual or aural analysis of music is time-consuming, and,
consequently, studies utilizing these methods are necessarily based on relatively small sets of musical material, which may not be representative of
the musical styles or traditions in question. Second, these kinds of analysis
methods may be subjective, or prone to errors, both of which can hinder
the replicability of the study.
A possible way of overcoming these limitations would be to adopt a
computational approach. This would include the use of large digital collections of appropriate musical material, computational extraction of relevant
musical features from this material, and subsequent utilization of, for instance, statistical methods to the extracted musical features. Such computational approaches to the analysis of large collection of music have been
utilized since the 1980s (e.g. Marillier 1983, Vos and Troost 1989).
In addition to testing specific hypotheses concerning music, large musical collections can be used as material for exploratory research, the aim of
which is to find interesting structures within, or similarities and differences
between musical collections. To this end, methods of data mining can be
applied. The present paper aims at providing an overview of the computa-
tional approach to comparative music research. First, issues related to
forms of music representation, musical feature extraction, digital music
collections, and data mining techniques are discussed. Second, examples
of visualization of large musical collections are presented.
2 Music representations
There are several alternatives for digital representation of music. On a
general level, music representations can be divided in three categories
based on their degree of structuredness: (1) notation-based, (2) eventbased, and (3) signal representations. Notation-based representations (e.g.,
**kern, SCORE, GUIDO, NIFF, DARMS, Common Music Notation)
consist of discrete musical events like notes, chords, time values, etc., and
describe these events in relation to formalized concepts of music theory.
Event-based representations (e.g., MIDI, MIDI File) are somewhat less
structured than notation-based ones, containing information about the
pitch, onset and offset times, dynamics (velocity) and timbre (channel).
Signal representations (e.g., AIFF, WAV, MP3, AAC) result from audio
recordings and contain no structured information about music.
From the viewpoint of computational music analysis, each of the three
representation categories has its advantages and shortcomings. Notationbased and event-based representations are especially suitable for the investigation of high-level musical phenomena such as melodic, harmonic, and
tonal structure. Signal representations are best suited for the analysis of,
for instance, timbre, rhythmic structure and, to some degree, harmony and
tonality. Although limited success has been achieved in extracting instrument parts and melodic lines from music recordings (see Klapuri 2005),
this problem still waits to be solved.
For each of these three main representation types, there are tools available for computational analysis of music. For notation-based representations, perhaps the best known is Humdrum (Huron 1995), which is a versatile collection of UNIX-based tools for musicological analysis. For eventbased representations, the MIDI Toolbox (Eerola & Toiviainen 2004a),
containing about 100 functions for cognitively oriented analysis of MIDI
files, is available on the Internet1. With the IPEM Toolbox (Leman, Lesaffre, & Tanghe 2000), signal representations of music can be analyzed in
terms of, for instance, their spectral structure, roughness, tone onset structure, and tonal centres.
1
http://www.jyu.fi/musica/miditoolbox/
3 Musical databases
There is a relatively long tradition in organizing musical material into
various kinds of thematic collections. Barlow's and Morgenstern's (1948)
A Dictionary of Musical Themes contains the opening phrases of ca.
10,000 compositions, organized with a manner that allows searches based
on musical content.
The largest digital database of music is the RISM (1997) incipits database that was initiated in the 1940s and currently contains ca. 450,000
works by ca. 20,000 composers. The compositions are encoded in the database using a simple notation-based representation that includes pitch,
time value, location of bar lines and key and meter signatures. Musical databases that are freely available on the Internet, such as Melodyhound 2 and
Themefinder 3, are not quite as extensive, containing a few thousands of
items from the classical repertoire. On the web pages of Ohio State University one can find a few thousand classical works 4.
In the field of folk music, the most extensive collection is the Digital
Archive of Finnish Folk Tunes 5 (Eerola & Toiviainen 2004b), containing
ca. 9000 folk melodies and related metadata. Another extensive digital collection of folk music is the Essen Folk Song Collection (Schaffrath 1995)
that consists of ca. 6000 folk melodies of mainly European origin; also this
collection contains extensive metadata concerning each melody. The
MELDEX collection, mentioned above, contains ca. 2000 folk melodies
from the Digital Tradition collection, comprising traditional music mainly
from the British Isles.
Although the number of music recordings greatly exceeds the number of
notation-based or event-based representations of music, organized music
databases in signal representations are, as of yet, less common than databases in other representations. This is mainly due to the memory requirements associated with audio. However, the Variations2 project6 at Indiana
University aims at creating a digital music library that will contain the entire catalogue of Classical, Jazz, and Asian digital recordings of the recording company Naxos, consisting of about three terabytes of digital music information. In addition, the Real World Computing Music Database (Goto
et al. 2002) contains works in pop, rock, jazz, and classical styles in both
acoustical and event-based forms.
2
http://www.musipedia.org/
http://www.themefinder.org
4
http://kern.humdrum.net/
5
http://www.jyu.fi/musica/sks/index_en.html
6
http://dml.indiana.edu
3
4 Musical feature extraction
In comparative research based on musical databases, the first step in the
investigation is to extract relevant features from the musical material. The
choice of features to be extracted is mainly dictated by the type of representation of the musical material at hand, and the research questions one
aims to study. As indicated before, the set of musical features that can be
reliably extracted with computational algorithms depends on the type of
music representation. On a general level, the features could be divided into
low-level features related to, for instance, spectrum, roughness, and pitch,
and high-level features such as texture, rhythmic, melodic, and tonal structure. Another distinction can be made between temporal and static features. Temporal features represent aspects of sequential evolution in the
music; examples of such features include the melodic contour vector (e.g.
Juhasz 2000) and the self-similarity matrix (Cooper & Foote 2002). Static
features are overall descriptors of the musical piece collapsed over time,
such as spectrum histograms (e.g. Pampalk, Dixon, Widmer 2004), statistical distributions of pitch-classes, intervals, and time values (Ponce de
León, Pérez-Sancho, & Iñesta 2004, Eerola & Toiviainen 2001), as well as
periodicity histograms (Dixon, Pampalk & Widmer 2003, Toiviainen &
Eerola, 2006). An overview of the state of the art in computational feature
extraction of music can be obtained at the ISMIR (International Conference for Music Information Retrieval) website7.
The musical feature extraction process results in a musical feature matrix M = (mij). This is an N x M matrix, in which each of the N musical
items is represented by an M-component feature vector. This is the starting
point of subsequent analyses.
5 Data mining
Depending on the research approach, the obtained musical feature matrix
can be subjected to either confirmatory of exploratory data analysis. If one
has specific hypotheses concerning, for instance, aspects in which two musical collections differ, these can be tested using a deductive approach, that
is, using inferential statistics. If, however, there are no clear hypotheses
concerning the data, an inductive, exploratory approach can be adopted.
The aim of this latter approach is to find interesting structures in the data
set, such as clusters, trends, correlations, and associations, as well as to
find questions (rather than answers), and create hypotheses for further
7
http://www.ismir.net/all-papers.html
study. To this end, methods of data mining can be useful. Data mining can
be described as a collection of methods for exploratory analysis of large
data sets. Central methods utilized in data mining include projection, clustering, estimation, and visualization. Each of these methods is summarized
below.
5.1 Projection
In many cases, the musical feature matrix has a large number of feature
dimensions. To reduce the number of feature dimensions, various methods
of projection can be applied. The various projection methods differ in
terms of their criteria for the choice of projection direction in the highdimensional space. Typical methods used for dimensionality reduction include the following:
• Principal Components Analysis (PCA). The PCA is a standard projection method that uses maximal variance as the projection criterion, and
produces orthogonal projection directions.
• Independent Component Analysis (ICA; Hyvärinen, Karhunen & Oja
2001). The ICA utilizes a latent variable model to project the data onto
statistically independent dimensions.
• Fisher Discriminant Function (FDF). If the data consists of items belonging to different classes, and the class labels are available, the FDF
can be used to project the data onto dimensions that maximize the ratio
of between-class variance to within-class variance, thus resulting in projections that produce maximal separation between the classes.
• Projection Pursuit (PP; Friedman 1987). The PP attempts to find projection directions according to a criterion of "interestingness". A typical
such criterion is that the distribution of the projected data be maximally
non-Gaussian.
• Self-Organizing Map (SOM; Kohonen 1995). The SOM utilizes an
unsupervised learning algorithm to produce a non-linear projection of
the data set that maximizes the local variance.
The projections obtained by each of the aforementioned methods can be
visualized to allow exploratory study of the data. Moreover, the projection
directions themselves contain information about the musical features that
are significant for the projection criterion of the particular projection
method.
5.2 Clustering
If the musical collection under investigation is large, it is often useful to
reduce the amount of information by representing the items by a smaller
number of representative exemplars. To this end, various clustering methods are available.
• Hierarchical Clustering methods proceed successively by merging small
clusters into larger ones. This results in a tree of clusters referred to as
the dendrogram, which shows how the clusters are related.
• Partitional Clustering methods attempt to decompose the data set into a
predefined number of clusters. This is usually carried out by minimizing
some measure of dissimilarity between the items within each cluster, or
maximizing the dissimilarity between the clusters. An example of partitional clustering methods is k-means clustering.
• The Self-Organizing Map (SOM), in addition to performing a non-linear
projection of the data set, carries out clustering by representing the data
set using a reduced set of prototype vectors. The combination of projection and clustering makes the SOM particularly suitable for data visualization.
5.3 Estimation
Musical feature matrices with high feature dimensions (M) can be visualized as, for instance, scatter plots on two (or three) projection directions. If
the number of items (N) is large, it may, however, be difficult to observe
the structure of the data set due to extensive overlapping of markers. In
other words, it is possible that one observes mainly the outliers rather than
the bulk of the data. This problem may be overcome by estimating the
probability density of the projected data set with a nonparametric method,
such as kernel density estimation (Silverman 1986). Kernel density estimation is carried out by summing kernel functions located at each data point,
which in the present case comprise the projections of each musical feature
vector. The kernel function is often a (one- or two-dimensional) Gaussian.
The result of the estimation is a smooth curve or surface – depending on
the dimensionality of the projection – the visualization of which may facilitate the observation of interesting structures in the data set.
6 Examples of visualization of musical collections
This chapter presents examples, in which methods of musical feature extraction, projection, clustering, and estimation have been applied to musical collections.
6.1 Pitch-class distributions and SOM
Pitch-class distributions can be used, for instance, to infer the key and the
mode of a piece of music. They also enable a more detailed analysis on the
importance of different tones in the musical material. Fig. 1 displays the
component planes of a SOM with 12 x 18 cells that was trained with the
pitch-class distributions of 2240 Chinese, 2323 Hungarian, 6236 German,
and 8613 Finnish melodies. The musical feature matrix used to train the
SOM thus had 19412 x 12 components. Each of the 12 subplots of the figure corresponds to one pitch-class, from C to B, and the colour displays
the value of the respective component in the cells' prototype vectors, the
red colour standing for a high value, and the blue colour for a low value.
For instance, the lower left region of the SOM contains cells with prototype vectors having high values for the pitch classes G and A. Consequently, melodies in which these pitch-classes are frequently used are
mapped to this region.
Fig. 1. The component planes of a SOM trained with pitch class distributions of
19412 folk melodies.
Differences in the pitch-class distributions between the collections can
be investigated by visualizing the number of melodies that are mapped to
each cell. This is shown in Fig. 2. As can be seen, the melodies of each
collection occupy to a great extent different regions on the map, suggesting
that there are significant differences in the pitch-class usage between these
collections.
Fig. 2. Number of melodies mapped to each cell of the SOM of Fig. 2 for each of
the four collections.
6.2 Metrical structure and PP
Most music exhibits a hierarchical periodic grouping structure, commonly
referred to as meter. The metrical structure of a piece of music can be represented by, for instance, an autocorrelation-based function (Brown 1993,
Toiviainen & Eerola 2006).
Fig. 3a displays a visualization of metrical structures in a collection of
Finnish folk melodies. To obtain the visualization, 8613 melodies from the
Digital Archive of Finnish Folk Tunes (Eerola & Toiviainen 2004b) were
subjected to autocorrelation analysis, using the method of Toiviainen &
Eerola (2006). This resulted in 32-component autocorrelation vectors, representing the metrical structure of each melody in the collection. Subsequently, PP and kernel density estimation were applied to the projection.
Fig. 3. Visualization of metrical structures in (a) the Digital Archive of Finnish
Folk Tunes, and its (b) Folk songs and (c) Rune songs subcollections.
The obtained probability density shows an interesting structure, with
three arms growing from the central body. Inspection of the projection directions suggests that the three arms can be associated with the 2/4, 3/4,
and 5/4 meters. Probability densities for the folk song and rune song subcollections (Figs. 3b-c) imply differences between the distributions of meters within these subcollections.
6.3 Melodic contour and SOM
Melodic contour, or the overall temporal development of the pitch height,
is one of the most salient features of a melody (Dowling 1971). Some
shapes of melodic contour shapes have been found to be more frequent
than others. For instance, Huron (1996) investigated the melodies of the
Essen collection and found that an arch-shaped (i.e. ascending pitch followed by descending pitch) contour was the most frequent contour form in
the collection. The SOM can be used to study and visualize typical contour
shapes. Fig. 4 displays a SOM with 6 x 9 cells that was trained with 64component melodic contour vectors. The material consisted of 9696 melodic phrases from Hungarian folk melodies and 13861 melodic phrases
from German folk melodies. The musical feature matrix thus had 23557 x
64 components. The prototype vectors of the obtained SOM are displayed
in Fig. 4. As can be seen, the arch-shaped contour is prevalent on the right
side of the map, but the left side of the map is partly occupied by descending and ascending contours.
Fig. 4. The prototype vectors of a SOM trained with 23557 melodic contour vectors.
To compare the distribution of contour types between the two collections, the number of melodic phases mapped to each cell can again be displayed. This is done in Fig. 5. As can be seen, the arch-shaped contour
types are somewhat more prevalent in the German collection than in the
Hungarian, whereas the opposite holds true for the descending contour
types.
Fig. 5. Number of melodic phrases mapped to each cell of the SOM of Fig. 5 for
both collections.
6.4 Spatial estimation of musical features
If a musical database contains precise information about the geographical
origin of each musical piece, geographical variation of musical features
can be studied by applying methods of spatial estimation. Aarden and
Huron (2001) created visualizations of the geographical variation of various musical features in the Essen collection. The Digital Archive of Finnish Folk Music contains detailed geographical information about the origin of each tune. Fig. 6 shows visualizations obtained using this
information and kernel density estimation.
Fig. 6. (a) The proportion of melodies in minor mode in different regions of Finland. The red colour denotes a high proportion and the blue colour a low proportion. (b) The proportion of melodies starting with a tonic.
Fig. 6a displays the geographical variation of the proportion of melodies in
minor mode in Folk song subcollection (N = 4842). As can be seen, melodies in minor are significantly more prevalent in the northeast than they are
in the southwest. Fig. 6b displays the proportion of melodies that start with
the tonic. The highest proportion of such melodies is in the western part of
the country.
7 Conclusion
This article has provided an overview of visualization methods in comparative music research. The application of computational methods to the
investigation of large musical collections has the potential to afford insights into the material that would be difficult to obtain through manual
analysis of musical notations, or aural analysis of recorded material. It also
avoids the pitfalls of traditional methods by allowing one to study larger,
and thus more representative, sets of musical material with objective
methods.
Explorative investigation of properly visualized collections may help to
discover interesting structures, such as clusters, trends, correlations, and
associations in various musical feature dimensions. These can again create
hypotheses for further studies, in which additional methodologies can be
used.
References
Aarden B, Huron D (2001) Mapping European folksong: Geographical localization of musical features. Computing in Musicology 12:169-183
Barlow SH, Morgenstern S (1948) A Dictionary of Musical Themes. Crown Publishers, New York
Brown JC (1993) Determination of meter of musical scores by autocorrelation.
Journal of the Acoustical Society of America 94:1953–1957
Cooper M, Foote J (2002) Automatic Music Summarization via Similarity Analysis. In Proceedings of the 3rd International Conference on Music Information
Retrieval (ISMIR 2002), pp. 81-5
Dixon S, Pampalk E, Widmer G (2003) Classification of dance music by periodicity patterns. Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR'03), pp. 159-165
Dowling WJ (1971) Recognition of inversions of melodies and melodic contours.
Perception & Psychophysics 9:348-349
Eerola T, Toiviainen P (2001) A method for comparative analysis of folk music
based on musical feature extraction and neural networks. In: H Lappalainen
(ed) Proceedings of the VII International Symposium of Systematic and
Comparative Musicology and the III International Conference on Cognitive
Musicology. University of Jyväskylä
Eerola T, Toiviainen P (2004a) MIDI toolbox: MATLAB tools for music research.
University of Jyväskylä, available at: http://wwwjyufi/musica/miditoolbox
Eerola T, Toiviainen P (2004b) The Digital Archive of Finnish Folk Tunes
Jyväskylä: University of Jyväskylä, available at: http://wwwjyufi/musica/sks
Friedman JH (1987) Exploratory projection pursuit. Journal of the American Statistical Association 82(397):249-266
Goto M, Hashiguchi H, Nishimura T, Oka R (2002) RWC Music Database: Popular Classical and Jazz Music Databases. In Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR 2002), pp. 287-288
Huron D (1995) The Humdrum Toolkit: Reference Manual. Center for Computer
Assisted Research in the Humanities, Menlo Park, CA
Huron D (1996) The melodic arch in Western folksongs. Computing in Musicology 10:3-23
Hyvärinen A, Karhunen J, Oja E (2001) Independent Component Analysis. John
Wiley & Sons, New York
Juhász Z (2000) Contour analysis of Hungarian folk music in a multidimensional
metric-space. Journal of New Music Research 29(1):71-83
Klapuri A (2005) Automatic music transcription as we know it today. Journal of
New Music Research 33(3):269-282
Kohonen T (1995) Self-organizing maps. Springer-Verlag, Berlin
Leman M, Lesaffre M, Tanghe K (2000) The IPEM toolbox manual. University of
Ghent, IPEM
Marillier CG (1983) Computer assisted analysis of tonal structure in the classical
symphony. Haydn Yearbook 14:187-199
Pampalk E, Dixon S, Widmer G (2004) Exploring music collections by browsing
different views. Computer Music Journal 28(2):49-62
Ponce de León PJ, Pérez-Sancho C, Iñesta J M (2004) A shallow description
framework for music style recognition. Lecture Notes in Computer Science
3138:876-884
RISM (1997) Répertoire international des sources musicales: International inventory of musical sources. In Series A/II Music manuscripts after 1600 [CDROM database]. K. G. Saur Verlag, Munich
Schaffrath H (1995) The Essen folksong collection in kern format [computer database]. Edited by D Huron. Center for Computer Assisted Research in the Humanities, Menlo Park, CA
Silverman BW (1986) Density Estimation for Statistics and Data Analysis. Chapman and Hall, London
Toiviainen P, Eerola T (2006) Autocorrelation in meter induction: The role of accent structure. Journal of the Acoustical Society of America 119(2):11641170
Vos PG, Troost JM (1989) Ascending and descending melodic intervals: statistical findings and their perceptual relevance. Music Perception 6(4):383-396