Download PDF

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Microsoft Access wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational algebra wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

PL/SQL wikipedia , lookup

Versant Object Database wikipedia , lookup

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
A Musical Regular Expression Matching
System for Relational Databases
John Lane
William Punch
June 21, 2001
Abstract
3.2
3.3
Transposition invariance
Rhythmic-notational
Tempo invariance . . . .
The Case for Clusters . .
Regular Expressions . .
Relational Schema . . .
Dealing with MIDI . . .
. . .
and
. . .
. . .
. . .
. . .
. . .
We present a large scale model for a database
of multi-track polyphonic songs searchable
3.4
via a modified regular expression format and
3.5
stored in a relational database. The regu3.6
lar expression format is of our own design
3.7
and will likely be familiar to those with experience using common text search utilities. 4 The Gory Details
The relational database is a standard, off-the4.1 Relational Schema, Part II
shelf SQL-based database management sys4.2 Breaking up a Query . . .
tem. The songs have been decomposed into
4.2.1 Class Definitions .
a relational model and searching is done via a
4.3 Generating SQL Queries .
translation between our query language and
4.4 Indexes . . . . . . . . . . .
SQL.
4.5 Computational Analysis .
Keywords: musical regular expression,
relational database
5 Results
Contents
.
.
.
.
.
.
.
.
.
.
.
.
4
4
5
5
8
9
10
10
11
11
11
12
13
15
6 Conclusion
16
7 Appendix
18
1 What is needed?
1
2 State of the Art
2
3 What could be?
3.1 String Matching . . . . . . . .
3 Musical analysis, algorithmic composition
3 and even query-by-tune have great music cog-
1
1
What is needed?
2
nition needs that parallel those of text cognition. Text cognition is based heavily on
language theory and text cognition systems
are rooted in regular languages and the regular expressions which alternately recognize or
generate them. Viewing music as a language,
if we hope to move beyond simple matching
of music and into cognition of it we must begin with the application of language theory.
The simplest set of languages are the regular
languages – those recognized by finite state
automata; on these our search system will be
based. [9]
State of the Art
There are several music search systems and
prototypes in existence today, including Semex [1], Meldex [5], Themefinder [6] and several others lacking short monikers [3].
For purposes of comparison, we restrict our
attention to those which parallel our use of
digital scores and our handling of polyphony,
not considering analog-based systems, and
those which disregard polyphony, as these
systems tend to be quite different internally.
We will focus on each system’s query format,
database schema, and respective advantages
and disadvantages. These restrictions set our
focus on Semex [1] and a system by Uitdenbogerd & Zobel [3].
Semex’s query format consists of a string
of lettered pitches which are in turn converted into a lettered pitch and subsequent pitch intervals (the distances between
pitches). Semex’s database scheme is based
around a “fast bit-parallel” algorithm for
what amounts to a computation of the edit
distance (the number of changes necessary
to transform a one string into another) between a potential match and a given pattern. Strings (of pitches) having the smallest edit distance to a given pattern are considered best matches. The database itself
is an index of various score files residing in
memory. Polyphony is handled in one of two
ways: either by reducing any query or piece
to monophony by only considering the highest pitch of any would-be chord, or by successive, combined monophonic searches. Semex
has the advantage of being small and fast,
but currently lacks a proper long-term stor-
Vast amounts of data are tamed on a daily
basis by relational databases and the relational models which back them. We show
here that these same relational models allow
for quick searching and much flexibility in the
case of large databases of music as well. Combining these ideas, we present here a regular expression query language for searching
of multi-track, polyphonic music and a relational database to provide a quick and efficient backend for it.
The paper is constructed as follows. We
begin in the next section by discussing some
existing search systems and the various query
and database formats these systems use. In
section three we discuss possibilities for query
and database schemes, present the construction of our own and prove various properties of it. Section four contains the technical
details of our query language and database
system, and proves various claims regarding
each. Section five presents empirical results
from our implemented system, and section six
concludes.
2
age and indexing method. Please refer to [1]
for further details.
The Uitdentbogerd & Zobel system
presents a new method for indexing melodies
and handles polyphonic indexing by simply
considering the highest pitch of any given
chord (as was the case with the first of
Semex’s methods). This method has the
distinct and obvious disadvantage of returning many false matches to any given query.
Please see [4] for more details.
As best as can be determined, neither prototype handles the encoding of rhythmic information such as note durations or tempos.
We make a number of observations and
conclusions:
first, only Semex handles
polyphony fully (via its second method of
repeated pitch matching) while maintaining
transposition invariance, and Semex is only
able to do so in an awkward manner. Thus,
polyphony presents a special challenge (as
we will see again in section 3.4) particularly
for database schema using relative pitches
and/or durations.
Second, both provide entirely separate
schemas and sets of algorithms for exact
matching and inexact (fuzzy) matching, thus
limiting the user to one or the other, or forcing the database maintainer to implement
both. Ideally, query languages should be flexible enough to handle a wide range of uses,
including exact and inexact matching and
polyphony.
Lastly, some manner of long-term indexing is necessary to manage data extraction
in large music databases. Further, industrial
size (every piece of music ever written, for
example) requires industrial strength storage,
retrieval and indexing systems, and this is exactly the reason relational databases were developed. To use a relational database, however, one must first develop an efficient search
system in the relational model. We will address this more fully in section 3.6 and refine
this idea for practical use in section 4.1.
3
3.1
What could be?
String Matching
Application of modern string matching techniques to music is well-covered in the literature. [2] Two of the more popular techniques
(mainly so because they allow sub-linear time
matching of multiple strings to a pattern), are
the Aho-Corasick tree-based matching technique, and various suffix tree-based methods.
A good treatment of these two and more is
available in [7].
These two, in their general forms, require
a memory model with pointers (in order to
form a tree structure). Though there exist non-pointer-based variants (e.g., suffix arrays), these are not nearly as flexible. Unfortunately pointer-based structures are inefficiently represented in a relational model.
For example, a tree can be represented in
a table, each row being, a character (or a
word). However, assuming a standard index
on the table, a traversal of one node now requires O(log(n)), where n is proportional to
the number of rows in the table (i.e., the number of nodes in the tree). (We will cover indexes in more depth in section 4.4.)
This means an average of O(log2 (n)) to
3
traverse the tree stored in a table. If N is
proportional to the number of characters (or
words) of input encoded in the tree, given a
suffix tree’s O(N 2 ) tree size, the O(log2 (n))
search time is O(log2 (N 2 )) or, asymptotically, O(log2 (N )). This is actually worse
than simply indexing a linked list structure
and finding a match using it! This simpler
method would amount to O(k∗log(N )) where
k is the length of the desired match.
When representing music within a relational database, pointer-based structures can
generally be said to be inefficient in terms of
storage space and processing time. This fact
removes many pointer-based algorithms from
our arsenal of potential solutions.
Object-oriented databases provide hope for
long-term storage and quick (index-based) retrieval of these types of pointer-based structures, and this may well prove to be a fruitful area of research. [8] However, the current
generation are highly priced and are generally
add-ons to existing RDBMS packages. We
restrict our focus in this paper, therefore, to
standard RDBM systems.
the same onset time and could have different durations. One could sort the pitches for
a given onset and accept a vector of relative
pitches for that onset to the next, but even
after one handles the problem of mismatched
set sizes (between onsets), this would prevent
partial chord matches because any addition
or deletion from a set effectively changes its
order and thereby changes the vector of relative pitches representing it.
Another possibility is to perform, as above,
repeated searches of all possible transpositions. If the system only encodes 12 different pitches (one octave), this becomes much
more feasible.
We make the assumption henceforth that
this is how transposition invariance is handled. Instead of pushing this complexity into
the search engine itself, this places it on the
outside; keeping the engine as simple and fast
as possible.
3.2
Rhythmic-notational invariance simply implies invariance over the different notations
for encoding beat divisions (e.g., different
time signatures). We assume henceforth that
this is handled by encoding note durations as
divisions of a beat (instead of a “note”).
Tempo invariance, or providing for the
matching of queries to pieces which would
match if their tempos were modified (i.e., if
all their durations were scaled by some rational number), is more difficult. It can be
provided for in the same manner as transpo-
3.3
Transposition invariance
Transposition invariance, or providing for the
matching of queries to transposed pieces,
is, in all cases studied above, accomplished
through the use of storage of relative pitches
– as changes in pitch from a previous (or subsequent) note.
This works wonderfully for monophony.
However, the introduction of polyphony begs
the question of which pitch another should
be relative to, since pitches that may have
4
Rhythmic-notational
Tempo invariance
and
sition invariance – using relative durations.
This, of course, suffers the same problems
described above for transposition invariance
when polyphony is allowed.
We therefore henceforth assume a similar
scheme in which all reasonable tempo multiples are tried until a match succeeds (and
while the query agent deems this step useful). The lack of different possibilities for
non-beat-transforming tempo variation (generally only a doubling or halving of beat value
is reasonable since each requires a doubling or
halving of the tempo, which becomes quite
unnatural) makes this a reasonable solution.
3.4
Polyphony is unrestricted in that at any
beat offset b, any set Cb of any number of
(pitch, duration) two-tuples (pitch from P
and duration from Q) may occur. Again, a
set of clusters Cb , ordered by their offset defines a piece.
In other words, polyphony requires the introduction of an extra layer of complexity: a
set of (pitch, duration) two-tuples (as defined
above) can begin at any piece offset. Here the
duration portion of the two-tuple is necessary
to provide the needed freedom for note overlap, whereas this was not necessary in the
monophonic case.
We assume henceforth that P is a set of
non-negative integers, less than 128, mapping
to a pitch in the equitempered scale as defined by the MIDI standard [10], and that
the durations are represented by positive integers using a scheme in which the number
120 has the value of one beat (again, taken
from MIDI). Each of these conveniences limits the space of possible pitches, durations
and therefore note expressions, but as can be
seen within MIDI itself, the limitation is not
significant.
These sets of note onsets, or note onset
clusters, as defined above are a simple way
to manage polyphony, while maintaining full
expressiveness.
The Case for Clusters
We now lay out our structure for storing
polyphonic music data.
We begin with
monophony and extend the structure for handling it to handle polyphony. Monophony is
a restricted form of polyphony in which a set
(or cluster) of cardinality one, Cb , of notes
from a set of pitches P (depending upon the
scale being used) occurs in a piece at a beat
offset b, where b is from the set Q of rational
numbers. Notes are generally (pitch, duration) two-tuples, but since Cb contains all the
information needed regarding the onset time
of the note (Cb ), the duration portion of the
tuple is unnecessary. A set of notes (or singlenote note onset clusters) Cb , ordered by their
offsets defines a piece.
In other words, monophony requires only
a set of (pitch, offset) pairs (or an ordered
set of (pitch, duration) pairs) to form a
piece. Polyphony introduces additional layers of complexity.
3.5
Regular Expressions
To describe music as a language of emotion is
nearly a cliche; even so, as so many have already stated, music is heavily based on grammatical productions.
While music does differ significantly from
5
text and demands its own treatment, it still
seems only natural that, if progress in music cognition is to achieve anything of what
text cognition has, language theory must be
applied to it in a meaningful way. Regular expressions, or languages generated by Chomsky type three grammars [9], form the basis
of modern language processing and analysis.
We create a regular expression query language for music by successive transformation of a known regular expression format for
character or text recognition as follows. We
begin with the following grammar:
The above language (grammar) for generating regular expressions is useful to us if
and only if it is capable of generating a set
of regular expressions which is in turn capable of generating any regular language. We
set about proving that it is capable of this by
first defining just what a regular expression is,
and then moving on to prove that it is indeed
capable of generating any such expression.
A recursive definition from a standard undergraduate theory textbook [9] for regular
expressions over an alphabet Σ and their corresponding language is as follows:
start:
pattern_chain(s)
pattern_chain: or_pattern(s) |
pattern(s)
or_pattern:
pattern(s) ’|’
pattern(s)
pattern:
’(’ pattern_chain
’)’ qfier |
’(’ cluster_chain
’)’ qfier
char_chain:
char(s)
qfier:
’{’ int ’,’
int ’}’ |
’’
char:
/[a-zA-Z]/
int:
/[0-9]+/
1. ∅, corresponding to the empty language
2. Λ, corresponding to the language {Λ}
3. For each symbol a ∈ Σ, the corresponding to the language {a}
4. For any regular expressions r and s over
Σ, corresponding to languages Lr and
Ls , respectively, each of the following is a
regular expression over Σ, corresponding
to the language indicated.
(a) (rs), corresponding to Lr Ls
(concatenation)
(b) (r + s), corresponding to Lr ∪ Ls
(union)
For which a typical expression might be of
the form:
(c) (r∗ ), corresponding to L∗r
(repetition)
((a){0,.}|(b){1,.}(c){1,.}){1,.}
corresponding to the following regular expression:
5. Only those expressions generated by one
of the above are regular expressions over
Σ.
(a*|b+c+)+
6
The first two rules are useful in proving
duration:
int ’/’ int |
theorems about regular languages but not as
int |
useful in generating them. The first case
/(\.)/
is handled by any expression of the form
int:
/[0-9]+/
(){0, 0}. Not very useful, but nevertheless
Plugging into our regular expression gramconformant. The second case is handled by
mar
above and modifying slightly to handle
any expression of the form (){1, 1}. Again,
not very useful, but still obtainable using the clusters instead of characters yields the following:
grammar.
The third is handled by any expression of
start:
pattern_chain(s)
the form (a){1, 1}.
pattern_chain: or_pattern(s) |
Part (a) of the fourth, or concatenation,
pattern(s)
is handled by the form (r){1, 1}(s){r, r}.
or_pattern:
pattern(s) ’|’
Part (b), or union is handled by the form
pattern(s)
(r){1, 1}|(s){1, 1}. Part (c), or repetition is
pattern:
’(’ pattern_chain
given by (r){0, .}.
’)’ qfier |
All four necessary components of the defi’(’ cluster_chain
nition having been satisfied (having been gen’)’ qfier
erable), it must be that the given language is
cluster_chain: cluster(s)
capable of generating an expression languageqfier:
’{’ int ’,’
equivalent to that generated by any regular
int ’}’ |
expression; syntax may differ, but the set of
’’
generable languages is equivalent.
cluster:
’[’ note(s) ’;’
The flexibility of the {min,max} notation is
duration ’]’ |
obviously not strictly necessary for a regular
’[’ note(s) ’]’
expression, but becomes useful particularly
note:
’<’ pitch ’,’
when dealing with highly repetitive data such
duration ’>’
as that represented in music.
pitch:
/[a-gA-Gs]+/
One possible grammar for cluster (as deduration:
int ’/’ int |
fined above), note, pitch and duration repreint |
sentation is as follows:
/(\.)/
cluster:
note:
pitch:
int:
’[’ note(s) ’;’
duration ’]’ |
’[’ note(s) ’]’
’<’ pitch ’,’
duration ’>’
/[a-gA-Gs]+/
/[0-9]+/
As a simple extension of the above regular expression grammar, it itself must also be
capable of generating the same set of regular expressions; this time for music (for clusters). It also allows for the full expressibility
7
pitch
{1,m}
Note
Cluster
Note
o id
o id
o pitch
o note
o duration
Track−Cluster Track
o onset
o id
o track
o piece
o piece
o name
o patch
o cluster
o tempo
Piece
o id
o title
onset
(delta)
duration
Cluster−
{1,n}
Cluster
Notes
{1,m}
Track−
Cluster
{1,m}
{1,m}
Piece
Piece−
{1,1}
tempo
Track
Track
title
name
patch
Figure 2: Decomposed Tables for ClusterFigure 1: E/R Diagram for Cluster-based based Music Database
Music Database
id (piece)
→ title
of clusters and since, as shown above, cluspiece (id),
ters in turn allow for the full expressibility of
id (track)
→ name, patch, tempo
any polyphonic piece, the language must also
onset, track (id),
therefore be fully musically expressible.
piece (id)
→ cluster
id (note)
→ pitch, duration
3.6 Relational Schema
id (cluster)
→ note
Relational databases require a relational Figure 3: Dependencies for Cluster-based
schema. Entity/Relationship diagrams are Music Database
a common, readily-understood form of representation for these schema and we present
such a diagram in figure 1. [8]
necessary; a track relates to at least one cluster and as many as necessary. A track relates
to one and only one piece, and a piece must
We begin with our “bag” of entities: Note, have at least one track, but can have as many
Cluster, Track, and Piece; each with their as needed. The (candidate) keys of each entity are their underlined attributes. Decomrespective attributes.
We then define three relations joining the position into tables from the diagram is fairly
four entities. As in the diagram in figure 1, a simple, and is shown in figure 2, with the cornote can relate to any number of clusters; a responding dependencies shown in figure 3.
Since there exist no multi-valued depencluster must have at least one note, but can
relate to as many as necessary. A cluster re- dencies, full decomposition requires that the
lates to at least one track, and as many as tables be in the so-called Boyce/Codd Nor8
mal Form, or BCNF [8]. First normal form
is trivially reached: clearly in every legal
value of every relvar, every tuple contains
one value. Second normal form is already obtained because every non-key attribute is irreducibly dependent on the keys. The set of
relvars is in third normal form because each
non-key is directly dependent upon the key.
Finally the set is in BCNF because the determinants of the relations are the candidate
keys.
With our decomposed tables in hand, we
move to dealing with data to propagate them.
3.7
music encoded in common Western musical
notation with notes durations being formed
by beat multiples. The beat notation is a
useful one for an internal duration since it is
notation (time signature) invariant.
However, most MIDI music on the web, is
“played” at a keyboard, meaning that there is
some human being pushing keyboard keys to
produce MIDI events on a MIDI keyboard or
synthesizer. Invariably the beat value played
by a keyboard player on her keyboard will
differ from its ideal notational equivalent; a
high number of ticks per beat is allowed by
MIDI for expressiveness (e.g., notes can be
“held” or shortened for emphasis or effect),
however, a musician’s (emotional) “expressiveness” is simply imprecision from the perspective of one creating a music database and
this imprecision haunts us when we try to
make sense of what was meant, notation-wise,
by the human-generated events.
Various methods exist for dealing with this
problem, but the most common in use today is known as note duration “quantization” and works by mapping each respective note duration of an existing piece onto
a member of a smaller, fixed set of durations
which correspond to well-known note durations (e.g., correspond to a quarter-note or
a dotted thirty-second). This process is most
often done using an approximation algorithm
– one which chooses the note from the fixed
set which is closest in duration to the value at
hand. Musicologically-minded methods exist for eliminating some notes from the fixed
set through the use of domain knowledge of
what note durations could possibly appear in
certain situations, but rules for doing so de-
Dealing with MIDI
In any music database the encoding of notes
and durations is of major import; indeed it
is this very encoding which determines the
search properties of the system. (For example, the transposition and tempo invariance
properties discussed in the sections above.)
Because our database will use MIDI [10] files
as input, we begin by explaining our format
choices and the problems inherent in dealing
with MIDI.
The MIDI format for encoding note event
durations is based upon a tick value. There
exists some (modifiable) number of microseconds per tick. These ticks form the basis of
another unit: beats; the number of ticks per
beat is also controllable but is generally (simply by convention) set to be 96. Higher tickper-beat values (such as 96) provide for sufficient resolution to allow for widely varying
note durations and subsequent musical expressive ability.
Beats, in turn, are the underlying value for
9
pend greatly on the type of music being evaluated. In order to handle any type of music, we will assume, henceforth, that a generic
approximation-based quantization algorithm
is used to some fixed, smallest duration value
(e.g., a thirty-second note) without regard to
the type of music being evaluated.
Clearly there will be times when such a system makes an error in its guess as to which
duration was intended by the player, and this
creates great problems for any music matching system based on such quantized data. We
will assume henceforth that this quantization problem in solved in one of two ways.
The first is simply allowing the user to enter a wildcard for a duration (i.e., a duration
which matches any duration). The second is
to allow the user to specify some ‘fuzz’ value
for their duration match, such that if a given
note duration within the database is within
given number of ‘fuzz’ units to the note duration being sought, it is allowed to match.
Having dealt with the more theoretical considerations of a musical regular expression
matching system, we now use these as we
turn our attention to implementation details
of our system.
4
The Gory Details
Full decomposition minimizes redundancy
but does not necessarily maximize performance. Here we will introduce some alterations in the above schema to improve performance through techniques such as the addition of non-essential keys (e.g., identity values) and the creation of redundant tables or
Note
o id
o pitch
o duration
Score
o id
o offset
o delta
o track
o piece
o cluster
o next
Cluster
o id
o note
o ncount
Track
o id
o piece
o name
o patch
o tempo
Piece
o id
o title
o md5sum
o url
o ntracks
Figure 4: Decomposed Tables for Clusterbased Music Database
the storing of redundant values in different
tables.
4.1
Relational Schema, Part II
In the implemented version of our database
we have chosen, for reasons of efficiency and
ease of querying, the layout shown in figure 4. The changes mainly apply to what
was the Tracks-Cluster table – what is now
the Score table. We have added a field to
uniquely identify any offset and corresponding cluster in any track.
Some changes were also made to the Piece
table, mainly for the purposes of adding additional information since, practically speaking, some pieces are without a unique name.
Finally, to least attempt to avoid duplicate
piece entry, we add an MD5 checksum on
MIDI input files.
10
4.2
Breaking up a Query
generation.
Data insertion, given that the database can
handle full polyphony, is straightforward and
in deference to brevity will not be covered
here. We move to the more important task
of finding the result set of a query. Let us
assume, for the purposes of explanation, that
we have a populated database.
The first step is defining exactly what we
would like from a result set. What we would
like for each match is a (start, finish) twotuple, both start and finish being a score id,
as detailed above. While useful from an enduser (or higher level) perspective, this scheme
also makes it easy to pass intermediate answer sets between query segments allowing
for a good deal of modularity.
The first step in searching is the parsing or
breaking up of a query into searchable bits
and pieces. This breaking up is most easily done using a parser generator (recursive
descent or bottom-up) in concert with the
grammar as laid out above, and appropriate data structures, in our case, in the form
of classes. Concise definition of these classes
makes parsing the query straight-forward.
4.2.1
Class Definitions
Most notably, each of the defined classes has
a method named sql which is used by higher
levels in the parse tree of the query string for
producing SQL suitable for finding a match
of its own instantiation in the database. To
avoid clutter, the class hierarchy is printed
in the appendix of this document; please refer to it there for the next section on SQL
4.3
Generating SQL Queries
Building an SQL query from a query string is
straight-forward when viewed from the perspective of the parser. As alluded to in
the previous paragraph, the parser instantiates objects of the appropriate class to represent objects encountered in the parse tree
of a query string. When finished, it has a
Query object with a single PatternChain in
tow forming the root of a tree of instantiated
classes representing the given query.
More concretely, in order to parse a query
string, the Query object actually creates
a Parser object and uses it to parse the
given query string. If the Parser object
is successful in building the tree of objects described above, it returns the aforementioned a PatternChain object; SQL is
obtained through its sql method; the name
of the table of final results is obtained via
its table name method. Thus, running the
query is simply a matter of passing the generated SQL to the the database via the Mdb
class and performing the necessary action
with the result set (e.g., user formatting).
Internally, the task of generating SQL can
be somewhat complex but generally works using the interfaces mentioned directly above
with each object being responsible for its own
SQL generation. Once formed into a tree by
the parser, all one need do is traverse the
tree correctly (depth first, in this case), calling the sql method and, where necessary,
table name method to piece together successive queries.
11
We treat the respective classes here from
the bottom (of a query) up with a mind to
cover enough implementation detail to provide a base for further work, but not so much
as to unduly encumber the reader.
The ClusterChain class acts as a container for an ordered set of Clusters which
must appear in series. ClusterChain produces SQL to generate a temporary table of
(start, finish) pairs matching the chain,
or series of clusters. It does this by calling its
constituent Cluster’s sql method and using
the results.
A Pattern acts as a container for anything that can appear within a query (within
parentheses). That is to say, a series of
PatternChain’s, a single ClusterChain, or a
series of PatternOr’s. Accompanied with it
is a (min, max) pair of integers noting that the
given pattern must repeat at least min times
but no more than max times in any given
match. To produce a query (SQL) generating
its own result set, it uses the sql method of
its constituent PatternChain, ClusterChain
and PatternOr objects to form a temporary
table of (start, finish) tuples unique to it.
Its SQL code essentially performs successive
joins on the finish and start fields of its element’s solution tables, piecing together a new
table with all solutions and storing this solution set in a new table unique to itself.
The PatternChain, and PatternOr
classes, are the highest level subclasses
of class Query, and similarly use the sql
method of member Pattern’s to generate
SQL code which generates solution sets
(in temporary tables). The PatternChain
might be more aptly named PatternAnd
since it seeks to ‘and’ (chain) together a
series of patterns using essentially the same
successive join-type method as described
above for Pattern. The PatternOr class
exists for housing a series of Pattern’s in an
‘or’ clause. Its solution set is generated by
simple successive unions of its constituent
solution sets (those generated by its constituent Pattern objects) into a new solution
set in a new temporary table.
The respective queries are passed up the
tree during the afore-mentioned depth-first
traversal in the form of a successivelyappended array of SQL statements. The topmost object in the complete tree is queried by
the Query class (which serves as a driver) for
the table name of its solution set, which contains the final results. These results are then
passed to other classes which format them for
the requester.
In the prototype system, formatting consists of a conversion from the score id
(start, finish) pair form to a form containing information regarding piece, track,
and beat offset. Dynamic MIDI generation
from the database itself is facilitated from the
beginning of a match and presented to the
user as a link on each result displayed on the
results page.
4.4
Indexes
Indexes are what make relational databases
usable. Deciding where to place indexes without wasting space can be a difficult task,
and we do so here using the most common
method – by considering which fields are being used for selecting rows from given tables,
12
and which fields are used to join which tables.
What follows is also covered in the next section covering the computational complexity
of query segments.
From bottom to top, we detail here what
types of statements are executed during the
running of a Query, at each step detailing the
index we have chosen to create as a result and
the reason for it.
First, Cluster’s search for their constituent elements via a join on the Cluster
and Note tables on the note field of the
Cluster table and the id field of the Note
table.
The Cluster table can grow to be quite
large (theoretically as large as the Score
table itself), but indexes on the note id,
pitch and duration on the Notes table combined with an index on the note and ncount
columns of the Clusters table provide reasonable access to clusters. Without this, because of the computationally intense nature
of the effective set equality test being done
between our query set of notes and the set
of all notes, even exact match searches (nonwildcarded pitch and durations) could require
an inordinate amount of time.
Next, ClusterChain’s retrieve matching
(start, finish) pairs from the Score table
by selecting on the cluster and delta fields,
and we therefore index the Score table on
those fields, lest we be forced to perform an
entire scan on their tables.
Finally, the temporary tables passed between Pattern, PatternChain, PatternOr
and Query are indexed on both start and
finish fields, as successive queries use both
fields separately, and the size of these tables
could potentially be quite large (theoretically,
the entire Score table).
4.5
Computational Analysis
Given the heavy reliance upon indexes here,
their tendency to obfuscate results from the
user, and the need to compare this method
with others on a more theoretical level, we
close this section with a computational analysis of elements of a query on our database,
covering the indexes and their role.
The main database indexing tools buried
within most modern relational databases are
hashes and B+ trees. Within our database we
have used B+ tree-based indexes; they generally provide O(log(n)) access time to rows
of their tables when provided their member
attributes. [8]
Despite the varied number of queries that
are possible within the above musical regular expression grammar, there are essentially
only two types of queries performed on the
database. We have seen a bit of both already in section 4.4; the first involves finding
all clusters containing a given set of notes.
The second involves determining which pairs
of (finish, start) pairs (from two different
tables) are sufficiently adjacent to be joined
and passed up as part of another result set.
The first type of query is the more complex
of the two and we will deal with it first. The
query amounts to a set equality test between
the set of pitches within a cluster, in a given
query, and any set of pitches in any cluster
in the database. It is, on the average, the
most computationally expensive single query
made, and there is likely room for some im-
13
provement with respect to it in the database We now look in the Score table for entries
layout and query structure. A typical query that match the given set of returned (from
of this type looks something like this:
the subselect) cluster id’s. Using the next
field of the score table, “chained” or consecuselect id, id+<num clusters> from
tive cluster entries in the score can be found.
score where cluster in
Let DN , DC , and DS be the number of
2.( select id from clusters C
notes, clusters and score items stored in the
where exists
database, respectively. Let QN and QC have
1.( select C2.id from notes N,
like meaning for the number of notes and
clusters C2 where
clusters in a given query. (For multi-cluster
pitch=<pitch>
queries, let QN denote an average number of
and duration=<duration>
notes per cluster.) Let us further assume that
and C2.note=N.id
for all queries made as above, the respective
and C2.ncount=2
tables have indices built on the sets of fields
and C2.ncount=<num notes>
on which they are being selected (i.e., all seand C2.delta=<delta>
lects go through an index, as we have already
and C2.id=C.id )
prescribed, for data retrieval).
[ and exists (<as
Query number 1 performs a join on the
previous exists clause>)])
Clusters and Notes tables. Computation[ and next in <as
ally, given the afore-mentioned index on each,
previously nested clause>])
this amounts to a pairing of each item selected from each table with each other or a
The inner-most subselect (labeled 1.) query of order O(log(DN ) ∗ log(DC )). This is
finds all clusters which have one of the given done once for each note in the cluster and this
notes in it. Just outside of that subselect is therefore becomes O((log(DN ) ∗ log(DC )) ∗
another (labeled 2.) which selects the clus- QN ).
ters which match all the given pitches. At
Let R1 be the number of clusters that
this point in the query, we have found, if one match query one. Clearly, 0 ≤ R1 ≤
exists, a cluster (or set of clusters in the case DC . Query number 2 performs a select on
of a wildcard search) which has the given size, the Clusters table for each of the R1 rethe given set of notes and the given delta sults. This translates to a retrieval time of
value.
O(log(DC ) ∗ R1 ).
From here, we can attempt to find consecLastly, let R2 be the size of the result set of
utively occurring clusters. This step is per- query 2. As above, 0 ≤ R1 ≤ DC . The outerformed by the outer-most select (select id, most select selects from the Score table, once
id+<num clusters> ...). Each cluster is for each result. This translates to an answer
found via a subselect like the one detailed in retrieval time of O(log(DS ) ∗ R2 ).
the previous paragraph (the one labeled 2.).
Putting this all together we arrive at
14
O((log(DN ) ∗ log(DC ) ∗ QN ) + log(DC ) ∗ R1 +
log(DS ) ∗ R2 ). This is difficult to deal with
because the sizes of the result sets determine what drives the expression the most.
The worst case is quite poor: O((log(DN ) ∗
log(DC )∗QN )+log(DC )∗DC +log(DS )∗DC ).
The average case is very difficult to speculate on without more empirical evidence from
the search domain. Both R1 and R2 have very
high bounds (the size of the Clusters table),
and these bounds are not unattainable – especially for poorly formed wildcard queries.
Unfortunately, the second type of query
suffers similarly – it is completely based upon
the results of a user query. It is, however
much simpler:
select T1.start, T2.finish from
<table name> T1, <table name> T2
where T1.finish=(T2.start-1)
Let R1 and R2 here denote the size of the
tables labeled T1, and T2 respectively. Again,
assuming an index on the start and finish
fields of each table, this amounts to a join of
the two tables on the finish field of one and
the start field of the other.
The only problem with this query is that
table T2 is not indexed on (T2.start-1), but
that is what is requested of it, and so a full
(indexed) scan must be performed. It would
be trivial to modify the construction of the
internal tables such that the start value is actually one less than the real start value of the
score id in question. It would also be trivial
to store both values and index one of them.
Either one of these options would result in a
data retrieval time of O(log(R1 ) ∗ log(R2 )).
Without these tricks, however the analysis is somewhat more complex. Different
databases handle this differently, however,
one common method is to perform the scan
on T2, retrieve arithmetically altered values
from it and use T1’s index to retrieve matching values out of the join. This results in
O(log(R1 ) ∗ log(R2 ) + R2 ).
5
Results
Though our generated SQL queries are standard enough to be capable of running on
Sybase and Oracle, we have implemented the
system on a Pentium-III 750 MHz system
with 512 megabytes of RAM running Microsoft SQL server 7.0 and Windows 2000 Advanced Server.
The client code including the parser, and
all classes discussed in this paper have been
written in Perl [13] and though they run on
nearly any modern operating system (with
Perl and the proper database interface libraries installed), we have used Solaris 8. Our
web server of choice (to parse queries and
send them off to the database via a CGI interface) is Apache running on a Sun Enterprise
E250 with one gigabyte of RAM.
In order to test our database we devised
three types of queries. The first is simply a
single cluster of a random set of notes ranging
in cardinality from one to three. We will refer
to these as “singles”. The second is a series
of such randomly generated clusters, ranging
in length from one to three. We will refer
to these as “chains”. Finally, we randomly
generate chains and “or” between one and
15
three of them together into a new query that
we will refer to simply as an “or”.
In order to gauge how well the system
scales, after each piece insertion we randomly
generated the three types of queries above,
ran them on the database and timed the results. Clearly, given the random nature of the
queries, they will vary greatly in run-time,
but taken as a whole, they should give a clear
indication of the scalability of the system.
The graph in figure 5 shows the results of
these time trials. On the X-axis is the number of clusters in the system at the time of
the test; numbered between 0 and roughly
500,000. On the Y-axis is the time in seconds; measured between 0 and 10. The or
queries have a good deal of variance on the
Y-axis, but still stay well under 10 seconds
for all tests. The single and chain queries
display much less variance and have reasonable times (generally less than three seconds)
throughout all trials.
6
der eight, with a maximum number of 364
songs in the system and just over half a million clusters.
We see room for further research in a number of areas. The first involves alternate
database schema, and more efficient queries.
The second involves a friendlier user interface, as the current leaves much to be desired for even advanced users, much less the
general music community or even the public
at large. Finally, we envision alternate uses
for the database itself, including, algorithmic
composition and music analysis.
Conclusion
We set out to provide a base for future music cognition research using modern relational
database techniques and the application of elemental language theory in the form of regular expressions. We formed a query language
which mimics a known syntax for searching
text and implemented a prototype search engine for long-term storage of massive amounts
of music. We tested the performance of this
engine against a number of common queries
and found non-wildcard search times to be
generally under three seconds and always un16
Figure 5: Three Queries Varying by DB Size
7
Appendix
The class hierarchy:
Score::Note
int
pitch
int
duration
method midi_pitch
method pitch
method duration
method sql
#
#
#
#
returns
returns
returns
returns
absolute pitch 0-127
octave-invariant pitch 0-11
duration
SQL suitable for finding the note instance
Score::Cluster
Score::Note array notes
method count
# returns the number of notes in the cluster
method add
# adds the passed note to the cluster
method sql
# returns SQL suitable for finding the cluster instance
Score::Track
Score::Cluster array
string title
int
patch
#
int
tempo
#
method clusters
#
method title
#
method size
#
method parse
#
method tempo
#
Score::Piece
Score::Track array
string title
string md5sum
string url
method parse
method title
method num_tracks
clusters
MIDI patch ID
MIDI tempo
returns the clusters array
returns the title
returns the number of clusters
parses a passed MIDI track
returns the MIDI tempo of the piece
tracks
# title of the piece
# the md5sum of the piece
# the originating URL of the piece
# parses a passed MIDI piece
# returns the title of the piece
# returns the number of tracks
method url
method md5sum
# returns the url of the piece
# returns the md5sum of the piece
Query::ClusterChain
Score::Cluster array
method add
#
method count
#
method sql
#
method table_name #
clusters
add a cluster to the chain
returns the number of clusters
returns SQL suitable for finding the chain
returns the table name of the sql-generated result set
Query::Pattern
Query::Pattern or
Query::ClusterChain or
Query::PatternChain or
Query::PatternOr array chain
method bypassable # returns boolean -- bypassable (min==0) or not
method sql
# returns SQL suitable for finding the pattern
method table_name # returns the table name of the sql-generated result set
Query::PatternChain
Query::Pattern array patterns
method sql
# returns SQL suitable for finding the pattern
method table_name # returns the table name of the sql-generated result set
Query::PatternOr
Query::Pattern array or_list
method sql
# returns SQL suitable for finding the pattern
method table_name # returns the table name of the sql-generated result set
Query::Parser
string grammar
method parse
# the grammar, as specified previously
# parses the passed query string
Mdb (intended for abstracting database interface and providing a
‘‘handle’’ to the DB for passing to multiple existing Queries)
string username
string password
string server
Dbh
bool
hash
hash
method
method
method
method
method
method
dbh
connected
cluster_cache
note_cache
open_db
close
connected
run_select
run_non_select
insert
# a database handle
#
#
#
#
#
#
#
#
hash on cluster string
hash on note string
opens a connection to the database
closes database connection
whether or not the database is connected
returns results of a passed SQL select statement
runs a passed SQL non-select statement
external method for insertion of a Score::Piece
References
[1] K. Lemstrom, S. Perttu, SEMEX – An Efficient Music Retrieval Prototype,
http://ciir.cs.umass.edu/music2000/papers/lemstrom paper.pdf
[2] K. Lemstrom, String Matching Techniques for Music (Doctoral Thesis),
http://www.cs.helsinki.fi/ lemstrom/
[3] A. Uitdenbogerd, J. Zobel, Melodic Matching Techniques for Large Music
Databases, ACM Conference on Multimedia, 1999
[4] A. Uitdenbogerd, J. Zobel, Manipulation of Music For Melody Matching,
ACM Conference on Multimedia, 1998
[5] R. McNab, et. al., The New Zealand Digital Library MELody inDEX,
http://www.dlib.org/dlib/may97/meldex/05witten.html
[6] D. Huron, http://www.themefinder.org/
[7] D. Gusfield, Algorithms on Strings and Sequences, Cambridge Press, 1999
[8] C.J. Date, Introduction to Database Systems Addison-Wesley, 2000
[9] J. Martin, Introduction to Languages and the Theory of Computing,
McGraw-Hill, 1991
[10] MIDI Group, MIDI Standard, http://www.midi.org/
[11] S. Burke, MIDI Perl module,
http://search.cpan.org/doc/SBURKE/MIDI-Perl-0.79/lib/MIDI.pm
[12] D. Conway, Recursive Descent Perl Parser,
http://search.cpan.org/doc/DCONWAY/Parse-RecDescent1.80/lib/Parse/RecDescent.pod
[13] http://www.perl.org/