* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PDF
Oracle Database wikipedia , lookup
Microsoft Access wikipedia , lookup
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Ingres (database) wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
A Musical Regular Expression Matching System for Relational Databases John Lane William Punch June 21, 2001 Abstract 3.2 3.3 Transposition invariance Rhythmic-notational Tempo invariance . . . . The Case for Clusters . . Regular Expressions . . Relational Schema . . . Dealing with MIDI . . . . . . and . . . . . . . . . . . . . . . We present a large scale model for a database of multi-track polyphonic songs searchable 3.4 via a modified regular expression format and 3.5 stored in a relational database. The regu3.6 lar expression format is of our own design 3.7 and will likely be familiar to those with experience using common text search utilities. 4 The Gory Details The relational database is a standard, off-the4.1 Relational Schema, Part II shelf SQL-based database management sys4.2 Breaking up a Query . . . tem. The songs have been decomposed into 4.2.1 Class Definitions . a relational model and searching is done via a 4.3 Generating SQL Queries . translation between our query language and 4.4 Indexes . . . . . . . . . . . SQL. 4.5 Computational Analysis . Keywords: musical regular expression, relational database 5 Results Contents . . . . . . . . . . . . 4 4 5 5 8 9 10 10 11 11 11 12 13 15 6 Conclusion 16 7 Appendix 18 1 What is needed? 1 2 State of the Art 2 3 What could be? 3.1 String Matching . . . . . . . . 3 Musical analysis, algorithmic composition 3 and even query-by-tune have great music cog- 1 1 What is needed? 2 nition needs that parallel those of text cognition. Text cognition is based heavily on language theory and text cognition systems are rooted in regular languages and the regular expressions which alternately recognize or generate them. Viewing music as a language, if we hope to move beyond simple matching of music and into cognition of it we must begin with the application of language theory. The simplest set of languages are the regular languages – those recognized by finite state automata; on these our search system will be based. [9] State of the Art There are several music search systems and prototypes in existence today, including Semex [1], Meldex [5], Themefinder [6] and several others lacking short monikers [3]. For purposes of comparison, we restrict our attention to those which parallel our use of digital scores and our handling of polyphony, not considering analog-based systems, and those which disregard polyphony, as these systems tend to be quite different internally. We will focus on each system’s query format, database schema, and respective advantages and disadvantages. These restrictions set our focus on Semex [1] and a system by Uitdenbogerd & Zobel [3]. Semex’s query format consists of a string of lettered pitches which are in turn converted into a lettered pitch and subsequent pitch intervals (the distances between pitches). Semex’s database scheme is based around a “fast bit-parallel” algorithm for what amounts to a computation of the edit distance (the number of changes necessary to transform a one string into another) between a potential match and a given pattern. Strings (of pitches) having the smallest edit distance to a given pattern are considered best matches. The database itself is an index of various score files residing in memory. Polyphony is handled in one of two ways: either by reducing any query or piece to monophony by only considering the highest pitch of any would-be chord, or by successive, combined monophonic searches. Semex has the advantage of being small and fast, but currently lacks a proper long-term stor- Vast amounts of data are tamed on a daily basis by relational databases and the relational models which back them. We show here that these same relational models allow for quick searching and much flexibility in the case of large databases of music as well. Combining these ideas, we present here a regular expression query language for searching of multi-track, polyphonic music and a relational database to provide a quick and efficient backend for it. The paper is constructed as follows. We begin in the next section by discussing some existing search systems and the various query and database formats these systems use. In section three we discuss possibilities for query and database schemes, present the construction of our own and prove various properties of it. Section four contains the technical details of our query language and database system, and proves various claims regarding each. Section five presents empirical results from our implemented system, and section six concludes. 2 age and indexing method. Please refer to [1] for further details. The Uitdentbogerd & Zobel system presents a new method for indexing melodies and handles polyphonic indexing by simply considering the highest pitch of any given chord (as was the case with the first of Semex’s methods). This method has the distinct and obvious disadvantage of returning many false matches to any given query. Please see [4] for more details. As best as can be determined, neither prototype handles the encoding of rhythmic information such as note durations or tempos. We make a number of observations and conclusions: first, only Semex handles polyphony fully (via its second method of repeated pitch matching) while maintaining transposition invariance, and Semex is only able to do so in an awkward manner. Thus, polyphony presents a special challenge (as we will see again in section 3.4) particularly for database schema using relative pitches and/or durations. Second, both provide entirely separate schemas and sets of algorithms for exact matching and inexact (fuzzy) matching, thus limiting the user to one or the other, or forcing the database maintainer to implement both. Ideally, query languages should be flexible enough to handle a wide range of uses, including exact and inexact matching and polyphony. Lastly, some manner of long-term indexing is necessary to manage data extraction in large music databases. Further, industrial size (every piece of music ever written, for example) requires industrial strength storage, retrieval and indexing systems, and this is exactly the reason relational databases were developed. To use a relational database, however, one must first develop an efficient search system in the relational model. We will address this more fully in section 3.6 and refine this idea for practical use in section 4.1. 3 3.1 What could be? String Matching Application of modern string matching techniques to music is well-covered in the literature. [2] Two of the more popular techniques (mainly so because they allow sub-linear time matching of multiple strings to a pattern), are the Aho-Corasick tree-based matching technique, and various suffix tree-based methods. A good treatment of these two and more is available in [7]. These two, in their general forms, require a memory model with pointers (in order to form a tree structure). Though there exist non-pointer-based variants (e.g., suffix arrays), these are not nearly as flexible. Unfortunately pointer-based structures are inefficiently represented in a relational model. For example, a tree can be represented in a table, each row being, a character (or a word). However, assuming a standard index on the table, a traversal of one node now requires O(log(n)), where n is proportional to the number of rows in the table (i.e., the number of nodes in the tree). (We will cover indexes in more depth in section 4.4.) This means an average of O(log2 (n)) to 3 traverse the tree stored in a table. If N is proportional to the number of characters (or words) of input encoded in the tree, given a suffix tree’s O(N 2 ) tree size, the O(log2 (n)) search time is O(log2 (N 2 )) or, asymptotically, O(log2 (N )). This is actually worse than simply indexing a linked list structure and finding a match using it! This simpler method would amount to O(k∗log(N )) where k is the length of the desired match. When representing music within a relational database, pointer-based structures can generally be said to be inefficient in terms of storage space and processing time. This fact removes many pointer-based algorithms from our arsenal of potential solutions. Object-oriented databases provide hope for long-term storage and quick (index-based) retrieval of these types of pointer-based structures, and this may well prove to be a fruitful area of research. [8] However, the current generation are highly priced and are generally add-ons to existing RDBMS packages. We restrict our focus in this paper, therefore, to standard RDBM systems. the same onset time and could have different durations. One could sort the pitches for a given onset and accept a vector of relative pitches for that onset to the next, but even after one handles the problem of mismatched set sizes (between onsets), this would prevent partial chord matches because any addition or deletion from a set effectively changes its order and thereby changes the vector of relative pitches representing it. Another possibility is to perform, as above, repeated searches of all possible transpositions. If the system only encodes 12 different pitches (one octave), this becomes much more feasible. We make the assumption henceforth that this is how transposition invariance is handled. Instead of pushing this complexity into the search engine itself, this places it on the outside; keeping the engine as simple and fast as possible. 3.2 Rhythmic-notational invariance simply implies invariance over the different notations for encoding beat divisions (e.g., different time signatures). We assume henceforth that this is handled by encoding note durations as divisions of a beat (instead of a “note”). Tempo invariance, or providing for the matching of queries to pieces which would match if their tempos were modified (i.e., if all their durations were scaled by some rational number), is more difficult. It can be provided for in the same manner as transpo- 3.3 Transposition invariance Transposition invariance, or providing for the matching of queries to transposed pieces, is, in all cases studied above, accomplished through the use of storage of relative pitches – as changes in pitch from a previous (or subsequent) note. This works wonderfully for monophony. However, the introduction of polyphony begs the question of which pitch another should be relative to, since pitches that may have 4 Rhythmic-notational Tempo invariance and sition invariance – using relative durations. This, of course, suffers the same problems described above for transposition invariance when polyphony is allowed. We therefore henceforth assume a similar scheme in which all reasonable tempo multiples are tried until a match succeeds (and while the query agent deems this step useful). The lack of different possibilities for non-beat-transforming tempo variation (generally only a doubling or halving of beat value is reasonable since each requires a doubling or halving of the tempo, which becomes quite unnatural) makes this a reasonable solution. 3.4 Polyphony is unrestricted in that at any beat offset b, any set Cb of any number of (pitch, duration) two-tuples (pitch from P and duration from Q) may occur. Again, a set of clusters Cb , ordered by their offset defines a piece. In other words, polyphony requires the introduction of an extra layer of complexity: a set of (pitch, duration) two-tuples (as defined above) can begin at any piece offset. Here the duration portion of the two-tuple is necessary to provide the needed freedom for note overlap, whereas this was not necessary in the monophonic case. We assume henceforth that P is a set of non-negative integers, less than 128, mapping to a pitch in the equitempered scale as defined by the MIDI standard [10], and that the durations are represented by positive integers using a scheme in which the number 120 has the value of one beat (again, taken from MIDI). Each of these conveniences limits the space of possible pitches, durations and therefore note expressions, but as can be seen within MIDI itself, the limitation is not significant. These sets of note onsets, or note onset clusters, as defined above are a simple way to manage polyphony, while maintaining full expressiveness. The Case for Clusters We now lay out our structure for storing polyphonic music data. We begin with monophony and extend the structure for handling it to handle polyphony. Monophony is a restricted form of polyphony in which a set (or cluster) of cardinality one, Cb , of notes from a set of pitches P (depending upon the scale being used) occurs in a piece at a beat offset b, where b is from the set Q of rational numbers. Notes are generally (pitch, duration) two-tuples, but since Cb contains all the information needed regarding the onset time of the note (Cb ), the duration portion of the tuple is unnecessary. A set of notes (or singlenote note onset clusters) Cb , ordered by their offsets defines a piece. In other words, monophony requires only a set of (pitch, offset) pairs (or an ordered set of (pitch, duration) pairs) to form a piece. Polyphony introduces additional layers of complexity. 3.5 Regular Expressions To describe music as a language of emotion is nearly a cliche; even so, as so many have already stated, music is heavily based on grammatical productions. While music does differ significantly from 5 text and demands its own treatment, it still seems only natural that, if progress in music cognition is to achieve anything of what text cognition has, language theory must be applied to it in a meaningful way. Regular expressions, or languages generated by Chomsky type three grammars [9], form the basis of modern language processing and analysis. We create a regular expression query language for music by successive transformation of a known regular expression format for character or text recognition as follows. We begin with the following grammar: The above language (grammar) for generating regular expressions is useful to us if and only if it is capable of generating a set of regular expressions which is in turn capable of generating any regular language. We set about proving that it is capable of this by first defining just what a regular expression is, and then moving on to prove that it is indeed capable of generating any such expression. A recursive definition from a standard undergraduate theory textbook [9] for regular expressions over an alphabet Σ and their corresponding language is as follows: start: pattern_chain(s) pattern_chain: or_pattern(s) | pattern(s) or_pattern: pattern(s) ’|’ pattern(s) pattern: ’(’ pattern_chain ’)’ qfier | ’(’ cluster_chain ’)’ qfier char_chain: char(s) qfier: ’{’ int ’,’ int ’}’ | ’’ char: /[a-zA-Z]/ int: /[0-9]+/ 1. ∅, corresponding to the empty language 2. Λ, corresponding to the language {Λ} 3. For each symbol a ∈ Σ, the corresponding to the language {a} 4. For any regular expressions r and s over Σ, corresponding to languages Lr and Ls , respectively, each of the following is a regular expression over Σ, corresponding to the language indicated. (a) (rs), corresponding to Lr Ls (concatenation) (b) (r + s), corresponding to Lr ∪ Ls (union) For which a typical expression might be of the form: (c) (r∗ ), corresponding to L∗r (repetition) ((a){0,.}|(b){1,.}(c){1,.}){1,.} corresponding to the following regular expression: 5. Only those expressions generated by one of the above are regular expressions over Σ. (a*|b+c+)+ 6 The first two rules are useful in proving duration: int ’/’ int | theorems about regular languages but not as int | useful in generating them. The first case /(\.)/ is handled by any expression of the form int: /[0-9]+/ (){0, 0}. Not very useful, but nevertheless Plugging into our regular expression gramconformant. The second case is handled by mar above and modifying slightly to handle any expression of the form (){1, 1}. Again, not very useful, but still obtainable using the clusters instead of characters yields the following: grammar. The third is handled by any expression of start: pattern_chain(s) the form (a){1, 1}. pattern_chain: or_pattern(s) | Part (a) of the fourth, or concatenation, pattern(s) is handled by the form (r){1, 1}(s){r, r}. or_pattern: pattern(s) ’|’ Part (b), or union is handled by the form pattern(s) (r){1, 1}|(s){1, 1}. Part (c), or repetition is pattern: ’(’ pattern_chain given by (r){0, .}. ’)’ qfier | All four necessary components of the defi’(’ cluster_chain nition having been satisfied (having been gen’)’ qfier erable), it must be that the given language is cluster_chain: cluster(s) capable of generating an expression languageqfier: ’{’ int ’,’ equivalent to that generated by any regular int ’}’ | expression; syntax may differ, but the set of ’’ generable languages is equivalent. cluster: ’[’ note(s) ’;’ The flexibility of the {min,max} notation is duration ’]’ | obviously not strictly necessary for a regular ’[’ note(s) ’]’ expression, but becomes useful particularly note: ’<’ pitch ’,’ when dealing with highly repetitive data such duration ’>’ as that represented in music. pitch: /[a-gA-Gs]+/ One possible grammar for cluster (as deduration: int ’/’ int | fined above), note, pitch and duration repreint | sentation is as follows: /(\.)/ cluster: note: pitch: int: ’[’ note(s) ’;’ duration ’]’ | ’[’ note(s) ’]’ ’<’ pitch ’,’ duration ’>’ /[a-gA-Gs]+/ /[0-9]+/ As a simple extension of the above regular expression grammar, it itself must also be capable of generating the same set of regular expressions; this time for music (for clusters). It also allows for the full expressibility 7 pitch {1,m} Note Cluster Note o id o id o pitch o note o duration Track−Cluster Track o onset o id o track o piece o piece o name o patch o cluster o tempo Piece o id o title onset (delta) duration Cluster− {1,n} Cluster Notes {1,m} Track− Cluster {1,m} {1,m} Piece Piece− {1,1} tempo Track Track title name patch Figure 2: Decomposed Tables for ClusterFigure 1: E/R Diagram for Cluster-based based Music Database Music Database id (piece) → title of clusters and since, as shown above, cluspiece (id), ters in turn allow for the full expressibility of id (track) → name, patch, tempo any polyphonic piece, the language must also onset, track (id), therefore be fully musically expressible. piece (id) → cluster id (note) → pitch, duration 3.6 Relational Schema id (cluster) → note Relational databases require a relational Figure 3: Dependencies for Cluster-based schema. Entity/Relationship diagrams are Music Database a common, readily-understood form of representation for these schema and we present such a diagram in figure 1. [8] necessary; a track relates to at least one cluster and as many as necessary. A track relates to one and only one piece, and a piece must We begin with our “bag” of entities: Note, have at least one track, but can have as many Cluster, Track, and Piece; each with their as needed. The (candidate) keys of each entity are their underlined attributes. Decomrespective attributes. We then define three relations joining the position into tables from the diagram is fairly four entities. As in the diagram in figure 1, a simple, and is shown in figure 2, with the cornote can relate to any number of clusters; a responding dependencies shown in figure 3. Since there exist no multi-valued depencluster must have at least one note, but can relate to as many as necessary. A cluster re- dencies, full decomposition requires that the lates to at least one track, and as many as tables be in the so-called Boyce/Codd Nor8 mal Form, or BCNF [8]. First normal form is trivially reached: clearly in every legal value of every relvar, every tuple contains one value. Second normal form is already obtained because every non-key attribute is irreducibly dependent on the keys. The set of relvars is in third normal form because each non-key is directly dependent upon the key. Finally the set is in BCNF because the determinants of the relations are the candidate keys. With our decomposed tables in hand, we move to dealing with data to propagate them. 3.7 music encoded in common Western musical notation with notes durations being formed by beat multiples. The beat notation is a useful one for an internal duration since it is notation (time signature) invariant. However, most MIDI music on the web, is “played” at a keyboard, meaning that there is some human being pushing keyboard keys to produce MIDI events on a MIDI keyboard or synthesizer. Invariably the beat value played by a keyboard player on her keyboard will differ from its ideal notational equivalent; a high number of ticks per beat is allowed by MIDI for expressiveness (e.g., notes can be “held” or shortened for emphasis or effect), however, a musician’s (emotional) “expressiveness” is simply imprecision from the perspective of one creating a music database and this imprecision haunts us when we try to make sense of what was meant, notation-wise, by the human-generated events. Various methods exist for dealing with this problem, but the most common in use today is known as note duration “quantization” and works by mapping each respective note duration of an existing piece onto a member of a smaller, fixed set of durations which correspond to well-known note durations (e.g., correspond to a quarter-note or a dotted thirty-second). This process is most often done using an approximation algorithm – one which chooses the note from the fixed set which is closest in duration to the value at hand. Musicologically-minded methods exist for eliminating some notes from the fixed set through the use of domain knowledge of what note durations could possibly appear in certain situations, but rules for doing so de- Dealing with MIDI In any music database the encoding of notes and durations is of major import; indeed it is this very encoding which determines the search properties of the system. (For example, the transposition and tempo invariance properties discussed in the sections above.) Because our database will use MIDI [10] files as input, we begin by explaining our format choices and the problems inherent in dealing with MIDI. The MIDI format for encoding note event durations is based upon a tick value. There exists some (modifiable) number of microseconds per tick. These ticks form the basis of another unit: beats; the number of ticks per beat is also controllable but is generally (simply by convention) set to be 96. Higher tickper-beat values (such as 96) provide for sufficient resolution to allow for widely varying note durations and subsequent musical expressive ability. Beats, in turn, are the underlying value for 9 pend greatly on the type of music being evaluated. In order to handle any type of music, we will assume, henceforth, that a generic approximation-based quantization algorithm is used to some fixed, smallest duration value (e.g., a thirty-second note) without regard to the type of music being evaluated. Clearly there will be times when such a system makes an error in its guess as to which duration was intended by the player, and this creates great problems for any music matching system based on such quantized data. We will assume henceforth that this quantization problem in solved in one of two ways. The first is simply allowing the user to enter a wildcard for a duration (i.e., a duration which matches any duration). The second is to allow the user to specify some ‘fuzz’ value for their duration match, such that if a given note duration within the database is within given number of ‘fuzz’ units to the note duration being sought, it is allowed to match. Having dealt with the more theoretical considerations of a musical regular expression matching system, we now use these as we turn our attention to implementation details of our system. 4 The Gory Details Full decomposition minimizes redundancy but does not necessarily maximize performance. Here we will introduce some alterations in the above schema to improve performance through techniques such as the addition of non-essential keys (e.g., identity values) and the creation of redundant tables or Note o id o pitch o duration Score o id o offset o delta o track o piece o cluster o next Cluster o id o note o ncount Track o id o piece o name o patch o tempo Piece o id o title o md5sum o url o ntracks Figure 4: Decomposed Tables for Clusterbased Music Database the storing of redundant values in different tables. 4.1 Relational Schema, Part II In the implemented version of our database we have chosen, for reasons of efficiency and ease of querying, the layout shown in figure 4. The changes mainly apply to what was the Tracks-Cluster table – what is now the Score table. We have added a field to uniquely identify any offset and corresponding cluster in any track. Some changes were also made to the Piece table, mainly for the purposes of adding additional information since, practically speaking, some pieces are without a unique name. Finally, to least attempt to avoid duplicate piece entry, we add an MD5 checksum on MIDI input files. 10 4.2 Breaking up a Query generation. Data insertion, given that the database can handle full polyphony, is straightforward and in deference to brevity will not be covered here. We move to the more important task of finding the result set of a query. Let us assume, for the purposes of explanation, that we have a populated database. The first step is defining exactly what we would like from a result set. What we would like for each match is a (start, finish) twotuple, both start and finish being a score id, as detailed above. While useful from an enduser (or higher level) perspective, this scheme also makes it easy to pass intermediate answer sets between query segments allowing for a good deal of modularity. The first step in searching is the parsing or breaking up of a query into searchable bits and pieces. This breaking up is most easily done using a parser generator (recursive descent or bottom-up) in concert with the grammar as laid out above, and appropriate data structures, in our case, in the form of classes. Concise definition of these classes makes parsing the query straight-forward. 4.2.1 Class Definitions Most notably, each of the defined classes has a method named sql which is used by higher levels in the parse tree of the query string for producing SQL suitable for finding a match of its own instantiation in the database. To avoid clutter, the class hierarchy is printed in the appendix of this document; please refer to it there for the next section on SQL 4.3 Generating SQL Queries Building an SQL query from a query string is straight-forward when viewed from the perspective of the parser. As alluded to in the previous paragraph, the parser instantiates objects of the appropriate class to represent objects encountered in the parse tree of a query string. When finished, it has a Query object with a single PatternChain in tow forming the root of a tree of instantiated classes representing the given query. More concretely, in order to parse a query string, the Query object actually creates a Parser object and uses it to parse the given query string. If the Parser object is successful in building the tree of objects described above, it returns the aforementioned a PatternChain object; SQL is obtained through its sql method; the name of the table of final results is obtained via its table name method. Thus, running the query is simply a matter of passing the generated SQL to the the database via the Mdb class and performing the necessary action with the result set (e.g., user formatting). Internally, the task of generating SQL can be somewhat complex but generally works using the interfaces mentioned directly above with each object being responsible for its own SQL generation. Once formed into a tree by the parser, all one need do is traverse the tree correctly (depth first, in this case), calling the sql method and, where necessary, table name method to piece together successive queries. 11 We treat the respective classes here from the bottom (of a query) up with a mind to cover enough implementation detail to provide a base for further work, but not so much as to unduly encumber the reader. The ClusterChain class acts as a container for an ordered set of Clusters which must appear in series. ClusterChain produces SQL to generate a temporary table of (start, finish) pairs matching the chain, or series of clusters. It does this by calling its constituent Cluster’s sql method and using the results. A Pattern acts as a container for anything that can appear within a query (within parentheses). That is to say, a series of PatternChain’s, a single ClusterChain, or a series of PatternOr’s. Accompanied with it is a (min, max) pair of integers noting that the given pattern must repeat at least min times but no more than max times in any given match. To produce a query (SQL) generating its own result set, it uses the sql method of its constituent PatternChain, ClusterChain and PatternOr objects to form a temporary table of (start, finish) tuples unique to it. Its SQL code essentially performs successive joins on the finish and start fields of its element’s solution tables, piecing together a new table with all solutions and storing this solution set in a new table unique to itself. The PatternChain, and PatternOr classes, are the highest level subclasses of class Query, and similarly use the sql method of member Pattern’s to generate SQL code which generates solution sets (in temporary tables). The PatternChain might be more aptly named PatternAnd since it seeks to ‘and’ (chain) together a series of patterns using essentially the same successive join-type method as described above for Pattern. The PatternOr class exists for housing a series of Pattern’s in an ‘or’ clause. Its solution set is generated by simple successive unions of its constituent solution sets (those generated by its constituent Pattern objects) into a new solution set in a new temporary table. The respective queries are passed up the tree during the afore-mentioned depth-first traversal in the form of a successivelyappended array of SQL statements. The topmost object in the complete tree is queried by the Query class (which serves as a driver) for the table name of its solution set, which contains the final results. These results are then passed to other classes which format them for the requester. In the prototype system, formatting consists of a conversion from the score id (start, finish) pair form to a form containing information regarding piece, track, and beat offset. Dynamic MIDI generation from the database itself is facilitated from the beginning of a match and presented to the user as a link on each result displayed on the results page. 4.4 Indexes Indexes are what make relational databases usable. Deciding where to place indexes without wasting space can be a difficult task, and we do so here using the most common method – by considering which fields are being used for selecting rows from given tables, 12 and which fields are used to join which tables. What follows is also covered in the next section covering the computational complexity of query segments. From bottom to top, we detail here what types of statements are executed during the running of a Query, at each step detailing the index we have chosen to create as a result and the reason for it. First, Cluster’s search for their constituent elements via a join on the Cluster and Note tables on the note field of the Cluster table and the id field of the Note table. The Cluster table can grow to be quite large (theoretically as large as the Score table itself), but indexes on the note id, pitch and duration on the Notes table combined with an index on the note and ncount columns of the Clusters table provide reasonable access to clusters. Without this, because of the computationally intense nature of the effective set equality test being done between our query set of notes and the set of all notes, even exact match searches (nonwildcarded pitch and durations) could require an inordinate amount of time. Next, ClusterChain’s retrieve matching (start, finish) pairs from the Score table by selecting on the cluster and delta fields, and we therefore index the Score table on those fields, lest we be forced to perform an entire scan on their tables. Finally, the temporary tables passed between Pattern, PatternChain, PatternOr and Query are indexed on both start and finish fields, as successive queries use both fields separately, and the size of these tables could potentially be quite large (theoretically, the entire Score table). 4.5 Computational Analysis Given the heavy reliance upon indexes here, their tendency to obfuscate results from the user, and the need to compare this method with others on a more theoretical level, we close this section with a computational analysis of elements of a query on our database, covering the indexes and their role. The main database indexing tools buried within most modern relational databases are hashes and B+ trees. Within our database we have used B+ tree-based indexes; they generally provide O(log(n)) access time to rows of their tables when provided their member attributes. [8] Despite the varied number of queries that are possible within the above musical regular expression grammar, there are essentially only two types of queries performed on the database. We have seen a bit of both already in section 4.4; the first involves finding all clusters containing a given set of notes. The second involves determining which pairs of (finish, start) pairs (from two different tables) are sufficiently adjacent to be joined and passed up as part of another result set. The first type of query is the more complex of the two and we will deal with it first. The query amounts to a set equality test between the set of pitches within a cluster, in a given query, and any set of pitches in any cluster in the database. It is, on the average, the most computationally expensive single query made, and there is likely room for some im- 13 provement with respect to it in the database We now look in the Score table for entries layout and query structure. A typical query that match the given set of returned (from of this type looks something like this: the subselect) cluster id’s. Using the next field of the score table, “chained” or consecuselect id, id+<num clusters> from tive cluster entries in the score can be found. score where cluster in Let DN , DC , and DS be the number of 2.( select id from clusters C notes, clusters and score items stored in the where exists database, respectively. Let QN and QC have 1.( select C2.id from notes N, like meaning for the number of notes and clusters C2 where clusters in a given query. (For multi-cluster pitch=<pitch> queries, let QN denote an average number of and duration=<duration> notes per cluster.) Let us further assume that and C2.note=N.id for all queries made as above, the respective and C2.ncount=2 tables have indices built on the sets of fields and C2.ncount=<num notes> on which they are being selected (i.e., all seand C2.delta=<delta> lects go through an index, as we have already and C2.id=C.id ) prescribed, for data retrieval). [ and exists (<as Query number 1 performs a join on the previous exists clause>)]) Clusters and Notes tables. Computation[ and next in <as ally, given the afore-mentioned index on each, previously nested clause>]) this amounts to a pairing of each item selected from each table with each other or a The inner-most subselect (labeled 1.) query of order O(log(DN ) ∗ log(DC )). This is finds all clusters which have one of the given done once for each note in the cluster and this notes in it. Just outside of that subselect is therefore becomes O((log(DN ) ∗ log(DC )) ∗ another (labeled 2.) which selects the clus- QN ). ters which match all the given pitches. At Let R1 be the number of clusters that this point in the query, we have found, if one match query one. Clearly, 0 ≤ R1 ≤ exists, a cluster (or set of clusters in the case DC . Query number 2 performs a select on of a wildcard search) which has the given size, the Clusters table for each of the R1 rethe given set of notes and the given delta sults. This translates to a retrieval time of value. O(log(DC ) ∗ R1 ). From here, we can attempt to find consecLastly, let R2 be the size of the result set of utively occurring clusters. This step is per- query 2. As above, 0 ≤ R1 ≤ DC . The outerformed by the outer-most select (select id, most select selects from the Score table, once id+<num clusters> ...). Each cluster is for each result. This translates to an answer found via a subselect like the one detailed in retrieval time of O(log(DS ) ∗ R2 ). the previous paragraph (the one labeled 2.). Putting this all together we arrive at 14 O((log(DN ) ∗ log(DC ) ∗ QN ) + log(DC ) ∗ R1 + log(DS ) ∗ R2 ). This is difficult to deal with because the sizes of the result sets determine what drives the expression the most. The worst case is quite poor: O((log(DN ) ∗ log(DC )∗QN )+log(DC )∗DC +log(DS )∗DC ). The average case is very difficult to speculate on without more empirical evidence from the search domain. Both R1 and R2 have very high bounds (the size of the Clusters table), and these bounds are not unattainable – especially for poorly formed wildcard queries. Unfortunately, the second type of query suffers similarly – it is completely based upon the results of a user query. It is, however much simpler: select T1.start, T2.finish from <table name> T1, <table name> T2 where T1.finish=(T2.start-1) Let R1 and R2 here denote the size of the tables labeled T1, and T2 respectively. Again, assuming an index on the start and finish fields of each table, this amounts to a join of the two tables on the finish field of one and the start field of the other. The only problem with this query is that table T2 is not indexed on (T2.start-1), but that is what is requested of it, and so a full (indexed) scan must be performed. It would be trivial to modify the construction of the internal tables such that the start value is actually one less than the real start value of the score id in question. It would also be trivial to store both values and index one of them. Either one of these options would result in a data retrieval time of O(log(R1 ) ∗ log(R2 )). Without these tricks, however the analysis is somewhat more complex. Different databases handle this differently, however, one common method is to perform the scan on T2, retrieve arithmetically altered values from it and use T1’s index to retrieve matching values out of the join. This results in O(log(R1 ) ∗ log(R2 ) + R2 ). 5 Results Though our generated SQL queries are standard enough to be capable of running on Sybase and Oracle, we have implemented the system on a Pentium-III 750 MHz system with 512 megabytes of RAM running Microsoft SQL server 7.0 and Windows 2000 Advanced Server. The client code including the parser, and all classes discussed in this paper have been written in Perl [13] and though they run on nearly any modern operating system (with Perl and the proper database interface libraries installed), we have used Solaris 8. Our web server of choice (to parse queries and send them off to the database via a CGI interface) is Apache running on a Sun Enterprise E250 with one gigabyte of RAM. In order to test our database we devised three types of queries. The first is simply a single cluster of a random set of notes ranging in cardinality from one to three. We will refer to these as “singles”. The second is a series of such randomly generated clusters, ranging in length from one to three. We will refer to these as “chains”. Finally, we randomly generate chains and “or” between one and 15 three of them together into a new query that we will refer to simply as an “or”. In order to gauge how well the system scales, after each piece insertion we randomly generated the three types of queries above, ran them on the database and timed the results. Clearly, given the random nature of the queries, they will vary greatly in run-time, but taken as a whole, they should give a clear indication of the scalability of the system. The graph in figure 5 shows the results of these time trials. On the X-axis is the number of clusters in the system at the time of the test; numbered between 0 and roughly 500,000. On the Y-axis is the time in seconds; measured between 0 and 10. The or queries have a good deal of variance on the Y-axis, but still stay well under 10 seconds for all tests. The single and chain queries display much less variance and have reasonable times (generally less than three seconds) throughout all trials. 6 der eight, with a maximum number of 364 songs in the system and just over half a million clusters. We see room for further research in a number of areas. The first involves alternate database schema, and more efficient queries. The second involves a friendlier user interface, as the current leaves much to be desired for even advanced users, much less the general music community or even the public at large. Finally, we envision alternate uses for the database itself, including, algorithmic composition and music analysis. Conclusion We set out to provide a base for future music cognition research using modern relational database techniques and the application of elemental language theory in the form of regular expressions. We formed a query language which mimics a known syntax for searching text and implemented a prototype search engine for long-term storage of massive amounts of music. We tested the performance of this engine against a number of common queries and found non-wildcard search times to be generally under three seconds and always un16 Figure 5: Three Queries Varying by DB Size 7 Appendix The class hierarchy: Score::Note int pitch int duration method midi_pitch method pitch method duration method sql # # # # returns returns returns returns absolute pitch 0-127 octave-invariant pitch 0-11 duration SQL suitable for finding the note instance Score::Cluster Score::Note array notes method count # returns the number of notes in the cluster method add # adds the passed note to the cluster method sql # returns SQL suitable for finding the cluster instance Score::Track Score::Cluster array string title int patch # int tempo # method clusters # method title # method size # method parse # method tempo # Score::Piece Score::Track array string title string md5sum string url method parse method title method num_tracks clusters MIDI patch ID MIDI tempo returns the clusters array returns the title returns the number of clusters parses a passed MIDI track returns the MIDI tempo of the piece tracks # title of the piece # the md5sum of the piece # the originating URL of the piece # parses a passed MIDI piece # returns the title of the piece # returns the number of tracks method url method md5sum # returns the url of the piece # returns the md5sum of the piece Query::ClusterChain Score::Cluster array method add # method count # method sql # method table_name # clusters add a cluster to the chain returns the number of clusters returns SQL suitable for finding the chain returns the table name of the sql-generated result set Query::Pattern Query::Pattern or Query::ClusterChain or Query::PatternChain or Query::PatternOr array chain method bypassable # returns boolean -- bypassable (min==0) or not method sql # returns SQL suitable for finding the pattern method table_name # returns the table name of the sql-generated result set Query::PatternChain Query::Pattern array patterns method sql # returns SQL suitable for finding the pattern method table_name # returns the table name of the sql-generated result set Query::PatternOr Query::Pattern array or_list method sql # returns SQL suitable for finding the pattern method table_name # returns the table name of the sql-generated result set Query::Parser string grammar method parse # the grammar, as specified previously # parses the passed query string Mdb (intended for abstracting database interface and providing a ‘‘handle’’ to the DB for passing to multiple existing Queries) string username string password string server Dbh bool hash hash method method method method method method dbh connected cluster_cache note_cache open_db close connected run_select run_non_select insert # a database handle # # # # # # # # hash on cluster string hash on note string opens a connection to the database closes database connection whether or not the database is connected returns results of a passed SQL select statement runs a passed SQL non-select statement external method for insertion of a Score::Piece References [1] K. Lemstrom, S. Perttu, SEMEX – An Efficient Music Retrieval Prototype, http://ciir.cs.umass.edu/music2000/papers/lemstrom paper.pdf [2] K. Lemstrom, String Matching Techniques for Music (Doctoral Thesis), http://www.cs.helsinki.fi/ lemstrom/ [3] A. Uitdenbogerd, J. Zobel, Melodic Matching Techniques for Large Music Databases, ACM Conference on Multimedia, 1999 [4] A. Uitdenbogerd, J. Zobel, Manipulation of Music For Melody Matching, ACM Conference on Multimedia, 1998 [5] R. McNab, et. al., The New Zealand Digital Library MELody inDEX, http://www.dlib.org/dlib/may97/meldex/05witten.html [6] D. Huron, http://www.themefinder.org/ [7] D. Gusfield, Algorithms on Strings and Sequences, Cambridge Press, 1999 [8] C.J. Date, Introduction to Database Systems Addison-Wesley, 2000 [9] J. Martin, Introduction to Languages and the Theory of Computing, McGraw-Hill, 1991 [10] MIDI Group, MIDI Standard, http://www.midi.org/ [11] S. Burke, MIDI Perl module, http://search.cpan.org/doc/SBURKE/MIDI-Perl-0.79/lib/MIDI.pm [12] D. Conway, Recursive Descent Perl Parser, http://search.cpan.org/doc/DCONWAY/Parse-RecDescent1.80/lib/Parse/RecDescent.pod [13] http://www.perl.org/