Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Using an Evolutionary Algorithm to Generate Four-Part 18th Century Harmony TAMARA A. MADDOX Department of Computer Science George Mason University Fairfax, Virginia USA JOHN E. OTTEN Veridian/MRJ Technology Solutions Fairfax, Virginia USA Abstract: - We discuss the use of various types of evolutionary algorithms for the generation of 18th-century four-part musical chorales in the style of J.S. Bach. We briefly summarize known existing work on computerized music generation, and some of the roadblocks that have been reached. We propose the use of fitness functions embodying specific constraints to remove human subjectivity from the evaluation process. We compare multiple evolutionary selection and mutation methods, and we conclude that: (1) Chord fitness must be valued much more highly than voice leading fitness; (2) The use of multiple-chord mutation causes stalling of the algorithm at a low fitness value; and (3) The use of fitness proportional selection of parents and one-point crossover produces significantly better results than uniform parent selection. Key-Words: - Evolutionary Computation, Genetic Algorithms, Bach, Chorale, Music, Harmony, Voice Leading 1 Introduction Our goal was to use an evolutionary algorithm to create a musical construct with an objectively discernible level of quality. Attempting to create subjectively appropriate music beginning with random notes is somewhat akin to the traditional hypothesis that a monkey hitting random keys on a typewriter theoretically will eventually turn out a Shakespearean play where the time period involved would be so lengthy as to be completely impractical. In addition, if the hope were to obtain an original work of literature, rather than simply a letter-by-letter copy of a known work, it would be extremely difficult to programmatically judge any such output without human intervention. Generation of random musical notes, too, might eventually turn out something worthwhile, but without (1) reasonable constraints, and (2) a method for automatic determination of quality, such a project seems doomed to failure due to the inordinate time periods necessary, as well as the limitations arising from individual taste and bias. Due to the inherently subjective nature of judging the general quality of a musical composition, we have concentrated upon a narrower focus: the writing of four-part harmony by following specific rules predetermined to embody the style of chorales written by J.S. Bach [1]. These rules are widely accepted as embodying the concepts necessary to compose “correct” 18th century style chorales, and regularly are used as the basis for teaching undergraduate music majors techniques for such composition [2]. Although the rules are fairly straightforward, the potential for thousands of combinations of voice lines can makes it extremely difficult to write a chorale in 18th-century style while adhering to all the accepted rules. We propose the use of an evolutionary algorithm to attempt this task. By focusing on a musical form with predefined rules, we are able to encode many of the rules into our evolutionary algorithm’s fitness function. In this way, we avoid the otherwise predictable requirement of subjective human intervention as the basis for fitness determination, which not only would cripple the algorithm, but also would slow down the overall process so tremendously as to make it basically untenable to achieve meaningful results. Potential uses for an algorithm of this type include verification of previously written chorales (including grading student efforts), and generation of new chorales for a given chord progression and Bass line. The algorithm may also be useful to music theorists as a means to study hypotheses regarding the addition, modification, or removal of various voice-leading rules and how they affect resulting chorales. To establish the framework for generating chorales, we supply a line for the Bass voice and a chord progression. Based on these data, the algorithm will create 4-part chorales by filling in the Soprano, Alto, and Tenor voices and then applying the rules for 18th century harmony and voice-leading to each individual chorale in order to determine its “fitness.” In analyzing a chorale, a number of different criteria must be taken into account. The figured Bass line provided as input allows the fitness function to evaluate whether a given chord is composed of allowable pitches. Each time a chord contains an improper pitch, the chorale is penalized. However, even when chord pitches are technically “correct” given a figured Bass progression, they may violate established rules of voice leading. Each chord can consist of 3 or 4 distinct pitches. However, the position of these pitches (or voicing) in the chord can affect the flow of voicing for later chords based upon the various rules for voice leading. In order to adapt four-part harmony to an evolutionary algorithm, we generate full chorales using completely random notes for all voices except the Bass line (provided as input) and then use the rules for both chord structure and voice leading to determine the fitness of each chorale. The algorithm then creates “offspring” and prunes the chorale “population” using user selected evolutionary computational methods in an attempt to evolve chorales that fit the rules as closely as possible. We test multiple mutation mechanisms, multiple child selection mechanisms and multiple parent selection mechanisms in an effort to determine both the most effective type of evolutionary algorithm and the most effective specific mutation method for this type of problem. Although initially we encountered great difficulties in achieving more than rudimentary convergence for a very short chorale segment, we were able to overcome these initial difficulties by making four major modifications in our approach: (1) a less greedy mutation method; (2) a more carefully refined fitness evaluation function; (3) the use of a brood size larger than that of the parents; and (4) the use of fitnessproportional parental selection and one-point crossover with random mutation for child generation. The final results were quite impressive, achieving chorales with significantly higher fitness in a much shorter number of generations. In addition to being technically “correct,” many of these chorales also incorporated a number of the “optional” fitness criteria that would lead to a higher fitness score. 2 Background The bulk of past work involving music generation seems to have required subjective analysis to determine the fitness level of any individual work produced by the algorithm. In fact, when we conceived this project, we were unaware that this typical background had been expanded. In the course of our work on this project, however, we uncovered several recent efforts similar to our own – that is, efforts using constraints of a particular musical style to provide methods of objective, automated assessment of output. We discuss here two such systems. In his paper “Bach in a Box” [3], Ryan McIntyre sets forth a variant approach towards generating fourpart 18th-century harmony. McIntyre uses a fouroctave range and provides the melody (Soprano line) as the initial input, rather than the figured Bass used here. Besides a number of differences related to the initial input, McIntyre’s system of generation differs from the one presented here in two significant respects: (1) the system is specifically limited to the key of C Major, thus prohibiting not only other key signatures, but also modulation of key signatures within an individual chorale (a standard practice in Bach chorales); and (2) the system uses a “stepwise, three-tiered” approach which McIntyre touts as being a practical necessity to achieving reasonable success in chorale fitness, but which also limits the ability of the algorithm to work simultaneously towards correct chord structure and appropriate voice leading. McIntyre concludes, as we do, that an evolutionary algorithm can be a useful tool for generating four-part harmony in the style of J.S. Bach. A more recent paper by Wiggins, et al. [4], 2 discusses a broader implementation of a system for generation of four-part harmony, but again, provides the melody line as the initial input. While this can certainly be viewed as a more difficult problem, Wiggins’ algorithm was unable to provide acceptable solutions within 300 generations, leading the authors to conclude that “a conventional rule-based system (perhaps in conjunction with as [sic] one or more GAs) is a more appropriate method for the harmonization task.”[4] Additionally, the authors seemed primarily concerned with whether the algorithm correctly simulated human behavior, a concern not at issue to us. efforts simultaneously while moving more steadily towards chorale solutions containing correct chords. Out of the different parent/child selection mechanisms and mutation methods used, several combinations produced output providing acceptable solutions within 300 generations (an improvement over Wiggins’ results, although Wiggins’ did use some fitness constraints not present in our system), while also providing versatility of key signature and modulation not present in McIntyre’s system. 3.1 Representation of the chorales Each chorale is represented by a structure containing an array of chords (the “individual” chorale), together with information concerning various aspects of fitness of that chorale, the key signature input by the user (not technically necessary), the initial ancestor for this chorale’s “generation line” and a flag designating whether this chorale has ever placed within the top ten chorales when ranked by fitness. Each chord is itself represented by a structure that contains the initial input information regarding chord type and Bass voice value (in order to evaluate the chord’s fitness) and the notes currently contained in the chord. The notes are represented by an integer array with values between 4 and 43, inclusive, representing the notes from E nearly two octaves below Middle C through G 1 ½ octaves above middle C. (Since C is commonly used as a starting point, we have established the C two octaves below Middle C as represented by the integer 0. However, since that note is below the established range for the Bass voice, the first legitimate note value is actually 4, representing E, which is the bottom of the Bass voice range.) Since pitch value generally corresponds to the twelve values of the chromatic scale (A, A#/Bb, B, C, C#/Db, D, D#/Eb, E, F, F#/Gb, G, and G#/Ab), this representation provides an easy method to determine the pitch of any given note, since { note value modulo 12 } will always provide the proper pitch value. In our notation, therefore, since we begin with C, the pitch value of C=0, C#/Db =1, D=2, D#/Eb=3, and so on through B=11. The chord structure also contains a matrix representing the interval relationships among all four voices. This matrix allows the program to generate chorales independent of any specific key, since the individual notes and the relationships among the four voices are all that is necessary to evaluate the chord type and voice leading and thereby determine the 3 Our Approach As previously mentioned, our approach uses the figured Bass as system input rather than the melody line used by the systems mentioned above. This provides our system with clearer information regarding chord progression, thus allowing a more tailored fitness function. While arguably creating a “simpler” problem, using the figured Bass as input also creates a more clearly defined focus for results, thus setting up an environment more geared towards a successful outcome. Additionally, our chorale representation is not confined to any specific key signature. Not only may any key be used in a given chorale, but a change of key (or “modulation”) is permissible within individual chorales. This aspect is quite significant, since we are attempting to create harmony in the style of J.S. Bach, who typically incorporated such modulation into his own work. Our approach also uses a single evaluation level for determination of chorale fitness. McIntyre states quite firmly that “it is important to first develop a base of individuals who have proper chord spellings, and continue from there.”[3] While we agree that “correctness” of individual chords does take precedence over voice leading, a bad voicing in the first or second chord of a chorale can sometimes make it impossible to have a proper “solution” for the rest of the chorale. For this reason, we believe that McIntyre’s “step-wise three-tiered” approach is inherently limited and thus less likely to find the best harmonization solutions. Our solution to the difficulties of struggles between the two fitness areas was simply to weight chord correctness significantly higher than good voice leading. This allowed the algorithm to continue to work on both evaluation 3 chord’s fitness. Finally, the structure contains the current value of the chord’s fitness, based both on note makeup and on voice leading from the preceding chord. fitness. In evaluating the individual chords independently, the primary concern involves whether the subject chord contains “illegal” notes – that is, notes that do not legitimately fall within the chord’s intended chord type. However, other rules are also considered. Our present implementation considers five additional items that cause penalties if present: (1) voice crossing; (2) notes present that are outside the proper range for their intended voices (this is necessary because although the initial parent chorales keep voices within their proper range, no such constraint is made on some mutations); (3) doubled leading tones for chords V and vii; (4) intervals of more than one octave between any adjacent upper voices (e.g., Soprano and Alto or Alto and Tenor); and (5) doubling other pitches in a chord. In addition, if all pitches in a chord are represented (implemented by determining the number of distinct notes in the chord), the chord fitness is rewarded. In evaluating the “voice leading” fitness of a chord, our fitness function penalizes the following items: (1) parallel 5ths; (2) parallel unisons or octaves; (3) voice “leap” greater than one octave; (4) improper resolution of a large leap; and (5) overlapping voices. Voice leading fitness is also rewarded for correct resolution of voice leaps. Thus the fitness for voice leading of a given chord depends only on the two chords immediately preceding it. These criteria are used primarily to penalize the fitness of chorales having certain properties; fitness “rewards” occur only rarely. This methodology is a deliberate attempt to create chorale fitness values that will remain negative so long as the subject chorale has any significant flaws, according to traditional rules. In other words, chorales having a fitness value >= 0 are essentially “correct,” though perhaps not as well harmonized as they could be. This method allows the user to glean an immediate idea of the chorale’s overall performance based solely on numeric fitness value, without having to look at the individual chorale notes or perform lengthy comparisons with other chorales. 3.2 Generating initial chorales As previously stated, the Bass line for the chorales is part of the original input. We considered constraining the initial generation of the notes in the Soprano, Alto and Tenor lines so that the pitches would always correspond to a “legal” note in the chord (as determined by the given chord type). Although this method has the potential to create solutions faster, it may actually make finding a final solution more difficult, depending upon the mutation methods that are employed. (For example, if mutation is limited to a one-step or half-step modification, then forcing the initially generated notes into the “correct” chord would be likely to restrict the chorale from developing in any direction other than the one initially generated, despite poor voice leading. Even random note mutation would be unlikely to create diversity, since a note modified from its original position in a technically correct chorale is most likely to produce a chorale with a lower fitness value.) Accordingly, we chose to generate random pitches for all initial chord values, without regard to how well such pitch initially fits the “goal” chord type. The only constraint used during initial generation was to restrict the notes in each voice to the appropriate range for that voice (e.g., a randomly created Alto voice note will always be between the values of 19 and 36, inclusive). Although this method frequently creates initial chords with low fitness values, it allows a wider range of effective mutation methods, some of which would become worthless if the initial pitch were always a legitimate part of the appropriate chord. The number of chords in each chorale is variable, based on the initial user input. 3.3 Fitness Function In order to ascertain the fitness of individual chorales, we evaluate the chorales in two different ways: by the fitness of the individual chords in the chorale (without reference to each other), known as the “chord” fitness; and by the fitness of each chord respective to other chords in the chorale, particularly the immediately preceding chords, known as the “voice leading” 3.4 Mutation, Selection and Survival Methods We considered a number of possible methods for chorale mutation, including some we elected not to use, such as swapping the pitch values of neighboring chords or voices within the same chord, and rotating the pitch values among all voices. The mutation 4 methods included in our implementation are: (1) Mutate one randomly chosen note in each chord of the chorale to a new randomly chosen note (conceivably a note can mutate to itself); (2) Mutate one randomly chosen note in each chord to a note that is within one whole step of the original note; (3) Stochastically change one randomly chosen note in each chord (72% chance that the note will be changed within one whole step of the original note, 18% chance the note is changed to a randomly selected note, and 10% chance it is not changed at all); (4) Mutate only one chord in the chorale using method #1 above; (5) Mutate only one chord in the chorale using method #3 above. We also implemented various types of parent selection methods including (1) each parent creates one child, (2) each parent creates N children, (3) fitness proportional selection of N children, and (4) crossover with fitness proportional selection of parents. Finally, we implemented two survival methods. The first method merged parents and children into one list, sorted by fitness, and truncated the list down to the parent population size. The second method was one where parents do not survive to the next generation. Instead, the list of children are sorted and truncated down to the parent population size. We then reduced the number of chords to four, and finally began to see positive results. While generally hovering around 0, the results were so much better that we were determined to find a way to enhance the algorithm so that longer chorales might also be able to converge to good results. When we translated the results to actual notes, however, we realized that the harmonization was not particularly good. Therefore, we enhanced the fitness evaluation function to take into account most of the items set forth above. While a larger number of generations was required for convergence, the four-chord chorales produced were much better representatives. Upon increasing the chord-size to eight, however, we again experienced major difficulties with convergence, and realized that a modification of the program was in order. After much deliberation, we determined that the chord fitness and voice leading fitness were battling each other (i.e., a mutation that seemed helpful on the voice leading front actually served to disrupt individual chord fitness). As McIntyre noted, voice leading fitness is not particularly helpful without the basic notion of correct chords [3]. Accordingly, we realized that chord fitness needed to be weighted much more heavily relative to voice leading fitness. The final result, creates a much larger gap between the two types of fitness than we initially would have imagined, but leaves the basic algorithm intact. While delving into the fitness function, we also reviewed our mutation methods. Our initial runs allowed two types of mutation: mutation of a randomly chosen note to a new random note, and mutation of a randomly chosen note to a note within one whole step of the original note value. Our initial runs suggested that the fully random mutation type was somewhat more successful at evolving chorales, although neither type was demonstrating particular success at that point. Accordingly, we added the third type: mutation of a random note by one of several methods chosen stochastically. Each of the three types mutated one note in each chord of every chorale. However, this new mutation did not perform significantly better than the first two. Finally, we determined that the negative convergence level might be improved by decreasing the mutation rate. Accordingly, we added the remaining two mutation methods set forth above: basically using the first and third methods applied only to one randomly chosen chord in the chorale rather than to each chord. The two improvements discussed above (increased 3.5 Testing Framework We initially began testing the algorithm with population sizes of 1000 over 1000 generations using 12 chord chorales and far fewer fitness rules than those set forth above. The initial implementation incorporated a penalty of -10.0 for each incorrect note in a chord and a penalty of up to -10.0 for improper voice leading. Without foresight as to how the algorithm would behave, these parameters seemed sufficient to provide a fairly significant test, as well as providing a reasonable time frame in which to expect convergence. Our initial trials, however, did not perform up to expectations. Our first response was to allow the algorithm to run longer, extending the number of generations first to 2000, then to 5000. Although this produced slightly better results, the fitness of the best chorales still remained firmly in the negative zone. On average, even continuing through 5000 generations, runs produced “best-so-far” values in the -700 to -500 range, still quite a distance from comprising the “technically correct” chorale presumed to have 0 value. 5 weighting of chord fitness and addition of mutation strategies with decreased mutation rates) dramatically changed the performance of our algorithm. We later added fitness proportional selection and one-point crossover with fitness proportional selection between two chorales (the crossover point being a randomly chosen chord). Using mutation method (4) over 1000 generations, the 8-chord chorales immediately began to converge to fitness values greater than 0. After several tests, we reverted to the 12-chord chorales, which also converged quickly. Accordingly, we decided to increase our initial input to a 25-chord chorale, a far more challenging test for the algorithm. When many of these chorales also converged, we reduced the number of generations to 500 in order to allow time to run several similar executions for comparison of particular results based on specific methods used. The results were very encouraging. However, individual runs of the 25-chord chorales were far more time-consuming than the shorter chorales had been: one 500-generation run typically took 25-30 minutes on a 166 MHz Pentium processor. Although we were able to run a number of different 500-generation runs, we realized we would be unable to complete a sufficient number to make appropriately tested conclusions regarding the different types of mutation method, etc. Upon review of some of the runs, however, we realized that most mutation methods portrayed their basic characteristics within the first two hundred generations, although their fitness usually continued to increase. Accordingly, we executed a new series of runs on all combinations of Parent Selection, Child Mutation, and Survival methods (for a total of 40 different combinations) using the following standardized parameters: 25 chords, 500 Parents, 1000 Children, 250 generations and a figured Bass line extracted from J.S. Bach’s O Herzenangst (Note: the one exception to the above parameters is Parent Selection Method #1, where each parent produces 1 offspring, thereby producing 500 children). Figure 1 below shows the chorale for an initial random note generation and its corresponding fitness (Maximum possible fitness for a 25 chord chorale is 47.5). The fitness level of the chorales became zero (a technically “correct” chorale) at approximately generation 40. However further generations continued to improve the fitness, leveling off at approximately generation 100. Later generations did not significantly improve upon the fitness score. Figure 2 below shows the chorale at generation 500 (the end of the run). Fig. 2. O Herzenangst, Generation 500 (Fitness = 45.4) 4 Conclusions and Future Work Fitness proportional selection of parents dramatically increased the effectiveness of our algorithm, performing far and away more effectively than uniform parent distribution. Increasing the brood size appeared to increase performance for all eligible child creation schemes. Finally, when we added the option of one-point crossover together with fitness proportional selection, we found that this option consistently outperformed all others. Based on our testing history and the above results, we conclude that an evolutionary algorithm can be a useful vehicle for generation of 4-part harmony in the style of J.S. Bach. We are especially satisfied with our representation of chord types, which obviates any need to constrain the algorithm to a certain key, and furthermore allows modulation among different keys. Using the figured Bass line as input enabled us to create an algorithm that produced satisfactory harmonies in a relatively short time frame. However, we believe the algorithm can be easily modified to allow generation of the Bass line as well as the other three voices, given a chord progression. Although we disagree with McIntyre’s use of a step-wise “tiered” approach in which chorales must “pass” one level at an 85% fitness level before the next level’s issues are even considered, we did learn that some type of solution is necessary to insure that the system progresses primarily towards chorales that contain basically correct chords. Accordingly, we Fig. 1. O Herzenangst, Generation 0 (Fitness = -1413.7) 6 heavily weighted our basic chord correctness criteria, while leaving remaining criteria in place. This strategy appears to provide an excellent mechanism for driving towards both criteria simultaneously while maintaining a strong preference for chorales containing correct chords. Based on our testing, we conclude that the mutation rate is a significant factor in the likelihood that the algorithm will produce successful convergence rather than “stalling out” at a fairly low fitness value. Specifically, continuing progress is benefited most by having no more than a small portion of each chorale affected by mutation at any given time. Surprisingly, random mutation appears to work more effectively than constrained mutation. Although multiple mutation methods produce satisfactory results, perhaps the random quality produces a fuller spectrum of children to choose from, allowing better progress for the algorithm as a whole. Finally, the survival selection method (between merging parents with children and pruning less fit individuals vs. replacing parents with children) did not seem to be particularly significant in the overall outcome. Generally, merging parents with children did produce a steeper fitness curve. However, when the otherwise more successful strategies were compared, a comparison of the survival strategy produced almost identical curves. The algorithm could certainly be extended to increase functionality and to enhance ease of use. Potential future developments include (1) adding a more sophisticated fitness evaluation (including fine tuning the penalties and rewards); (2) adding an extension whereby the algorithm would develop fourpart harmony using the chord progression alone, without the benefit of the Bass line; and (3) introducing non-harmonic tones into the chorale generation. In addition, it might be worthwhile to consider a “cooperation” model, in which chorales would be divided into “sub-populations” of their individual chords. Although the current implementation appears to work well, the efficiency of the algorithm might be greatly enhanced if run on parallel processors. Individual chords provide ready-made units for such parallel analysis. would also like to acknowledge R. Duane King for his helpful comments. References: [1] Bach, Johann Sebastian (1736, 1832). 371 Harmonized Chorales and 69 Chorale Melodies with figured bass, edited by Albert Riemenschneider (1941). G. Schirmer, New York, N.Y. [2] Aldwell, Edward and Schachter, Carl (1978). Harmony and Voice Leading, Volume 1. Harcourt Brace Jovanovich, Inc., New York, N.Y. [3] McIntyre, Ryan A. (1994). Bach in a Box: The Evolution of Four Part Baroque Harmony Using the Genetic Algorithm. In First IEEE Conference on Evolutionary Computation, pp. 852-857. [4] Wiggins, G.A., Papadopoulos, G., PhonAmnuaisuk, S., and Tuson, A. (1998). Evolutionary Methods for Musical Composition. In Naranjo, M. and Deliege, I., editors, Proceedings of the CASYS’98 Workshop on Anticipation, Cognition and Music, Liege, Belgium. Also available as a Research Paper from the Department of Artificial Intelligence, University of Edinburgh. About the Authors: Tamara A. Maddox teaches Computer Ethics and Computer Science at George Mason University in Fairfax, Virginia. John E. Otten was formerly a professional musician and music educator. He is now employed as a computer scientist at Veridian/MRJ Technology Solutions in Fairfax, Virginia. Acknowledgements: We would like to thank Dr. Kenneth A. DeJong for his assistance in putting together this research. We 7