Download Using an Evolutionary Algorithm to Generate Four-Part

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Tonality wikipedia , lookup

Circle of fifths wikipedia , lookup

Traditional sub-Saharan African harmony wikipedia , lookup

Schenkerian analysis wikipedia , lookup

Chord names and symbols (popular music) wikipedia , lookup

Chord (music) wikipedia , lookup

Figured bass wikipedia , lookup

Harmony wikipedia , lookup

Transcript
Using an Evolutionary Algorithm to Generate
Four-Part 18th Century Harmony
TAMARA A. MADDOX
Department of Computer Science
George Mason University
Fairfax, Virginia
USA
JOHN E. OTTEN
Veridian/MRJ Technology Solutions
Fairfax, Virginia
USA
Abstract: - We discuss the use of various types of evolutionary algorithms for the generation of 18th-century
four-part musical chorales in the style of J.S. Bach. We briefly summarize known existing work on
computerized music generation, and some of the roadblocks that have been reached. We propose the use of
fitness functions embodying specific constraints to remove human subjectivity from the evaluation process. We
compare multiple evolutionary selection and mutation methods, and we conclude that: (1) Chord fitness must be
valued much more highly than voice leading fitness; (2) The use of multiple-chord mutation causes stalling of
the algorithm at a low fitness value; and (3) The use of fitness proportional selection of parents and one-point
crossover produces significantly better results than uniform parent selection.
Key-Words: - Evolutionary Computation, Genetic Algorithms, Bach, Chorale, Music, Harmony, Voice Leading
1 Introduction
Our goal was to use an evolutionary algorithm to
create a musical construct with an objectively
discernible level of quality. Attempting to create
subjectively appropriate music beginning with
random notes is somewhat akin to the traditional
hypothesis that a monkey hitting random keys on a
typewriter theoretically will eventually turn out a
Shakespearean play where the time period involved
would be so lengthy as to be completely impractical.
In addition, if the hope were to obtain an original
work of literature, rather than simply a letter-by-letter
copy of a known work, it would be extremely difficult
to programmatically judge any such output without
human intervention.
Generation of random musical notes, too, might
eventually turn out something worthwhile, but
without (1) reasonable constraints, and (2) a method
for automatic determination of quality, such a project
seems doomed to failure due to the inordinate time
periods necessary, as well as the limitations arising
from individual taste and bias. Due to the inherently
subjective nature of judging the general quality of a
musical composition, we have concentrated upon a
narrower focus: the writing of four-part harmony by
following specific rules predetermined to embody the
style of chorales written by J.S. Bach [1]. These rules
are widely accepted as embodying the concepts
necessary to compose “correct” 18th century style
chorales, and regularly are used as the basis for
teaching undergraduate music majors techniques for
such composition [2]. Although the rules are fairly
straightforward, the potential for thousands of
combinations of voice lines can makes it extremely
difficult to write a chorale in 18th-century style while
adhering to all the accepted rules.
We propose the use of an evolutionary algorithm to
attempt this task. By focusing on a musical form with
predefined rules, we are able to encode many of the
rules into our evolutionary algorithm’s fitness
function. In this way, we avoid the otherwise
predictable requirement of subjective human
intervention as the basis for fitness determination,
which not only would cripple the algorithm, but also
would slow down the overall process so tremendously
as to make it basically untenable to achieve
meaningful results. Potential uses for an algorithm of
this type include verification of previously written
chorales (including grading student efforts), and
generation of new chorales for a given chord
progression and Bass line. The algorithm may also be
useful to music theorists as a means to study
hypotheses regarding the addition, modification, or
removal of various voice-leading rules and how they
affect resulting chorales.
To establish the framework for generating chorales,
we supply a line for the Bass voice and a chord
progression. Based on these data, the algorithm will
create 4-part chorales by filling in the Soprano, Alto,
and Tenor voices and then applying the rules for 18th
century harmony and voice-leading to each individual
chorale in order to determine its “fitness.”
In analyzing a chorale, a number of different
criteria must be taken into account. The figured Bass
line provided as input allows the fitness function to
evaluate whether a given chord is composed of
allowable pitches. Each time a chord contains an
improper pitch, the chorale is penalized. However,
even when chord pitches are technically “correct”
given a figured Bass progression, they may violate
established rules of voice leading. Each chord can
consist of 3 or 4 distinct pitches. However, the
position of these pitches (or voicing) in the chord can
affect the flow of voicing for later chords based upon
the various rules for voice leading.
In order to adapt four-part harmony to an
evolutionary algorithm, we generate full chorales
using completely random notes for all voices except
the Bass line (provided as input) and then use the
rules for both chord structure and voice leading to
determine the fitness of each chorale. The algorithm
then creates “offspring” and prunes the chorale
“population” using user selected evolutionary
computational methods in an attempt to evolve
chorales that fit the rules as closely as possible. We
test multiple mutation mechanisms, multiple child
selection mechanisms and multiple parent selection
mechanisms in an effort to determine both the most
effective type of evolutionary algorithm and the most
effective specific mutation method for this type of
problem.
Although initially we encountered great difficulties
in achieving more than rudimentary convergence for a
very short chorale segment, we were able to overcome
these initial difficulties by making four major
modifications in our approach: (1) a less greedy
mutation method; (2) a more carefully refined fitness
evaluation function; (3) the use of a brood size larger
than that of the parents; and (4) the use of fitnessproportional parental selection and one-point
crossover with random mutation for child generation.
The final results were quite impressive, achieving
chorales with significantly higher fitness in a much
shorter number of generations. In addition to being
technically “correct,” many of these chorales also
incorporated a number of the “optional” fitness
criteria that would lead to a higher fitness score.
2 Background
The bulk of past work involving music generation
seems to have required subjective analysis to
determine the fitness level of any individual work
produced by the algorithm. In fact, when we
conceived this project, we were unaware that this
typical background had been expanded. In the course
of our work on this project, however, we uncovered
several recent efforts similar to our own – that is,
efforts using constraints of a particular musical style
to provide methods of objective, automated
assessment of output. We discuss here two such
systems.
In his paper “Bach in a Box” [3], Ryan McIntyre
sets forth a variant approach towards generating fourpart 18th-century harmony. McIntyre uses a fouroctave range and provides the melody (Soprano line)
as the initial input, rather than the figured Bass used
here. Besides a number of differences related to the
initial input, McIntyre’s system of generation differs
from the one presented here in two significant
respects: (1) the system is specifically limited to the
key of C Major, thus prohibiting not only other key
signatures, but also modulation of key signatures
within an individual chorale (a standard practice in
Bach chorales); and (2) the system uses a “stepwise,
three-tiered” approach which McIntyre touts as being
a practical necessity to achieving reasonable success
in chorale fitness, but which also limits the ability of
the algorithm to work simultaneously towards correct
chord structure and appropriate voice leading.
McIntyre concludes, as we do, that an evolutionary
algorithm can be a useful tool for generating four-part
harmony in the style of J.S. Bach.
A more recent paper by Wiggins, et al. [4],
2
discusses a broader implementation of a system for
generation of four-part harmony, but again, provides
the melody line as the initial input. While this can
certainly be viewed as a more difficult problem,
Wiggins’ algorithm was unable to provide acceptable
solutions within 300 generations, leading the authors
to conclude that “a conventional rule-based system
(perhaps in conjunction with as [sic] one or more
GAs) is a more appropriate method for the
harmonization task.”[4] Additionally, the authors
seemed primarily concerned with whether the
algorithm correctly simulated human behavior, a
concern not at issue to us.
efforts simultaneously while moving more steadily
towards chorale solutions containing correct chords.
Out of the different parent/child selection
mechanisms and mutation methods used, several
combinations produced output providing acceptable
solutions within 300 generations (an improvement
over Wiggins’ results, although Wiggins’ did use
some fitness constraints not present in our system),
while also providing versatility of key signature and
modulation not present in McIntyre’s system.
3.1 Representation of the chorales
Each chorale is represented by a structure containing
an array of chords (the “individual” chorale), together
with information concerning various aspects of fitness
of that chorale, the key signature input by the user
(not technically necessary), the initial ancestor for this
chorale’s “generation line” and a flag designating
whether this chorale has ever placed within the top ten
chorales when ranked by fitness.
Each chord is itself represented by a structure that
contains the initial input information regarding chord
type and Bass voice value (in order to evaluate the
chord’s fitness) and the notes currently contained in
the chord. The notes are represented by an integer
array with values between 4 and 43, inclusive,
representing the notes from E nearly two octaves
below Middle C through G 1 ½ octaves above middle
C. (Since C is commonly used as a starting point, we
have established the C two octaves below Middle C as
represented by the integer 0. However, since that note
is below the established range for the Bass voice, the
first legitimate note value is actually 4, representing
E, which is the bottom of the Bass voice range.)
Since pitch value generally corresponds to the twelve
values of the chromatic scale (A, A#/Bb, B, C, C#/Db,
D, D#/Eb, E, F, F#/Gb, G, and G#/Ab), this
representation provides an easy method to determine
the pitch of any given note, since { note value modulo
12 } will always provide the proper pitch value. In
our notation, therefore, since we begin with C, the
pitch value of C=0, C#/Db =1, D=2, D#/Eb=3, and so
on through B=11.
The chord structure also contains a matrix
representing the interval relationships among all four
voices. This matrix allows the program to generate
chorales independent of any specific key, since the
individual notes and the relationships among the four
voices are all that is necessary to evaluate the chord
type and voice leading and thereby determine the
3 Our Approach
As previously mentioned, our approach uses the
figured Bass as system input rather than the melody
line used by the systems mentioned above. This
provides our system with clearer information
regarding chord progression, thus allowing a more
tailored fitness function. While arguably creating a
“simpler” problem, using the figured Bass as input
also creates a more clearly defined focus for results,
thus setting up an environment more geared towards a
successful outcome.
Additionally, our chorale representation is not
confined to any specific key signature. Not only may
any key be used in a given chorale, but a change of
key (or “modulation”) is permissible within individual
chorales. This aspect is quite significant, since we are
attempting to create harmony in the style of J.S. Bach,
who typically incorporated such modulation into his
own work.
Our approach also uses a single evaluation level for
determination of chorale fitness. McIntyre states
quite firmly that “it is important to first develop a base
of individuals who have proper chord spellings, and
continue from there.”[3]
While we agree that
“correctness” of individual chords does take
precedence over voice leading, a bad voicing in the
first or second chord of a chorale can sometimes make
it impossible to have a proper “solution” for the rest
of the chorale. For this reason, we believe that
McIntyre’s “step-wise three-tiered” approach is
inherently limited and thus less likely to find the best
harmonization solutions.
Our solution to the
difficulties of struggles between the two fitness areas
was simply to weight chord correctness significantly
higher than good voice leading. This allowed the
algorithm to continue to work on both evaluation
3
chord’s fitness. Finally, the structure contains the
current value of the chord’s fitness, based both on
note makeup and on voice leading from the preceding
chord.
fitness.
In evaluating the individual chords independently,
the primary concern involves whether the subject
chord contains “illegal” notes – that is, notes that do
not legitimately fall within the chord’s intended chord
type. However, other rules are also considered. Our
present implementation considers five additional
items that cause penalties if present: (1) voice
crossing; (2) notes present that are outside the proper
range for their intended voices (this is necessary
because although the initial parent chorales keep
voices within their proper range, no such constraint is
made on some mutations); (3) doubled leading tones
for chords V and vii; (4) intervals of more than one
octave between any adjacent upper voices (e.g.,
Soprano and Alto or Alto and Tenor); and (5)
doubling other pitches in a chord. In addition, if all
pitches in a chord are represented (implemented by
determining the number of distinct notes in the
chord), the chord fitness is rewarded.
In evaluating the “voice leading” fitness of a chord,
our fitness function penalizes the following items: (1)
parallel 5ths; (2) parallel unisons or octaves; (3) voice
“leap” greater than one octave; (4) improper
resolution of a large leap; and (5) overlapping voices.
Voice leading fitness is also rewarded for correct
resolution of voice leaps. Thus the fitness for voice
leading of a given chord depends only on the two
chords immediately preceding it.
These criteria are used primarily to penalize the
fitness of chorales having certain properties; fitness
“rewards” occur only rarely. This methodology is a
deliberate attempt to create chorale fitness values that
will remain negative so long as the subject chorale has
any significant flaws, according to traditional rules.
In other words, chorales having a fitness value >= 0
are essentially “correct,” though perhaps not as well
harmonized as they could be. This method allows the
user to glean an immediate idea of the chorale’s
overall performance based solely on numeric fitness
value, without having to look at the individual chorale
notes or perform lengthy comparisons with other
chorales.
3.2 Generating initial chorales
As previously stated, the Bass line for the chorales is
part of the original input. We considered constraining
the initial generation of the notes in the Soprano, Alto
and Tenor lines so that the pitches would always
correspond to a “legal” note in the chord (as
determined by the given chord type). Although this
method has the potential to create solutions faster, it
may actually make finding a final solution more
difficult, depending upon the mutation methods that
are employed. (For example, if mutation is limited to
a one-step or half-step modification, then forcing the
initially generated notes into the “correct” chord
would be likely to restrict the chorale from developing
in any direction other than the one initially generated,
despite poor voice leading. Even random note
mutation would be unlikely to create diversity, since a
note modified from its original position in a
technically correct chorale is most likely to produce a
chorale with a lower fitness value.)
Accordingly, we chose to generate random pitches
for all initial chord values, without regard to how well
such pitch initially fits the “goal” chord type. The
only constraint used during initial generation was to
restrict the notes in each voice to the appropriate
range for that voice (e.g., a randomly created Alto
voice note will always be between the values of 19
and 36, inclusive). Although this method frequently
creates initial chords with low fitness values, it allows
a wider range of effective mutation methods, some of
which would become worthless if the initial pitch
were always a legitimate part of the appropriate
chord.
The number of chords in each chorale is variable,
based on the initial user input.
3.3 Fitness Function
In order to ascertain the fitness of individual chorales,
we evaluate the chorales in two different ways: by the
fitness of the individual chords in the chorale (without
reference to each other), known as the “chord” fitness;
and by the fitness of each chord respective to other
chords in the chorale, particularly the immediately
preceding chords, known as the “voice leading”
3.4 Mutation, Selection and Survival Methods
We considered a number of possible methods for
chorale mutation, including some we elected not to
use, such as swapping the pitch values of neighboring
chords or voices within the same chord, and rotating
the pitch values among all voices. The mutation
4
methods included in our implementation are: (1)
Mutate one randomly chosen note in each chord of the
chorale to a new randomly chosen note (conceivably a
note can mutate to itself); (2) Mutate one randomly
chosen note in each chord to a note that is within one
whole step of the original note; (3) Stochastically
change one randomly chosen note in each chord (72%
chance that the note will be changed within one whole
step of the original note, 18% chance the note is
changed to a randomly selected note, and 10% chance
it is not changed at all); (4) Mutate only one chord in
the chorale using method #1 above; (5) Mutate only
one chord in the chorale using method #3 above.
We also implemented various types of parent
selection methods including (1) each parent creates
one child, (2) each parent creates N children, (3)
fitness proportional selection of N children, and (4)
crossover with fitness proportional selection of
parents.
Finally, we implemented two survival methods. The
first method merged parents and children into one list,
sorted by fitness, and truncated the list down to the
parent population size. The second method was one
where parents do not survive to the next generation.
Instead, the list of children are sorted and truncated
down to the parent population size.
We then reduced the number of chords to four, and
finally began to see positive results. While generally
hovering around 0, the results were so much better
that we were determined to find a way to enhance the
algorithm so that longer chorales might also be able to
converge to good results. When we translated the
results to actual notes, however, we realized that the
harmonization was not particularly good. Therefore,
we enhanced the fitness evaluation function to take
into account most of the items set forth above. While
a larger number of generations was required for
convergence, the four-chord chorales produced were
much better representatives. Upon increasing the
chord-size to eight, however, we again experienced
major difficulties with convergence, and realized that
a modification of the program was in order.
After much deliberation, we determined that the
chord fitness and voice leading fitness were battling
each other (i.e., a mutation that seemed helpful on the
voice leading front actually served to disrupt
individual chord fitness). As McIntyre noted, voice
leading fitness is not particularly helpful without the
basic notion of correct chords [3]. Accordingly, we
realized that chord fitness needed to be weighted
much more heavily relative to voice leading fitness.
The final result, creates a much larger gap between
the two types of fitness than we initially would have
imagined, but leaves the basic algorithm intact.
While delving into the fitness function, we also
reviewed our mutation methods. Our initial runs
allowed two types of mutation: mutation of a
randomly chosen note to a new random note, and
mutation of a randomly chosen note to a note within
one whole step of the original note value. Our initial
runs suggested that the fully random mutation type
was somewhat more successful at evolving chorales,
although neither type was demonstrating particular
success at that point. Accordingly, we added the third
type: mutation of a random note by one of several
methods chosen stochastically. Each of the three
types mutated one note in each chord of every
chorale. However, this new mutation did not perform
significantly better than the first two.
Finally, we determined that the negative
convergence level might be improved by decreasing
the mutation rate.
Accordingly, we added the
remaining two mutation methods set forth above:
basically using the first and third methods applied
only to one randomly chosen chord in the chorale
rather than to each chord.
The two improvements discussed above (increased
3.5 Testing Framework
We initially began testing the algorithm with
population sizes of 1000 over 1000 generations using
12 chord chorales and far fewer fitness rules than
those set forth above. The initial implementation
incorporated a penalty of -10.0 for each incorrect note
in a chord and a penalty of up to -10.0 for improper
voice leading. Without foresight as to how the
algorithm would behave, these parameters seemed
sufficient to provide a fairly significant test, as well as
providing a reasonable time frame in which to expect
convergence. Our initial trials, however, did not
perform up to expectations.
Our first response was to allow the algorithm to run
longer, extending the number of generations first to
2000, then to 5000. Although this produced slightly
better results, the fitness of the best chorales still
remained firmly in the negative zone. On average,
even continuing through 5000 generations, runs
produced “best-so-far” values in the -700 to -500
range, still quite a distance from comprising the
“technically correct” chorale presumed to have 0
value.
5
weighting of chord fitness and addition of mutation
strategies with decreased mutation rates) dramatically
changed the performance of our algorithm. We later
added fitness proportional selection and one-point
crossover with fitness proportional selection between
two chorales (the crossover point being a randomly
chosen chord). Using mutation method (4) over 1000
generations, the 8-chord chorales immediately began
to converge to fitness values greater than 0. After
several tests, we reverted to the 12-chord chorales,
which also converged quickly. Accordingly, we
decided to increase our initial input to a 25-chord
chorale, a far more challenging test for the algorithm.
When many of these chorales also converged, we
reduced the number of generations to 500 in order to
allow time to run several similar executions for
comparison of particular results based on specific
methods used. The results were very encouraging.
However, individual runs of the 25-chord chorales
were far more time-consuming than the shorter
chorales had been: one 500-generation run typically
took 25-30 minutes on a 166 MHz Pentium processor.
Although we were able to run a number of different
500-generation runs, we realized we would be unable
to complete a sufficient number to make appropriately
tested conclusions regarding the different types of
mutation method, etc. Upon review of some of the
runs, however, we realized that most mutation
methods portrayed their basic characteristics within
the first two hundred generations, although their
fitness usually continued to increase. Accordingly,
we executed a new series of runs on all combinations
of Parent Selection, Child Mutation, and Survival
methods (for a total of 40 different combinations)
using the following standardized parameters: 25
chords, 500 Parents, 1000 Children, 250 generations
and a figured Bass line extracted from J.S. Bach’s O
Herzenangst (Note: the one exception to the above
parameters is Parent Selection Method #1, where each
parent produces 1 offspring, thereby producing 500
children). Figure 1 below shows the chorale for an
initial random note generation and its corresponding
fitness (Maximum possible fitness for a 25 chord
chorale is 47.5).
The fitness level of the chorales became zero (a
technically “correct” chorale) at approximately
generation 40. However further generations continued
to improve the fitness, leveling off at approximately
generation 100. Later generations did not significantly
improve upon the fitness score. Figure 2 below shows
the chorale at generation 500 (the end of the run).
Fig. 2. O Herzenangst, Generation 500
(Fitness = 45.4)
4 Conclusions and Future Work
Fitness proportional selection of parents
dramatically increased the effectiveness of our
algorithm, performing far and away more effectively
than uniform parent distribution. Increasing the brood
size appeared to increase performance for all eligible
child creation schemes. Finally, when we added the
option of one-point crossover together with fitness
proportional selection, we found that this option
consistently outperformed all others.
Based on our testing history and the above results,
we conclude that an evolutionary algorithm can be a
useful vehicle for generation of 4-part harmony in the
style of J.S. Bach. We are especially satisfied with
our representation of chord types, which obviates any
need to constrain the algorithm to a certain key, and
furthermore allows modulation among different keys.
Using the figured Bass line as input enabled us to
create an algorithm that produced satisfactory
harmonies in a relatively short time frame. However,
we believe the algorithm can be easily modified to
allow generation of the Bass line as well as the other
three voices, given a chord progression.
Although we disagree with McIntyre’s use of a
step-wise “tiered” approach in which chorales must
“pass” one level at an 85% fitness level before the
next level’s issues are even considered, we did learn
that some type of solution is necessary to insure that
the system progresses primarily towards chorales that
contain basically correct chords. Accordingly, we
Fig. 1. O Herzenangst, Generation 0
(Fitness = -1413.7)
6
heavily weighted our basic chord correctness criteria,
while leaving remaining criteria in place. This
strategy appears to provide an excellent mechanism
for driving towards both criteria simultaneously while
maintaining a strong preference for chorales
containing correct chords.
Based on our testing, we conclude that the mutation
rate is a significant factor in the likelihood that the
algorithm will produce successful convergence rather
than “stalling out” at a fairly low fitness value.
Specifically, continuing progress is benefited most by
having no more than a small portion of each chorale
affected by mutation at any given time.
Surprisingly, random mutation appears to work
more effectively than constrained mutation. Although
multiple mutation methods produce satisfactory
results, perhaps the random quality produces a fuller
spectrum of children to choose from, allowing better
progress for the algorithm as a whole.
Finally, the survival selection method (between
merging parents with children and pruning less fit
individuals vs. replacing parents with children) did
not seem to be particularly significant in the overall
outcome. Generally, merging parents with children
did produce a steeper fitness curve. However, when
the otherwise more successful strategies were
compared, a comparison of the survival strategy
produced almost identical curves.
The algorithm could certainly be extended to
increase functionality and to enhance ease of use.
Potential future developments include (1) adding a
more sophisticated fitness evaluation (including fine
tuning the penalties and rewards); (2) adding an
extension whereby the algorithm would develop fourpart harmony using the chord progression alone,
without the benefit of the Bass line; and (3)
introducing non-harmonic tones into the chorale
generation.
In addition, it might be worthwhile to consider a
“cooperation” model, in which chorales would be
divided into “sub-populations” of their individual
chords. Although the current implementation appears
to work well, the efficiency of the algorithm might be
greatly enhanced if run on parallel processors.
Individual chords provide ready-made units for such
parallel analysis.
would also like to acknowledge R. Duane King for his
helpful comments.
References:
[1] Bach, Johann Sebastian (1736, 1832). 371
Harmonized Chorales and 69 Chorale Melodies with
figured bass, edited by Albert Riemenschneider
(1941). G. Schirmer, New York, N.Y.
[2] Aldwell, Edward and Schachter, Carl (1978).
Harmony and Voice Leading, Volume 1. Harcourt
Brace Jovanovich, Inc., New York, N.Y.
[3] McIntyre, Ryan A. (1994). Bach in a Box: The
Evolution of Four Part Baroque Harmony Using the
Genetic Algorithm. In First IEEE Conference on
Evolutionary Computation, pp. 852-857.
[4] Wiggins, G.A., Papadopoulos, G., PhonAmnuaisuk, S., and Tuson, A. (1998).
Evolutionary Methods for Musical Composition. In
Naranjo, M. and Deliege, I., editors, Proceedings of
the CASYS’98 Workshop on Anticipation, Cognition
and Music, Liege, Belgium. Also available as a
Research Paper from the Department of Artificial
Intelligence, University of Edinburgh.
About the Authors:
Tamara A. Maddox teaches Computer Ethics and
Computer Science at George Mason University in
Fairfax, Virginia.
John E. Otten was formerly a professional musician
and music educator. He is now employed as a
computer scientist at Veridian/MRJ Technology
Solutions in Fairfax, Virginia.
Acknowledgements:
We would like to thank Dr. Kenneth A. DeJong for
his assistance in putting together this research. We
7