Download PDF 565k - Signata

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Language development wikipedia , lookup

Origin of language wikipedia , lookup

Annales des sémiotiques / Annals of Semiotics
6 | 2015
Sémiotique de la musique
Seven Theses on the Biology of Music and
Bjorn Merker
Presses universitaires de Liège (PULg)
Electronic version
DOI: 10.4000/signata.1081
ISSN: 2565-7097
Printed version
Date of publication: 31 December 2015
Number of pages: 195-213
ISBN: 978-2-87562-087-3
ISSN: 2032-9806
Electronic reference
Bjorn Merker, « Seven Theses on the Biology of Music and Language », Signata [Online], 6 | 2015,
Online since 31 December 2016, connection on 29 March 2017. URL :
Signata - PULg
muSic, Song, language
Seven heses on the Biology of Music and Language
Bjorn Merker
he nature and origin of music, its relation to language, and to our biology more
generally have been active topics of inquiry and debate at least since Darwin and
Spencer pondered them in the nineteenth century (Darwin 1871; Spencer 1911).
hey are still subject to both debate and inquiry (Wallin et al. 2000; Honing et al.
2015). As a contribution to this Signata special issue on “Music and Meaning” I
present seven theses on the biology of music and language, with annotation for
each thesis.
he theses themselves are given in compressed form, and then expanded
upon in an annotation following each thesis. hese annotations do not present
comprehensive treatments of any given thesis, but rather are intended to explicate
each thesis suiciently to make its compressed form comprehensible, and to
provide the reader with key references to relevant treatments of its topic in the
Taken together the theses represent a summary of my conceptual commitments
regarding topics that I perceive to have received insuicient attention or even
to have sufered outright neglect in our attempts to understand the nature and
biology of music and language. Rather than embodying a theory of the biology of
music and language (for which see Merker 2012) they are intended as a primer of
facts and concepts that are all too easily overlooked in fashioning such a theory.
Music, Song, Language
hesis 1: Generativity
he most fundamental property shared by human language and music is their
membership in the small class of generative systems based on the so called “particulate
principle”, by which unlimited pattern diversity is generated through combinations
among a inite (typically small) set of non-blending elements (von Humboldt 1836;
Abler 1989; Merker 2002). his, rather than recursion, is the key to the unbounded
generativity of music and language.
In a too seldom noted 1989 paper, William Abler combined Wilhelm von
Humboldt’s idea regarding the combinatorics of human language with the nonblending principle of genetic recombination from Roland Fisher to deine the
“particulate principle of self-diversifying systems” (Abler 1989). Abler highlighted
three principal instantiations of such “Humboldt systems”: chemistry and
genetics employing physical particles (atoms and nucleotides, respectively), and
human language employing conventional particles (phonemes) for combinatorial
purposes. Abler mentioned music in passing in this connection but did not
elaborate. Music does, however, fully qualify as a second human Humboldt system
on account of its structural and generative characteristics, as detailed in my paper
“Music: he missing Humboldt system” (Merker 2002).
Fig. 1. Cartoon illustration of the “particulate principle of self-diversifying systems”, following
Abler (1989). (a) A “blending system” in which combining ingredients average. Here exemplified
by a drop of ink in water: the combining elements do not generate a qualitatively new entity. Other
examples are most mixtures of liquids as well as gases, as in weather systems, and patterns of heat
conduction. (b) A “particulate system”, in which the combining elements generate a qualitatively
new entity by retaining their individuality on combining. (c) A miniscule sample of the infinite
generativity of a combinatorics of as few as one or two discrete non-averaging elements.
In such systems the combining elements must be non-blending in the sense
of not producing an average when combined (Fisher 1930), i.e. they must retain
their individuality on combining (Figure 1a, b). When that is the case, each such
Seven Theses on the Biology of Music and Language
combination “creates something which is not present per se in any of the associated
constituents” (von Humboldt 1836, p. 67), making ininite pattern variety possible
(Figure 1c). his is the badly named “digital ininity” of Chomskyan linguistics,
where the “digital” term stands for the discrete, non-blending aspect of particulate
combinatorics. When Chomsky stated that “Language is, at its core, a system that
is both digital and ininite. To my knowledge, there is no other biological system
with these properties” (Chomsky 1991, p. 50), he was only displaying the limits of
his knowledge, because a total of four major such systems are in fact lourishing
in and all around us. Two of these are independent of humans (chemistry and
genetics), while two of them lie at the heart of human culture (music and language),
as itemized in Table 1.
All molecules
All life forms
All “melodies”
All sentences
Particle type
Table 1. he principal open-ended or “self-diversifying” systems.
By combining inite sets of particulate elements music and language create
composite patterns without limit. he role of phonemes, and their higher order
combinations into morphemes, in the “duality of patterning” of human language
for this purpose is well known (Hockett 1960). Music derives its particulate
elements from a radical reduction in the degrees of freedom available to vocal or
instrumental sound production by discretizing two continua, those of pitch and
duration, to yield musical notes with determinate pitch and - in all rhythmic music
- discrete durations (based upon isochronous discretization of time: Merker 2002,
2014a; Merker et al. 2009). he result is the variety of musical notes that serve as
grist for the combinatorial mill of musical patterns of boundless variety.
It is in their capacity of Humbold systems that music and language are most
fruitfully compared and contrasted. It may even be that there is a parallelism
between the two “natural” and the two “cultural” Humbold systems with respect to
their genesis. Chemistry obviously is a necessary though not suicient prerequisite
and antecedent for the genetics of life forms. Similarly music in the form of human
song may be a necessary but not suicient prerequisite and antecedent for human
language (see further heses 2, 3, and 4). We might, in other words, have been
singing apes before we became talking humans, a possibility that will receive
further notice in what follows.
Music, Song, Language
hesis 2: Genesis
Natural selection is but one - and possibly the least important one - of three
agencies likely to have driven the genesis of music and language. he second of these
is sexual selection, in accordance with Zahavi’s “handicap principle”, particularly
as elaborated in the “developmental stress hypothesis” by Hasselquist. he third is
the “learner bottleneck” in cultural transmission across generations, whose powers
suice to transform random strings into a shared grammar without either Darwinian
selection or reinforcement of outcomes (Kirby 1998; Kirby et al. 2008 and references
he powers of human language to communicate information are so formidable that
we all too readily assume that this utility itself drove the evolution of language by
natural selection in our line (for which see Pinker & Bloom 1995). If so, one would
expect that some of the other large-brained animals on earth would have evolved
some version of it as well. Yet among the 15000 species of birds and mammals
(Clements 2007; Jones & Sai 2011) - the two taxonomic groups that contain all the
world’s large-brained species - there is not a single species with language except
our own (Maynard Smith and Szathmáry 1995). his indicates that language is
something that simply does not tend to evolve in the animal kingdom.
Any theory of language origin must account for this reluctance of biological
evolution to produce language, despite its utility once in hand. here are few
ways to escape this quandary except to assume that to arrive at language a species
or lineage must come into possession of a number of independent traits or
adaptations each of which is rare considered by itself, and all of which together
are needed as prerequisites for or stepping stones to language. In such a situation,
multiplicative probabilities determine outcomes, and since each trait is rare, their
small probabilities multiplied might shrink the product to the ininitesimal level
needed to ensure the empirical end result: a single species with language among
the millions of animal species on earth. A gapless and biologically plausible path to
human language traversing such rapidly dwindling probabilities has been charted
in two previous publications of mine (Merker and Okanoya 2007; Merker 2012).
According to the scenario presented there, the expressive powers of our
early Homo ancestors did not evolve to serve the communication of information,
but as means to impress potential mates and rivals by elaborately structured but
unsemanticized vocal displays based on vocal production learning. hey evolved, in
other words, through sexual selection, in keeping with the handicap principle and
the “developmental stress hypothesis” for the evolution of complex learned song in
many songbirds and some mammals (Hasslequist et al. 1996; Nowicki et al. 2002;
Zahavi and Zahavi 1997). Species thus equipped with the specialized telencephalic
mechanism of vocal production learning (see hesis 4) become carriers of cultural
traditions of complex learned song.
Seven Theses on the Biology of Music and Language
Once a species maintains a complex, learned song tradition, the process of
intergenerational transmission itself begins to shape and transform the patterns
of the transmitted vocal “lore”. When unconstrained by a species-speciic song
template, as in vocal mimics, this process is even capable of transforming an initial
state in which songstrings of arbitrary patterning are randomly distributed across
individuals into a state in which they all share a formal grammar. Yet contrary
to what one might expect, this process does not require any form of Darwinian
selection or reinforcement of outcomes, as shown in large-scale computer
simulations of populations of learning agents subject to such intergenerational
transmission (Kirby 1998, 2000, 2001, 2002).
he key to this apparent magic is the obligatory dependence of the process
of intergenerational transmission on the sparse sampling of the vocal lore
available to the individual learner in a population of culture-carriers. he process
through which utterances are iltered through this “learner bottleneck” creates
competition among vocal patterns for access to subsequent generations. In this
competition simple statistics of intergenerational transmission lead more general
forms to outlast less general ones in the population over time, eventually assuring
convergence on a stable and eicient grammar (see Kirby 2000 for details).
hus, an aspect of the so called “poverty of the stimulus”, which Chomsky
employed in arguments for the necessity of innate syntactic structure (Chomsky
1980, see further hesis 5), turns out to be a key factor in the emergence of syntax
from syntax-free songstrings by learning alone (Smith 2003; Zuidema 2003). It
amounts, in other words, to a process of data compression, as eicient as it is slow,
requiring hundreds and thousands of generations to do its transformative work.
hat transformative work takes place unbeknownst to the learners, without active
agency on their part, as a glacially slow transgenerational efect of their limited
learning capacity, a process sustained by the display function of song, free of the
necessity to communicate meaning. Further aspects of this process are briely
touched upon in the annotations to heses 3, 4, and 5, and in Merker et al. 2015.
hesis 3: Paths and precursors
here is no natural path from the semantics of animal calls to the syntax of human
language, but there is a natural path from the syntax of learned animal song to the
semantics of human language (Merker 2012). hus, our path to language must have
traversed a stage of learned, unsemanticized, and complex song.
Animal call systems tend to be innately based, and to feature tight one-to-one
coupling between sound gesture and “meaning” (or “functional reference”, see
Marler 2004). hey are capable of signaling both the varied emotional states of the
Music, Song, Language
animal calling (typically with graded intensity relecting emotional intensity) and of
carrying “functional reference” to external social or environmental circumstances
(as in alarm calls: Seyfarth et al. 1980, or food calls: Marler et al. 1992).
Animal calls, being compact meaning-carriers, would accordingly seem ideally
placed to serve as “irst words” of an open-ended particulate signalling system that
conveys new meanings by combining pre-existing call types by syntactic rules,
yielding a proto-language. hey are, however, inherently incapable of serving in
that role.
he problem is this: in order to function as elements of a combinatorial system
for conveying meaning, vocal gestures must possess a modicum of neutrality and
independence with respect to the basic emotional-motivational forces animating
an animal’s behaviour. Consider an animal whose innate repertoire includes two
conspicuous calls, one meaning “food”, normally given when the animal inds
food, the other meaning “fear”, an alarm call normally given when the animal
is frightened, as by a predator. Why not combine these calls to generate four
compound meanings from the two, as follows: the alarm call followed by the food
call would signal “fear-food”, i.e. “fear-with-food”, meaning “poisonous food”,
while the food call followed by the alarm call would mean “food-fear”, i.e. “foodwith-fear”, and hence “prey”?
But in order to do so, the animal would have to use a call charged with
positive valence (inding food) to signal something with strong negative valence
(poison), and also to use a call inherently linked to its own state of fear (as evoked
by a predator) to signal something the animal has no reason to fear (its prey). An
inherent feature of calls - their strong unitary referential loading, on a typically
innate and emotionally charged basis - prevents the animal from doing so. hus
the very traits which make calls so economical and eicient as components of an
animal call system erect a functional barrier to the adoption of call combinations
as a means to multiply meanings. Given this intrinsic constraint, animals in the
state of nature are unlikely to develop call combinatorics much beyond the level
documented for the referential call combinations of Campbell’s monkeys (Ouattara
et al. 2009).
By contrast, a path from the syntax of animal song to the semantics of human
language is encumbered by no such diiculties. Assume a species that has a large
repertoire of learned culturally transmitted songstrings composed of sets of
discrete elements, as in some songbirds with vocal learning. Unlike such birds, this
hypothetical species produces song year round and in the course of all its various
activities (a biologically plausible setting for such rare behaviour - a rarity that is
welcome given a need for diminishing probabilities - is found in Merker 2012).
Under such circumstances the “learner bottleneck” will segregate songstring
usage by behavioral and situational context over generations, saddling songstrings
with diferential contextual associations. Some songs are sung during hunting,
others in gatherings around the camp ire, and so on, extended across the
Seven Theses on the Biology of Music and Language
full spectrum of the species’ activities. In acquiring these passive contextual
associations songstrings have, in other words, undergone unplanned and implicit
he fact that under these circumstances songstrings will contain various substrings that occur scattered haphazardly across strings associated with diferent
contexts provides a mechanism by which the learner bottleneck will associate each
such sub-strings with features shared by those contexts in which it occurs. his
would result in a glacial process of diferential semantic allocation of sub-strings
to more general contextual associations over generations, in accordance with
principles irst enunciated independently by Alison Wray (Wray 1998) and Simon
Kirby (1998).
his process, which requires no other operations than those of generalization
and segmentation (Hurford 2000; Okanoya and Merker 2007), would produce
ever more abstract contextual associations for ever smaller sub-string units, albeit
still not in order to communicate information, but purely for display purposes.
How this system of highly diferentiated contextual songstring associations, which
so far are strictly and entirely tacit or implicit, ends up being deployed as a system
for communicating information - a language - is outlined in Merker 2012.
In this process there are no “irst words” but only long and complex “irst
(holistic) sentences” (the semanticized songstrings) carrying speciic (and initially
rather concrete) situational reference on a holistic, unanalyzed basis. Meaningful
words appear only at asymptote of the abstraction process by which sub-strings
come to be associated with shared features of disparate contexts. hat is, words
would emerge only in the inal phases of glossogenesis, at which point the logic
of grammaticalization processes would also become historically active (Bybee
2006; DeLancey 1993; Lehmann 1995; Merker and Okanoya 2007). Incidentally,
this process helps explain the historical pattern according to which languages
at earlier historical stages oten exhibit a more complex grammar than they do
today, a result altogether unexpected if language is assumed to originate by gradual
accretion from simple to complex.
A special problem in the conversion from a system of song for display purposes
to a system of language for conveying information is the fact that the attempt to
use semanticized songstrings in “displacement mode”, i.e. to refer to circumstances
other than those present, would undermine the contextual associations by which
songstrings acquire their semanticization. hat is the one point at which natural
selection will enter glossogenesis, according to the scenario presented in Merker
Music, Song, Language
hesis 4: Vocal learning
Humans are the only primate with a capacity for vocal production learning (Janik
and Slater 1997; Egnor and Hauser 2004). hat capacity is an absolute requirement
for both human song and spoken language in that it allows us to shape vocal output
to match auditory models. How and why humans alone added vocal learning to
the primate brain holds the biological key to the riddle of human music as well as
language (Merker 2009, 2012, 2014c; Nottebohm 1976).
An unabridgeable requirement for the existence of cultural traditions of learned
songstrings - a key stepping stone to language on the present account - is the
mechanism of vocal production learning. It gives us the capacity to reproduce,
through the voice, patterns of sound irst received by ear (i.e. not already in the
innate repertoire of calls). hat capacity is absent from all non-human primates,
our closest living relatives among the apes included (Janik and Slater 1997; see also
Egnor and Hauser 2004).
It is by means of this “watershed adaptation”, added to the primate brain at
or ater our ancestral split from our common ancestor with chimpanzees, that we
acquire every song we know how to sing and every word and phrase we know how
to pronounce. Human song and speech did not evolve, in other words, by way of
reinements of the “primate communication system”, as is oten assumed (see, for
example, Ackermann et al. 2014, and commentary by Merker 2014c). Rather, they
are products of a mechanism evolved de novo in our line, and distantly shared with
numerous species of songlearning songbirds and a few mammals.
Without primate homologs, convergently evolved vocal learning in birds and
mammals is the sole source of comparative insight into the biological signiicance
and origin of this trait of ours. he “developmental stress hypothesis” (a
sophisticated descendant of Zahavi’s handicap principle) provides the so far most
comprehensive biological framework for understanding the function of elaborate
vocal production in birds with vocal learning (Hasselquist et al 1996; Nowicki et al.
2002). It also helps answer outstanding questions regarding our own path to vocal
learning (Merker 2012).
Evolved through sexual selection for elaborate vocal display, this capacity
for vocal production learning provides the key enabling mechanism that brings
the data compression powers of the “learner bottleneck” to bear on culturally
transmitted vocal lore, as in human song and language (see annotation to hesis 2).
In vocal learners vocal output is shaped by learning, using feedback of their
own voice to match vocal production to heard models (Konishi 2004). his makes
the shape of the vocal tract a negligible factor in vocal production, as demonstrated
by the ability of bird mimics to duplicate the human voice with their utterly
diferent vocal anatomy (Nottebohm 1976).
Seven Theses on the Biology of Music and Language
Vocal learning requires a lengthy period of unreinforced practice to achieve
duplication of heard patterns. his necessitates a motivational mechanism to
sustain practice till a match is achieved. his “conformal motive” (Merker 2005),
once in place, provides a pivot for generalization to other imitative abilities,
and may account for the unique imitative, expressive, and ritual propensities of
humans, including their ready acceptance of arbitrary words as the obligatory
names of things. It may account, in other words, for the length to which humans,
alone among primates, have gone in developing a capacity for “expressive mimesis”
(for which see Donald 1991).
hesis 5: Biological endowment
In a big-brained primate with a capacity for vocal production learning (see hesis 4),
a biological endowment for human language is unlikely to take the form of the
innate, universal grammar postulated by Noam Chomsky (Chomsky 1968, 1975,
p. 38). Given that the “learner bottleneck” in transgenerational learning traditions
has the power to ensure language learnability (Christiansen and Chater 2008; Kirby
et al. 2008 and references therein; Merker 2009; Zuidema 2003), the mechanism of
vocal production learning that sustains traditions of learned vocal lore constitutes
our biological endowment for human language and song (Merker 2009).
Noam Chomsky confounded the question of the nature of a human “endowment
for language” with the issue of the learnability of language by means of the formal
behaviourist learning theory of his time (Merker 2009). He took its unlearnability
by those means to imply that we are possessed of a so called universal grammar on
an innate basis (Chomsky 1975, in Piatelli-Palmarini 1980, p. 111).
One need not follow Chomsky in this irst false step into the biology of
language in order to recognize the existence of a human “endowment for language”.
hat endowment is supplied by our telencephalic mechanism dedicated to vocal
production learning discussed in hesis 4. Once this mechanism was in place for
purposes of handicap-driven song displays, the learner bottleneck brought its
data compression powers to bear on the pattern content of the intergenerationally
transmitted vocal lore, with consequences briely outlined in heses 2, 3, and 4.
Supported by a conformal motive (Merker 2005) and de novo evolution of
a direct projection from primary motor cortex to the respiratory and phonatory
motor nuclei of the lower brain stem (Brown et al. 2008; Okanoya & Merker 2007),
vocal learning turns the cerebral territories centred on Wernicke’s and Broca’s
areas from their non-language uses in other primates to the service of human
language by recruiting them to the generative production and intergenerational
transmission of culturally learned vocal lore (for details, see Merker 2009, 2014c;
Okanoya & Merker 2007).
Music, Song, Language
To this decidated cerebral mechanism for vocal learning we owe not only our
developmental trajectory for language learning, infant babbling included (Doupé
& Kuhl 1999), but our propensity for imitation and ritual culture more generally
(Merker 2005), along with a robust selection pressure for encephalization (Merker
& Okanoya 2007; Merker 2012). hrough vocal learning, and unconstrained by
innate so called universal grammar, the historical ilter of cultural transmission
- which passes only the possible (Zuidema 2003) - continually adapts the
actual forms of languages to multiple interacting constraints such as use, utility,
learnability and neural resources (Christiansen & Chater 2008), as well as to
cultural norms (Everett 2005).
An endowment for human language in the form of vocal production learning,
as opposed to so called universal grammar, explains the ineluctably historical
nature of human languages, for which grammaticalization theory supplies some
of the general laws (Bybee 2006; Lehmann 1995). he principal consequence of
this inherently historical nature of human language is the empirically well attested
rampant structural diversity exhibited by the world’s languages (Evans & Levinson
hesis 6: Gesture
he plausibility of a gestural origin of language (Arbib & Rizzolatti 1996; Corballis
1999) can be estimated by the likelihood that a species already in possession of one
functioning form of language (manual-visual) would switch to another (vocalauditory) at the cost of having to develop the entire machinery of vocal production
learning de novo in order to do so, and this for beneits such as “freeing the hands”
from their encumbrance by gestural language, an encumbrance no greater than that
it did not prevent that species from developing gestural language in the irst place.
An absolute prerequisite for language in the vocal-auditory mode is vocal
production learning (see hesis 4). he absence of that capacity in our common
ancestor with chimpanzees might seem to make a gestural origin of language an
attractive possibility, since it would draw on the far more widely distributed learning
capacity for manual dexterity, which our closest primate relatives do possess. he
supplemental role of gesture in speech, and the ease with which humans acquire
or develop sign language when the auditory-vocal mode is unavailable, has been
taken to support a gestural origin of language in humans (Arbib & Rizzolatti 1996;
Corballis 1999).
hat, however, does not remove the vocal learning diiculty; it only postpones
it, because the end state of language evolution is vocal-auditory speech. How would
a species already in possession of gestural language implemented as a manual-visual
Seven Theses on the Biology of Music and Language
system of conventional gestures ever switch to vocal language in the absence of a
capacity for vocal learning, which is an absolute requirement for human speech?
Obviously, if chimpanzees were vocal production learners and easily acquired
a spoken vocabulary, a gestural origin for language would lose all plausibility. But
they are not, so our common ancestor with chimpanzees can be assumed to have
lacked vocal learning. hat lack, which favours a gestural origin, must conveniently
be forgotten in order to explain how the original gestural mode of language
(manual-visual) was replaced by a radically diferent mode (vocal-auditory) at
reasonable cost.
Assuming an absence of the capacity for vocal learning at that point forces us
to assume that the advantages of a switch suiced to pay the evolutionary cost of
developing the capacity for vocal learning de novo. he encumbrance of the hands
by gestural language is no greater than that it allowed the evolution of gestural
language in the irst place, on the gestural theory. Other advantages of speech,
such as communicating propositionally when the audience is out of sight, are
also marginal, and come with disadvantages as well: vocal language makes both
predators and prey privy to the communicative act, even when out of sight.
How encumbered are sign language users, really, compared to speakers,
except by the circumstance that the population at large speaks rather than signs?
Is whatever they are lacking by having their hands “encumbered” by language
of suicient magnitude to serve as a selection pressure for the evolution of a
sophisticated cerebral machinery for vocal production learning, a machinery
which up to that point has not made its appearance in any other primate?
he obvious alternative is to assume that human language originated in the
auditory-vocal mode, because we had already evolved vocal learning for nonspeech purposes before employing it for language. In us, as in most species that
possess it, vocal learning is likely to have evolved for learned song, as outlined in
heses 2, 3, 4, and 5. Under such circumstances, a gestural origin of language loses
its appeal. Given that the end-state is vocal-auditory, a species already in possession
of vocal production learning has no reason to make a detour into gestural language
along its path to speech.
his is all the more so since manual gestures do in fact have a natural, if
subsidiary, role to play in the vocal learning path to speech. hey are needed as
auxiliaries in the transition to the use of semanticized songstrings in “displacement
mode” (Hockett 1960) on the vocal path to human language, as markers for
communicative intent in that mode, as detailed in a previous publication of mine
(Merker 2012, p. 244). hat subsidiary role is eminently compatible with the way
present day speakers supplement spoken language with manual gestures.
Finally, and parenthetically, the acquisition of conventional gestures is
inherently imitative. Our unusual imitative capacities compared to our primate
relatives likely arose by generalization from our capacity for vocal production
learning, as outlined in the annotation to hesis 4, following Merker (2005). No
Music, Song, Language
other species has, ater all, developed a system of conventional manual gestures for
language-like communication, even though the manual learning capacity required
for it is available, as shown by apes taught sign language in captivity (e.g. Miles
1990). he lack of vocal production learning from which to generalize high-idelity
duplicative behaviour may explain this anomaly, and reinforces the plausibility of
a vocal rather than gestural origin of human language.
hesis 7: Meaning
Language alone “means”, which is to say that it conveys articulated semantic content
through a system of multi-level conventional coding rather than by having its
patterns in any way resemble their referents (Staal 1989). here is no corresponding
device in music, whose patterns do not serve as structural vehicles for conveying
another domain of content than themselves, except when they are made to do so (in
programmatic music) by means of mimicry or resemblance (Hanslick 1854).
Language in the form of speech embodies a bona ide code for information in that
it avails itself of a multilevel combinatorics of phonological and lexical elements
to perform arbitrary (in the sense of conventional) mappings between the form
of utterances and their meaning. An elementary example is the combinatorics of
phoneme sequence that assigns distinctive meanings to the tens of thousands of
words that make up our lexicon. hus bord, Tisch, and table are such arbitrary
phonological combinations for the self-same type of physical object, in Swedish,
German and English, respectively. A higher level example is the grammatical
device of sequential ordering of major sentence constituents such as subject, verb
and object (for which essentially every possible variant ordering is attested across
the world’s languages [Comrie 1983]).
his language code of meaning is so detailed and comprehensive that virtually
every diference between strings of phonemes makes a diference in the information
conveyed by those strings. By means of this elaborate, multilevel conventional
coding arrangement of phonemes and morphemes, sequential patterns of vocally
produced sounds become a vehicle for making statements about things that bear
not the slightest resemblance to those sound sequences themselves (see the table
example above). hus language allows us to communicate about objects, events,
matters of fact, states of the world, ideas, intentions, beliefs and desires, without
Nothing of the kind takes place in human song and music (unless they feature
lyrics), however structurally complex they may be. In song (and music) it is the
vocal (instrumental) patterns themselves that are the information conveyed, not
another domain of content which rides upon those patterns by means of a syntactic
Seven Theses on the Biology of Music and Language
system of conventional coding. Attempts have been made to assimilate music to
language by treating music as a “language of emotions” (Spencer 1911). However,
the weak sense in which music is capable of evoking or portraying emotions
(Konečni 2008, 2015; Konečni et al. 2008; Scherer 2003) or circumstances in the
world (storms, battles: Hanslick 1854) is achieved by using the pattern richness of
music to dynamically mimic, resemble, or caricature the things to be evoked (say,
the mimicking of emotional prosody as a means to portray emotion in Western
music [Juslin 2001]).
So even when music attempts to portray - which is far from always the case
- it does not mean, it mimics, a distinction easily accommodated within the
conceptual apparatus of semiotics. Such mimicking is exactly what language does
not do, except in the trivial case of onomatopoeia. here is, however, an aspect of
speech that does so. We have a rich repertoire of non-verbal vocal expressiveness,
ultimately based upon our innate calls (see hesis 3). It coexists with the meaningcode of language in speech. hat non-verbal expressiveness enters speech by
means of its prosody, and is used among other things to give speech its emotional
colouring. To confuse that aspect of language with its meaning-carrying coding
arrangements would of course be fatal. he two dwell in diferent qualitative and
formal universes and must not be confused.
hus, the fact that music can be made to suggest or mimic emotion tells us as
little about the nature of its generative pattern richness as the facts of speech prosody
tell us about the meaning-carrying generative pattern richness of language. he
secret of musical pattern richness (see hesis 1) is best approached by abandoning
the notion that it contains a code for a domain of information other than those
patterns themselves.
hose patterns expose our perception and imagination to their structural
pattern content: their mix of variation and repetition, their symmetries and
asymmetries, the evolution of their temporal trajectories with their tensions and
resolutions, their intricacy, complexity, or simplicity, the balance of such factors
among themselves as the music unfolds in time, and more. It is not to our emotions
that this content is addressed in the irst place, but to our imagination, as Arthur
Schopenhauer, followed by Eduard Hanslick, insisted (Hanslick 1854; Merker
2007; Shopenhauer 1844, vol. 2, p. 447f).
In so doing music will exhibit greater or lesser elegance, will achieve greater
or lesser degrees of well-formedness (see, e.g., Lerdahl & Jackendof 1983) and
- depending on our background of acquired familiarity forged by our listening
history - may compel our attention more or less fully, or please us more or less
well. In all of these respects the patterns of music deine themselves as esthetic
objects that address the informational dimension of our cognitive capacities
directly, in the manner of auditory analogues of visually presented arabesques
(Hanslick 1854).
Music, Song, Language
his by no means relegates music to an abstract domain of passively
contemplative connoisseurship. Some of its patterns - say of rhythmic music
meant to accompany dancing - access a presumably species-speciic predisposition
for bodily entrainment to isochronous auditory patterns, and help optimize such
entrainment (Merker et al. 2009; Merker 2014a). Its central role in youth and
popular culture suices to dispel any overly formalist notion of its nature. It is also
at the farther reaches of this “informational dimension of our cognitive capacities”
that music exerts its own genuinely emotional efects.
Just as an elegantly crated arabesque difers in its impact on our imagination
and esthetic sensibilities from that of a crude and clumsy doodle, so musical
patterns have analogous diferential impact on our sensibilities by their pattern
characteristics alone. Where an arabesque must “look good” to capture our
imagination, musical patterns must “sound good”, the more so the greater their
impact. So much so that those that sound exceptionally good tend to impress, even
to the point of inspiring awe and a sense of the sublime, emotions so powerful as to
drive tears to our eyes and chills up our spines (Gabrielsson 2011; Konečni 2005,
2011; Merker 2014b).
hat is where the notion of an emotional impact of music has its true point
of support: not in any analogy to the meaning encoded in language, nor in any
kinship with the biology of basic emotions, but in the basic biology of the Zahavian
handicap principle (Zahavi 1975). Where handicaps take the form of esthetic
displays - from the peacock’s tail to the vocal artistry of pied butcherbirds appraisal mechanisms for judging their quality must be in place, typically in the
medium of a cognitive dimension spanning from bordom, via interest/curiosity, to
being impressed, with awe and a sense of sublimity at its high end (Konečni 2005,
2011; Merker 2012, 2014b).
he elaboration of Zahavi’s handicap principle in the developmental
stress hypothesis to account for complexity and size of birdsong repertoires by
Hasselquist and colleagues (Hasselquist et al. 1996; Nowicki et al. 2002; see heses
2 and 4) provides an eminently plausible interpretive framework for the nature and
function of human song and music as well. It dispels the appearance of frivolity
encumbering our expenditure of efort and resources on acquiring and producing
the pattern richness of human song and music. Our musical developmental
trajectory does not equip us with a language-like device for conveying referential
meaning (emotional or otherwise). Rather, by exact analogy to the case of learned
birdsong, it gives us a means to display command and mastery of a trove of
culturally patterned and transmitted lore. Such command and mastery serves not
only as a badge of competence in the culture, but as a certiicate of the phenotypic
traits needed to achieve that competence (see further Merker 2012, 2014b).
Seven Theses on the Biology of Music and Language
Bibliographical References
Abler, W.L. (1989), “On the particulate principle of self-diversifying systems”, Journal of
Social and Biological Structures, 1, pp. 1-13.
Ackermann, H., Hage, S.R. & Ziegler, W. (2014), “Brain mechanisms of acoustic
communication in humans and nonhuman primates: An evolutionary perspective”,
Behavioral and Brain Sciences, 37, pp. 529-546.
Arbib, M.A. & Rizzolatti, G. (1996), “Neural expectations: A possible evolutionary path
from manual skills to language”, Communication & Cognition, 29, pp. 393-424.
Brown, S., Ngan, E. & Liotti, M. (2008), “A larynx area in the human motor cortex”,
Cerebral Cortex, 18, pp. 837-845.
Bybee, J. (2006), “Language change and universals”,. in R. Mairal & J. Gil (eds.), Linguistic
Universals, Cambridge: Cambridge University Press, pp. 179-194.
Chomsky, N. (1968), “Linguistics and Philosophy”, in Hook (ed.), Language and
Philosophy, New York: New York University Press, pp. 51-94.
— (1975), Relections on Language, New York: Pantheon Books.
— (1980), Rules and Representations, New York: Columbia University Press.
— (1991), “Linguistics and cognitive Science: Problems and mysteries”, in Kasher (ed.),
he Chomskyan Turn, Oxford: Blackwell, pp. 26-53.
Christiansen, M.H. & Chater, N. (2008), “Language as shaped by the brain”, Behavioral
and Brain Sciences, 31, pp. 489-558.
Clements, J.F. (2007), he Clements Checklist of the Birds of the World, 6th Ed., Ithaca, NY:
Cornell University Press.
Comrie, B. (1983), Language Universals and Linguistic Typology, Oxford: Basil Blackwell.
Corballis, M.C. (1999), “he gestural origins of language”, American Scientist, 87,
pp. 138-145.
Darwin, Ch. (1871), he Descent of Man and Selection in Relation to Sex, New York:
D. Appleton & Company.
DeLancey, S. (1993), “Grammaticalization and Linguistic heory”, in Gomez de
Garcia & Rood (eds.), Proceedings of the 1993 Mid-America Linguistics Conference,
Department of Linguistics, University of Colorado, pp. 1-22.
Donald, M. (1991), Origins of the Modern Mind, Cambridge, MA: Harvard University
Doupé, A.J. & Kuhl, P.K. (1999), “Birdsong and human speech: Common themes and
mechanisms”, Annual Review of Neuroscience, 22, pp. 567-631.
Egnor, S.E.R. & Hauser, M.D. (2004), “A paradox in the evolution of primate vocal
learning”, Trends in Neurosciences, 27, pp. 649-654.
Music, Song, Language
Evans, N. & Levinson, S.C. (2009), “he myth of language universals: Language diversity
and its importance for cognitive science”, Behavioral and Brain Sciences, 32, pp. 429492.
Everett, D.L. (2005), “Cultural constraints on grammar and cognition in Pirahã. Another
look at the design features of human language”, Current Anthropology, 46, pp. 621-646.
Fisher, R.A. (1930), he Genetical heory of Natural Selection, Oxford: he Clarendon
Gabrielsson, A. (2011), Strong Experiences with Music, Oxford: Oxford University Press.
Hanslick, E. (1854), Vom Musikalisch-Schönen. Ein Beitrag zur Revision der Aesthetik der
Tonkunst, Leipzig: Weigel.
Hasselquist, D., Bensch, S. & von Schantz, T. (1996), “Correlation between song
repertoire, extra-pair paternity and ofspring survival in the great reed warbler”,
Nature, 381, pp. 229-232.
Hockett, C.F. (1960), “Logical considerations in the study of animal communication”, in
Lanyon & Tavolga (eds.), Animal Sounds and Communication, American Institute
of Biological Sciences, Washington, D.C., pp. 392-430.
Honing, H., ten Cate, C., Peretz, I. & Trehub, S.E. (2015), “Without it no music:
cognition, biology and evolution of musicality”, Philosophical Transactions of the
Royal Society B, 370, 1664, pp. 1-8.
Hurford, J.R. (2000), “he emergence of syntax [Editorial introduction to the section
on syntax]”, in Knight, Studdert-Kennedy & Hurford (eds.), he Evolutionary
Emergence of Language: Social Function and the Origins of Linguistic Form, Cambridge,
Cambridge University Press, pp. 219-230.
Janik, V.M. & Slater, P.J.B. (1997), “Vocal learning in mammals”, Advances in the Study
of Behaviour, 26, pp. 59-99.
Jones, K.E. & Safi, K. (2011), “Ecology and evolution of mammalian biodiversity”,
Philosophical Transactions of the Royal Society B, 366, 1577, pp. 2451-2461.
Juslin, P.N. (2001), “Communicating emotion in music performance: A review and a
theoretical framework”, in Juslin & Sloboda (eds.), Music and Emotion: heory and
Research, New York: Oxford University Press, pp. 309-337.
Kirby, S. (1998), “Language evolution without natural selection: From vocabulary to
syntax in a population of learners. Technical Report”, Edinburgh Occasional Papers
in Linguistics, 98 (1).
— (2000), “Syntax without Natural Selection: How compositionality emerges from
vocabulary in a population of learners”, in Knight, Studdert-Kennedy & Hurford
(eds.), he Evolutionary Emergence of Language: Social Function and the Origins of
Linguistic Form, Cambridge: Cambridge University Press, pp. 303-323.
— (2001), “Spontaneous evolution of linguistic structure: an iterated learning model of
the emergence of regularity and irregularity”, IEEE Transactions on Evolutionary
Computation, 5, pp. 102-110.
Seven Theses on the Biology of Music and Language
— (2002), “Learning, bottlenecks and the evolution of recursive syntax”, in Briscoe
(ed.), Linguistic Evolution through Language Acquisition: Formal and Computational
Models, Cambridge: Cambridge University Press, pp. 173-204.
Kirby, S, Cornish, H. & Smith, K. (2008), “Cumulative cultural evolution in the
laboratory: An experimental approach to the origins of structure in human language”,
Proceedings of the National Academy of Sciences USA, 105, pp. 10681-10686.
Konečni, V.J. (2005), “he aesthetic trinity: Awe, being moved, thrills”, Bulletin of
Psychology and the Arts, 5, pp. 27-44.
— (2008), “Does music induce emotion? A theoretical and methodological analysis”,
Psychology of Aesthetics, Creativity, and the Arts, 2, pp. 115-129.
— (2011), “Aesthetic trinity theory and the sublime”, Philosophy Today, 55, pp. 64-73.
— (2015), “Book review of T. Cochrane, B. Fantini and K. Scherer (eds.), he Emotional
Power of Music: Multidisciplinary Perspectives on Musical Arousal, Expression, and
Social Control. Oxford University Press, 2013”, he Journal of Aesthetics and Art
Criticism, 73, pp. 214-218.
Konečni, V., Brown, A & Wanic, R.A. (2008), “Comparative efects of music and recalled
life-events on emotional state”, Psychology of Music, 36, pp. 289-308.
Konishi, M. (2004), “he role of auditory feedback in birdsong”, in Ziegler & Marler
(eds.), he Behavioral Neurobiology of Birdsong. Annals of the New York Academy of
Sciences, 1016, pp. 463-475.
Lehmann, C. (1995), houghts on Grammaticalization, Second, revised edition, München:
LINCOM Europa.
Lerdahl, F. & Jackendoff, R. (1983), A Generative heory of Tonal Music, Cambridge,
Mass.: MIT Press.
Marler, P. (2004), “Animal calls: heir potential for neurobiology”, in Zeigler & Marler
(eds.), Behavioral Neurobiology of Birdsong. Annals of the New York Academy of
Sciences, 1016, pp. 31-44.
Marler, P., Evans, C.S., & Hauser, M.D. (1992), “Animal signals. Reference, motivation or
both?”, in Papousek, Jürgens & Papousek (eds.), Nonverbal Vocal Communication:
Comparative and Developmental Approaches, Cambridge, UL: Cambridge University
Press, pp. 66-86.
Maynard Smith, J. & Szathmáry, E. (1995), he Major Transitions in Evolution, Oxford:
Oxford University Press.
Merker, B. (2002), “Music: he missing Humboldt system”, Musicae Scientiae, 6, pp. 3-21.
— (2005), “he conformal motive in birdsong, music and language: An Introduction”, in
Avanzini, Lopez, Koelsch & Majno (eds.), he Neurosciences and Music II: From
Perception to Performance, Annals of the New York Academy of Sciences, 1060,
pp. 17-28.
Music, Song, Language
— (2007), “Music at the limits of the mind”, in Kugiumutzakis (ed.), Sympantiki Armonia,
Musike kai Epistimi. Ston Miki heodoraki (Universal Harmony, Music and Science. In
honour of Mikis heodorakis), Herakleion: Crete University Press.
— (2009), “Returning language to culture by way of biology”, Behavioral and Brain
Sciences, 32, pp. 460-461.
— (2012), “he vocal learning constellation: Imitation, ritual culture, encephalization”, in
Bannan (ed.), Music, Language, and Human Evolution, Oxford: Oxford University
— (2014a), “Groove or swing as distributed rhythmic consonance: Introducing the groove
matrix”, Frontiers in Human Neuroscience, 8, article 454, 1-4.
— (2014b), “Warum wir musikalisch sind – Antworten aus der Evolutionsbiologie”, in
Gruhn & Seither-Preisler (eds.), Der musikalische Mensch. Evolution, Biologie und
Pädagogik musikalischer Begabung, Hildesheim: Georg Olms Verlag, pp. 255-280. [In
German; an English version will eventually be archived in CogPrints under the title
“When extravaganze impresses”].
— (2014c), “Speech, vocal production learning, and the comparative method”, Behavioral
and Brain Sciences, 37, pp. 566-567.
Merker, B., Madison, G. & Eckerdal, P. (2009), “On the role and origin of isochrony in
human rhythmic entrainment”, Cortex, 45, pp. 4-17.
Merker, B. & Okanoya, K. (2007), “he natural history of human language: Bridging
the gaps without magic”, in Lyon, Nehaniv & Cangelosi (eds.), Emergence of
Communication and Language, London: Springer-Verlag, pp. 403-420.
Merker, B., Morley, I. & Zuidema, W. (2015), “Five fundamental constraints on theories
of the origins of music”, Philosophical Transactions of the Royal Society B, 370,
Miles, H.L. (1990), “he cognitive foundations for reference in a signing orang-utan”,
in Parker & Gibson (eds.), “Language” and Intelligence in Monkeys and Apes:
Comparative Developmental Perspectives, New York: Cambridge University Press,
pp. 511-539.
Nottebohm, F. (1976), “Discussion paper. Vocal tract and brain: A search for evolutionary
bottlenecks”, in Harnad, Steklis & Lancaster (eds.), Origins and Evolution of
Language and Speech, New York: Annals of the New York Academy of Sciences, 280,
pp. 643-649.
Nowicki, S., Searcy, W.A. & Peters, S. (2002), “Brain development, song learning
and mate choice in birds: a review and experimental test of the ‘nutritional stress
hypothesis’”, Journal of Comparative Physiology A: Sensory, Neural, and Behavioral
Physiology, 188, pp. 1003-1014.
Okanoya, K. & Merker, B. (2007), “Neural substrates for string-context mutual
segmentation: A path to human language”, in Lyon, Nehaniv & Cangelosi (eds.),
Emergence of Communication and Language, London: Springer-Verlag, pp. 421-434.
Seven Theses on the Biology of Music and Language
Ouattara K., Lemasson A., Zuberbühler K. (2009), Campbell’s Monkeys Concatenate
Vocalizations into Context-speciic Call Sequences, proceedings of the National
Academy of Sciences (USA) 106, 22026-22031. Doi:10.1073pnas.0908118106.
Piattelli-Palmarini, M. (ed., 1980), Language and Learning: he Debate between Jean
Piaget and Noam Chomsky, Harvard: Harvard University Press.
Pinker, S. & Bloom, P. (1990), “Natural language and natural selection”, Behavioral and
Brain Sciences, 13, pp. 707-784.
Scherer, K.R. (2003), “Why music does not produce basic emotions”, in Breslin (ed.),
Proceedings of the Stockholm Music Acoustic Conference, 1, pp. 25-28, online: [http://].
Schopenhauer, A. (1958), he World as Will and Representation [1819], 2nd ed., E.F.J.
Payne transl., 2 vols, New York: Dover.
Seyfarth, R.M., Cheney, D.L., & Marler, P. (1980), “Vervet monkey alarm calls: Semantic
communication in a free-ranging primate”, Animal Behaviour, 28, 1070-1094.
Smith, K. (2003), “Learning biases and language evolution”, in Kirby (ed.) Language
Evolution and Computation. Proceedings of the Workshop on Language Evolution and
Computation, 15th European Summer School on Logic, Language and Information,
Vienna, pp. 22-31.
Spencer, H. (1911), “On the Origin and Function of Music”, in Essays on Education and
Kindred Subjects, London: J.M. Dent & Sons, pp. 312-330.
Staal, F. (1989), Rules without Meaning. Ritual, Mantras and the Human Sciences, New
York: Peter Lang.
Von Humboldt, W. (1836), Über die Verschiedenheit des Menschlichen Sprachbaues und
ihren Einluss auf die geistige Entwicklung des Menschengeschlechts, Berlin, Germany:
Royal Academy of Sciences; English transl.: Buck & Raven transls. (1971), Linguistic
Variability and Intellectual Development, Baltimore, MD: University of Miami Press.
Wallin, N.L., Merker, B. & Brown, S. (eds., 2000), he Origins of Music, Cambridge,
Mass.: MIT Press.
Wray, A. (1998), “Protolanguage as a holistic system for social interaction”, Language and
Communication, 18, pp. 47-67.
Zahavi, A. (1975), “Mate selection – A selection for a handicap”, Journal of heoretical
Biology, 53, pp. 205-214.
Zahavi, A. & Zahavi, A. (1997), he Handicap Principle: A Missing Piece of Darwin’s
Puzzle, Oxford University Press, Oxford.
Zuidema, W. (2003), “How the poverty of the stimulus solves the poverty of the stimulus”,
in Becker, Thrun & Obermayer (eds.), Advances in Neural Information Processing
Systems 15 (Proceedings of NIPS’02), Cambridge: MIT Press, pp. 51-58.