* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Signata Annales des sémiotiques / Annals of Semiotics 6 | 2015 Sémiotique de la musique Seven Theses on the Biology of Music and Language Bjorn Merker Publisher Presses universitaires de Liège (PULg) Electronic version URL: http://signata.revues.org/1081 DOI: 10.4000/signata.1081 ISSN: 2565-7097 Printed version Date of publication: 31 December 2015 Number of pages: 195-213 ISBN: 978-2-87562-087-3 ISSN: 2032-9806 Electronic reference Bjorn Merker, « Seven Theses on the Biology of Music and Language », Signata [Online], 6 | 2015, Online since 31 December 2016, connection on 29 March 2017. URL : http://signata.revues.org/1081 Signata - PULg muSic, Song, language Seven heses on the Biology of Music and Language Bjorn Merker Introduction he nature and origin of music, its relation to language, and to our biology more generally have been active topics of inquiry and debate at least since Darwin and Spencer pondered them in the nineteenth century (Darwin 1871; Spencer 1911). hey are still subject to both debate and inquiry (Wallin et al. 2000; Honing et al. 2015). As a contribution to this Signata special issue on “Music and Meaning” I present seven theses on the biology of music and language, with annotation for each thesis. he theses themselves are given in compressed form, and then expanded upon in an annotation following each thesis. hese annotations do not present comprehensive treatments of any given thesis, but rather are intended to explicate each thesis suiciently to make its compressed form comprehensible, and to provide the reader with key references to relevant treatments of its topic in the literature. Taken together the theses represent a summary of my conceptual commitments regarding topics that I perceive to have received insuicient attention or even to have sufered outright neglect in our attempts to understand the nature and biology of music and language. Rather than embodying a theory of the biology of music and language (for which see Merker 2012) they are intended as a primer of facts and concepts that are all too easily overlooked in fashioning such a theory. 196 Music, Song, Language hesis 1: Generativity he most fundamental property shared by human language and music is their membership in the small class of generative systems based on the so called “particulate principle”, by which unlimited pattern diversity is generated through combinations among a inite (typically small) set of non-blending elements (von Humboldt 1836; Abler 1989; Merker 2002). his, rather than recursion, is the key to the unbounded generativity of music and language. Annotation In a too seldom noted 1989 paper, William Abler combined Wilhelm von Humboldt’s idea regarding the combinatorics of human language with the nonblending principle of genetic recombination from Roland Fisher to deine the “particulate principle of self-diversifying systems” (Abler 1989). Abler highlighted three principal instantiations of such “Humboldt systems”: chemistry and genetics employing physical particles (atoms and nucleotides, respectively), and human language employing conventional particles (phonemes) for combinatorial purposes. Abler mentioned music in passing in this connection but did not elaborate. Music does, however, fully qualify as a second human Humboldt system on account of its structural and generative characteristics, as detailed in my paper “Music: he missing Humboldt system” (Merker 2002). Fig. 1. Cartoon illustration of the “particulate principle of self-diversifying systems”, following Abler (1989). (a) A “blending system” in which combining ingredients average. Here exemplified by a drop of ink in water: the combining elements do not generate a qualitatively new entity. Other examples are most mixtures of liquids as well as gases, as in weather systems, and patterns of heat conduction. (b) A “particulate system”, in which the combining elements generate a qualitatively new entity by retaining their individuality on combining. (c) A miniscule sample of the infinite generativity of a combinatorics of as few as one or two discrete non-averaging elements. In such systems the combining elements must be non-blending in the sense of not producing an average when combined (Fisher 1930), i.e. they must retain their individuality on combining (Figure 1a, b). When that is the case, each such Seven Theses on the Biology of Music and Language 197 combination “creates something which is not present per se in any of the associated constituents” (von Humboldt 1836, p. 67), making ininite pattern variety possible (Figure 1c). his is the badly named “digital ininity” of Chomskyan linguistics, where the “digital” term stands for the discrete, non-blending aspect of particulate combinatorics. When Chomsky stated that “Language is, at its core, a system that is both digital and ininite. To my knowledge, there is no other biological system with these properties” (Chomsky 1991, p. 50), he was only displaying the limits of his knowledge, because a total of four major such systems are in fact lourishing in and all around us. Two of these are independent of humans (chemistry and genetics), while two of them lie at the heart of human culture (music and language), as itemized in Table 1. System CHEMISTRY GENETICS MUSIC LANGUAGE product All molecules All life forms All “melodies” All sentences Combining “particles” atoms nucleotides Notes phonemes Particle type physical conventional domain nature culture Table 1. he principal open-ended or “self-diversifying” systems. By combining inite sets of particulate elements music and language create composite patterns without limit. he role of phonemes, and their higher order combinations into morphemes, in the “duality of patterning” of human language for this purpose is well known (Hockett 1960). Music derives its particulate elements from a radical reduction in the degrees of freedom available to vocal or instrumental sound production by discretizing two continua, those of pitch and duration, to yield musical notes with determinate pitch and - in all rhythmic music - discrete durations (based upon isochronous discretization of time: Merker 2002, 2014a; Merker et al. 2009). he result is the variety of musical notes that serve as grist for the combinatorial mill of musical patterns of boundless variety. It is in their capacity of Humbold systems that music and language are most fruitfully compared and contrasted. It may even be that there is a parallelism between the two “natural” and the two “cultural” Humbold systems with respect to their genesis. Chemistry obviously is a necessary though not suicient prerequisite and antecedent for the genetics of life forms. Similarly music in the form of human song may be a necessary but not suicient prerequisite and antecedent for human language (see further heses 2, 3, and 4). We might, in other words, have been singing apes before we became talking humans, a possibility that will receive further notice in what follows. 198 Music, Song, Language hesis 2: Genesis Natural selection is but one - and possibly the least important one - of three agencies likely to have driven the genesis of music and language. he second of these is sexual selection, in accordance with Zahavi’s “handicap principle”, particularly as elaborated in the “developmental stress hypothesis” by Hasselquist. he third is the “learner bottleneck” in cultural transmission across generations, whose powers suice to transform random strings into a shared grammar without either Darwinian selection or reinforcement of outcomes (Kirby 1998; Kirby et al. 2008 and references therein). Annotation he powers of human language to communicate information are so formidable that we all too readily assume that this utility itself drove the evolution of language by natural selection in our line (for which see Pinker & Bloom 1995). If so, one would expect that some of the other large-brained animals on earth would have evolved some version of it as well. Yet among the 15000 species of birds and mammals (Clements 2007; Jones & Sai 2011) - the two taxonomic groups that contain all the world’s large-brained species - there is not a single species with language except our own (Maynard Smith and Szathmáry 1995). his indicates that language is something that simply does not tend to evolve in the animal kingdom. Any theory of language origin must account for this reluctance of biological evolution to produce language, despite its utility once in hand. here are few ways to escape this quandary except to assume that to arrive at language a species or lineage must come into possession of a number of independent traits or adaptations each of which is rare considered by itself, and all of which together are needed as prerequisites for or stepping stones to language. In such a situation, multiplicative probabilities determine outcomes, and since each trait is rare, their small probabilities multiplied might shrink the product to the ininitesimal level needed to ensure the empirical end result: a single species with language among the millions of animal species on earth. A gapless and biologically plausible path to human language traversing such rapidly dwindling probabilities has been charted in two previous publications of mine (Merker and Okanoya 2007; Merker 2012). According to the scenario presented there, the expressive powers of our early Homo ancestors did not evolve to serve the communication of information, but as means to impress potential mates and rivals by elaborately structured but unsemanticized vocal displays based on vocal production learning. hey evolved, in other words, through sexual selection, in keeping with the handicap principle and the “developmental stress hypothesis” for the evolution of complex learned song in many songbirds and some mammals (Hasslequist et al. 1996; Nowicki et al. 2002; Zahavi and Zahavi 1997). Species thus equipped with the specialized telencephalic mechanism of vocal production learning (see hesis 4) become carriers of cultural traditions of complex learned song. Seven Theses on the Biology of Music and Language 199 Once a species maintains a complex, learned song tradition, the process of intergenerational transmission itself begins to shape and transform the patterns of the transmitted vocal “lore”. When unconstrained by a species-speciic song template, as in vocal mimics, this process is even capable of transforming an initial state in which songstrings of arbitrary patterning are randomly distributed across individuals into a state in which they all share a formal grammar. Yet contrary to what one might expect, this process does not require any form of Darwinian selection or reinforcement of outcomes, as shown in large-scale computer simulations of populations of learning agents subject to such intergenerational transmission (Kirby 1998, 2000, 2001, 2002). he key to this apparent magic is the obligatory dependence of the process of intergenerational transmission on the sparse sampling of the vocal lore available to the individual learner in a population of culture-carriers. he process through which utterances are iltered through this “learner bottleneck” creates competition among vocal patterns for access to subsequent generations. In this competition simple statistics of intergenerational transmission lead more general forms to outlast less general ones in the population over time, eventually assuring convergence on a stable and eicient grammar (see Kirby 2000 for details). hus, an aspect of the so called “poverty of the stimulus”, which Chomsky employed in arguments for the necessity of innate syntactic structure (Chomsky 1980, see further hesis 5), turns out to be a key factor in the emergence of syntax from syntax-free songstrings by learning alone (Smith 2003; Zuidema 2003). It amounts, in other words, to a process of data compression, as eicient as it is slow, requiring hundreds and thousands of generations to do its transformative work. hat transformative work takes place unbeknownst to the learners, without active agency on their part, as a glacially slow transgenerational efect of their limited learning capacity, a process sustained by the display function of song, free of the necessity to communicate meaning. Further aspects of this process are briely touched upon in the annotations to heses 3, 4, and 5, and in Merker et al. 2015. hesis 3: Paths and precursors here is no natural path from the semantics of animal calls to the syntax of human language, but there is a natural path from the syntax of learned animal song to the semantics of human language (Merker 2012). hus, our path to language must have traversed a stage of learned, unsemanticized, and complex song. Annotation Animal call systems tend to be innately based, and to feature tight one-to-one coupling between sound gesture and “meaning” (or “functional reference”, see Marler 2004). hey are capable of signaling both the varied emotional states of the 200 Music, Song, Language animal calling (typically with graded intensity relecting emotional intensity) and of carrying “functional reference” to external social or environmental circumstances (as in alarm calls: Seyfarth et al. 1980, or food calls: Marler et al. 1992). Animal calls, being compact meaning-carriers, would accordingly seem ideally placed to serve as “irst words” of an open-ended particulate signalling system that conveys new meanings by combining pre-existing call types by syntactic rules, yielding a proto-language. hey are, however, inherently incapable of serving in that role. he problem is this: in order to function as elements of a combinatorial system for conveying meaning, vocal gestures must possess a modicum of neutrality and independence with respect to the basic emotional-motivational forces animating an animal’s behaviour. Consider an animal whose innate repertoire includes two conspicuous calls, one meaning “food”, normally given when the animal inds food, the other meaning “fear”, an alarm call normally given when the animal is frightened, as by a predator. Why not combine these calls to generate four compound meanings from the two, as follows: the alarm call followed by the food call would signal “fear-food”, i.e. “fear-with-food”, meaning “poisonous food”, while the food call followed by the alarm call would mean “food-fear”, i.e. “foodwith-fear”, and hence “prey”? But in order to do so, the animal would have to use a call charged with positive valence (inding food) to signal something with strong negative valence (poison), and also to use a call inherently linked to its own state of fear (as evoked by a predator) to signal something the animal has no reason to fear (its prey). An inherent feature of calls - their strong unitary referential loading, on a typically innate and emotionally charged basis - prevents the animal from doing so. hus the very traits which make calls so economical and eicient as components of an animal call system erect a functional barrier to the adoption of call combinations as a means to multiply meanings. Given this intrinsic constraint, animals in the state of nature are unlikely to develop call combinatorics much beyond the level documented for the referential call combinations of Campbell’s monkeys (Ouattara et al. 2009). By contrast, a path from the syntax of animal song to the semantics of human language is encumbered by no such diiculties. Assume a species that has a large repertoire of learned culturally transmitted songstrings composed of sets of discrete elements, as in some songbirds with vocal learning. Unlike such birds, this hypothetical species produces song year round and in the course of all its various activities (a biologically plausible setting for such rare behaviour - a rarity that is welcome given a need for diminishing probabilities - is found in Merker 2012). Under such circumstances the “learner bottleneck” will segregate songstring usage by behavioral and situational context over generations, saddling songstrings with diferential contextual associations. Some songs are sung during hunting, others in gatherings around the camp ire, and so on, extended across the Seven Theses on the Biology of Music and Language 201 full spectrum of the species’ activities. In acquiring these passive contextual associations songstrings have, in other words, undergone unplanned and implicit semanticization. he fact that under these circumstances songstrings will contain various substrings that occur scattered haphazardly across strings associated with diferent contexts provides a mechanism by which the learner bottleneck will associate each such sub-strings with features shared by those contexts in which it occurs. his would result in a glacial process of diferential semantic allocation of sub-strings to more general contextual associations over generations, in accordance with principles irst enunciated independently by Alison Wray (Wray 1998) and Simon Kirby (1998). his process, which requires no other operations than those of generalization and segmentation (Hurford 2000; Okanoya and Merker 2007), would produce ever more abstract contextual associations for ever smaller sub-string units, albeit still not in order to communicate information, but purely for display purposes. How this system of highly diferentiated contextual songstring associations, which so far are strictly and entirely tacit or implicit, ends up being deployed as a system for communicating information - a language - is outlined in Merker 2012. In this process there are no “irst words” but only long and complex “irst (holistic) sentences” (the semanticized songstrings) carrying speciic (and initially rather concrete) situational reference on a holistic, unanalyzed basis. Meaningful words appear only at asymptote of the abstraction process by which sub-strings come to be associated with shared features of disparate contexts. hat is, words would emerge only in the inal phases of glossogenesis, at which point the logic of grammaticalization processes would also become historically active (Bybee 2006; DeLancey 1993; Lehmann 1995; Merker and Okanoya 2007). Incidentally, this process helps explain the historical pattern according to which languages at earlier historical stages oten exhibit a more complex grammar than they do today, a result altogether unexpected if language is assumed to originate by gradual accretion from simple to complex. A special problem in the conversion from a system of song for display purposes to a system of language for conveying information is the fact that the attempt to use semanticized songstrings in “displacement mode”, i.e. to refer to circumstances other than those present, would undermine the contextual associations by which songstrings acquire their semanticization. hat is the one point at which natural selection will enter glossogenesis, according to the scenario presented in Merker 2012. 202 Music, Song, Language hesis 4: Vocal learning Humans are the only primate with a capacity for vocal production learning (Janik and Slater 1997; Egnor and Hauser 2004). hat capacity is an absolute requirement for both human song and spoken language in that it allows us to shape vocal output to match auditory models. How and why humans alone added vocal learning to the primate brain holds the biological key to the riddle of human music as well as language (Merker 2009, 2012, 2014c; Nottebohm 1976). Annotation An unabridgeable requirement for the existence of cultural traditions of learned songstrings - a key stepping stone to language on the present account - is the mechanism of vocal production learning. It gives us the capacity to reproduce, through the voice, patterns of sound irst received by ear (i.e. not already in the innate repertoire of calls). hat capacity is absent from all non-human primates, our closest living relatives among the apes included (Janik and Slater 1997; see also Egnor and Hauser 2004). It is by means of this “watershed adaptation”, added to the primate brain at or ater our ancestral split from our common ancestor with chimpanzees, that we acquire every song we know how to sing and every word and phrase we know how to pronounce. Human song and speech did not evolve, in other words, by way of reinements of the “primate communication system”, as is oten assumed (see, for example, Ackermann et al. 2014, and commentary by Merker 2014c). Rather, they are products of a mechanism evolved de novo in our line, and distantly shared with numerous species of songlearning songbirds and a few mammals. Without primate homologs, convergently evolved vocal learning in birds and mammals is the sole source of comparative insight into the biological signiicance and origin of this trait of ours. he “developmental stress hypothesis” (a sophisticated descendant of Zahavi’s handicap principle) provides the so far most comprehensive biological framework for understanding the function of elaborate vocal production in birds with vocal learning (Hasselquist et al 1996; Nowicki et al. 2002). It also helps answer outstanding questions regarding our own path to vocal learning (Merker 2012). Evolved through sexual selection for elaborate vocal display, this capacity for vocal production learning provides the key enabling mechanism that brings the data compression powers of the “learner bottleneck” to bear on culturally transmitted vocal lore, as in human song and language (see annotation to hesis 2). In vocal learners vocal output is shaped by learning, using feedback of their own voice to match vocal production to heard models (Konishi 2004). his makes the shape of the vocal tract a negligible factor in vocal production, as demonstrated by the ability of bird mimics to duplicate the human voice with their utterly diferent vocal anatomy (Nottebohm 1976). Seven Theses on the Biology of Music and Language 203 Vocal learning requires a lengthy period of unreinforced practice to achieve duplication of heard patterns. his necessitates a motivational mechanism to sustain practice till a match is achieved. his “conformal motive” (Merker 2005), once in place, provides a pivot for generalization to other imitative abilities, and may account for the unique imitative, expressive, and ritual propensities of humans, including their ready acceptance of arbitrary words as the obligatory names of things. It may account, in other words, for the length to which humans, alone among primates, have gone in developing a capacity for “expressive mimesis” (for which see Donald 1991). hesis 5: Biological endowment In a big-brained primate with a capacity for vocal production learning (see hesis 4), a biological endowment for human language is unlikely to take the form of the innate, universal grammar postulated by Noam Chomsky (Chomsky 1968, 1975, p. 38). Given that the “learner bottleneck” in transgenerational learning traditions has the power to ensure language learnability (Christiansen and Chater 2008; Kirby et al. 2008 and references therein; Merker 2009; Zuidema 2003), the mechanism of vocal production learning that sustains traditions of learned vocal lore constitutes our biological endowment for human language and song (Merker 2009). Annotation Noam Chomsky confounded the question of the nature of a human “endowment for language” with the issue of the learnability of language by means of the formal behaviourist learning theory of his time (Merker 2009). He took its unlearnability by those means to imply that we are possessed of a so called universal grammar on an innate basis (Chomsky 1975, in Piatelli-Palmarini 1980, p. 111). One need not follow Chomsky in this irst false step into the biology of language in order to recognize the existence of a human “endowment for language”. hat endowment is supplied by our telencephalic mechanism dedicated to vocal production learning discussed in hesis 4. Once this mechanism was in place for purposes of handicap-driven song displays, the learner bottleneck brought its data compression powers to bear on the pattern content of the intergenerationally transmitted vocal lore, with consequences briely outlined in heses 2, 3, and 4. Supported by a conformal motive (Merker 2005) and de novo evolution of a direct projection from primary motor cortex to the respiratory and phonatory motor nuclei of the lower brain stem (Brown et al. 2008; Okanoya & Merker 2007), vocal learning turns the cerebral territories centred on Wernicke’s and Broca’s areas from their non-language uses in other primates to the service of human language by recruiting them to the generative production and intergenerational transmission of culturally learned vocal lore (for details, see Merker 2009, 2014c; Okanoya & Merker 2007). 204 Music, Song, Language To this decidated cerebral mechanism for vocal learning we owe not only our developmental trajectory for language learning, infant babbling included (Doupé & Kuhl 1999), but our propensity for imitation and ritual culture more generally (Merker 2005), along with a robust selection pressure for encephalization (Merker & Okanoya 2007; Merker 2012). hrough vocal learning, and unconstrained by innate so called universal grammar, the historical ilter of cultural transmission - which passes only the possible (Zuidema 2003) - continually adapts the actual forms of languages to multiple interacting constraints such as use, utility, learnability and neural resources (Christiansen & Chater 2008), as well as to cultural norms (Everett 2005). An endowment for human language in the form of vocal production learning, as opposed to so called universal grammar, explains the ineluctably historical nature of human languages, for which grammaticalization theory supplies some of the general laws (Bybee 2006; Lehmann 1995). he principal consequence of this inherently historical nature of human language is the empirically well attested rampant structural diversity exhibited by the world’s languages (Evans & Levinson 2009). hesis 6: Gesture he plausibility of a gestural origin of language (Arbib & Rizzolatti 1996; Corballis 1999) can be estimated by the likelihood that a species already in possession of one functioning form of language (manual-visual) would switch to another (vocalauditory) at the cost of having to develop the entire machinery of vocal production learning de novo in order to do so, and this for beneits such as “freeing the hands” from their encumbrance by gestural language, an encumbrance no greater than that it did not prevent that species from developing gestural language in the irst place. Annotation An absolute prerequisite for language in the vocal-auditory mode is vocal production learning (see hesis 4). he absence of that capacity in our common ancestor with chimpanzees might seem to make a gestural origin of language an attractive possibility, since it would draw on the far more widely distributed learning capacity for manual dexterity, which our closest primate relatives do possess. he supplemental role of gesture in speech, and the ease with which humans acquire or develop sign language when the auditory-vocal mode is unavailable, has been taken to support a gestural origin of language in humans (Arbib & Rizzolatti 1996; Corballis 1999). hat, however, does not remove the vocal learning diiculty; it only postpones it, because the end state of language evolution is vocal-auditory speech. How would a species already in possession of gestural language implemented as a manual-visual Seven Theses on the Biology of Music and Language 205 system of conventional gestures ever switch to vocal language in the absence of a capacity for vocal learning, which is an absolute requirement for human speech? Obviously, if chimpanzees were vocal production learners and easily acquired a spoken vocabulary, a gestural origin for language would lose all plausibility. But they are not, so our common ancestor with chimpanzees can be assumed to have lacked vocal learning. hat lack, which favours a gestural origin, must conveniently be forgotten in order to explain how the original gestural mode of language (manual-visual) was replaced by a radically diferent mode (vocal-auditory) at reasonable cost. Assuming an absence of the capacity for vocal learning at that point forces us to assume that the advantages of a switch suiced to pay the evolutionary cost of developing the capacity for vocal learning de novo. he encumbrance of the hands by gestural language is no greater than that it allowed the evolution of gestural language in the irst place, on the gestural theory. Other advantages of speech, such as communicating propositionally when the audience is out of sight, are also marginal, and come with disadvantages as well: vocal language makes both predators and prey privy to the communicative act, even when out of sight. How encumbered are sign language users, really, compared to speakers, except by the circumstance that the population at large speaks rather than signs? Is whatever they are lacking by having their hands “encumbered” by language of suicient magnitude to serve as a selection pressure for the evolution of a sophisticated cerebral machinery for vocal production learning, a machinery which up to that point has not made its appearance in any other primate? he obvious alternative is to assume that human language originated in the auditory-vocal mode, because we had already evolved vocal learning for nonspeech purposes before employing it for language. In us, as in most species that possess it, vocal learning is likely to have evolved for learned song, as outlined in heses 2, 3, 4, and 5. Under such circumstances, a gestural origin of language loses its appeal. Given that the end-state is vocal-auditory, a species already in possession of vocal production learning has no reason to make a detour into gestural language along its path to speech. his is all the more so since manual gestures do in fact have a natural, if subsidiary, role to play in the vocal learning path to speech. hey are needed as auxiliaries in the transition to the use of semanticized songstrings in “displacement mode” (Hockett 1960) on the vocal path to human language, as markers for communicative intent in that mode, as detailed in a previous publication of mine (Merker 2012, p. 244). hat subsidiary role is eminently compatible with the way present day speakers supplement spoken language with manual gestures. Finally, and parenthetically, the acquisition of conventional gestures is inherently imitative. Our unusual imitative capacities compared to our primate relatives likely arose by generalization from our capacity for vocal production learning, as outlined in the annotation to hesis 4, following Merker (2005). No 206 Music, Song, Language other species has, ater all, developed a system of conventional manual gestures for language-like communication, even though the manual learning capacity required for it is available, as shown by apes taught sign language in captivity (e.g. Miles 1990). he lack of vocal production learning from which to generalize high-idelity duplicative behaviour may explain this anomaly, and reinforces the plausibility of a vocal rather than gestural origin of human language. hesis 7: Meaning Language alone “means”, which is to say that it conveys articulated semantic content through a system of multi-level conventional coding rather than by having its patterns in any way resemble their referents (Staal 1989). here is no corresponding device in music, whose patterns do not serve as structural vehicles for conveying another domain of content than themselves, except when they are made to do so (in programmatic music) by means of mimicry or resemblance (Hanslick 1854). Annotation Language in the form of speech embodies a bona ide code for information in that it avails itself of a multilevel combinatorics of phonological and lexical elements to perform arbitrary (in the sense of conventional) mappings between the form of utterances and their meaning. An elementary example is the combinatorics of phoneme sequence that assigns distinctive meanings to the tens of thousands of words that make up our lexicon. hus bord, Tisch, and table are such arbitrary phonological combinations for the self-same type of physical object, in Swedish, German and English, respectively. A higher level example is the grammatical device of sequential ordering of major sentence constituents such as subject, verb and object (for which essentially every possible variant ordering is attested across the world’s languages [Comrie 1983]). his language code of meaning is so detailed and comprehensive that virtually every diference between strings of phonemes makes a diference in the information conveyed by those strings. By means of this elaborate, multilevel conventional coding arrangement of phonemes and morphemes, sequential patterns of vocally produced sounds become a vehicle for making statements about things that bear not the slightest resemblance to those sound sequences themselves (see the table example above). hus language allows us to communicate about objects, events, matters of fact, states of the world, ideas, intentions, beliefs and desires, without limit. Nothing of the kind takes place in human song and music (unless they feature lyrics), however structurally complex they may be. In song (and music) it is the vocal (instrumental) patterns themselves that are the information conveyed, not another domain of content which rides upon those patterns by means of a syntactic Seven Theses on the Biology of Music and Language 207 system of conventional coding. Attempts have been made to assimilate music to language by treating music as a “language of emotions” (Spencer 1911). However, the weak sense in which music is capable of evoking or portraying emotions (Konečni 2008, 2015; Konečni et al. 2008; Scherer 2003) or circumstances in the world (storms, battles: Hanslick 1854) is achieved by using the pattern richness of music to dynamically mimic, resemble, or caricature the things to be evoked (say, the mimicking of emotional prosody as a means to portray emotion in Western music [Juslin 2001]). So even when music attempts to portray - which is far from always the case - it does not mean, it mimics, a distinction easily accommodated within the conceptual apparatus of semiotics. Such mimicking is exactly what language does not do, except in the trivial case of onomatopoeia. here is, however, an aspect of speech that does so. We have a rich repertoire of non-verbal vocal expressiveness, ultimately based upon our innate calls (see hesis 3). It coexists with the meaningcode of language in speech. hat non-verbal expressiveness enters speech by means of its prosody, and is used among other things to give speech its emotional colouring. To confuse that aspect of language with its meaning-carrying coding arrangements would of course be fatal. he two dwell in diferent qualitative and formal universes and must not be confused. hus, the fact that music can be made to suggest or mimic emotion tells us as little about the nature of its generative pattern richness as the facts of speech prosody tell us about the meaning-carrying generative pattern richness of language. he secret of musical pattern richness (see hesis 1) is best approached by abandoning the notion that it contains a code for a domain of information other than those patterns themselves. hose patterns expose our perception and imagination to their structural pattern content: their mix of variation and repetition, their symmetries and asymmetries, the evolution of their temporal trajectories with their tensions and resolutions, their intricacy, complexity, or simplicity, the balance of such factors among themselves as the music unfolds in time, and more. It is not to our emotions that this content is addressed in the irst place, but to our imagination, as Arthur Schopenhauer, followed by Eduard Hanslick, insisted (Hanslick 1854; Merker 2007; Shopenhauer 1844, vol. 2, p. 447f). In so doing music will exhibit greater or lesser elegance, will achieve greater or lesser degrees of well-formedness (see, e.g., Lerdahl & Jackendof 1983) and - depending on our background of acquired familiarity forged by our listening history - may compel our attention more or less fully, or please us more or less well. In all of these respects the patterns of music deine themselves as esthetic objects that address the informational dimension of our cognitive capacities directly, in the manner of auditory analogues of visually presented arabesques (Hanslick 1854). 208 Music, Song, Language his by no means relegates music to an abstract domain of passively contemplative connoisseurship. Some of its patterns - say of rhythmic music meant to accompany dancing - access a presumably species-speciic predisposition for bodily entrainment to isochronous auditory patterns, and help optimize such entrainment (Merker et al. 2009; Merker 2014a). Its central role in youth and popular culture suices to dispel any overly formalist notion of its nature. It is also at the farther reaches of this “informational dimension of our cognitive capacities” that music exerts its own genuinely emotional efects. Just as an elegantly crated arabesque difers in its impact on our imagination and esthetic sensibilities from that of a crude and clumsy doodle, so musical patterns have analogous diferential impact on our sensibilities by their pattern characteristics alone. Where an arabesque must “look good” to capture our imagination, musical patterns must “sound good”, the more so the greater their impact. So much so that those that sound exceptionally good tend to impress, even to the point of inspiring awe and a sense of the sublime, emotions so powerful as to drive tears to our eyes and chills up our spines (Gabrielsson 2011; Konečni 2005, 2011; Merker 2014b). hat is where the notion of an emotional impact of music has its true point of support: not in any analogy to the meaning encoded in language, nor in any kinship with the biology of basic emotions, but in the basic biology of the Zahavian handicap principle (Zahavi 1975). Where handicaps take the form of esthetic displays - from the peacock’s tail to the vocal artistry of pied butcherbirds appraisal mechanisms for judging their quality must be in place, typically in the medium of a cognitive dimension spanning from bordom, via interest/curiosity, to being impressed, with awe and a sense of sublimity at its high end (Konečni 2005, 2011; Merker 2012, 2014b). he elaboration of Zahavi’s handicap principle in the developmental stress hypothesis to account for complexity and size of birdsong repertoires by Hasselquist and colleagues (Hasselquist et al. 1996; Nowicki et al. 2002; see heses 2 and 4) provides an eminently plausible interpretive framework for the nature and function of human song and music as well. It dispels the appearance of frivolity encumbering our expenditure of efort and resources on acquiring and producing the pattern richness of human song and music. Our musical developmental trajectory does not equip us with a language-like device for conveying referential meaning (emotional or otherwise). Rather, by exact analogy to the case of learned birdsong, it gives us a means to display command and mastery of a trove of culturally patterned and transmitted lore. Such command and mastery serves not only as a badge of competence in the culture, but as a certiicate of the phenotypic traits needed to achieve that competence (see further Merker 2012, 2014b). Seven Theses on the Biology of Music and Language 209 Bibliographical References Abler, W.L. (1989), “On the particulate principle of self-diversifying systems”, Journal of Social and Biological Structures, 1, pp. 1-13. Ackermann, H., Hage, S.R. & Ziegler, W. (2014), “Brain mechanisms of acoustic communication in humans and nonhuman primates: An evolutionary perspective”, Behavioral and Brain Sciences, 37, pp. 529-546. Arbib, M.A. & Rizzolatti, G. (1996), “Neural expectations: A possible evolutionary path from manual skills to language”, Communication & Cognition, 29, pp. 393-424. Brown, S., Ngan, E. & Liotti, M. (2008), “A larynx area in the human motor cortex”, Cerebral Cortex, 18, pp. 837-845. Bybee, J. (2006), “Language change and universals”,. in R. Mairal & J. Gil (eds.), Linguistic Universals, Cambridge: Cambridge University Press, pp. 179-194. Chomsky, N. (1968), “Linguistics and Philosophy”, in Hook (ed.), Language and Philosophy, New York: New York University Press, pp. 51-94. — (1975), Relections on Language, New York: Pantheon Books. — (1980), Rules and Representations, New York: Columbia University Press. — (1991), “Linguistics and cognitive Science: Problems and mysteries”, in Kasher (ed.), he Chomskyan Turn, Oxford: Blackwell, pp. 26-53. Christiansen, M.H. & Chater, N. (2008), “Language as shaped by the brain”, Behavioral and Brain Sciences, 31, pp. 489-558. Clements, J.F. (2007), he Clements Checklist of the Birds of the World, 6th Ed., Ithaca, NY: Cornell University Press. Comrie, B. (1983), Language Universals and Linguistic Typology, Oxford: Basil Blackwell. Corballis, M.C. (1999), “he gestural origins of language”, American Scientist, 87, pp. 138-145. Darwin, Ch. (1871), he Descent of Man and Selection in Relation to Sex, New York: D. Appleton & Company. DeLancey, S. (1993), “Grammaticalization and Linguistic heory”, in Gomez de Garcia & Rood (eds.), Proceedings of the 1993 Mid-America Linguistics Conference, Department of Linguistics, University of Colorado, pp. 1-22. Donald, M. (1991), Origins of the Modern Mind, Cambridge, MA: Harvard University Press. Doupé, A.J. & Kuhl, P.K. (1999), “Birdsong and human speech: Common themes and mechanisms”, Annual Review of Neuroscience, 22, pp. 567-631. Egnor, S.E.R. & Hauser, M.D. (2004), “A paradox in the evolution of primate vocal learning”, Trends in Neurosciences, 27, pp. 649-654. 210 Music, Song, Language Evans, N. & Levinson, S.C. (2009), “he myth of language universals: Language diversity and its importance for cognitive science”, Behavioral and Brain Sciences, 32, pp. 429492. Everett, D.L. (2005), “Cultural constraints on grammar and cognition in Pirahã. Another look at the design features of human language”, Current Anthropology, 46, pp. 621-646. Fisher, R.A. (1930), he Genetical heory of Natural Selection, Oxford: he Clarendon Press. Gabrielsson, A. (2011), Strong Experiences with Music, Oxford: Oxford University Press. Hanslick, E. (1854), Vom Musikalisch-Schönen. Ein Beitrag zur Revision der Aesthetik der Tonkunst, Leipzig: Weigel. Hasselquist, D., Bensch, S. & von Schantz, T. (1996), “Correlation between song repertoire, extra-pair paternity and ofspring survival in the great reed warbler”, Nature, 381, pp. 229-232. Hockett, C.F. (1960), “Logical considerations in the study of animal communication”, in Lanyon & Tavolga (eds.), Animal Sounds and Communication, American Institute of Biological Sciences, Washington, D.C., pp. 392-430. Honing, H., ten Cate, C., Peretz, I. & Trehub, S.E. (2015), “Without it no music: cognition, biology and evolution of musicality”, Philosophical Transactions of the Royal Society B, 370, 1664, pp. 1-8. Hurford, J.R. (2000), “he emergence of syntax [Editorial introduction to the section on syntax]”, in Knight, Studdert-Kennedy & Hurford (eds.), he Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, Cambridge, Cambridge University Press, pp. 219-230. Janik, V.M. & Slater, P.J.B. (1997), “Vocal learning in mammals”, Advances in the Study of Behaviour, 26, pp. 59-99. Jones, K.E. & Safi, K. (2011), “Ecology and evolution of mammalian biodiversity”, Philosophical Transactions of the Royal Society B, 366, 1577, pp. 2451-2461. Juslin, P.N. (2001), “Communicating emotion in music performance: A review and a theoretical framework”, in Juslin & Sloboda (eds.), Music and Emotion: heory and Research, New York: Oxford University Press, pp. 309-337. Kirby, S. (1998), “Language evolution without natural selection: From vocabulary to syntax in a population of learners. Technical Report”, Edinburgh Occasional Papers in Linguistics, 98 (1). — (2000), “Syntax without Natural Selection: How compositionality emerges from vocabulary in a population of learners”, in Knight, Studdert-Kennedy & Hurford (eds.), he Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, Cambridge: Cambridge University Press, pp. 303-323. — (2001), “Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity”, IEEE Transactions on Evolutionary Computation, 5, pp. 102-110. Seven Theses on the Biology of Music and Language 211 — (2002), “Learning, bottlenecks and the evolution of recursive syntax”, in Briscoe (ed.), Linguistic Evolution through Language Acquisition: Formal and Computational Models, Cambridge: Cambridge University Press, pp. 173-204. Kirby, S, Cornish, H. & Smith, K. (2008), “Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language”, Proceedings of the National Academy of Sciences USA, 105, pp. 10681-10686. Konečni, V.J. (2005), “he aesthetic trinity: Awe, being moved, thrills”, Bulletin of Psychology and the Arts, 5, pp. 27-44. — (2008), “Does music induce emotion? A theoretical and methodological analysis”, Psychology of Aesthetics, Creativity, and the Arts, 2, pp. 115-129. — (2011), “Aesthetic trinity theory and the sublime”, Philosophy Today, 55, pp. 64-73. — (2015), “Book review of T. Cochrane, B. Fantini and K. Scherer (eds.), he Emotional Power of Music: Multidisciplinary Perspectives on Musical Arousal, Expression, and Social Control. Oxford University Press, 2013”, he Journal of Aesthetics and Art Criticism, 73, pp. 214-218. Konečni, V., Brown, A & Wanic, R.A. (2008), “Comparative efects of music and recalled life-events on emotional state”, Psychology of Music, 36, pp. 289-308. Konishi, M. (2004), “he role of auditory feedback in birdsong”, in Ziegler & Marler (eds.), he Behavioral Neurobiology of Birdsong. Annals of the New York Academy of Sciences, 1016, pp. 463-475. Lehmann, C. (1995), houghts on Grammaticalization, Second, revised edition, München: LINCOM Europa. Lerdahl, F. & Jackendoff, R. (1983), A Generative heory of Tonal Music, Cambridge, Mass.: MIT Press. Marler, P. (2004), “Animal calls: heir potential for neurobiology”, in Zeigler & Marler (eds.), Behavioral Neurobiology of Birdsong. Annals of the New York Academy of Sciences, 1016, pp. 31-44. Marler, P., Evans, C.S., & Hauser, M.D. (1992), “Animal signals. Reference, motivation or both?”, in Papousek, Jürgens & Papousek (eds.), Nonverbal Vocal Communication: Comparative and Developmental Approaches, Cambridge, UL: Cambridge University Press, pp. 66-86. Maynard Smith, J. & Szathmáry, E. (1995), he Major Transitions in Evolution, Oxford: Oxford University Press. Merker, B. (2002), “Music: he missing Humboldt system”, Musicae Scientiae, 6, pp. 3-21. — (2005), “he conformal motive in birdsong, music and language: An Introduction”, in Avanzini, Lopez, Koelsch & Majno (eds.), he Neurosciences and Music II: From Perception to Performance, Annals of the New York Academy of Sciences, 1060, pp. 17-28. 212 Music, Song, Language — (2007), “Music at the limits of the mind”, in Kugiumutzakis (ed.), Sympantiki Armonia, Musike kai Epistimi. Ston Miki heodoraki (Universal Harmony, Music and Science. In honour of Mikis heodorakis), Herakleion: Crete University Press. — (2009), “Returning language to culture by way of biology”, Behavioral and Brain Sciences, 32, pp. 460-461. — (2012), “he vocal learning constellation: Imitation, ritual culture, encephalization”, in Bannan (ed.), Music, Language, and Human Evolution, Oxford: Oxford University Press. — (2014a), “Groove or swing as distributed rhythmic consonance: Introducing the groove matrix”, Frontiers in Human Neuroscience, 8, article 454, 1-4. — (2014b), “Warum wir musikalisch sind – Antworten aus der Evolutionsbiologie”, in Gruhn & Seither-Preisler (eds.), Der musikalische Mensch. Evolution, Biologie und Pädagogik musikalischer Begabung, Hildesheim: Georg Olms Verlag, pp. 255-280. [In German; an English version will eventually be archived in CogPrints under the title “When extravaganze impresses”]. — (2014c), “Speech, vocal production learning, and the comparative method”, Behavioral and Brain Sciences, 37, pp. 566-567. Merker, B., Madison, G. & Eckerdal, P. (2009), “On the role and origin of isochrony in human rhythmic entrainment”, Cortex, 45, pp. 4-17. Merker, B. & Okanoya, K. (2007), “he natural history of human language: Bridging the gaps without magic”, in Lyon, Nehaniv & Cangelosi (eds.), Emergence of Communication and Language, London: Springer-Verlag, pp. 403-420. Merker, B., Morley, I. & Zuidema, W. (2015), “Five fundamental constraints on theories of the origins of music”, Philosophical Transactions of the Royal Society B, 370, 20140095. Miles, H.L. (1990), “he cognitive foundations for reference in a signing orang-utan”, in Parker & Gibson (eds.), “Language” and Intelligence in Monkeys and Apes: Comparative Developmental Perspectives, New York: Cambridge University Press, pp. 511-539. Nottebohm, F. (1976), “Discussion paper. Vocal tract and brain: A search for evolutionary bottlenecks”, in Harnad, Steklis & Lancaster (eds.), Origins and Evolution of Language and Speech, New York: Annals of the New York Academy of Sciences, 280, pp. 643-649. Nowicki, S., Searcy, W.A. & Peters, S. (2002), “Brain development, song learning and mate choice in birds: a review and experimental test of the ‘nutritional stress hypothesis’”, Journal of Comparative Physiology A: Sensory, Neural, and Behavioral Physiology, 188, pp. 1003-1014. Okanoya, K. & Merker, B. (2007), “Neural substrates for string-context mutual segmentation: A path to human language”, in Lyon, Nehaniv & Cangelosi (eds.), Emergence of Communication and Language, London: Springer-Verlag, pp. 421-434. Seven Theses on the Biology of Music and Language 213 Ouattara K., Lemasson A., Zuberbühler K. (2009), Campbell’s Monkeys Concatenate Vocalizations into Context-speciic Call Sequences, proceedings of the National Academy of Sciences (USA) 106, 22026-22031. Doi:10.1073pnas.0908118106. Piattelli-Palmarini, M. (ed., 1980), Language and Learning: he Debate between Jean Piaget and Noam Chomsky, Harvard: Harvard University Press. Pinker, S. & Bloom, P. (1990), “Natural language and natural selection”, Behavioral and Brain Sciences, 13, pp. 707-784. Scherer, K.R. (2003), “Why music does not produce basic emotions”, in Breslin (ed.), Proceedings of the Stockholm Music Acoustic Conference, 1, pp. 25-28, online: [http:// www.speech.kth.se/smac03]. Schopenhauer, A. (1958), he World as Will and Representation , 2nd ed., E.F.J. Payne transl., 2 vols, New York: Dover. Seyfarth, R.M., Cheney, D.L., & Marler, P. (1980), “Vervet monkey alarm calls: Semantic communication in a free-ranging primate”, Animal Behaviour, 28, 1070-1094. Smith, K. (2003), “Learning biases and language evolution”, in Kirby (ed.) Language Evolution and Computation. Proceedings of the Workshop on Language Evolution and Computation, 15th European Summer School on Logic, Language and Information, Vienna, pp. 22-31. Spencer, H. (1911), “On the Origin and Function of Music”, in Essays on Education and Kindred Subjects, London: J.M. Dent & Sons, pp. 312-330. Staal, F. (1989), Rules without Meaning. Ritual, Mantras and the Human Sciences, New York: Peter Lang. Von Humboldt, W. (1836), Über die Verschiedenheit des Menschlichen Sprachbaues und ihren Einluss auf die geistige Entwicklung des Menschengeschlechts, Berlin, Germany: Royal Academy of Sciences; English transl.: Buck & Raven transls. (1971), Linguistic Variability and Intellectual Development, Baltimore, MD: University of Miami Press. Wallin, N.L., Merker, B. & Brown, S. (eds., 2000), he Origins of Music, Cambridge, Mass.: MIT Press. Wray, A. (1998), “Protolanguage as a holistic system for social interaction”, Language and Communication, 18, pp. 47-67. Zahavi, A. (1975), “Mate selection – A selection for a handicap”, Journal of heoretical Biology, 53, pp. 205-214. Zahavi, A. & Zahavi, A. (1997), he Handicap Principle: A Missing Piece of Darwin’s Puzzle, Oxford University Press, Oxford. Zuidema, W. (2003), “How the poverty of the stimulus solves the poverty of the stimulus”, in Becker, Thrun & Obermayer (eds.), Advances in Neural Information Processing Systems 15 (Proceedings of NIPS’02), Cambridge: MIT Press, pp. 51-58.