* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The complexity of numeral systems
Survey
Document related concepts
Transcript
Scuola di Dottorato “Vito Volterra” Dottorato di Ricerca in Fisica The complexity of numeral systems Thesis submitted to obtain the degree of “Dottore di Ricerca” – Philosophiæ Doctor PhD in Physics – XXI cycle – October 2009 by Alessio Ansuini Program Coordinator Prof. Enzo Marinari Thesis Advisor Prof. Vittorio Loreto ii ... to my brother, Federico iii Acknowledgments I am grateful to Vittorio Loreto, with whom I have had the chance to work during these years of PHD. His constant encouragment in the critical moments have been precious to me, no less than his great scientific competence and the generosity of his ideas. I thank Enzo Marinari for his patience to me and the fairness and honesty with which has always talked to me. I warmly thank Vito Servedio for its useful comments and encouragements during these years. Last but not least I am very indebted with my friend Alessandro Attanasi: he is great. iv Contents 1 Numeral Systems 1.1 The perception of abstract numbers . . . . . . . . 1.1.1 Our cognitive limits . . . . . . . . . . . . . 1.1.2 Approximate Representation of numerosity 1.1.3 Distance and Size Effects . . . . . . . . . . 1.2 The Language of Numbers . . . . . . . . . . . . . 1.2.1 The origins of a language for number . . . 1.3 Linguistics of numeral systems . . . . . . . . . . . 1.3.1 Composition of Numerals . . . . . . . . . 1.4 The complexity of numeral systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 7 9 9 11 11 13 13 14 2 A network model for Numeral Systems 2.1 Ω symbolic systems . . . . . . . . . . . . . . . . . 2.1.1 Axioms for Ω systems . . . . . . . . . . . 2.2 Elements of Ω networks . . . . . . . . . . . . . . . 2.2.1 Nodes . . . . . . . . . . . . . . . . . . . . 2.2.2 Triple . . . . . . . . . . . . . . . . . . . . 2.3 To build an Ω network . . . . . . . . . . . . . . . 2.3.1 Fuyuge and Miskito . . . . . . . . . . . . . 2.3.2 Italian and French systems . . . . . . . . . 2.3.3 Holistic and Unary system . . . . . . . . . 2.3.4 Positional systems with arbitrary base . . 2.3.5 Canonical systems . . . . . . . . . . . . . 2.3.6 Primes systems . . . . . . . . . . . . . . . 2.3.7 The ensemble of random numeral systems 2.4 Elementary observables of the Ω networks . . . . 2.4.1 The number of elementary symbols . . . . 2.4.2 The description of a simple Ω network . . 2.4.3 Degree sequence and its distribution . . . 2.4.4 The Tree . . . . . . . . . . . . . . . . . . . 2.4.5 The logical depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 17 17 18 19 19 20 21 24 27 28 30 31 35 37 37 39 39 41 42 v vi CONTENTS 2.5 2.6 2.7 2.8 2.4.6 Relation between LD and kout . . . . . . Other functionals defined on Ω networks . . . . 2.5.1 Entropy of the Degree distribution . . . Comparison of different models . . . . . . . . . 2.6.1 Holistic and Unary . . . . . . . . . . . . 2.6.2 Canonical and Positional systems . . . . 2.6.3 Primes . . . . . . . . . . . . . . . . . . . 2.6.4 Random . . . . . . . . . . . . . . . . . . The space Ω . . . . . . . . . . . . . . . . . . . . 2.7.1 The size of the space of simple networks 2.7.2 Dynamics in the Ω space . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . 3 Development of the formalism 3.1 Generalization of Ω networks . . . . . . . . . 3.1.1 Categories . . . . . . . . . . . . . . . 3.1.2 Generalized Triples . . . . . . . . . . 3.1.3 Description of a generic Ω network . 3.1.4 Other concepts related to Ω networks 3.2 Distance between symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 48 50 50 51 51 52 53 53 54 56 56 . . . . . . 59 59 59 61 62 64 65 4 Reduction of redundancies 4.1 The NSRPS Algorithm . . . . . . . . . . . . . . . . . . . . . . 4.2 The reduction transformation R . . . . . . . . . . . . . . . . 4.2.1 Reduction of two simple Triples . . . . . . . . . . . . . 4.2.2 Transformation R (e, ⋆) . . . . . . . . . . . . . . . . . . 4.2.3 Reduction of two general Triples . . . . . . . . . . . . . 4.2.4 Reversibility of R and separation . . . . . . . . . . . . 4.2.5 The causality constraint . . . . . . . . . . . . . . . . . 4.2.6 The reduction algorithm . . . . . . . . . . . . . . . . . 4.2.7 The orbits of R, and the (approximate) reduced network 4.2.8 Holistic and Unary system are two fixed point of R . . 4.3 The ω R networks and their relevant quantities . . . . . . . . . 4.3.1 Irreducible operations: π⋆ . . . . . . . . . . . . . . . . 4.4 The complexity functional . . . . . . . . . . . . . . . . . . . . 69 71 73 73 74 76 76 78 80 82 83 84 85 85 5 Complexity of numeral systems 5.1 Holistic and Unary . . . . . . . . . . . . . . . . . . 5.2 Complexity of the Italian and French system . . . 5.3 Complexity of the positional systems . . . . . . . . 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . 87 88 88 92 97 . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS vii 6 Perspectives and conclusions 99 viii CONTENTS Introduction Numbers are pervasive in our everyday life. They lie at the heart of our technology and science. They are the building blocks of mathematics, and the holy Graal of its deeper mysteries. Eminent philosophers in the ancient world put them at the basis of reality itself. Numbers are in our language and our writing systems. Recent archeological findings suggest that the origin of the writing systems is rooted in the ancient accounting systems, developed in Mesopotamian cultures [SB92]. The evolution of an (apparently) rudimental accounting system, based on small clay objects -cones, spheres, disks and other forms- evolved in the course of millennia through a sequence of higher and higher abstractions into a complex system of symbols for the abstract numbers and then for words and sentences (See Fig.1 and relative caption). It is astonishing the variety of numeral systems accross languages and cultures in the world. Many languages have very small numeral inventories, just words up to two or three, and perhaps a possibility to express exact numbers up to at most ten, using these and the word for “hand” [Ham06]. But in languages which do not have small numeral inventories, numeral expressions form a system: a set of interrelated entities that can be studied from the point of view of its complexity. We want address the following question: “What is the complexity of numeral systems ? ”. More precisely “How can we define a notion of complexity that reflects the cognitive effort required in the memorization and the mastering of numeral systems ?”. In order to give meaning to this sentence we must first explain what are numeral systems first. This will be addressed in Chapter 1, where we review a (very small) part of the literature dedicated to this subject. In the first section (1.1) we report the salient facts from neuropsychology and cognitive science about how the human (and, in a great extent, animal) brain represent exact quantities and manipulates them. This is a necessary premise: how our brain works reveals essential in shaping the evolution, and the actual form of numeral systems. Than, in the remaining of Chapter 1, the focus is more 1 2 CONTENTS Figure 1: Envelop with tokens from Susa, Iran. A clay envelop was used for storing plain tokens, small clay objects (usually an inch or less), that were used to represent, by virtue of their shapes, specific commodities: a cylinder stood for an animal; cones and spheres referred to two common measures of grain. Clay envelopes where gradually substituted: Sumerian accountants began impressing the tokens on the soft exteriors of the envelopes before enclosing them, thus leaving a visible record of the number and shape of tokens held inside. At some point, accountants must have realized that the markings on the envelope -reflecting everything significant about its contentsrendered the tokens superfluous. Thus were the first written tablets created, as two-dimensional symbols; a circle replaced the sphere, a the wedge the cone. on how numeral systems are like in the different cultures all over the world. In general less attention is paid to the question of the origin and evolution of numeral systems, that is part of a more general question: that of the origin and evolution of language. In the last section (1.4) we rewiew briefly the early attempts by linguists of defining a complexity for a numeral system, and than describe our approach. In the last decades physicists have devoted great attention to the study of complex systems, i.e. systems that “acquire a functional, spatial or temporal structure without specific interference from the outside” [Hak49]. Language, as was recognized in recent years, is to a great extent a complex adaptive system that organize itself through social interactions [LS07]. With this in mind we introduce in Chapter 2 a new network model, that we call Ω, aimed at describing a numeral systems. The point of view that we adopt is that of describing the inner “logic” in the formation of higher symbols from lower ones, abstracting from the concrete form of the representation, be it the waveform of the pronouced word or a the shape of a sign traced on a sheet of paper. CONTENTS 3 This approach has the advantage that we can compare natural language and written numeral systems on a common basis, but we pay a price for this. The price is that we cannot include in our networks the information on how complex is the “mapping” from the concepts to the concrete representations. We study the general properties of the Ω networks, establishing a language suitable for their description. The formalization of the description of a mathematical object is essential in order to define its complexity. In this Chapter we also report the results of extensive numerical simulations on different networks, built upon several numeral systems models. These models are inspired by natural languages and written systems, but we also introduce abstract models, that are useful to illustrate crucial theoretical aspects. The main focus of these simulations is on the characterization of the structure of the networks, i.e. the static aspects. What we learn in this Chapter, and is one of the original contributions of this Thesis, is how the networks of numeral systems are like: this is, at our knowledge, not covered in the literature. The main conclusion we draw from these measures is that the topological and statistical properties of these networks can signal if our numeral system is governed by some rule, but it is hard from these informations to extract the representation of these rules. This statement of fact was the motivation focalizing, in a second moment, on the detailed way the symbols are related. The idea was born from a very simple observation: when trying to draw these networks on a blackboard, we were spontaneouly drifting towards a reduced representation. This was possible only for highly redundant system, like the decimal, but was absolutely impossible with the systems built on prime numbers or from random rules1 . The natural step was to elaborate a system to reduce systematically the redundancies in the networks, in order to obtain a more compact, but equivalent,2 version of the original ones. The idea behind a well known algorithm for the lossless compression of sequences (the NSRPS algorithm [WM80] [Gra02] [BCG06]) was the right one, but it had to be adapted, and somewhat reinvented, for the very different structure of the object to compress: a network, although of a special type. This effort is described in Chapter 4 were we define a transformation R in the space Ω, that exploits the redundancies (of a certain class3 ) of our 1 Every natural number x has a unique decomposition in prime factors and so it is natural to consider it as a representation for x. This system of representation defines the Primes numeral system 2 It is always possible to recover from the reduced networks the original one 3 The redundancies are precisely defined in terms of the topological properties of the networks. 4 CONTENTS networks, and can be applied recursively, until all these redundancies are exploited. In the case of sequences the length of the compressed sequence was in relation with the entropy of the emitting source, and we adopt the point of view of the shortest description defining complexity of a numeral system as the description of its reduced network. In the final Chapter we study in detail how the reduction R works, calculating the reduced representations and the complexities of some relevant systems from natural language and written systems. Remarkably we observe that the topological structure and the description of the reduced networks, gives valuable information on the role and the function that different symbols have within the numeral systems, and we think that this is a clear indication that our complexity measure captures the essential elements of the perceived complexity of these cognitive-linguistic systems. Chapter 1 Numeral Systems In this Chapter we will review some relevant features of numeral systems, as a part of natural language and of writing systems. The words and signs for numbers are the visible part of a cognitive systems: they permit us to have access to the knowledge of exact quantities, that otherwise would be unaccessible, due to fundamental cognitive limitations. They are shaped by two fundamental forces: • a dedicated neuronal circuitry for the access, manipulation and representation of arithmetical concepts; • the social communicative interactions of billions of humans that, during millenia, needed to communicate exact quantities in order to cooperate, often cooridinating a complex collective behaviour, or, simply, to survive. 1.1 The perception of abstract numbers Our attention is focused on a restricted but very important domain of semantic knowledge: that of abstract numbers. The concept of abstract number lies at the foundation of ancient and modern mathematics, but paradoxically it was only between the end of the eigtheenth century and the beginning of the nineteenth century that its mathematical foundation has been developed, thanks to the efforts of the logician Gottlob Frege [Fre79], and then, independently, of the eminent philosophers and matematicians Russell and Whitehead in their monumental work of Principia Mathematica [WR27]. Anyway we will not be focused here in the mathematical conception of number, but only in its everyday-life meaning, as a part of our semantic world. In this view, the number (or cardinality) of a set can be defined as 5 6 CHAPTER 1. NUMERAL SYSTEMS “the only property that remains invariant under substitutions of any of its items”. Thus we can talk about three objects (see Fig.1.1), three persons, three sounds, or three events: we can recognize that the cardinal of a set is three, regardless of its composition [DDLC98]. Figure 1.1: The concept of abstract number: the two sets A and B are different, but contains the same number of elements. They are equally representive of the “threeness” quality. The possibility that humans have access, through their cognitive system, to the numerosity of a set relies on a dedicated neuronal circuitry, inherithed from biological evolution [DDLC98]. Animals, young infants and adult humans possess a biologically determined, domain-specific representation of number and of elementary arithmetic operations: strong evidences points to an evolutionary endowment of abstract numerical knowledge in the brain. But our perception of number is far from being sharp: there is a very surprising difference between the mathematical conception of number and how our cognitive system perceives it. It is important in this respect, to clearly distinguish between symbolic and non-symbolic aspects of the knowledge of number and arithmetics. Symbolic arithmetic deals with how we understand and manipulate numerals 1.1. THE PERCEPTION OF ABSTRACT NUMBERS 7 and number words, while non-symbolic arithmetic is concerns how we grasp and combine the approximate numerosity of concrete sets of objects. Our core knowledge of arithmetic is essentially non-symbolic. But when number symbols are available, they become strongly attached to the corresponding non-symbolic representations of numbers and, thereafter, a form of “secondorder” intuition seems to develop, as the links between symbols and the quantities themselves become fast, automatic and unconscious. 1.1.1 Our cognitive limits Our non-symbolic knowledge of abstract numbers is comparable to that of animals and young infants. We do not have a direct knowledge of exact quantities without performing calculations, i.e. without symbolic computation. Figure 1.2: If we look at a numerosity of objects, points etc. no larger than 3 or 4 (as humans) we have an immediate perception of the numerical quantity without relying on counting. This cognitive faculty is called subitizing. The value of the subitizing limit (3 or 4 for humans) influences our behaviour; it is perhaps an important parameter in studying the collective dynamics of societies in suitable circumstances.1 1 In [BCC+ 08] for example it is discussed the relationship between the subitizing limit 8 CHAPTER 1. NUMERAL SYSTEMS Figure 1.3: When the number of objects is higher we rely on computational strategies in order to achieve a knowledge of the exact quantity of a set of perceptual objects. When we observe a number of spots exceeding our subitizing limit we have to process each item individually or in small subitizable groups, and put them in a one-to-one correspondence with more complex representations. There are two crucial mechanisms involved in th operation. The first is the individuation of the items (the groups of Fig.1.3). When we do that, we navigate the image with ocular movements, fixing our glance on each group at a time. Counting, but not subitizing, is made impossible if ocular and/or attentional movements are prevented [OTI81]. The second crucial component of counting is working memory. This is needed to keep in mind the total while integrating the successive items. While the estimation of small numerosity is shared with non-human animals, the counting mechanisms are peculiar of humans, are not universal among cultures, and are acquired progressively by learning in human children. Counting appears as a uniquely human activity, and a cultural invention. of (a certain species of) birds and the emergent properties of their flocking dynamics. 1.1. THE PERCEPTION OF ABSTRACT NUMBERS 1.1.2 9 Approximate Representation of numerosity When humans discriminate or compare the numerosity of two sets of dots, under conditions that prevent counting, responses are approximate and become increasingly accurate as the difference between the numbers increases, in a way that is modulated by their ratio. This ratio-dependent behavior is an istance of Weber’s law [Deh03] [PI09], which is typically found in judgments of continuous perceptual variables such as length, luminance, or frequency. Weber’s law can be stated as follows: over a large dynamic range, the treshold of discrimination between two stimuli increases linearly with stimulus intensity. Weber’s law can be accounted for by postulating that the external stimulus is scaled into a logarithmic internal representation of sensation [Fec60]. It is very pervasive in numerical cognition: it is observed independently of culture, degree of instruction, age. It is also observed in various animal species performing many different tasks. The universality of Weber’s law in animals, humans of all age and education is taken to indicate the presence of an universal mechanism for approximate number processing [PI09]. 1.1.3 Distance and Size Effects • The numerical distance effect (Fig.1.5) refers to the empirical finding that the ability to discriminate between two numbers improves as the numerical distance between them increases. It is faster and easier to compare four with eigth than four with five, even after intensive training; The distance effect common to humans and animals, and, in the first case manifest itself also when processing Arabic digits or number words. The occurrence of a distance effect even when numbers are presented in a symbolic notation suggests that the human brain converts numbers internally from the symbolic format to a continuous analogical format. • The size effect the discrimination of two numbers worsen as their numerical size increases. This effect is substantially a Weber’s law holds cross-modally: it is found in animals and humans engaged in discrimination or comparison tasks with visual objects or sounds. The number size effect is found also when humans are presented with Arabic digits or number words, also when the subjects are highly trained in mathematics. This indicates that, in certain circumstances, humans access a numerical representation that is similar to that of animals. 10 CHAPTER 1. NUMERAL SYSTEMS Figure 1.4: Distance Effect. When subjects are asked to make comparisons between two numbers, independently of the modality of presentation, the error rates decrease monotonically, with an approximate logarithmic function of the numerical distance between the two numbers. (D) Humans exibit a distance effect also when processing symbolic numerals, but the error rates are much lower than in the case (C) of a non-symbolic processing. (Reproduced from Ref. [DDLC98]) In Chapter we will introduce a “distance” between symbols inspired by the distance and size effect. This distance is expressed in terms of the logic of construction of higher symbols from lower ones and we will use it mainly to illustrate some general properties of numeral systems. In conclusion, as humans we have a very limited discriminating capacity of number perception, shared with animals and young infants. Our brain seems more well designed for the approximate calculation, than for the precise calculation. Fortunately we are a symbolic species [Dea97], and this has been the impulse for a cultural effort, the invention of a language for numbers. In the next Section we will take a look to the solutions that humans developed in their cultures in order to grasp the knowledge of the larger exact numerical quantities. 1.2. THE LANGUAGE OF NUMBERS 11 Figure 1.5: Size Effect. The performance on various numerical tasks become increasingly imprecise as the number involveg get larger. (A) Rats compared the number of lever presses to several targets. The dispersion of the distribution of correct responses grows as the numerical target grows. (B) Humans compared the numerosity of a dot pattern to a target. The distribution of the errors shows a similar behavior, but the scale is different. (Adaptation from Ref. [DDLC98]) 1.2 1.2.1 The Language of Numbers The origins of a language for number Not all languages have a numeral system. Some languages have quite simple systems, capable of counting only to about 20 or even lower. In primitive systems, the words have not always fully lost their non-numerical meanings. So the word for 5 might also mean “(left) hand”; the expression for “+ 1” might also mean “and another”; the expression for 10 might also mean “man” or “whole” or “finished” or “right hand”. In these systems, either all the numeral expressions are monomorphemic 12 CHAPTER 1. NUMERAL SYSTEMS (or at least do not contain more than one morpheme with a numerical interpretation), or a relatively low number, such as 2, 3, 4, 5, is used as a basis of addition (or very much more rarely of subtraction or multiplication). Sometimes, after a base number appears in the counting sequence, it is used for all higher numbers. But this is not always so, and there can be what appears to be fairly random interspersing of morphologically complex numerals with monomorphemic numerals. In natural languge numeral systems, monomorphemic numerals will be called elementary symbols. Numeral systems have evolved by successive small increments of linguistic invention. The successive inventions are built somewhat roughly on the pre-existing structures, so that growth marks can be seen in the resulting developed systems. Languages, like organisms can have vestigial characters that lost all or most of their original function through evolution. The early stage in the development of a numeral system relies quite certainly in the practice of counting the body parts. Fingers put to a one-to-one correspondence with any set of items. The gesture of raising three fingers comes to serve as a symbol for the quantity 3. In many aboriginal groups there is a rich vocabulary of numerical gestures, that fulfill the same role of a symbolic representation. Then, immediatly beyhond the gesture is the naming. Naming a body part suffices to evoke the corresponding numeral. In many societies in New Guinea, for example, the word six is literally “wrist”. In countless languages throughout the world the etymology of the word “five” evokes the word “hand”. But the body parts are a small number, a very natural limit is twenty, given by the sum of our fingers. In order to go beyond this limit we must make “infinite use of finite means”: we need a syntax that allow larger numerals to be expressed by combining several smaller ones. This often implies the choice of a base number, and the expression of larger numbers by means of a combination of sums and products. Most languages have adopted a base number such as 10 or 20 whose name is often a contraction of smaller units. 10 “two hands”. Once the new form is established, it can itself enter into more complex constructions. And contractions, morphological and phonetical distortions are possible. The most familiar type of numeral system in better known languages is decimal, and sometimes also partly vigesimal. This canonical type [Hur99] has the following characteristics: • Single words for 1 − 10, • Use of addition to 10 for 11 − 19 • Use of multiplication by 10 (or 20), (and addition) for 20 − 99, 1.3. LINGUISTICS OF NUMERAL SYSTEMS 13 • Single words for higher bases, typically 100, 1000, and sometimes also 20. We will use in the following the word canonical for such systems with any base, but with the same regular rules of formation. Further characteristics, common to both primitive and developed types of system, are: • Complete coverage to some limit: there are no gaps in the counting sequence; • No ambiguity or homonymy (examples of ambiguous numerals are extremely scarce, if they occur at all); • Little, if any, redundancy or synonymy (from a vast set of arithmetically possible combinations for any given number, typically there is only a single well-formed numeral used in the canonical counting sequence. The occasional exception to this generalization occurs, as in paraphrases like English one thousand one hundred versus eleven hundred); • Recursion: expressions for higher numbers typically contain expressions for lower numbers nested within them; • Packing Strategy: the recursive possibilities are severely constrained by a principle to the general effect that one builds on the highest valued expression available (See [Hur87] for details and discussion). In the following we will refer to the canonical numeral system as one with base B and the properties just described, but a little bit simplified. 1.3 1.3.1 Linguistics of numeral systems Composition of Numerals Phrase-structure rules are a way to describe a given language’s syntax. They are used to break a natural language sentence down into its constituent parts (syntactic categories) namely phrasal categories (like noun phrases or verbal phrases) and lexical categories (part of speech, like nouns, verbs, adjectives, and so on). [Cho88] [Cho02] With remarkable uniformity, the basic form of most syntactically complex numerals in most languages can be generated from a universal schema of just two simple phrase structure rules. Here, “NUMBER ” represents the category Numeral itself, the set of possible numeral expressions in a language; “DIGIT” represents any single 14 CHAPTER 1. NUMERAL SYSTEMS Figure 1.6: Phrase structure rules for a syntax of numeral systems. numeral word up to the value of the base number (e.g., English one, two,..., nine); and “M” represents a category of mainly noun-like numeral forms used as multiplicational bases (e.g., English -ty, thousand, and billion). The curly brackets in the rules enclose alternatives; thus a numeral may be either a DIGIT (e.g. eight) or a so-called PHRASE (numeral phrase) followed optionally by another numeral (e.g., eight hundred or eight hundred and eight). If a numeral has two immediate constituents (i.e., is not just a single word) the value of the whole is calculated by adding the values of the constituents; thus sixty four means 60 + 4. If a numeral phrase (as distinct from a numeral) has two immediate constituents the value of the whole is calculated by multiplying the values of the constituents; thus two hundred means 2 × 100. 1.4 The complexity of numeral systems We will introduce and study in detail a model for the representation of numeral systems as networks that we will call Ω. This model captures the essential aspects of numeral systems as symbolic systems • existence of elementary symbols 1.4. THE COMPLEXITY OF NUMERAL SYSTEMS 15 • compositionality • recursivity but does not introduce a priori the grammatical categories, like in the phrase structure rules in (1.6). We can consider this as a zero model for a grammar of numeral systems. Figure 1.7: These “rules” are indeed a zero model for a grammar of numeral systems. They simply means that a number can be represented with an elementary symbol, or with a representation built upon the representations of other two numbers. In our approach, as a first step (Chapter 2) we will transform these rules in a network. These networks are constructed starting from the real data of natural language or written numeral systems, or from abstract model introduced by us as analytical tools. In this way we could see the issue of growing numeral systems as growing networks. [DM03] Then, in Chapter 4 we will define a transformation that try to reduce the redundancies present in the original network, and obtains a compact version of it. In this process the numbers that have the same role or are constructed following similar patterns are grouped together. We will call these groups Categories, and we will see that in some relevant cases (Chapter 5) we find meaningful grammatical categories. The quantities related to the compact 16 CHAPTER 1. NUMERAL SYSTEMS version of the network will permit us to define a complexity for numeral systems in a precise way. Chapter 2 A network model for Numeral Systems The aim of this Chapter is to introduce, and study in detail, a network model that is capable to describe the “logical” organization of numeral systems as a model for simple symbolic systems. This model is capable to describe some elementary but very general properties of symbolic systems. We do not make any reference to the meaning of the symbols: the focus in on how complex symbols are put together in terms of more elementary ones. The fact that symbolic systems, in particular language, are composed by elementary constituents that are combined together in order to form higher and more complex expressions in known as compositionality and is one of the common features of all non trivial symbolic systems. 2.1 Ω symbolic systems Ω symbolic systems are a mathematical model that we introduced in order to describe the relevant features of simple systems of symbols like numerals words and sentences in natural language or graphemes1 and combinations of these in written systems. We will call these elements simply symbols. 2.1.1 Axioms for Ω systems An Ω symbolic systems is a finite sets of symbols that satisfy the following axioms: 1 A grapheme is a fundamental unit in a written language. 17 18 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS • there is a subset of elementary symbols that cannot be expressed in terms of others; all the other symbols are said composed; • composed symbols are the result of a unique binary operation; the nature and the number of these operations depends on the system, but its output is always a well defined element of the system. The input symbols can be either elementary or composed; In the set E are defined relationships between elements that describes the way composed symbols are constructed with “more elementary” ones. It is natural to define a network structure, that we will call Ω that describes these relations. 2.2 Elements of Ω networks Ω networks are intended to represent the “logic” behind the composition of elementary symbols into complex ones. This logic is deduced inspecting the representations of these symbols living in the physical world, like signs traced on a paper, or linguistic sounds floating in the air. We will use the peculiar notation Ψ in order to signal when we are talking about representations of numerical concepts. So in natural language numeral systems Ψ is a function that assigns words or phrases to natural numbers, and analogously, in written systems, strings of written symbols are concerned. We will see in Section 2.3 how to build the Ω networks from these representations. The description of how elementary symbols are represented by means of concrete objects, and how these representations are put together (eventually transformed) to construct the representations of composed symbols (See for example [RMR08]) falls out from the purpose of the Ω network model, that abstracts from these details and concentrates on the logic behind the system. Nevertheless these aspects are important in a study on the complexity of numeral systems. We will not follow this line: we will only observe that physically realizable (and realistic) symbolic system cannot have an infinite number of elementary symbols. This was stressed by Turing in his classic 1936 paper [Tur36], where he observed that in any realistic symbolic system, the concrete representations for the elementary symbols occupy a limited (compact) space, as for example a small square of length 1. Since any realistic symbol occupies a non-zero area of this space, it is impossible to have infinite symbols that are also distinguishable. From now on we will concentrate only in how the symbols are organized together into a system, and study the complexity of this organization. 2.2. ELEMENTS OF Ω NETWORKS 2.2.1 19 Nodes In Ω networks there are two kind of nodes: • circular nodes are representatives of natural numbers 0, 1, 2, . . . ; they can also represent more complex (hierarchical) structures that we will call Categories, but for the moment we will not consider this possibility (See 3.1.1);2 • square nodes are representative of arithmetical binary operations choosen from an a priori fixed set {⋆1 , ⋆2 , · · · ⋆r }.3 In this Thesis we will consider the addition and the multiplication, as the only possible arithmetical operations {+, ×}, because these are sufficient in order to discuss almost all known numeral systems [Hur87]. A noticeable exception is represented by the Roman written numeral system, in which the possible operations are {+, −} operation, but we will not discuss it here. The circular nodes inherits the “elementariness” from the numerical symbols they represent. We usually colour circular nodes with yellow (or orange) if they are elementary, and with green if they are composed. 2.2.2 Triple Every composed number is the output of a binary operation. The two input nodes have a direct link to the operational node, and this has a direct link to the output node. Input numbers can be elementary or composed. The input and output nodes, the operational node and the links are part of a unity that we will call Triple. An example of this structure is illustrated in Fig.2.1. We call T (x) the Triple associated to the output node containing x (it is defined only for composed numbers). Our networks are organized into such units, as is showed in Fig. 2.2 . Ω networks, with their characteristic Triple structure, are a natural network representation of symbolic systems that are well described by the phrase structure rules described in Fig.1.7. But, while phrase structure rules used by linguists are meant to represent all possible grammatical sentences in a language [Cho88], and are given in terms of recursive relations between a priori recognized grammatical categories , Ω networks are built upon the actual sentences in a language, the 2 The Categories have nothing to do with the concept of “category” in mathematics. In this work we do not consider the possibility of a dynamic generation of operations, for example under selective pressures or optimality principles, but it is a very interesting direction of future research. 3 20 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS Figure 2.1: The Triple of the number x (T (x)). The elementary numbers are associated to the yellow nodes and the composed numbers with the green ones. The square node represents a binary operation among, among the available ones, typically {+, ×}. Only in special cases (for example in the case of the Roman numeral system) the inventory of possible operations is {+, −}. raw data, that in our case is given by the numerals or the written symbols of a numeral systems. A subset of Ω networks, that I will call ΩHurf ord networks, is a representation of the more restrictive phrase structure rules described in Fig.1.6, that is well suited for developed numeral systems. These networks are no more organized into Triples. The recurrent structure is described in the following picture It is clear that the ΩHurf ord subset is very small with respect to Ω. We choose this zero model for two reasons. The first is that is simpler to deal with with analitical and numerical simulation tools. The second, that is the real reason behind our approach, is that we don’t want to fix the grammatical categories of numbers a priori. On the contrary we want them to emerge from the topological properties of a “zero” model, that does not introduce a priori any category but one: the number. 2.3 To build an Ω network Now we have all the elements for building the networks of numeral systems. We illustrate how to do that with some abstract and concrete models, among which two primitive numeral systems (Fuyuge and Miskito) two from devel- 2.3. TO BUILD AN Ω NETWORK 21 Figure 2.2: The network of a small (randomly built) numeral system. The yellow nodes represent elementary symbols, all composed numbers are built upon them through arithmetical operations {+, ×}. Every composed number forms a Triple with its operational node and the nodes of its parental numbers. The network associated to numeral systems are organized into Triples. oped ones (Italian and French). 2.3.1 Fuyuge and Miskito Mafulu is the name of the people who live in a group of villages within and near the north-westerly corner of the area of the Fuyuge-speaking people, a Papuan language, and may be regarded as one common language throughout the Fuyuge area [Wil12]. We give the first few numerals of its numeral system, which is substantially a base-2 one. This quite regular (and redundant) structure is made visible in its associated Ω network (See Fig.2.4). • 1 = Fida (One). • 2 = Gegedo (Two). 22 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS Figure 2.3: The fundamental unity of a ΩHurf ord network. This means that a number (category Number) can be a digit (elementary symbol), or a more complex numeral. In this case it is composed by a Phrase and (eventually) another number, connected by an addition. The Phrase is always composed by a numeral (category M) expressing a base or a power of it, and (eventually) another number. In this case the operation between them is a multiplication. • 3 = Gegedo minda (Two and another). • 4 = Gegedo ta gegedo (Two and Two). • 5 = Gegedo ta gegedo minda (Two and Two and another) [ or Bodo fida (one hand)]. • 6 = Gegedo ta gegedo ta gegedo (Two and Two and Two). • 7 = Gegedo ta gegedo ta gegedo minda (Two and Two and Two and another) [or Bodo fida ta gegedo (one hand and Two)] . • 8 = Gegedo ta gegedo ta gegedo ta gegedo (Two and Two and Two and two [or Bod o fida ta gegedo minda (one hand and Two and another)]. 2.3. TO BUILD AN Ω NETWORK 23 • 9 = Gegedo ta gegedo ta gegedo ta gegedo minda (Two and Two and Two and Two and another) [or Bodo fida ta gegedo ta gegedo (one hand and Two and Two)]. • 10 = Bodo gegedo (Two hands). • 11 = Bodo gegedov’ u minda (Two hands and another). • 12 = Bodo gegedo ta gegedo (Two hands and Two). • 13 . . . Figure 2.4: The network of Fuyuge System. We notice that the number 5 is represented in two different forms, one as a elementary symbol, one as the output of 2 + 2 + 1. The synonimy is very rare in numeral systems and is mainly present in primitive ones (one occasional exception occurs in English, numbers like one thousand one hundred 1100 and its paraphrase eleven hundred ). Miskito is an indigenous language of Central America, spoken by nearly 200, 000 people in Nicaragua, Honduras and Belize. The Miskito numeral 24 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS system is substantially base-5 [Hal91], but there are evidences of base-2 and base-6 structures [Hur99]. The irregularities (often called idiosyncracies in lingustic jargon) are reflected into the disordered structure of its Ω network (See Fig. 2.3.1). • 1 = Kum (One) • 2 = Wol (Two) • 3 = Yumpa (Three) • 4 = Wol Wol (Two Two) • 5 = Matsip (Five) • 6 = Matlalkahbi (Six) • 7 = Matlalkahbi pura kum (Six + One) • 8 = Matlalkahbi pura wal (Six + Two) • 9 = Matlalkahbi pura yumhpa (Six + Three) • 10 = Matawalsip (Ten) • 11 = Matawalsip pura kum (Ten + One) • 12 = Matawalsip pura wal (Ten + Two) • 13 . . . 2.3.2 Italian and French systems The most familiar type of numeral system is decimal, like the Italian system, and sometimes also partly vigesimal like the French system. Italian numeral system is of a canonical type for numbers lesser than 104 , than there is a transition to an higher base (the superbase 103 ).4 Its elementary simbols correspond to the single words for 0, 1, . . . , 10 zero, uno, due, . . . , dieci and for the following powers of ten: 102 , 103 , 106 , 109, . . . cento, mille, un milione, un miliardo, . . . . 4 The insertion to a superbase is very common in developed system (See [Hur87]). 2.3. TO BUILD AN Ω NETWORK 25 Figure 2.5: The network of Miskito numeral system. The addition by 10 is used for 11, . . . , 19, the multiplication by 10 for 20, . . . , 90, and multiplication followed by addition is used for 21, . . . , 29, . . . 91, . . . , 99. Italian numeral system has a very regular (and redundant) network as is evident from Fig.2.3.2, where is possible to note also the peculiar role played by the base 10 and the unity. The French counting system is partially vigesimal: 20 (vingt) is used as a base number in the names of numbers from 60 to 99. The French word for 80, for example, is quatre-vingts, which literally means “four twenties”, and soixante-quinze (literally “sixty-fifteen”) means 75.5 This system is comparable to the archaic English use of score, as in fourscore and seven, meaning 87, or “threescore and ten”, meaning 70. Belgian French and Swiss French are different in this respect. In Belgium and Switzerland 70 and 90 are sep5 This particular structure was introduced during the French Revolution as an attempt to unify the different counting systems (mostly vigesimal near the coast, because of Celtic and Viking influences). 26 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS Figure 2.6: A small part of the network of the Italian numeral system. Notice the different treatment of number 1 for addition and multiplication. All the nodes containing the same numbers are to be considered identified, we draw them separately only for the sake of clarity. tante and nonante. In Switzerland, depending on the local dialect, 80 can be quatre-vingts or huitante. In Belgium, however, quatre-vingts is universally used. The elementary symbols for French are 0, 1, 2, 3, 4, . . . , 10 zéro, un, deux, trois, quatre, . . . , dix and 102 , 103, 106 , . . . cent, mille, un million, . . . The seemingly indipendent symbols for 11, 12, . . . , 16 onze, douze, . . . , seize but in reality, like the corresponding Italian numerals are derived from their Latin ancestors. So we will consider them as 10 + 1, 10 + 2, . . . , 10 + 6. Higher numbers 20 − 69 consists of a word for the multiple of 10 plus optionally the number for the 1 − 9 from the list opposite. The names of the tens {20, 30, 40, 50, 60} are vingt, trente, quarante, cinquante, soixante. 2.3. TO BUILD AN Ω NETWORK 27 These continue on from {70, 71, 72, . . . , 79} (soixante-dix soixante et onze, soixante-douze, . . . ) Notice the et in 71 mimics the behaviour of 21, 31, . . .. The French for 80 is quatre-vingts. Numbers 81 − 99 consist of quatrevingt- (minus the -s) plus a number 1−19 (quatre-vingt-un, quatre-vingt-deux, quatre-vingt-dix, quatre-vingt-onze, . . . , quatre-vingt-dix-neuf ) The Ω network for the French system is less regular than the Italian network; it shows the relics of the vigesimal system, emphasizing the role of number 20, but at the same time it shows the substantially decimal nature of the French numeral system too (See Fig. 2.3.2). Figure 2.7: A small part of the network of French numeral system. Notice the peculiar construction of the numbers 70 (soixante-dix), 80 (quatre-vingts) and 90 (quatre-vingt-dix). Now we introduce two idealized numeral systems that will have a very important role in the following. 2.3.3 Holistic and Unary system In all realistic numeral systems there is, in a certain sense, a compromise between the Holistic and the Unary system. In the evolution of numerical symbols, as we saw in the preceding Chapter, there is evidence of the fact that 28 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS almost all written systems begin with a sequence of Unary-like symbols. On the other side each numeral system has a repertory of elementary symbols, typically the first few numbers, and it is Holistic within this range. The Holistic system’s Ω network is a set of isolated nodes, all of them standing for elementary symbols (See Fig.2.8). The Unary system instead has only one elementary symbol, standing for the number 1, and all other numbers are created by reiterated additions (See Fig.2.9). These models are somewhat abstract, but we will find them very useful in the development of the theory, in that they are in a certain sense two extremal points in the space of Ω networks. Figure 2.8: The Holistic system. If we consider the first few words in the numeral systems of all cultures, the great majority of times we find that are independent and irreducible words, from a morphosyntactic point of view, i.e. they are elementary symbols. This so for example in Italian for the words from 1 to 10, then compositionality arise. There are exceptional cases, as for example in Panjabi numerals where the Holistic part of the system extends up to 100. In this sense every numeral system has an Holistic part, but this part can have very different extensions. 2.3.4 Positional systems with arbitrary base Positional notation or place-value notation is a generalization of decimal notation to arbitrary base. These include binary (base 2) and hexadecimal (base 16) notations used by computers as well as the base 60 notation of Babylonian numerals. Indian mathematicians developed the Hindu-Arabic numeral system, the modern decimal positional notation, in the 9th century. Positional notation is distinguished from previous notations (such as Roman 2.3. TO BUILD AN Ω NETWORK 29 Figure 2.9: The Unary system. Notice that in order to construct any symbol it is necessary to construct all the others first. The only exception is for 1, that is the only elementary symbol. numerals) for its use of the same symbol for the different orders of magnitude (for example, the “one’s place”, “ten’s place”, “hundred’s place”). This greatly simplified arithmetic and lead to the quick spread of the notation across the world. In order to construct the Ω network for positional systems we must first establish which number is an elementary symbol. In positional systems the digits and the base are to be considered elementary symbols, because they are independent graphical signs. The other elementary symbols are the powers of the base. For any given natural number x the natural decomposition in a positional system of size N = B k is x = B k−1 × ak−1 + B k−2 × ak−2 + · · · + B 0 × a0 where the {ai } are digits. In order to build the Ω network we first construct the Triple n T (x) : x = B k−1 × ak−1 + B k−2 × ak−2 + · · · + B 0 × a0 o and then the inner Triples corresponding to the two inner terms B k−1 × ak−1 , B k−2 × ak−2 + · · · + B 0 × a0 . This procedure is repeated iteratively, until all the Triples involved have elementary symbols as input nodes. 30 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS The Ω network associated to a positional system is then clearly highly redundant, and its topology must reflect a special role of the digits and the powers of the base. We observe that all the digits are treated on the same footing in these networks, in particular there is no special role for number 1. 2.3.5 Canonical systems We described canonical systems in Chapter 1; they are a class of models inspired by the structure of the decimal system, when incorporated in natural languages. They reserve a special role to the symbols for unity and zero. The unity does not appear in multiplicative expressions, and the zero is a standalone elementary symbol (notes that in positional systems the zero is highly connected to the rest of the network, it is indeed a hub). The Ω networks associated to positional systems are highly ordered, as we can see in Figg.2.10, 2.11. Figure 2.10: The Ω network associated to the canonical (base 4) numeral system with size 64 (43 ). In orange we painted the elementary symbols. 2.3. TO BUILD AN Ω NETWORK 31 Figure 2.11: A larger part of the Ω network for the canonical (base 4)(size N = 256 = 44 ). 2.3.6 Primes systems Prime numbers are the “building blocks” of natural numbers. Their crucial importance in mathematics, and particularly in number theory, stems from the fundamental theorem of arithmetic which states that every positive integer larger than 1 can be written as a product of one or more primes in a way which is unique except possibly for the order of the prime factors [GW79]. For example, we can write 666 = 2 × 3 × 3 × 37 We will adopt a standard factorization in which the prime factors are ordered in a crescent way (p1 < p2 · · · < pr ) so that for any given number n we will have n = pµ1 1 pµ2 2 · · · pµr r (2.1) where µi is the multiplicity of the i-th prime factor. 32 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS Our aim here is to use this decomposition as a mean to represent natural numbers. We imagine that every prime p is associated to an independent symbol Ψ(p) and the expression of a number n is given by the string formed by all the symbols pi of its decomposition, drawn a number of times equal to the multiplicity µi of the relative prime factor. For example in the case of 666 the representation will be given by Ψ(666) = Ψ(2)Ψ(3)Ψ(3)Ψ(37). We can associate a network to this representation in the following way. The elementary symbols, as we said, are the primes; for all other numbers we define their Triple T (n) as formed by pm × pnm , where pm is the smallest prime factor of n, taken with multiplicity 1. The resulting network has interesting properties, as we will see in the rest of this Chapter. Like the Unary and the Holistic, we can consider the Prime system’s Ω network as a frontier point in the space Ω. Let us consider the set of integers divisible by a prime p. The probability that extracting a random integer (with uniform measure) we find a number in this set is clearly 1p . Take now the set of integers divisible by both p and q, where q is another prime. To be divisible by p and q is equivalent to being 1 divisible by pq, and consequently the probability of this set is pq . Since 1 1 1 = × pq p q we can interpret this by saying that the “events” of being divisible by p and q are independent, and so, in a certain sense “primes play a game of chance” [Kac59]. This is the beginning of a new development which links in a significant way number theory and probability theory. One of the earliest findings in the probabilistic properties of prime numbers is that the number of prime numbers lower than N, usually indicated with π(N), is asymptotically equal to logNN . This is the celebrated Prime Number’s Theorem, obtained independently by Hadamard and de la Vallèe Poussin in 1896. Another probabilistic result in number theory that we will find useful for studying the Ω network of the primes is described below. Let us define ω(n) as the number of different prime factors of n and Ω(n) as its total number of prime factors6 ; thus, referring to equation 2.1, ω(n) = r, 6 Ω(n) = µ1 + µ2 + · · · + µr This is the reason for the name Ω, given to the networks representing numbers: its is an homage to these functions and incidentally to the Cantor first transfinite ordinal number ω. 2.3. TO BUILD AN Ω NETWORK 33 Figure 2.12: The Ω network associated to the Prime numeral system with size 100. It is evident the presence of three hubs corresponding to numbers 2 (in the middle part of the picture), 3 (upper left) and 5 (upper right). The orange numbers are the primes. Both ω(n) and Ω(n) behave irregularly for large n, and both functions are 1 when n is prime, while, for example Ω(n) = log n log 2 when n is a power of 2. Although the behavior of ω(n) and Ω(n) is erratic, (see Fig.2.3.6) both these functions show a statistical regularity, captured by a theorem (see [GW79] Theorem 430 page 355) the average order 7 of both ω(n) and Ω(n) 7 In number theory, the average order of an arithmetic function is often studied by means of some simpler or better-understood function which takes the same values on average. 34 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS 10 8 LD[n] 6 4 2 0 0 50 100 200 150 n 250 300 Figure 2.13: Given an integer n, Ω(n) is the number of its primes factors, counted with their multiplicity. In the y axis we reported the related function LD(n) = Ω(n)−1 (See Sec. 2.4.5). The erratic behavior of the Ω(n) function is evident. is log log n. More precisely X ω(n) = N log log N + B1 N + o(N) (2.2) X Ω(n) = N log log N + B2 N + o(N) (2.3) n≤N n≤N where the constant B1 is the Mertens constant (see [GW79]) and B2 can be expressed in terms of B1 by B2 = B1 + ∞ X 1 k=1 pk (pk − 1) pk being the k − th prime in the natural ordering. Their values are approximatively B1 = 0.26149, B2 = 1.03465. Let f be an arithmetic function. We say that the average order of f is g if X X f (n) ∼ g(n) n≤x as x tends to infinity. n≤x 2.3. TO BUILD AN Ω NETWORK 35 Figure 2.14: A larger part (size N = 103 ) of the network associated to the Prime system. Every prime number plays the role of an hub in such network. Here only two hubs (2 and 3) are distinguishable. 2.3.7 The ensemble of random numeral systems Random numeral systems are artificial systems that we introduce for two reasons: • to study the properties of Ω networks associated to human-created systems by comparison with those of a random ensemble; • to study the properties of the space of Ω networks. The creation of a random system is obtained by a random growth from the only preexisting (and necessary) symbol that we suppose to be available at the beginning: the 1. We create randomly a Triple for each x, starting from 2, then for 3, etc. Once a number x has its symbolic representation (once its Triple is created) it is available for the creation of higher numbers’ 36 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS representations. We add the elementary symbol for 0, but it is an isolated node, and has no function nor role in this system. The creation of a random system depends on two parameters: • the probability that at each step an elementary symbol is created, that we will call ε; • the probability of creating a + Triple, once we know that a Triple must be created (when the symbol is not elementary). We will call this probability p. To be more precise we give here the pseudocode for its generation: • x = 0 and x = 1 are always elementary symbols, so T (0) : {0} , T (1) : {1} • for all other x: T (x) : {x} with probability ε and, if this is not the case: – if x is prime create a random Triple with the + operation T (x) : {x = z + y} – if x is not prime extracts an operation {+, ×} with a Bernoulli distribution with parameters {p, 1 − p} – create a random Triple T (x) : {x = z ⋆ y} with the operation ⋆ extracted. The ensemble so defined depends on the two parameters (ε, p), and we will denote it E (ε, p). When we want to compare random numeral system of size N with other systems of the same size, we will set the parameters (ε, p) in a way so that the π (number of elementary symbol) and p are the same, on average. If we indicate with hπi the average number of elementary symbols in such a random network we have hπi = 2 + ε (N − 2) (2.4) and when we want to compare a random system with another system in which a certain π is given, we will have to use the correct ε: π−2 ε= (2.5) N −2 For that regards the parameter p, in a network with a certain value of + nodes, called π⋆+ in the following, we find that the right p is p= π⋆+ N −2 (2.6) 2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS 37 Figure 2.15: The Ω network associated to a Random numeral system with size 100. In orange we painted the elementary symbols. (parameters of the ensemble: ε = 0.1 p = 21 ) 2.4 Elementary observables of the Ω networks Now that we have introduced some models of numeral systems we develop the tools for their analysis. In this section we will introduce simple definitions and use standard tools from the theory of complex networks [RB02] [New03] [CRTVB07]. 2.4.1 The number of elementary symbols The first quantity that we will consider is the number of elementary symbols, that we will indicate with π.8 This is one of the most relevant quantities regarding numeral systems and their networks: the number of elementary 8 This coincides with the notation introduced for the Prime system. 38 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS Figure 2.16: A larger part (size N = 103 ) of the network associated to a Random system. We note the absence of any structure. symbols has to do with the memory resources required to learn to use such systems. For any given Ω network we define: π (ω) =| {n ∈ ω | n is elementary} | (2.7) where the operation | · | extracts the cardinality of the set within the curly brackets. As we will see in Sec. 6 this definition admits a generalization that reduces to the actual definition when the networks involved are simple. Let us now consider some examples. In the Holistic system π = N, while in the Unary system, that can be consider at the opposite side in the space Ω, we have π = 1. In positional and canonical systems the growth of the function π(N) is logarithmic, after an initial linear behavior. This is due to fact that the first numbers are also the digits (elementary symbols), and a new elementary symbol is introduced whenever the size exceeds a power of the base. In the Prime system the situation is more intriguing: as we 2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS 39 saw in Sec. 2.3.6, the number of prime factors lower than a given N is an irregular function that behaves asympotically like logNN for the Hadamard theorem. Finally, for random systems the function π(N) is stochastic but its mean value grows linearly: hπ(N)i = 2 + ε (N − 2). We can see that, just considering a very simple aspect of Ω networks, and a limited set of models, the phenomenology of Ω networks is already very rich. 2.4.2 The description of a simple Ω network In order to completely specify an Ω network we must give its elementary symbols and a description of its wiring diagram (i.e. a description of how the nodes are linked together). Since Ω networks are organized into Triples a description of the wiring diagram is a list of these Triples. We stipulate that the descriptional length of an elementary symbol is equal to 1, and the same convention is made for the descriptional length of a Triple.9 Since the number of Triples is N −π the descriptional length of a (simple) Ω network is given by L = π + (N − π) = N. We will see in the development of the Thesis how this concept of descriptional length will be useful. At this point the descriptional length is the same for all Ω networks, independently of their level of organization or of randomness: it depends only on their size. 2.4.3 Degree sequence and its distribution The degree is an essential property of a node. Let us consider the node representing a number x in an Ω network. If x is an elementary symbol than kin (x) = 0, otherwise, if it is a composed symbol kin (x) = 1; {0, 1} are the only two possible values because we have excluded the possibility to have “synonyms”, i.e. multiple different ways to construct the same number. The possible values of kout instead are all non-negative integers; if x is a dominant element, its kout will be high, with respect to the others. Such dominant elements are often called hubs in the language of complex networks and usually play key roles in networked systems. As we saw in Sec. 2.3.5 in canonical systems the base number and its powers are hubs in the corresponding network. The possible values for the degrees kin and kout of a node are described in Fig. 2.17. 9 Other choices are possible, like for example to assign length 3 to the description of a Triple, but such differences are immaterial. 40 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS Figure 2.17: The generic node x and its possible degree values kin and kout . In the following we will always refer to the kout of a node as its degree, because the kin only signals if a symbol is elementary or not. The kout one of the fundamental observables in studying the structure of Ω networks. We will analize the kout as a function of x (degree function) and its probability distribution P (kout ) (often replaced with P (k)). For example the degree function for the canonical decimal system is reported in Fig. 2.18. 10000 k_out(x) 1000 100 10 1 10 100 x 1000 10000 Figure 2.18: The degree function for the canonical base 10 system. We can observe the hubs in correspondence of the powers of the base and of their multiples. The recursive structure of its Ω network is reflected in the selfsimilar structure of the degree function graph. We analized the P (k) for the all the models of numeral systems introduced 2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS 41 in the preceding section. We see an example in Fig. 2.19, where the P (k) of a base 4 canonical system is reproduced. 1 0.1 P(k) 0.01 0.001 0.0001 1e-05 1e-06 0.1 1 4 16 64 256 1024 4096 k Figure 2.19: The P (k) of a base 4 canonical system. The tail of the distribution is generated by the presence of the hubs, corresponding to the higher powers of the base. 2.4.4 The Tree Let us consider a number x and its network representation in its Ω network. If x is composed, it will be the output of a binary operation involving its parental nodes. The latters, in their turn can be either elementary or composed and so on. If we pick up that part of the Ω network consisting of all the Triples that are “upstream” of a certain number x we obtain its Tree (See Fig. 2.20), and the function so defined will be called Tree(x). Definition 1 (Tree) The Tree of a number x ∈ Ω is the set constituted by all the Triples that are upstream of x. The Tree(x) retains all the information about the decomposition of x in terms of its elementary constituents. At the level of representation, Ψ(x) is typically composed by a certain combination of the representations of the elementary symbols that are the leaves of Tree(x). Every elementary symbol leaves a trace in this representation; for example in the Italian system the representation of a composed number like 234 is given by the numeral phrase: Ψ(234) = duecentotrentaquattro in which the representations of the elementary symbols 2 (due), 100 (cento), 3 (tre), 10 (-enta), 4 (quattro) are recognizable. The main characteristic of 42 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS Figure 2.20: The Tree of a number (275) in a (randomly generated) Ω network. the Tree is its length. In the next Section we will explore this property. The length of the Tree will be interpreted as the analogue of the length of a computation. This computation starts from certain available numbers (that does not need to be computed), that are the elementary symbols, and is described by the sequence of operations in Tree(x). This length will be also called logical depth for analogy with a complexity measure introduced in the context of Algorithmic Information theory (or Kolmogorov Complexity). 2.4.5 The logical depth The logical depth is a complexity measure introduce by Charles Bennett [Ben88], that is rooted in the theory of Kolmogorov Complexity [LV97] [Par03], and, roughly speaking, measures the time required for computing a number from the shortest program that generates it. We will use the term logical depth in the context of our Ω networks in analogy with the Bennett’s logical depth, because the number of operations needed to “compute” a number x in a given numeral system is the length of its Tree. Consequently this length can be thought as the computational time 2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS 43 required from building an object from its minimal representation.10 We stress that this is only an analogy: the logical depth is a sophisticated concept rooted in Kolmogorov Complexity theory, and its definition requires mathematical rigour. Our main focus will be not in the logical depth of a single number x, but on its average over the whole network, that we will call πC (or sometiemes ALD, depending on the situations). The value of πC is a measure of the mean cognitive effort required in order to build the representations of numbers in a given system, and it depends mainly on two factors: • the topology of the wiring diagram; • the number of elementary symbols π. The tendency is that disordered Ω networks, corresponding to systems with an high number of intricated rules, like for example random systems shows an high value of πC if compared with ordered ones, corresponding to systems with a few, “intelligent” rules. On the other side a large number of elementary symbols (high π) tends to lower the πC , with the specification that the effectiveness of rising the number of elementary symbols on πC depends on the LD of the numbers that are promoted to the “elementariness” condition (See Fig. 2.32). Large values of π require a proportional request of memory resources, in order to remember the elementary symbols. In fact we find that the developed numeral systems in natural languages, evolved under the pressure of lowering the cognitive efforts required for their use, are highly ordered, and simultaneously make use of a low number of elementary symbols: that these two factors work in the direction o f lowering the requests made to our cognitive system. Definition 2 (Logical depth (narrow sense)) The logical depth of a number x is the number of arithmetical operations contained in its Tree. LD(x) =| {operations in Tree(x)} | . An number x is an elementary symbol if and only if its logical depth is zero: {x is elementary} ⇔ {LD(x) = 0.} In terms of LD we can express other quantities, for example the number of elementary symbols contained in T ree(x), counted with their multiplicity: | {elementary symbols in Tree(x)} |= LD(x) + 1. 10 This is true only in systems that makes an “optimized” use of the resources, corresponding to an optimized Ω network. 44 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS as we can see from Fig.2.20. The number of elementary symbols π can be easily rewritten in terms of the LD function: π= N −1 X δ (LD (x) , 0) x=0 In conclusion the LD function contains a lot of information regarding Ω networks, together with its probability distribution P (LD) (See Fig. 2.21). 0.5 Random Canonical Base 10 P(LD) 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 LD Figure 2.21: The probability distributions of LD in two completely different systems. In the random system (red) we observe a Gaussian behaviour of the LD around a quite high value (around 14). The size N of these networks is 104 . The random system was generated with a ε value (probability of creating an elementary symbol) ten times higher than the value suitable for a comparison (See eq. 2.5). The canonical base 10 network shows a very small modal value (6) of the LD distribution: this contribution is given by numbers like 4598 (for which LD = 6). Given a number x, its logical depth refers to the way this element is constructed through the absolutely elementary things. Sometimes it can be useful to measure the depth relatively to other numbers which are not necessarily elementary: 2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS 45 Possonian fit <LD> = 2.37046 0.25 P(LD) 0.2 0.15 0.1 0.05 0 0 2.37 10 5 15 LD Figure 2.22: The Primes systems shows a Poissonian distribution of the LD. The size of the network used is N = 5 ∗ 105 , notice that the average value of LD is very small, if compared to other systems. This means that the representation of a number in terms of its prime factor decomposition is highly compressed, but this is paid in terms of a large number of elementary symbols. The number of primes lower than N in fact grows asymptotically as logNN : this is the Prime Number Theorem, obtained independently by J.Hadamard and de la Vallé Poussin (1896). Definition 3 (Relative logical depth) The logical depth of a number x ∈ ω, relative to a set of numbers y1 , y2 , . . . , yr : RD (x | y1 , y2 , . . . , yr ) is the logical depth of x computed as if all the numbers y1 , y2 , . . . , yr were elementary. Obviously this definition differs from that of logical depth only when at least one of the yi is composed. Let us consider the case in which a single reference y is involved. If x is not in the set of numbers that are “downstream” of y we put LD(x | y) = ∞, 46 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS meaning that it is impossible to construct the symbol x starting from the symbol y, within the Ω network. This is the case, for example, when y belongs to the set of elements that are upstream of x. So we can state that the LD(x | y) is not a symmetric function in its arguments. Definition 4 (Logical independence) Two symbols (x, y) will be called logically independent if and only if LD(x | y) = LD(y | x) = ∞ (2.8) This is always the case when (x, y) are two elementary symbols, and the meaning is that it is impossible to construct one of them starting from the other. Metaphorically, the quantity LD(x | y) measures how far is a certain result x, considered as the final point of a computation, from a premise, when the latter is taken as the starting point for a manipulation. When this quantity is infinite it means that, at least in this system, x is not obtainable from y, or, that is the same, that is obtainable in an infinite number of operations (an consequently infinite time). We now discuss an important concept, that is the average logical depth πC (sometimes also referred as ALD). We define it as the logical depth averaged over all the numbers of an Ω network Definition 5 (Average logical depth) The average logical depht of ω is the quantity 1 X πC (ω) = LD (x) , (2.9) N x where N is the size of ω. The value of πC is the average number of operations that are necessary in order to build the representations of the different numbers in a given numeral system. The ω for which πC is absolutely minimal is clearly the one with all isolated nodes, corresponding to the Holistic system: for all its numbers LD = 0. The price payed in order to reach optimality from the side of πC , is an high number of elementary symbols, or in other words an high value of π. These two functions typically push in opposite directions. We studied the πC in all the models we introduced. In canonical systems, for example, it has a logarithmic behaviour, as we can see in Fig. 2.23. 2.4.6 Relation between LD and kout Most natural complex networks show an high degree of heterogeneity in the degree of their nodes [RB02][New03][BLM+ 06]. Very often there are a few 2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS 47 Figure 2.23: The average logical depth πC (ALD) as a function of the size N for several models. The “Base 10 S3” is the canonical decimal system with superbase 103 . This is like the Italian system, in which we do not have an elementary symbol for all the powers of the base (104 is dieci-mila for example) as in purely canonical systems, but, beyond the number 103 it behaves like a base 103 system. We notice that the random numeral systems is highly unefficient, while the Prime system is the one who realize the best performance, from the point of view of πC . nodes with an extremely high connectivity (hubs), which play a fundamental role in the network’s functioning, while the great majority of nodes is linked to a few others. This heterogeneity establishes a natural hierarchy between the nodes, but there are situations in which other natural hierarchies are possible. This is true especially in systems of symbols, like numeral systems, where some of these are elementary and others are composed, with a high or low degree of complexity, corresponding to the number of operations required to construct them, described by the LD function. In these cases it is interesting to check if nodes that are highly connected are also the ones with an high degree of elementariness (low LD). We find that the tendency, in natural systems like the canonical (Fig. 2.25) (but also in the Prime system that can 48 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS be consider natural for mathematical reasons (Fig. 2.26) ), can be expressed in a somewhat unprecise but evocative fashion in this way : “The more a symbol is elementary (low LD), the more is connected (high kout )”. Figure 2.24: The LD(x) and kout (x) in the canonical (base 5) system (kin (x) is also drawn). We observe the tendency of numbers with an high kout (the peaks in the black graph), to have a low value (0 or 1) of LD (the minima in the green graph). A direct consequence is that the symbols that are from a cognitive point of view more directly accessible are also the ones more frequently used. This property seems a very natural trait of real symbolic systems and it is experimentally verified (See for example the experiment on the frequency of numbers in the WWW described in [DMO05]). 2.5 Other functionals defined on Ω networks In the preceding sections we introduced two functionals on Ω networks: π and πC . The first is the number of elementary symbols, which is intended to be a measure of the cognitive resources required to memorize symbols, and the second represents the average computational length, and is a measure of the cognitive cost of constructing numbers starting from their elementary components. These are only two aspects of the complexity of Ω networks, other aspects can be captured introducing other functionals, like for example the entropy of the degree distribution P (k) [CRTVB07]. The entropy of P (k) is a measure of the heterogeneity of the degree function, and it is expected to 2.5. OTHER FUNCTIONALS DEFINED ON Ω NETWORKS 49 Figure 2.25: In the Ω network of a canonical base 4 system, we calculated the number of nodes with fixed values of LD and kout . The graph shows the tendency of numbers with a low LD to be highly connected, and viceversa. Figure 2.26: In the Prime system there is a strong correlation between LD and kout . Notice that the relevant values of LD are particularly low. be high in systems with a disordered network topology, reflecting the presence of a large number of rules. 50 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS Figure 2.27: The residual correlation between LD and kout in a random system is due to finite size effects. 2.5.1 Entropy of the Degree distribution In directed networks the total degree is the sum of the inner and outer degree k = kin + kout . As we already know, in Ω networks the inner degree can only be 0 or 1, so we concentrate on the external degree only and on its probability distribution P (k): this is the probability that a certain node for a number has exactly k outgoing links (when we pick number nodes at random with uniform measure). The entropy of the degree distribution P (k) is a measure of the heterogeneity of the degree function. The entropy of P (k) as a function of the size was studied extensively in canonical systems with several bases, ranging from 2 to 100. We report in Fig.2.28 the results for the first small bases. 2.6 Comparison of different models Now we briefly compare some of the models introduced in this Chapter. We already discussed the behavior of the functional π in the different models in 2.4.1. 51 2.6. COMPARISON OF DIFFERENT MODELS Entropy of P(h,k) (rescaled) entropy/(number of elementary symbols) 0.4 Base2 Base3 Base4 Base5 Base6 0.3 0.2 0.1 0 0 1 2 3 log(N)/log(Base) 4 5 Figure 2.28: The entropy of P (k) of canonical systems with bases from 2 to 6. A suitable rescaling of the dependent variable is introduced in order to compare the different datasets. We do not get a clear indication on the complexity of a numeral system from these kind of statistical measures on the network topology. 2.6.1 Holistic and Unary Let us call ωH and ω1 the networks associated respectively to the Holistic and the Unary system. We already know that: π(ωH ) = N, π(ω1) = 1; and it is easy to see that πC (ωH ) = 0, πC (ω1 ) = N(N − 1) 2 from definition 5. Although in a trivial case, we see from these formulas that there is a sort of compensation that is typical of all symbolic systems: if we have many elementary symbols at our disposal the expressions are short and viceversa. 2.6.2 Canonical and Positional systems Let us call ωc and ωp the networks of respectively a canonical system and a positional system with the same base B and same size N. As we already observed in 2.4.1, π(ωc ) = π(ωp ) = B + [logB N], 52 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS where the square brackets stand for the integer part. In both these systems the behavior of πC is logarithmic (in an approximate way for the canonical, see Fig. 2.29). In particular, for positional systems we find an exact expression: πC (ωp ) = 2[logB N] − 1. The reason why the canonical system exibits a more complex behavior is due to the fact that 0 and the 1 are treated in different ways with respect to all other numbers of the base, while in positional systems these number are all treated on the same footing. This introduces some redundancies (as in the case of 10 × 1 = 10) that are absent in canonical systems and account for the inequality πC (ωc ) < πC (ωp ). 5 4 Canonical base 10 Positional base 10 ( m = 2 ) Logarithmic fit ( m = 1.7) ALD 3 2 1 0 10 100 1000 10000 N Figure 2.29: The canonical system has a lower πC than the positional systems of the corresponding base. 2.6.3 Primes Let us call ωP the network associated to the Prime system. We already know that, for the “prime number theorem” of Hadamard π(ωP ) ∼ N , log N where log is the natural logarithm and N is the size of the network. From equation 2.3, and remembering that the relation between the function Ω(n) and the logical depth is Ω(n) = LD(n) + 1, 2.7. THE SPACE Ω 53 we find that11 πC ∼ log log N. We want now to find this result with an heuristic argument, based on the experimental observation of P (LD). The experimental distribution of the LD is well fitted by a Poissonian (See Fig. 2.22). The mean (and variance) of LD, πC = hLDi depends on the size N, and as long as the Possonian approximation is good we can say that: P (LD = 0) = e−πC On the other side P (LD = 0) is equal to the probability that, picking at random a number x < N this is prime so that asymptotically, for the Hadamard theorem we have 1 P (LD = 0) ∼ log N and, confronting the two expressions we find πC ∼ log log N. The average length of the representations in the Prime system is significanly lower than what we found in more natural systems, like canonicals or the positionals, but this have a price in terms of an higher and diverging π (unfeasible, following the Turing’s argument on the finiteness of the number of elementary symbols of any realistic symbolic system). 2.6.4 Random In random Ω networks extracted by some ensemble E (ε, p) (See Sec. 2.3.7) the expected value of π is (See eq. 2.4): hπ(N)i = 2 + ε (N − 2) . The behaviour of πC as a function of N is not so easy to understand: it depends on the values of (ε, p). A typical behavior is shown in Fig. 2.23, but a study of this dependency requires further investigations. The P (LD) in a typical situation is reported in Fig. 2.21, and shows a Gaussian behavior. 2.7 The space Ω The space of (simple) Ω networks of a given size N is a very large one, and grows with N at a very fast rate. 11 We do not take into account the constant B2 because we are only interested in the N dependency. 54 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS Figure 2.30: An artistic view of the space Ω. Notice that we draw the representative point of Holistic, Unary and Prime systems on the boundary. 2.7.1 The size of the space of simple networks Counting the number of microstates of a macroscopic physical system is one of the fundamental concerns of Statistical Physics since its foundation by L.Boltzmann, and it has been recently recognized that a similar approach is relevant in the context of network ensembles [Bia08] [AB09] [GBPV08]. We want to estimate here the number of simple Ω networks of a given size, that is the cardinality of the space Ω. This is a “zero-degree” ensemble, because no other quantity with a “macroscopical” meaning like π or πC is restricted to some fixed value. Let us consider the space of all simple Ω networks of size N. This space, that we will denote with ΩN in the following, contains a huge number of points (ω) as N grows. Let us define the volume of ΩN as its cardinality: V (N) =| ΩN | . We know that Ω networks are organized into Triples T and, since every Triple is in principle independent from each other, the V (N) is given by V (N) = N Y {1 + ζ(x)} x=2 where the product starts from 2 because 1 is always an elementary symbol, and the factor {1 + ζ(x)} needs an expanation. The contribution 1 accounts for the possibility that x is an elementary symbol; the function ζ(x) is the number of all possible ways in which x can be represented in terms of an addition or a multiplication of two smaller numbers.12 Separating the two 12 Addition by zero and multiplication by one are excluded. 2.7. THE SPACE Ω 55 contributions: ζ(x) = x + ν(x), 2 where ν(x) is the number of possible ways of splitting x into two (nonnecessarily primes) factors.13 So we have: N Y x 1+ + ν(x) , V (N) = 2 x=2 and taking the logarithm we obtain the “entropy” of the network ensemble ΩN : N X x log 1 + S(N) = log V (N) = + ν(x) . 2 x=2 We show in Fig. 2.31 how huge can be V (N), already for small sizes like N ≤ 20. Figure 2.31: The Volume V (N) function as a function of the size N, compared with the same function (the number of microstates) for a chain of N spin. 13 This function is really hard to compute, and we calculated it explicitly for the first few numbers. 56 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS Due to the astronomical growth of V (N) it is particularly difficult to control the dynamics in the space of Ω networks. We could ask for example: starting from a disordered network, and rewiring it at random accepting the move if it lowers a certain functional, can we reach ordered networks ? We define in next section the rewiring dynamics. 2.7.2 Dynamics in the Ω space The possible ways to rewire a single Triple T (x) in an Ω network are three • raise a x to the elementary symbol status; • lower the elementary symbol x to a non-elementary status building a new Triple T ′ (x); • rewire the input links and/or the operation of T (x) obtaining a new Triple T ′ (x). All these operations are represented in Fig. 2.32. 2.8 Conclusions In this Chapter we begun studying numeral systems from the point of view of complex systems. This has been done first introducing a suitable network representation, that takes into account how the numerical concepts are organized into a system, making explicit the way composed numbers are built on elementary ones and by which arithmetical operations. Then, after illustrating how to construct such networks from linguistic data or from abstract auxiliary models, we started describing the tools to analyze these networks quantitatively. These tools have been developed in the theory of graphs and of complex networks, and others, specific to Ω networks have been introduced by us. We studied the statistical and topological properties relevant in this contex, and the observed quantities gave many valuable information into the organization of these systems. We realized that Ω networks exhibit a very rich structure and are a solid framework for the quantitative study of simple symbolic systems. All the observables we considered in this Chapter, expecially π and πC described isolated aspects of the complexity of these systems, but no one of these captured at once all these aspects. The statistical and topological global properties usually studied in the context of complex networks, like the degree distribution P (k) gives important informations on the structure of these systems, but it is hard from this to have precise insights 2.8. CONCLUSIONS 57 Figure 2.32: There are three possible ways to rewire a certain Triple Ω network. The first (a) is promoting a composed symbol (green) to elementarity, eliminating the incoming link, the operational node and the other links of its original Triple. The second (b) is to downgrade a certain elementary symbol to a composed one, building up a new Triple (we show only the new incoming link). The third (c) is to substitute the original Triple with another one. This operations can be completely random or can be driven following the minimization of some suitable functional φ defined on Ω. on the rules that are behind their construction. This observation was the motivation for a new approach to the complexity of numeral systems, that we will describe in the remaining of this Theses. In Chapter 4 we will propose to define the complexity of a numeral system through a procedure, that we will call reduction, that exploits systematically the “redundancies” present in simple Ω networks, in order to reach an equivalent, but shorter, network representation. In order to define this “reduced” network we have to enlarge the space of simple Ω networks: this will be done in the next Chapter. 58 CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS 20 number of elementary symbols ALD 15 10 5 0 0 200 400 M Figure 2.33: During a rewiring simulation (M is the number of iterations), following the π gradient (blue line) an elementary symbol (LD = 0) which is an hub of the network is rewired into a composed one. Its LD raises from zero, causing a big leap in the πC value (green line). Chapter 3 Development of the formalism Until now we have studied simple Ω networks, organized into Triples with the characteristic structure drawn in Fig.2.1. In these networks a circular node represented an individual symbol for a certain natural number. Now we want to generalize this network structure, allowing for circular nodes to contain sets of numbers, organized into a hierarchical structure of Categories. The space obtained in this way will be called Ω (without the specification simple, is the true Ω space) and contains all simple networks, and much more. The elements of this new space are always organized into Triples (generalized ones), and the operations inventory is the same as for simple networks, but their meaning is different. As we will see, the meaning of such an operation is that the arithmetic operation involves all the elements contained into the first Category with all the elements contained into the second. The generalized Ω networks are always obtained from simple ones (representing concrete numeral systems), through a recursive transformation called reduction, that will be the object of of the next Chapter. We develop here the necessary formalism in order to describe these generalized networks. 3.1 3.1.1 Generalization of Ω networks Categories Until now a circular node contained a single number. In view of a future development of the theory (see Chapter 4), we allow a circular node to contain sets of numbers, organized into a hierarchical structure. We will call a node of this kind a Category for reasons that will be clear later, but the concept is that Categories can contain numbers (and other smaller Categories) that have the same role in the system, or are constructed following similar computational 59 60 CHAPTER 3. DEVELOPMENT OF THE FORMALISM paths. The most general Category (See 3.1) contains sets of numbers and of other smaller Categories. Definition 6 Suppose that a certain Category C contains the numbers x1 , x2 , . . . , xr and other Categories C1 , C2 , . . . , Cs . The description of a Category C is the list of its inner element at the first nested level C = {x1 , x2 , . . . , xr , C1 , C2 , . . . , Cs } and its length is l(C) =| C | The inner Categories C1 , C2 , . . . , Cs can have a rich inner structure, but they contribute to the descriptional length l(C) as much as a single number. A circular node, containing only one number x will be sometimes called self − Category and labelled with Cx .1 . When there is no need to specify if we are referring to a number or a Category we refer generically to an element e of the network. Figure 3.1: A Category C0 and its inner structure. In the description of C0 the inner structure of C1 is not considered: it stops at the first nested level. It is useful to consider an operation on a Category that destroys its inner hierarchical structure, reporting all numbers contained in every inner Category at each level, on the “surface”. This operation is called lysis 2 1 2 It is somewhat pedantic but we will find it useful in the future. In analogy with the bacterial lysis in biology. 3.1. GENERALIZATION OF Ω NETWORKS 61 Definition 7 (Lysis of a Category) The lysis of a Category C is obtained popping its inner Categories at every nested level. The result will be called C lysed . For example if we have C = {x1 , C1 }, C1 = {x2 , x3 , C2 } and C2 = {x4 , x5 } we have C2lysed = C2 ; C1lysed = {x2 , x3 , x4 , x5 } C lysed = {x1 , x2 , x3 , x4 , x5 } Definition 8 (Equal and Identical Categories) Two Categories (C1 , C2 ) are equal if their lysed counterparts contain the same symbols. They are identical if their Category structure at all nested levels is the same. Obviously two identical Categories are also equal, but in general the converse is false. For example C and C lysed in the preceding example are equals but not identical. We will use this distinction during the description of the reduction transformation R in Chapter 4. A Category C that, in an Ω network, is not contained in any other Category is called external. When a certain Category C appears only as input in a certain number of Triples (or is isolated) we will call it an input Category. They are the generalization to Categories of the elementary symbols (2.2.1). The input Categories are the only ones really relevant in the description of an Ω network.3 All the others Categories are called output Categories. 3.1.2 Generalized Triples The Triples in which circular nodes contain only one number will be called simple; the Ω networks composed of simple Triples are called simple too. Simple networks have a great importance; when we build and Ω network from a numeral system, (See Sec.2.3) we build a simple network: they are the starting point of all our analysis. In general Ω networks are composed of Triples that are not simple, in which the circular input and output nodes can be Categories. When we consider a simple Triple it is clear what is the output of a certain operation, given its input: it is an arithmetical fact. But when we consider general Triples (See Fig.3.2) the situation is ambiguous: “What is the result of the addition or a multiplication between two Categories ? ”. 3 This is evident for simple Triples like the one in Fig.2.1, but the general case requires clarifications. This is an important point and it will be stated precisely in Sec.4.2.4. 62 CHAPTER 3. DEVELOPMENT OF THE FORMALISM Figure 3.2: A generalized Triple has the same form of a simple Triple. The difference is that the arithmetical operation involves Categories, that can have an arbitrary complex inner structure (The meaning of the operation in this case will be explained in the next Chapter, see the text). If the input Categories C and D contain only a set of numbers C = {x1 , x2 . . . , xr } D = {y1 , y2 . . . , ys } than it is still possible to give a meaning to the operation C ⋆ D = E, posing E = {x1 ⋆ y1 , x2 ⋆ y1 , . . . , xr ⋆ ys } . But, when Categories with a more complex inner structure are involved, it is not clear how we can define the ⋆ operation. This is as we will see in Chapter 4 an apparent problem, because all the non-simple Ω networks we will consider are a result of certain tranformations R applied on an initial simple ω, and during this transformation process the meaning of the operations on input Categories, and the resulting output Category is always well defined. 3.1.3 Description of a generic Ω network A general Ω network can be constitued of both non-simple and simple Triples. We can see an example in Fig. 3.3. In this Section we define its description, establishing some terminology useful for future developments. Let us call the size N of a network ω ∈ Ω the number of its elements (among single numbers and Categories): | ω |= N. The description of ω has two contributions: one from the wiring diagram, 3.1. GENERALIZATION OF Ω NETWORKS 63 Figure 3.3: A generic Ω network. There are four Triples, and seven input Categories (Ca , Cb , Cc , Cf ) are self-Categories, the others (A, H, I) have an inner structure: A = {g, f }, H = {n, o, C}, I = {D, E}. The description of this network is given by it wiring diagram, i.e. the list of its Triples, and the description of its input elements. describing the input, the output and the operation of each Triple, and one from the description of the Categories (as it was defined in (6)). The description of the wiring diagram is the list of the Triples of each element e of the Ω network. The Triple of the generic element e is T (e) : {e = f ⋆e g} , where e, f are the inputs and ⋆e the operation. We take as descriptional length of the wiring diagram the length of the list of Triples, that coincides with the number of operational nodes. This number is called π⋆ and it is easy to see that, if we denote with πI the number of input elements π⋆ = N − πI 64 CHAPTER 3. DEVELOPMENT OF THE FORMALISM Suppose that in ω there are the input elements {e1 , e2 , . . . , er } Definition 9 The length of the input elements of an ω network is π= X l(ei ) (3.1) ei In simple networks the length of the input elements coincides with the number of elementary symbols. Definition 10 [Description of ω and its length] The description of ω ∈ Ω is given by the list of its Triples and the description of its input elements. The length of this description is L = π + π⋆ . (3.2) 3.1.4 Other concepts related to Ω networks Definition 11 (Similar Triples) Two Triples T1 , T2 will be called similar iff they share at least an input element e and their operational node contains the same operation. Once we fix an element e and an operation ⋆ we individuate a set of similar Triples. Definition 12 (Offspring) The Offspring of an element e (O(e)) is the set of all elements, not including e, that are reachable from e following some link, through a finite number of operations. Definition 13 (Ancestors) The Ancestors of an element e (A(e)) is the set of all elements, not including e, that are reachable from e following some link, in a finite or eventually infinite number of operations. We introduce also a generalized version of both offspring and ancestors of a given element e (Og , Ag ), both including the element e itself. The following definition regards only simple networks. It is a fundamental ingredient of the notion of “logical (or symbolic) distance” that will be introduced in (3.2). Definition 14 (Common ancestors) The common ancestor z of two numbers x, y is the biggest number in the intersection of the generalized ancestors of the two elements z = max {Ag (x) ∩ Ag (y)} 3.2. DISTANCE BETWEEN SYMBOLS 65 Figure 3.4: The common ancestor of the elements x and y. It is an essential ingredient for the definition of a distance between symbols. Definition 15 (Relative distance) The relative distance between two elements (e, f ), denoted as RD (e | f ) is the minimum number of operations that are needed in order reach e starting from f . This is an oriented, asymmetric distance. It will be used only in its symmetrized version, when we will define the “logical distance ” in (3.2). 3.2 Distance between symbols Distance between Categories, in particular we look at the distance between numbers. We consider only simple networks in the examples. Definition 16 (Logical distance) We define as logical (or symbolic) distance between two elements e and f , having g as their common ancestor, the following function d (e, f ) = RD (e | g) + RD (f | g) . LD (e) + LD (f ) It is obvious that when a common ancestor exists the d satisfies 0 ≤ d(x, y) ≤ 1. (3.3) 66 CHAPTER 3. DEVELOPMENT OF THE FORMALISM When (e, f ) have not a common ancestor, their distance is undefined, and we will pose d (e, f ) = 2, this happens in particular when (e, f ) are distinct elementary symbols, or, more in general, for only-input elements. Figure 3.5: We plot the Histograms of the P (d) (the distribution of the distances) in the Canonical Base 3 system (blue) and, for comparison, in a random system (red). We note that the distances in a random system are systematically lower than in the Base 3 system. This is interpreted as a worse discrimination power for the random system. < dC >= 0.507, < dR >= 0.247. Notice that for the Holistic system, that has the higher discrimination power as a symbolic system this probability is P (d) = δ (d − 2). 67 3.2. DISTANCE BETWEEN SYMBOLS 1 Cumulative P(distance) 0.8 Positional (Base3) Random 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 distance Figure 3.6: The cumulative function, referred to the preceding Figure. We note that P (d < 0.2) > 0.8 for random systems, while in Base 3 we have P (d > 0.2) > 0.9. 68 CHAPTER 3. DEVELOPMENT OF THE FORMALISM Figure 3.7: Logical distances in a positional (base 4) system are compared with distances in a random system (Size for both systems N = 256). We plotted the distances d(x, y), where the number x is also the abscissa, and y is a number at a numerical distance δ from x: | x − y |= δ. We plotted this function as a function of x for several values (1, 2, 3, 4, 5) of δ for both a Canonical base 4 and a random system. The peaks at d = 2 are in correspondence of the maximal possible logical distance between (x, y) that is realized when they have no common ancestor. The logical distances in a random system are systematically smaller. Chapter 4 Reduction of redundancies In mathematics a transformation could be any function from a set X to itself. However, often the set X has some additional structure (geometric, algebric) and the term “transformation” refers to a function from X to itself which preserves this structure. Examples include linear transformations and affine transformations such as rotations, reflections and translations. More generally a transformation in mathematics is one facet of the mathematical concept of function; the term mapping is also used in ways that are quite close synonims. In this sense the term transformation only flags that a function’s more geometrical aspects are being considered, and a special attention is paid to invariants. In this Chapter I will introduce a transformation R on the space Ω that will be called reduction. R preserves the Ω-structure of the elements ω on which is applied R:Ω→Ω and is reversible, i.e. R−1 is always defined. The latter transformation will be called separation and will be called S.1 The aim of R is to exploit the regularities contained in ω, conceived as redundancies in its wiring diagram, in order to construct a more compact but equivalent representation of it. The equivalency is defined in (19) and is guaranteed by the reversibility. The fact that R preserves the Ω-structure allow the composition of an arbitrary number of R transformations; the points ω-s visited in this way form an orbit in Ω. The idea behind R was inspired by a well know data-compression algorithm, non-sequential recursive pair substitution (NSRPS), introduced for 1 Altough we are interested mainly in numeral systems, R can be applied to any symbolic system that can be represented through an Ω network (See Sec. 2.1). This will be clear from its description in this Chapter. 69 70 CHAPTER 4. REDUCTION OF REDUNDANCIES sequences of symbols by Jiménez-Montaño, Ebeling and others [WM80], and subsequently refined by Peter Grassberger [Gra02]. The NSRPS algorithm is simple to describe: it searches through the initial sequence the most frequent couple of consecutive symbols, and substitute this couple with a new symbol, created for the purpose. Than repeat this transformation iteratively, until the length of the sequence (plus the length of a suitable description of the substitutions) reaches a minimum, so that no further substitutions can improve the compression. We adapted the basic idea of NSRPS to Ω networks. But in doing that we faced the subtleties of dealing with a network structure, and it has been necessary to define new concepts in order to describe our manipulations. The definition itself of Ω network, of its constituents (Categories) and the other concepts that we will introduce in this Chapter, are in large portion the result of this effort. The transformation R depends on two parameters: an element e of ω and an operation ⋆ to which e is connected. We can iterate R until we reach a point in which there are no more redundancies to exploit. The main difficulty in this context is contained in the following question: “How can we choose the sequence of reductions (the sequence of (e, ⋆) ) in order to exploit in an optimal way the redundancies contained in ω ? ”. The answer to this question is not easy, but we proposed a greedy algorithm (as it is the NSRPS) that functions well in simple cases. The result of a complete reduction (that is the final point of an orbit) will be called “reduced” network and will be marked with the superscript R. The descriptional length L of an Ω network (See Def. 10 in Chapter 3) now plays a crucial role. The value of this functional on a reduced network ω R defined as the descriptional complexity of the original network ω. In this way we can formulate the problem of the search of the reduced network ω R as the minimization of the functional L. This Chapter is organized as follows. In the first Section (4.1) we will explain the functioning of the NSRPS algorithm. Than in Sec. 4.2 we will describe the R transformation in great detail and the greedy reduction algorithm. In Sec. 4.3 we will depict the general aspect of the reduced networks ω R and introduce for them a suitable description. We describe its elements and their meaning in our framework. In the final Section (4.4) we will introduce the complexity functional(s) for Ω networks and discuss the reasons for this choice. Until now we did not really define what a ω R is, we simply stated that is the final point of a sequence of R transformations. The complexity functional permits us to define clearly the concept of reduced networks as the solutions of a minimum problem. 71 4.1. THE NSRPS ALGORITHM 4.1 The NSRPS Algorithm The NSRPS algorithm has been studied and preciselydefined by P.Grassberger as a tool for data compression and entropy estimation [Gra02]). He deduced some important properties of the method and used it to estimate the entropy of the written English. The results in [Gra02] and the conjectures made therein have been invesigated in a subsequent paper [BCG06] in a rigorous setting. Our aim here is to describe its basic idea and functioning, with the explicit intention of introducing the common features with our R tranformation. Let us call the original sequence σ σ = s0 s1 . . . built from the symbols of a finite alphabet {αi} (i ∈ {0, . . . , m − 1}) of size m. We count the numbers njk of non-overlapping consecutive pairs of symbols in σ where st = αj and st+1 = αk , and find their maximum nmax = max njk . (j,k)<m The corresponding index pair is (j0 , k0 ). Then we create a new symbol correspondig to the concatenation αm = (αj0 αk0 ) (4.1) and form the sequence σ 1 by replacing everywhere the pair αj0 αk0 by αm . For the special case j0 = k0 , any string of 2r + 1 symbols αj0 is replaced by r characters αm , followed by one αj0 . This is the elementary step of this transformation - let us call it S - that is repeated recursively: the sequence σ i+1 is obtained from σ i by replacing the most frequent pair αji αki by a new symbol αm+i . The procedure stops when the length, consisting of both a description of σ i+1 and a description of the pair (ji , ki) is definitely longer than the corresponding description of σ i , for the present and all subsequent i. We can see the sequence σ i as an orbit in the space of sequences S. All the points in an orbit are equivalent in that they represent exactly the same sequence, because the substitution S is invertible. 2 2 Remember that the description of the pairs is to be considered as a part of the description of the sequence. 72 CHAPTER 4. REDUCTION OF REDUNDANCIES Figure 4.1: The substitution of a frequent pair (a, b) with the new symbol c. At the end of this process the length of the final sequence and the substitution is an estimate of the entropy of the sequence. Unfortunately we could not touch the very interesting questions related to the transformation of the statistical properties of the sequences induced by the substitution. The results in [Gra02] and [BCG06] showed that the NSRPS is effective as a tool for data compression and entropy estimation. This conclusion is based on the fact that Markov sequences are a attractive fixed points for the S transformation. We think that this aspects can be fruitfully investigate also in Ω networks, and we hope to turn back on this subject in a future research. 4.2. THE REDUCTION TRANSFORMATION R 73 Figure 4.2: The orbit of NSRPS. 4.2 The reduction transformation R The transformation R depends, as we said, on two variables (e, ⋆), where e is an element of ω and ⋆ an operation. We will suppose for the moment that these are given, and describe the generic R (e, ⋆). The R (e, ⋆) takes all the similar Triples (the ones individuated by the couple (e, ⋆)), and is in itself articulated in elementary steps. It start considering two similar Triples (it is not important how they are selected) and tries to reduct them together preserving the Ω-structure. The result of an elementary reduction is the creation of a new Triple and new Categories ( See 2.2.2). That is the analogous of the creation of a new symbol in a sequence, from a couple of consecutive and frequent ones. We describe now the elementary reduction step. 4.2.1 Reduction of two simple Triples Let us suppose that the two selected Triples are T1 : {z1 = x ⋆ y1 } and T2 : {z2 = x ⋆ y2 }. Here the number x acts like a Pivot (See 4.2 for its definition), around which the reduction is going on. When we reduct two Triples together several rearrangements are performed in the network, reguarding both the wiring diagram, and the nature 74 CHAPTER 4. REDUCTION OF REDUNDANCIES and number of the nodes, in order to preserve the Ω-structure • the two Triples are destroyed. This means that all the links and the operational node are removed; • two new Categories X and Y are created.3 The first contains the input symbols of the destroyed Triples y1 and y2 , the second contains the output ones z1 and z2 ; • a new Triple is created, involving the pivot x and the new Categories, and a new operational node; Figure 4.3: The reduction of two simple Triples. Two new Categories are formed Y = {y1 , y2 } and Z = {z1 , z2 }. The two operational nodes have been identified and their total number is diminished by one. 4.2.2 Transformation R (e, ⋆) Now that the elementary reduction step is defined, we must describe how to perform all the reductions inside the set of similar Triples held by (e, ⋆). The order with which the elementary steps within R (e, ⋆) are executed is established by the following rule. If the similar Triples are named {T1 , T2 , . . . , Tr } 3 In reality they are created only if they are not yet present in the network. At this point this is certainly so. 4.2. THE REDUCTION TRANSFORMATION R 75 the first step is the reduction of {T1 , T2 }. These are then destroyed, the new T1 is the new Triple, and the remaining ones T3 , ·, Tr are renamed T2 , ·, Tr−1.4 We need now to make two points clear • As long as the steps within R (x, ⋆) are going on, the newly formed Categories and Triples are temporary. It is well possible that, say in the second step, the Categories created in the first step are be destroyed. If the second step involves the newly formed Triple and T3 = {z1 = x ⋆ y3 } formed Triple and the Categories Cin and Cout are destroyed; • After that the R (x, ⋆) is completed the situation is like the one in Fig (4.4). At this point the new Triple is definitively created. The newly formed Categories too are created, if they were not already created during another R transformation. In this case the effect of the creation is just the update of the Category’s input and output degrees. Figure 4.4: The similar Triples after R (x, ⋆). There is the possibility that not all the Triples can be reducted together (See Sec. 4.2.5), in this case they are left unreduced. 4 We are aware that other choices are equally possible. 76 CHAPTER 4. REDUCTION OF REDUNDANCIES 4.2.3 Reduction of two general Triples The reduction of two general Triples goes along the same lines described for simple Triples. Suppose for example that at a certain point during the reduction process we must reduce T1 : {Cγ1 = Cα ⋆ Cβ1 } and T2 : {Cγ2 = Cα ⋆ Cβ2 } where all the Catgories involved can have an arbitrary inner structure (See Fig. 3.1). The result is the same as for simple Triples and is depicted in Fig. 4.5. Figure 4.5: The reduction of general Triples goes along the same lines of the simple Triples. 4.2.4 Reversibility of R and separation During the reduction process the description of the ω network, defined in Sec. 4.2.1 evolves. New Categories appears, new Triples are formed, and old ones are destroyed. A part from the descriptional length, that guides us in finding an optimal network (See Sec. 4.4), the description enters in a fundamental property of the transformation R, that is its reversibility.5 It is clear from the definition of the elementary reduction step in 4.2.1 that if two Triples T1 and T2 are fused together in a new Triple T3 , we can always reconstruct the original Triples from the description of the Categories and the wiring diagram of T3 . This process will be called separation. 5 A similar consideration can be given for the NSRPS algorithm: the S transformation was reversible only if a description of the substitutions was maintained during the compression. 4.2. THE REDUCTION TRANSFORMATION R 77 Definition 17 (Separation) The separation S is the inverse transformation of the reduction S = R−1 defined for all (e, ⋆) The tranformation R is now defined, but there is still a problem that will force us to define a constraint on the possibility of reducing two similar Triples. We will illustrate it for the Unary system described in 2.3.3. In the Unary sistem we have only one number with a large kout that is 1, and this is also the unique elementary symbol. The obvious tranformation that exploits the redundancies in its network is R (1, +) (although other choices are possible). If we perform this reduction we obtain the network reported in Fig. 4.6. Figure 4.6: The Unary network after the (unconstrained) R (1, +). There is left only one operational node, but it is an apparent reduction. In order to build a number n we must pass through this operational node more than one time (n − 1). Since we want our reduced networks to represent the cognitive resources exploited to represent numbers, we will introduce a constraint on the reducibility in order to avoid this kind of situations. 78 CHAPTER 4. REDUCTION OF REDUNDANCIES Why then we are unsatisfied with this elegant result ? After all the Unary system is very simple to define, and its network ω1 is very redundant and deserve a description as compact as possible. In reality this simplicity does not reflects the fact that the Unary system involves lengthy (and, for humans, tedious) manipulations of the unique elementary symbol. In order to represent the number 4 for example, we must necessarily pass through the representations of all the lower numbers: looking at ω1 (Fig. 4.7) this fact is very clear. Instead, from its reduced version in Fig. 4.6, it could seem that with only one operation we can reach any number. Figure 4.7: The Unary system is irreducible, when we apply the constraint. We are unsatisfied with the network in Fig. 4.6 because we want that the operational nodes in the reduced network ω R represent the irreducible computations needed in order to build any one of the composed symbols from the elementary ones. This is essential for us in order that the reduced networks reflects a complexity of the numeral system, taking into accounts not only their definition, but also the effort in constructing higher symbols out of the lower ones. This can be expressed in an elegant way referring to a causality principle. In the next Section we formulate a constraint on the reduction step that forbids violations of the causality when tranforming ω networks by means of our R. 4.2.5 The causality constraint The constraint we put on the reduction of two similar Triples has the nature of a causality constraint. We cannot reduct two Triples if in the newly formed 4.2. THE REDUCTION TRANSFORMATION R 79 Triple some element appears both in the new input and output Categories. More in general, the reduction is forbidden if there is some element in ω belonging to the ancestors of the two input elements that is also in the offspring of one of the output elements. In the Unary system for example, this constraint clearly forbids any reduction. Let us consider two arbitrary Triples: T1 : {k + 1 = 1 + k} , T2 : {h + 1 = 1 + h} with k < h. Looking at Fig. 4.7 it is easy to see that a reduction of T1 , T2 has as a consequence the violation of causality. Another similar example is given in the ω network associated to the Fuyuge numeral system (See Sec. 2.3.1). In we consider these two Triples: T1 : {4 = 2 + 2}, T2 : {6 = 4 + 2}. Figure 4.8: In the Fuyuge system that is substantially Base two the 6 and the 4 are constructed in this way. These two Triples cannot be reducted together, due to the constraint of causality. we are faced exactly with the same problem as in Unary system, and the reduction is impossible. The conclusion is that the computations represented by the two operational nodes cannot be the same thing. The causality constraint involves more general situations than the ones we just showed. The general enunciate of the constraint is given in terms of 80 CHAPTER 4. REDUCTION OF REDUNDANCIES the generalized Ancestors (Ag ) and Offsprings (Og ) (See definitions 12 and 13) of the input elements that are not in common between the two Triples: Definition 18 (Reducibility) Two similar Triples ( i.e. involving the same operation and sharing one of the input elements ) T1 , T2 are said to be reducible if and only if, denoting with e1 , e2 the input elements that are not in common between them, the following conditions are both satisfied: Ag (e1 ) ∩ Og (e2 ) = ∅ (4.2) Ag (e2 ) ∩ Og (e1 ) = ∅ (4.3) When two Triples are reduced the Offspring of the newly formed output Category is the union of the Offsprings of the two output elements (z1 , z2 ) in the original Triples. Analogously for the Ancestors of the newly formed input Category: A(Cin ) = A(e2 ) ∪ A(e2 ) (4.4) O(Cout ) = O(z1 ) ∪ O(z2 ) (4.5) This is an highly non local constraint, and in certain conditions, expecially for very large systems, can be computationally hard to check when two Triples are reducible. We will not consider the computational complexity of the reduction algorithm here. 4.2.6 The reduction algorithm Until now we have described in full detail the reduction transformation R and its effect to a set of similar Triples “holded” by a couple (e, ⋆). The other Triples are left unaffected by R. But the redundancies are typically spread over different places in a network ω and, as we will see, at different hierarchical levels. In order to exploit all these redundancies a single transformation R is usually not sufficient: we must explore the landscape. But what is the best strategy to do that ? In other words what is the algorithm (if it exists), that gives us the right sequence of (ei , ⋆i ), or can explore stochastically the landscape in order to exploit optimally the redundancies ? Suppose for the moment that we have this algorithm. This will product as a final result an ω network that is no more reducible. I call these ω-s reduced networks, but still have not defined them. I will define the reduced networks in the last Section, as solutions of a minimum problem. In NSRPS the greedy algorithm was efficient, and there was a simple criterion to know when the algorithm must stop: when the descriptional length 4.2. THE REDUCTION TRANSFORMATION R 81 reaches its minimum. Here the situation is much more complicated, but we will adopt a similar point of view, defining a greedy algorithm that generates a sequence of (ei , ⋆i ), based only on the topological properties (specifically the degree function) of the ω i-s that are visited during this process. The reduction algorithm proceeds following these steps. Starting from a simple ω network: • firstly we find a set of elements (symbols or Categories) that are dominant in the network, in the sense that they have the maximal kout (there can be more than one) P = {e1 , e2 , . . . , er } that we will call Pivots. The Pivots are not necessarily elementary symbols or Categories, but very often elementary or low-LD elements fall into P . We will usually sort this set in a lexicographic order, but this is not that important. The important thing to realize is that when we have more than one Pivot it is like when we are at a crossing, and we have to choose which way will be walked. We will discuss these “bifurcations”6 later; • choose an element, say e1 from P , and an operation ⋆ among all operations in which e1 is involved. In concrete we can choose the operation ⋆ for which the kout i.e. the number of outgoing links that are connected with ⋆ operations is maximal • consider all the similar Triples individuated by (e1 , ⋆) and apply R (e1 , ⋆) • Return to the first step. This process stops when one of the following conditions are satisfied • ∀ element e: (kout (e) = 1) ∨ (kout (e) = 0) (this is realized only in very special systems); • ∀ element e with kout (e) > 1, there is no more reducible couple within the similar Triples associated to each operation ⋆. The greedy algorithm contains a stochastic part that is unavoidable, occurring when there are bifurcations due to the presence of more than one Pivot. 6 This is not completely appropriate as a term, because the Pivots can be more than two. 82 CHAPTER 4. REDUCTION OF REDUNDANCIES There are virtually infinite possibilities to modify this greedy algorithm with stochastic perturbations. One possibility is, for example, to extract the Pivots with a probability distribution that gives high probability to elements with an high kout and viceversa. 4.2.7 The orbits of R, and the (approximate) reduced network Starting from a certain ω the sequence of reductions gives a sequence of networks ω, R e1 , ⋆1 ω, R e2 , ⋆2 R e1 , ⋆1 ω, . . . where ei , ⋆i is the sequence of Pivots and operations selected as described in 4.2.6. This sequence forms an orbit in the space Ω ω, ω (1) , ω (2) , . . . , ω R that we will call Γ(ω). Since the reduction R, and in particular the elementary reduction step, is reversible all the elemements lying on the same orbit are in this sense equivalent: they all contain the same information. Definition 19 (Equivalent networks) Two Ω networks ω1 and ω2 and are equivalent if and only if they can be connected by a finite path of elementary reduction (or separation) steps. This is an equivalence relation, and realizes a partition of the space Ω into regions A1 , A2, . . . . These regions are invariant with respect to the dynamics defined by R, and consequently any orbit is confined to the Ai that contains the initial ω. The networks belonging to a certain region Ai are all equivalent to a certain simple network ωi, that can be taken as a representative element, typically the simple one. As we observed, there are possible bifurcations on the sequence; if we consider simultaneously all the possible orbits we obtain a tree, and the leaves of this tree will be called the (approximate) reduced network. We will write : ω∗R = R (ω) . (4.6) 4.2. THE REDUCTION TRANSFORMATION R 83 The situation is typical of the optimization problems of in a complex landscape. In the interesting situations we want to find the solution of some complex optimization problem, but we do not have an exact algorithm that gives this solution, due to the lack of insights on the problem, or intrisic unfeasibility. Then we are forced to use approximate algorithms, based on heuristics i.e. simple computational strategies that are phenomenologically effective [GJ79]. A more complex computational strategy could be realized performing stochastic separations and reductions, after an initially greedy phase. In the next Section we will describe the general aspect of the (approximate or not) reduced networks, and will introduce the appropriate language for describing them. But first we want to observe that the R transformation has at least two relevant fixed points. 4.2.8 Holistic and Unary system are two fixed point of R Let us consider the Holistic numeral system, to which the ωH network is associated. It is obvious that the transformation R in this case is the identity, being ωH a set of isolated (elementary) nodes: ωH → ωH . Figure 4.9: The holistic system is irreducible The same conclusion, but for a different reason, is reached in the case of the Unary system, as we saw in the preceding Section. Therefore ω1 → ω1 . 84 CHAPTER 4. REDUCTION OF REDUNDANCIES So the R transformation has at least two fixed points. This makes sense in the light of the discussion on the meaning that we want to give to reduced networks. Figure 4.10: The piece of an orbit of the Reduction algorithm. Starting from ω the first four R (ei , ⋆i ) are represented. The green line represents a step backward in the reduction, corresponding to a separation of a certain group of similar Triples. The separation is not allowed, in the reduction algorithm we have described, but it can be considered for different heuristics. Starting from ω (2) there are several possible directions available. Our reduction algorithm can choose randomly one of these. The Holistic and Unary systems are fixed points of R, and this, instead is independent of the heuristics. 4.3 The ω R networks and their relevant quantities Now we will suppose to have determined the real ωiR associated to ω. The complete description of an Ω network, as it was defined in Sec. 3.1.3, requires the descriptions of all input and output elements, besides the description of 4.4. THE COMPLEXITY FUNCTIONAL 85 the wiring diagram. We recall that the description of an element is a list of its inner elements, and its length is their numerosity (See Sec. 3.1.1). This is clearly overabundant, because in simple Ω networks the output of a Triple is completely determined by its inputs and the operation. We can always recover an arbitrary Ω network from its wiring diagram and its input elements. This network is equal but not identical, because the inner structure of the output Category is lost. 4.3.1 Irreducible operations: π⋆ Definition 20 (Irreducible operations) The number of operational nodes π⋆ of ω R is the number of irreducible operational nodes of the original network ω. The effect of a reduction on π⋆ is monotonic: π⋆ (ω) ≤ π⋆ (R (ω)) . If we have a lot of redundancies the π⋆ can be decreased very much, at a low expense of π growth, due to the eventual definition of new Categories. In ω R there can remain simple Triples, that we will call unreduced. Unreduced Triples arise, depending on the cases, when some computational steps are particularly “unreducible”, like for example in the case of the Unary system, or when some parts of the network ω did not fit in the regularity schemas emerged during the reduction process. We finally note that πC is invariant along the orbit: πC (ω) = πC (R (ω)) 4.4 The complexity functional Definition 21 (Descriptional Complexity) The descriptional complexity of an ω network is given by the descriptional length of its reduced network(s): K = π + π⋆ (4.7) This is the analogous of the descriptional length of a sequence S maximally compressed by means of a NSRPS algorithm. In particular the π⋆ contribution is analogous to the length of the compressed sequence | S R | (it coincides with number of Triples); the π instead is the analogous of a coincise description of the substitutions. This is substantially the amount of information that we need in order to recover without any loss the initial ω. There 86 CHAPTER 4. REDUCTION OF REDUNDANCIES can be prefactors, but it is the most natural choice. The descriptional complexity does not always capture the information contained in πC , the average logical depth. For example if the Triples are all connected, as in positionals, the system has a low number of input categories, but involves lenghty computations with respect to a system in which all Triples are disconnected. Definition 22 (Complexity) We will define complexity of a numeral system ω the following quantity: C = K + πC . (4.8) Now we can finally define precisely the concept of reduced network: Definition 23 (Reduced network) Given a simple ω ∈ Ω, we call its ren o R duced network(s) ωi the solutions of the following minimum problem min K Aω (4.9) where Aω it the set of all networks that are equivalent to ω (in the sense of definition 19). We will study in the next Chapter the complexities of some network models, and we finish this Chapter with an observation. If in a reduced network there are unreduced Triples it means that they are fallen out of the regularity schema recognizable by the reduction algorithm. In this sense they can be considered at the same level of noise, and we have a non-ambiguous criterion to distinguish regularities and irregularities in a numeral system, and this is very useful for defining complexity. A description of this unreduced Triples does not deserve the degree of precision used for the description of the regular part, so we could adopt a statistical description for the irregular part. This amount to a constant term, that stands for the description of the statistical ensemble from which this Triples are imagined to be extracted. In this way we obtain an interesting result: our complexity definition will be low for very regular systems and for random ones, it will be high only for intermediate systems, with intricate rules. Chapter 5 Complexity of numeral systems In this Chapter we apply the transformation R to several numeral systems, finding their ω∗R (the final points of the R orbits) and their complexities. This results are based on the application of the greedy algorithm described in Sec. (4.2.6) We will discuss in great detail the functioning of the greedy algorithm in Sec. (5.2) for the Italian and French natural language numeral systems, based on their network representations introduced in Sec. (2.3.2). A special attention will be given to the evolution under the transformation R of the degree function, from which the sequence of the Pivots depends. At the end of this section we will describe the ω∗R that we have found in the two cases, stressing their differences, and the interpretation of their input Categories. Then in the last section (5.3) we consider positional systems, which network representation was introduced in (5.2), for all the bases B ≥ 2 and all the sizes. In this case we find and explicit formula for the complexity as a function of the base B and of the size N (the latter is the highest number that has a symbolic representation). The complexity features an interesting behaviour: for each fixed N it has a minimum for small (but not too much) bases, with a very slow dependency on N. Since positional systems are an abstract model that approximates very well the developed natural language systems (this means that the way higher symbols are constructed with lower ones are similar), we think that this is a qualitative argument showing that the use of small (but not too much) bases in numeral systems all over the world is not only the result of historical accidents, but responds also to the objective need of simplicity of a cognitive system. [CVA03] 87 88 CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS 5.1 Holistic and Unary Let us pose | ω |= N; the complexity of Holistic system has only one positive contribution from π; this is the length of input Categories, that in this case coincides with the number of elementary symbols (See Sec. 2.3.3). This is equal to the sistem’s size C (ωH ) = N. For the Unary system π = 1, π⋆ = N − 1 (See Sec. (2.3.3)), and finally πC = N N −1 1 X . k= N k=1 2 So the complexity of Unary system is asymptotically 3 C (ω1 ) = N. 2 Prefactors in complexity formulas are not very important, the essential is the functional form. We observe that both the Holistic and Unary system’s complexity grow linearly with the system’s size. This is a rapid growth if compared with more familiar systems, like positionals, in which on has a logarithmic behaviour, as we will see in Sec. 5.3. 5.2 Complexity of the Italian and French system The aim of this section is to describe in detail the functioning of the greedy reduction algorithm. In order to do that it is better to consider small networks (N = 100). Italian The starting point is the simple ωItalian that we described in Sec. 2.3.2 (we will call it ω in the following). Its degree function is reported in Fig. 5.1. • The first Pivot is (as we could expect) the base 10 with kout = 17, and is the unique Pivot at this point. Than we choose an operation among {+, ×}; the greedy algorithm leaves the freedom to choose random, for example +. The similar Triples are nine: T1 : {10 = 10 + 1}, T2 : {11 = 10 + 2}, . . . , T9 : {19 = 10 + 9}. 5.2. COMPLEXITY OF THE ITALIAN AND FRENCH SYSTEM 89 20 k_out(x) 15 10 5 0 0 20 40 x 60 80 100 Figure 5.1: The degree function kout (x) of the simple ω associated to the Italian system. We note that the number 10 is the only one Pivot with kout = 17. The reduction of these Triples is denoted as R (10, +) (See Sec. 4.2.6). The effect of R (10, +) is the creation of two new Categories: C0 = + {0, 1, 2, . . . , 9} and C10 = {11, 12, . . . , 19} (See Fig. (5.4), quadrant 1 (1) ); this is the ω . • At this point the Pivots are {2, 3, . . . , 10, 20, 30, . . . , 90} (See Fig. 5.2), and the greedy algorithm enters in the first bifurcation (See Sec. 4.2.7). Suppose that, for example, 3 is choosen as the next Pivot, with the + operation. The transformation R (3, +) leads to the creation of two new Categories: Cd = {20, 30, . . . , 90}, and C1 = {23, 33, . . . , 93} (See Fig. 5.4, quadrant 2). The Category C1 could seem strange, but there is nothing wrong in principle with the creation of such Category. • The Pivots are now {2, 4, 5, . . . , 10} (See Fig. 5.3); we will suppose that the following reductions are R (2, +) , . . . , R (9, +) (See Fig. 5.4, quadrant 3 ). Every one of these reductions creates a new Category Ci and “recreates’ ’ Cd , that is not really created two or more times: a new outgoing link is added to the unique Cd every time. • The other reductions are described in the Figures 5.4, 5.5) . In the following figure we report the sequence of reductions and describe the formation of the new Triples and Categories. 90 CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS 12 k_out(x) 10 8 6 4 2 0 0 10 20 30 40 50 x 60 70 80 90 100 110 Figure 5.2: The degree function kout (x) of the network R (10, +) (ω) = ω 1. We note that there is a large number of maximal-kout (x) elements (Pivots). One of these is still 10 and its kout is diminished, because its similar Triples have been reduced together in one big Triple (See also 5.4). We have assigned + to the newly formed Categories C0 and C10 the index 100 and 101 respectively. + The Category C0 acquire a kout = 1 and C10 remains at kout = 0 because they are respectively input and output Categories of the newly formed Triple. The reduced network ω∗R that we obtain for the Italian system is represented in Fig. . From the reduced network of the Italian system (Fig. 5.6) we can see that π = 21 and π⋆ = 2. Numerically we find that πC = 3.5 and consequently: K = π + π⋆ = 23, C = K + πC = 26.5. French The discussion in the case of the French system proceeds along the same lines, and we will not reproduce it here. I just report the sequence of reductions. • Pivots = {60, 80}, R (60, +), Cv is created; • Pivots = {80}, R (80, +), Cv raises its degree; • Pivots = {10}, R (10, +), C0 is created; • Pivots = {20}, R (20, +), C0 raises its degree; 5.2. COMPLEXITY OF THE ITALIAN AND FRENCH SYSTEM 91 12 10 k_out(x) 8 6 4 2 0 0 10 20 30 40 50 x 60 70 80 90 100 110 Figure 5.3: The degree function kout (x) of the network R (3, +) (ω 1 ) = ω 2 . We note that not all the elements which were Pivots in ω 1 behaved in the same way. Taking into account that 3 diminished its kout (x) for the reduction of its similar Triples, there are two groups: A = 2, 3, 4, . . . , 9 and B = 20, 30, . . . , 90. The 10 forms a group apart (C) because all its remaining links are connected to a × operation. Figure 5.4: The sequence of reductions 1−9 for the Italian system; the newly formed Categories and Triples at each step are represented. 92 CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS Figure 5.5: The sequence of reductions 10 − 13 for the Italian system; the newly formed Categories and Triples at each step are represented. • Pivots = {30, 40, 50}, R (30, +),R (40, +),R (50, +), C0 raises its degree; • Pivots = {10}, R (10, ∗), C0′ is created; • Pivots = {C0 }, R (C0 , +), Cd′ is created; • Pivots = {Cv }, R (Cv , +), Cs is created; The reduced network ω∗R for the French system is reported in Fig. 5.8. From the reduced network of the French system (Fig. 5.8) we can see that π = 44 and π⋆ = 4. Numerically we find that πC = 4.5 and consequently: K = π + π⋆ = 48, 5.3 C = K + πC = 52.5. Complexity of the positional systems We consider the positional systems, in their written form. There is a particular role of the symbols 0, 1, which is different from the other natural language systems. In particular the 0 appears in several places in the ω network. This is an effect of the fact that the same numerical quantity can be represented in several possible ways in positional systems. For example 0 5.3. COMPLEXITY OF THE POSITIONAL SYSTEMS 93 Figure 5.6: The ω∗R for the Italian system. The input Categories are in complete agreement with the linguistic reality. C0 collects the numbers that have independent words, these are called “digits” because they are the analogous in natural language of the digits in the written systems. C0′ collects the digits that are multiplied by 10 (dieci). In Italian the number words for the elements in C0′ (v-enti, tr-enta, quar-anta) are constructed with the number words for the digits (due, tre, quattro, . . . ) and the suffixes -enti -enta -anta, marking the multiplication by 10. (We do not take into account the complex linguistic transformations that change the form of the digit words when the suffix is placed.) and 0000 are both zero, but their representation is different. In positional systems we must take into account this difference in our network, because, as we have remarked, we are concerned with the logical organization of the symbols representations. With this in mind we can realize that 0 is an elementary symbol, and instead 0000 is the result of the sequence of operations: (0 × B 3 ) + (0 × B 2 ) + (0 × B 1 ) + (0 × B 0 ). We can calculate explicitly the complexity for positional systems with any base B, for any size B k . We do not enter in the details on how the reduction algorithm operates, we will only say that the first pivot, as it is clear, is B k−1 . The reduction of the similar Triples leads to the formation of two new Categories, the first (input) contains all elements from 0 to B k−1 − 1, the output Category is formed by all elements of the form B k−1 + x, where x ∈ C0,k−1 . The reduction process goes iteratively along the usual lines and the final result is depicted in the following picture: 94 CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS 20 k_out(x) 15 10 5 0 0 4 10 20 30 40 50 60 70 80 90 100 x Figure 5.7: The degree function kout (x) for the French system. There are two Pivots {60, 80} with kout = 19. The kout of the numbers 4 (quatre) and 20 (vingt) is higher than that of their “ peers” (numbers that in the ω R are in the same Category, see Fig. 5.8 The ω∗R is constitued by a single connected component. In the Triple with the × operation is represented the formation of the numbers of the form x × B l , where x is a digit and l some positive power of the base. The l othern Triples describe the o formation of numbers of the form x×B + y, where l y ∈ 0, 1, 2, . . . , B − 1 . There are only two input Categories: C0 = {0, 1, 2, . . . , B − 1} containing the digits, and n C∗ = B, B 2 , B 3 , . . . , B k−1 o containing the powers of the base. Now we calculate the complexities K and C as functions of the base and of the size. Taking into account that k = logB N, the length of the input Categories is π = B + (k − 1) = B + logB N − 1, and the number of irreducible operation (or the number of Triples) of ω R is π⋆ = k = logB N. The descriptional complexity of the ω R for positional systems with a base B and size N = B k is K = B + 2 logB N − 1. (5.1) 5.3. COMPLEXITY OF THE POSITIONAL SYSTEMS 95 Figure 5.8: The ω R for the French system. The Triple for the construction of the number 80 quatre-vingts is unreduced. A possible interpretation is that this construction breaks the regularity of a numeral system otherwise would be substantially similar to the Italian system (See also Sec. 4.4 for our interpretation of unreducible Triples). The other Categories group togheter numbers that are constructed in a similar way (output Categories) or have similar roles (input Categories) in the system. For example Cv collects the numbers that are added to 60 and 80 so they have similar roles. 10000 k_out(x) 1000 100 10 1 10 100 x 1000 10000 Figure 5.9: The degree function kout (x) of the simple ωDecimal for the positional base 10 system. The size is N = 105 but we set the range of the abscissa at [0 − 104 ]. 96 CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS Figure 5.10: The ω∗R for base B positional systems of size N = B k . We did not reproduce the details of the output Categories’ inner structure, because it is described in the text. There are two input Categories: C0 that contains the digits 0, 1, . . . , B − 1 and C ∗ that contains the positive powers of the base B, B 2 , . . . , B k−1 . In ΩHurf ord networks these Categories correspond exactly to the DIGIT and M grammatical categories. The additional term (average logical depth, or πC ) is simply πC = 2 logB N, and the complexity is finally C = B + 4 logB N − 1. (5.2) For example in the case of the decimal positional system with N = 102 we find C = 17, to be confronted with, for example the value we obtained from the Italian system (See Sec. 5.2). Deriving the equation 5.2 with respect to B and making equal to zero we obtain the implicit equation that gives the optimal base as a function of the size 1 N = exp log (B)2 B 4 (5.3) 97 5.4. CONCLUSIONS 30 28 K C 26 24 complexity 22 20 18 16 14 12 10 8 1 2 3 4 5 7 6 8 9 10 11 base (B) Figure 5.11: The complexity functionals K and C in a positional network of size N = 102 , as a function of the base. 45 C K 40 complexity 35 30 25 20 15 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 base (B) Figure 5.12: The complexity functionals K and C in a positional network of size N = 104 , as a function of the base. 5.4 Conclusions It seems that our reduction algorithm, at least in these simple cases, is able to find meaningful categories. In the Italian system for example 1 and 10 make a category apart, as 4 and 20 in the French system. A category for the decades Cd is found in both Italian and in French systems. For positional systems we find explicit formulas for the complexities K and C that share the interesting feature of exhibiting a range of approximate minimal solutions around “small but not too much” bases. In both cases this solutions are dependent on the size in the expected way. In natural languages numeral system a range of bases from 2 to 100 is covered. But the predominance in 98 CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS 70 K C 65 60 complexity 55 50 45 40 35 30 25 20 2 4 6 8 10 12 14 16 18 20 22 24 26 28 base (B) Figure 5.13: The complexity functionals K and C in a positional network of size N = 1012 , as a function of the base. 1e+24 C K N 1e+18 1e+12 1e+06 1 10 5 15 B Figure 5.14: The optimal base as an implicit function of the size N with repect to the functionals K (blue) and C (red). As one expects the optimal base adapts to the range of numbers that are involved. evoluted systems of the base 10 is almost universal and we think that this has to do with the fact that, in this range of bases, an optimization of cognitive resources is realized. Chapter 6 Perspectives and conclusions In this Thesis we studied numeral systems, as representatives of symbolic systems. We decided to abstract from their physical manifestations (sound of the words and phrases, signs) in order to capture in the same model natural language and written numeral systems, and to concentrate on the logical organization of these symbols into a system, which is possible once we introduce a network model. In a first moment we focalized on measurements of the structure of these systems, learning a lot on their relevant features and on the quantities that can be useful to parametrize this space. The initial idea was to define the complexity of such systems looking at topological and statistical properties, expecially comparing different models of numeral systems, inspired by natural language and written systems with abstract models introduced as theoretical tools. Than we moved to a completely different approach. In order to find a suitable definition of complexity it was possible to get inspiration from a well known compression algorithm, introduced by P.Grassberger for sequences of symbols. The central idea of this algorithm is to search through the given sequence a dominant (most frequent) couple of nearby symbols, and substitute this couple with a new symbol. This is obviously a new sequence, but with a different alphabet: the old one plus the newly born symbol. The transformation is repeated iteratively, and stops only when the descriptional length of the sequence, defined as the length of the actual sequence plus the length of a concise description of the new symbols, reaches a stationary regime. The second term was actually discarded, because it was very small if compared with the first one. Compared with previously existing approaches, the contribution of this Thesis is twofold. First of all we established a new mathematical framework that has revealed very fruitful for the comprehension at a systemic level of the organization of numeral systems. This framework is quite flexible and 99 100 CHAPTER 6. PERSPECTIVES AND CONCLUSIONS we think that it can be adapted to other (simple) symbolic systems with a reinterpretation of the meaning of the operations and of the symbols. The second contribution consists in the introduction of a transformation (the reduction R) in a space of networks with non trivial topology, that maps the original network into an equivalent version, suitable to describe it in a coincise and in a certain sense meaningful way. Two main lines of research seems to be promising in the immediate future. • One, more mathematical, consist in studying the reduction R transformation more deeply. Expecially intriguing would be, from my point of view, to investigate the possible relationships of the functional K with information-theoretical measures such as Kolmogorov Complexity. Another interesting issue could be a more systematic investigation of the space Ω from the point of view of Statistical Mechanics, for example calculating the cardinality (volume) of the network ensemble E (ε, p) as a function of (ε, p); and on the surfaces defined in Ω by fixing some functional, like for example πC . We tried to move in this direction, expecially in the early stages of this work, relying mainly on simulations, but without great success. But recent developments in the Statistical Mechanics of networks shows that such entropy calculations are possible and shed considerable ligth on the information-theoretical aspects of networks ensembles [AB09]. • The other research line, for which we think that the tools developed in this work are useful, is the modelization of the evolution of numeral systems by means of interactions in a community of cognitive agents, and more in general in the evolution of language, broadly interpreted [LS07]. In many biological, technological and social systems, initially the units of which the system is composed interact among themselves and with the environment in a sensorial and non-symbolic way, their communication system not being predetermined nor fixed from a global entity [MN06] [NF04] [Ste06]. The communication system emerges spontaneously as a result of the interactions of the agents and it could change continuously due to the mutations occurring in the agents, in their objectives as well as in the environment. An important question concerns how conventions are established, how communication arises, what kind of communication systems are possible and what are the prerequisites for such an emergence to occur. In this perspective it is interesting to investigate the emergence of syntactic forms of agreement: compositionality, categories, syntactic or grammatical structures [CFL09]. Bibliography [AB09] Kartik Anand and Ginestra Bianconi. Entropy measures for networks: Toward an information theory of complex topologies. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 80(4), 2009. [BCC+ 08] M. Ballerini, N. Cabibbo, R. Candelier, A. Cavagna, E. Cisbani, I. Giardina, V. Lecomte, A. Orlandi, G. Parisi, A. Procaccini, M. Viale, and V. Zdravkovic. Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proceedings of the National Academy of Sciences, 105(4):1232–1237, January 2008. [BCG06] Dario Benedetto, Emanuele Caglioti, and Davide Gabrielli. Non sequential recursive pair substitution: Some rigorous results, 2006. [Ben88] Charles H. Bennett. Logical depth and physical complexity. In The Universal Turing Machine: A Half-Century Survey, pages 227–257, 1988. [Bia08] Ginestra Bianconi. Entropy of randomized network ensembles. Europhysics Letters, 81, 2008. [BLM+ 06] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D. Hwang. Complex networks: Structure and dynamics. Physics Reports, 424(4-5):175–308, February 2006. [CFL09] C. Castellano, S. Fortunato, and V. Loreto. Statistical physics of social dynamics. Review of Modern Physics, 81:591–646, 2009. [Cho88] N. Chomsky. Language and Problems of Knowledge. The Managua Lectures. Cambridge, Mass., 1988. [Cho02] Noam Chomsky. Syntactic Structures. Walter de Gruyter, 2nd edition, December 2002. 101 102 BIBLIOGRAPHY [CRTVB07] Luciano da F. Costa, Francisco A. Rodrigues, Gonzalo Travieso, and P.R. Villas Boas. Characterization of complex networks: A survey of measurements. Advances In Physics, 56:167, 2007. [CVA03] Nick Chater, Paul Vitányi, and Coventry Cv Al. Simplicity: A unifying principle in cognitive science? In Trends in Cognitive Sciences, pages 7–19, 2003. [DDLC98] S. Dehaene, G. Dehaene-Lambertz, and L. Cohen. Abstract representations of numbers in the animal and human brain. Trends Neurosci, 21(8):355–361, August 1998. [Dea97] Terrence Deacon. The Symbolic Species : The Co-evolution of Language and the Brain. W.W.Norton, New York, 1997. [Deh03] S. Dehaene. TRENDS in Cognitive Sciences, 7:145–147, April 2003. [DM03] S. N. Dorogovtsev and J. F. F. Mendes. Evolution of Networks: From Biological Nets to the Internet and Www (Physics). Oxford University Press, March 2003. [DMO05] S. Dorogovtsev, J. Mendes, and J. Oliveira. Frequency of occurrence of numbers in the world wide web, April 2005. [Fec60] G. T. Fechner. Elements of psychophysics. New York: Holt, Rinehart & Winston, 1860. [Fre79] G. Frege. Begriffsschrift. Halle, 1879. [GBPV08] A. C. C. Coolen G. Bianconi and C. J. Perez-Vicente. Entropies of complex networks with hierarchically constrained topologies. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 78, 2008. [GJ79] Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979. [Gra02] Peter Grassberger. Data compression and entropy estimates by non-sequential recursive pair substitution, 2002. [GW79] G.H.Hardy and E.M. Wright. An Introduction to the Theory of Numbers, 5th edn. Oxford University Press, 1979. BIBLIOGRAPHY 103 [Hak49] H. Haken. Information and Self-Organization: a macroscopic approach to complex systems. Springer-Verlag, Berlin 1988, 1949. [Hal91] Ken Hale. Miskito numerals, 1991. [Ham06] H. Hammarström. Complexity in numeral systems with an investigation into pidgins, pidgincreoles and creoles. Language Complexity: Typology, Contact, Change [Studies in Language Companion Series]. Amsterdam: John Benjamins., 2006. [Hur87] J. Hurford. Language and Number: the emergence of a cognitive system. Basil Blackwell, Oxford, 1987. [Hur99] J. Hurford. Artificially growing a numeral system. In Jadranka Gvozdanovic, editor, Numeral Types and Changes Worldwide, pages 7–41. 1999. [Kac59] Marc Kac. Statistical Independence in Probability, Analysis and Number Theory (Carus Mathematical Monographs, No. 12). Wiley, New York, 1959. [LS07] Vittorio Loreto and Luc Steels. Social dynamics: Emergence of language. Nature Physics, 3:758–760, November 2007. [LV97] Ming Li and Paul Vitanyi. An Introduction to Kolmogorov Complexity and Its Applications (Texts in Computer Science). Springer, February 1997. [MN06] D. Marocco and S. Nolfi. Origins of communication in evolving robots. In S. Nolfi, G. Baldassarre, R. Calabretta, J. Hallam, D. Marocco, O. Miglino, J-A Meyer, and D. Parisi, editors, From animals to animats 9: Proceedings of the Ninth International Conference on Simulation of Adaptive Behaviour. LNAI. Volume 4095. Springer Verlag, Berlin, Germany, 2006. [New03] M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167–256, 2003. [NF04] Stefano Nolfi and Dario Floreano. Evolutionary Robotics: The Biology, Intelligence, and Technology of Self-Organizing Machines (Intelligent Robotics and Autonomous Agents). The MIT Press, March 2004. 104 BIBLIOGRAPHY [OTI81] T Oyama, Kikuchi T., and S. Ichihara. Span of attention backward masking and reaction time. Percept. Psychophys, 29:106– 12, 1981. [Par03] G. Parisi. Complexity and intelligence. In Lectures Notes in Physics, editor, The Kolmogorov Legacy in Physics. Part II. Algorithmic Complexity and Information Theory, pages 76–88. Springer Berlin / Heidelberg, 2003. [PI09] Manuela Piazza and Véronique Izard. How humans count: Numerosity and the parietal cortex. The Neuroscientist, 15(3):261– 273, June 2009. [RB02] Albert Reka and Barabási. Statistical mechanics of complex networks. Rev. Mod. Phys., 74:47–97, June 2002. [RMR08] Andrij A. Rovenchak, Ján Macutek, and Charles Riley. Distribution of complexities in the vai script. CoRR, abs/0810.0200, 2008. [SB92] Denise Schmandt-Besserat. Before writing: from counting to cuneiform. University of Texas Press, Austin, 1992. [Ste06] Luc Steels. Experiments on the emergence of human communication. Trends in Cognitive Sciences, 10(8):347–349, August 2006. [Tur36] A. M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proc. London Math. Soc., 2(42):230– 265, 1936. [Wil12] The Mafulu. Mountain People of British New Guinea. 1912. [WM80] Ebeling W. and Jiménez-Montano M.A. On grammars, complexity, and information measures of biological macromolecules. Math. Biosc. 52:, 53-71., 1980. [WR27] Alfred North Whitehead and Bertrand Russell. Principia Mathematica. Cambridge University Press, 1925–1927.