Download The complexity of numeral systems

Document related concepts

Addition wikipedia , lookup

A New Kind of Science wikipedia , lookup

Location arithmetic wikipedia , lookup

Arithmetic wikipedia , lookup

Elementary mathematics wikipedia , lookup

Positional notation wikipedia , lookup

Transcript
Scuola di Dottorato “Vito Volterra”
Dottorato di Ricerca in Fisica
The complexity of numeral
systems
Thesis submitted to obtain the degree of
“Dottore di Ricerca” – Philosophiæ Doctor
PhD in Physics – XXI cycle – October 2009
by
Alessio Ansuini
Program Coordinator
Prof. Enzo Marinari
Thesis Advisor
Prof. Vittorio Loreto
ii
... to my brother, Federico
iii
Acknowledgments
I am grateful to Vittorio Loreto, with whom I have had the chance to
work during these years of PHD. His constant encouragment in the critical
moments have been precious to me, no less than his great scientific competence and the generosity of his ideas. I thank Enzo Marinari for his patience
to me and the fairness and honesty with which has always talked to me.
I warmly thank Vito Servedio for its useful comments and encouragements
during these years. Last but not least I am very indebted with my friend
Alessandro Attanasi: he is great.
iv
Contents
1 Numeral Systems
1.1 The perception of abstract numbers . . . . . . . .
1.1.1 Our cognitive limits . . . . . . . . . . . . .
1.1.2 Approximate Representation of numerosity
1.1.3 Distance and Size Effects . . . . . . . . . .
1.2 The Language of Numbers . . . . . . . . . . . . .
1.2.1 The origins of a language for number . . .
1.3 Linguistics of numeral systems . . . . . . . . . . .
1.3.1 Composition of Numerals . . . . . . . . .
1.4 The complexity of numeral systems . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
7
9
9
11
11
13
13
14
2 A network model for Numeral Systems
2.1 Ω symbolic systems . . . . . . . . . . . . . . . . .
2.1.1 Axioms for Ω systems . . . . . . . . . . .
2.2 Elements of Ω networks . . . . . . . . . . . . . . .
2.2.1 Nodes . . . . . . . . . . . . . . . . . . . .
2.2.2 Triple . . . . . . . . . . . . . . . . . . . .
2.3 To build an Ω network . . . . . . . . . . . . . . .
2.3.1 Fuyuge and Miskito . . . . . . . . . . . . .
2.3.2 Italian and French systems . . . . . . . . .
2.3.3 Holistic and Unary system . . . . . . . . .
2.3.4 Positional systems with arbitrary base . .
2.3.5 Canonical systems . . . . . . . . . . . . .
2.3.6 Primes systems . . . . . . . . . . . . . . .
2.3.7 The ensemble of random numeral systems
2.4 Elementary observables of the Ω networks . . . .
2.4.1 The number of elementary symbols . . . .
2.4.2 The description of a simple Ω network . .
2.4.3 Degree sequence and its distribution . . .
2.4.4 The Tree . . . . . . . . . . . . . . . . . . .
2.4.5 The logical depth . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
17
18
19
19
20
21
24
27
28
30
31
35
37
37
39
39
41
42
v
vi
CONTENTS
2.5
2.6
2.7
2.8
2.4.6 Relation between LD and kout . . . . . .
Other functionals defined on Ω networks . . . .
2.5.1 Entropy of the Degree distribution . . .
Comparison of different models . . . . . . . . .
2.6.1 Holistic and Unary . . . . . . . . . . . .
2.6.2 Canonical and Positional systems . . . .
2.6.3 Primes . . . . . . . . . . . . . . . . . . .
2.6.4 Random . . . . . . . . . . . . . . . . . .
The space Ω . . . . . . . . . . . . . . . . . . . .
2.7.1 The size of the space of simple networks
2.7.2 Dynamics in the Ω space . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . . . . . .
3 Development of the formalism
3.1 Generalization of Ω networks . . . . . . . . .
3.1.1 Categories . . . . . . . . . . . . . . .
3.1.2 Generalized Triples . . . . . . . . . .
3.1.3 Description of a generic Ω network .
3.1.4 Other concepts related to Ω networks
3.2 Distance between symbols . . . . . . . . . .
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
46
48
50
50
51
51
52
53
53
54
56
56
.
.
.
.
.
.
59
59
59
61
62
64
65
4 Reduction of redundancies
4.1 The NSRPS Algorithm . . . . . . . . . . . . . . . . . . . . . .
4.2 The reduction transformation R . . . . . . . . . . . . . . . .
4.2.1 Reduction of two simple Triples . . . . . . . . . . . . .
4.2.2 Transformation R (e, ⋆) . . . . . . . . . . . . . . . . . .
4.2.3 Reduction of two general Triples . . . . . . . . . . . . .
4.2.4 Reversibility of R and separation . . . . . . . . . . . .
4.2.5 The causality constraint . . . . . . . . . . . . . . . . .
4.2.6 The reduction algorithm . . . . . . . . . . . . . . . . .
4.2.7 The orbits of R, and the (approximate) reduced network
4.2.8 Holistic and Unary system are two fixed point of R . .
4.3 The ω R networks and their relevant quantities . . . . . . . . .
4.3.1 Irreducible operations: π⋆ . . . . . . . . . . . . . . . .
4.4 The complexity functional . . . . . . . . . . . . . . . . . . . .
69
71
73
73
74
76
76
78
80
82
83
84
85
85
5 Complexity of numeral systems
5.1 Holistic and Unary . . . . . . . . . . . . . . . . . .
5.2 Complexity of the Italian and French system . . .
5.3 Complexity of the positional systems . . . . . . . .
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . .
87
88
88
92
97
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
vii
6 Perspectives and conclusions
99
viii
CONTENTS
Introduction
Numbers are pervasive in our everyday life. They lie at the heart of our
technology and science. They are the building blocks of mathematics, and
the holy Graal of its deeper mysteries. Eminent philosophers in the ancient
world put them at the basis of reality itself.
Numbers are in our language and our writing systems. Recent archeological findings suggest that the origin of the writing systems is rooted in
the ancient accounting systems, developed in Mesopotamian cultures [SB92].
The evolution of an (apparently) rudimental accounting system, based on
small clay objects -cones, spheres, disks and other forms- evolved in the
course of millennia through a sequence of higher and higher abstractions into
a complex system of symbols for the abstract numbers and then for words
and sentences (See Fig.1 and relative caption).
It is astonishing the variety of numeral systems accross languages and
cultures in the world. Many languages have very small numeral inventories,
just words up to two or three, and perhaps a possibility to express exact
numbers up to at most ten, using these and the word for “hand” [Ham06].
But in languages which do not have small numeral inventories, numeral expressions form a system: a set of interrelated entities that can be studied
from the point of view of its complexity.
We want address the following question: “What is the complexity of numeral systems ? ”. More precisely “How can we define a notion of complexity
that reflects the cognitive effort required in the memorization and the mastering of numeral systems ?”.
In order to give meaning to this sentence we must first explain what are
numeral systems first. This will be addressed in Chapter 1, where we review
a (very small) part of the literature dedicated to this subject. In the first
section (1.1) we report the salient facts from neuropsychology and cognitive
science about how the human (and, in a great extent, animal) brain represent
exact quantities and manipulates them. This is a necessary premise: how our
brain works reveals essential in shaping the evolution, and the actual form
of numeral systems. Than, in the remaining of Chapter 1, the focus is more
1
2
CONTENTS
Figure 1: Envelop with tokens from Susa, Iran. A clay envelop was used for
storing plain tokens, small clay objects (usually an inch or less), that were
used to represent, by virtue of their shapes, specific commodities: a cylinder
stood for an animal; cones and spheres referred to two common measures of
grain. Clay envelopes where gradually substituted: Sumerian accountants
began impressing the tokens on the soft exteriors of the envelopes before
enclosing them, thus leaving a visible record of the number and shape of
tokens held inside. At some point, accountants must have realized that the
markings on the envelope -reflecting everything significant about its contentsrendered the tokens superfluous. Thus were the first written tablets created,
as two-dimensional symbols; a circle replaced the sphere, a the wedge the
cone.
on how numeral systems are like in the different cultures all over the world.
In general less attention is paid to the question of the origin and evolution of
numeral systems, that is part of a more general question: that of the origin
and evolution of language. In the last section (1.4) we rewiew briefly the
early attempts by linguists of defining a complexity for a numeral system,
and than describe our approach. In the last decades physicists have devoted
great attention to the study of complex systems, i.e. systems that “acquire
a functional, spatial or temporal structure without specific interference from
the outside” [Hak49]. Language, as was recognized in recent years, is to a
great extent a complex adaptive system that organize itself through social
interactions [LS07]. With this in mind we introduce in Chapter 2 a new
network model, that we call Ω, aimed at describing a numeral systems. The
point of view that we adopt is that of describing the inner “logic” in the
formation of higher symbols from lower ones, abstracting from the concrete
form of the representation, be it the waveform of the pronouced word or a
the shape of a sign traced on a sheet of paper.
CONTENTS
3
This approach has the advantage that we can compare natural language
and written numeral systems on a common basis, but we pay a price for this.
The price is that we cannot include in our networks the information on how
complex is the “mapping” from the concepts to the concrete representations.
We study the general properties of the Ω networks, establishing a language suitable for their description. The formalization of the description of
a mathematical object is essential in order to define its complexity.
In this Chapter we also report the results of extensive numerical simulations on different networks, built upon several numeral systems models.
These models are inspired by natural languages and written systems, but we
also introduce abstract models, that are useful to illustrate crucial theoretical
aspects. The main focus of these simulations is on the characterization of
the structure of the networks, i.e. the static aspects. What we learn in this
Chapter, and is one of the original contributions of this Thesis, is how the
networks of numeral systems are like: this is, at our knowledge, not covered
in the literature.
The main conclusion we draw from these measures is that the topological
and statistical properties of these networks can signal if our numeral system
is governed by some rule, but it is hard from these informations to extract
the representation of these rules.
This statement of fact was the motivation focalizing, in a second moment,
on the detailed way the symbols are related. The idea was born from a very
simple observation: when trying to draw these networks on a blackboard, we
were spontaneouly drifting towards a reduced representation. This was possible only for highly redundant system, like the decimal, but was absolutely
impossible with the systems built on prime numbers or from random rules1 .
The natural step was to elaborate a system to reduce systematically the
redundancies in the networks, in order to obtain a more compact, but equivalent,2 version of the original ones. The idea behind a well known algorithm
for the lossless compression of sequences (the NSRPS algorithm [WM80]
[Gra02] [BCG06]) was the right one, but it had to be adapted, and somewhat reinvented, for the very different structure of the object to compress: a
network, although of a special type.
This effort is described in Chapter 4 were we define a transformation R
in the space Ω, that exploits the redundancies (of a certain class3 ) of our
1
Every natural number x has a unique decomposition in prime factors and so it is
natural to consider it as a representation for x. This system of representation defines the
Primes numeral system
2
It is always possible to recover from the reduced networks the original one
3
The redundancies are precisely defined in terms of the topological properties of the
networks.
4
CONTENTS
networks, and can be applied recursively, until all these redundancies are
exploited.
In the case of sequences the length of the compressed sequence was in
relation with the entropy of the emitting source, and we adopt the point of
view of the shortest description defining complexity of a numeral system as
the description of its reduced network.
In the final Chapter we study in detail how the reduction R works, calculating the reduced representations and the complexities of some relevant
systems from natural language and written systems. Remarkably we observe
that the topological structure and the description of the reduced networks,
gives valuable information on the role and the function that different symbols
have within the numeral systems, and we think that this is a clear indication
that our complexity measure captures the essential elements of the perceived
complexity of these cognitive-linguistic systems.
Chapter 1
Numeral Systems
In this Chapter we will review some relevant features of numeral systems, as
a part of natural language and of writing systems.
The words and signs for numbers are the visible part of a cognitive systems: they permit us to have access to the knowledge of exact quantities, that
otherwise would be unaccessible, due to fundamental cognitive limitations.
They are shaped by two fundamental forces:
• a dedicated neuronal circuitry for the access, manipulation and representation of arithmetical concepts;
• the social communicative interactions of billions of humans that, during
millenia, needed to communicate exact quantities in order to cooperate, often cooridinating a complex collective behaviour, or, simply, to
survive.
1.1
The perception of abstract numbers
Our attention is focused on a restricted but very important domain of semantic knowledge: that of abstract numbers. The concept of abstract number
lies at the foundation of ancient and modern mathematics, but paradoxically
it was only between the end of the eigtheenth century and the beginning
of the nineteenth century that its mathematical foundation has been developed, thanks to the efforts of the logician Gottlob Frege [Fre79], and then,
independently, of the eminent philosophers and matematicians Russell and
Whitehead in their monumental work of Principia Mathematica [WR27].
Anyway we will not be focused here in the mathematical conception of
number, but only in its everyday-life meaning, as a part of our semantic
world. In this view, the number (or cardinality) of a set can be defined as
5
6
CHAPTER 1. NUMERAL SYSTEMS
“the only property that remains invariant under substitutions of any of its
items”. Thus we can talk about three objects (see Fig.1.1), three persons,
three sounds, or three events: we can recognize that the cardinal of a set is
three, regardless of its composition [DDLC98].
Figure 1.1: The concept of abstract number: the two sets A and B are different, but contains the same number of elements. They are equally representive
of the “threeness” quality.
The possibility that humans have access, through their cognitive system,
to the numerosity of a set relies on a dedicated neuronal circuitry, inherithed
from biological evolution [DDLC98]. Animals, young infants and adult humans possess a biologically determined, domain-specific representation of
number and of elementary arithmetic operations: strong evidences points to
an evolutionary endowment of abstract numerical knowledge in the brain.
But our perception of number is far from being sharp: there is a very surprising difference between the mathematical conception of number and how
our cognitive system perceives it.
It is important in this respect, to clearly distinguish between symbolic and
non-symbolic aspects of the knowledge of number and arithmetics. Symbolic arithmetic deals with how we understand and manipulate numerals
1.1. THE PERCEPTION OF ABSTRACT NUMBERS
7
and number words, while non-symbolic arithmetic is concerns how we grasp
and combine the approximate numerosity of concrete sets of objects. Our
core knowledge of arithmetic is essentially non-symbolic. But when number
symbols are available, they become strongly attached to the corresponding
non-symbolic representations of numbers and, thereafter, a form of “secondorder” intuition seems to develop, as the links between symbols and the
quantities themselves become fast, automatic and unconscious.
1.1.1
Our cognitive limits
Our non-symbolic knowledge of abstract numbers is comparable to that of
animals and young infants. We do not have a direct knowledge of exact quantities without performing calculations, i.e. without symbolic computation.
Figure 1.2: If we look at a numerosity of objects, points etc. no larger than 3
or 4 (as humans) we have an immediate perception of the numerical quantity
without relying on counting. This cognitive faculty is called subitizing.
The value of the subitizing limit (3 or 4 for humans) influences our behaviour; it is perhaps an important parameter in studying the collective
dynamics of societies in suitable circumstances.1
1
In [BCC+ 08] for example it is discussed the relationship between the subitizing limit
8
CHAPTER 1. NUMERAL SYSTEMS
Figure 1.3: When the number of objects is higher we rely on computational
strategies in order to achieve a knowledge of the exact quantity of a set of
perceptual objects.
When we observe a number of spots exceeding our subitizing limit we have
to process each item individually or in small subitizable groups, and put them
in a one-to-one correspondence with more complex representations.
There are two crucial mechanisms involved in th operation. The first is
the individuation of the items (the groups of Fig.1.3). When we do that, we
navigate the image with ocular movements, fixing our glance on each group
at a time. Counting, but not subitizing, is made impossible if ocular and/or
attentional movements are prevented [OTI81].
The second crucial component of counting is working memory. This is
needed to keep in mind the total while integrating the successive items. While
the estimation of small numerosity is shared with non-human animals, the
counting mechanisms are peculiar of humans, are not universal among cultures, and are acquired progressively by learning in human children. Counting appears as a uniquely human activity, and a cultural invention.
of (a certain species of) birds and the emergent properties of their flocking dynamics.
1.1. THE PERCEPTION OF ABSTRACT NUMBERS
1.1.2
9
Approximate Representation of numerosity
When humans discriminate or compare the numerosity of two sets of dots,
under conditions that prevent counting, responses are approximate and become increasingly accurate as the difference between the numbers increases,
in a way that is modulated by their ratio. This ratio-dependent behavior
is an istance of Weber’s law [Deh03] [PI09], which is typically found in
judgments of continuous perceptual variables such as length, luminance, or
frequency. Weber’s law can be stated as follows: over a large dynamic range,
the treshold of discrimination between two stimuli increases linearly with
stimulus intensity. Weber’s law can be accounted for by postulating that
the external stimulus is scaled into a logarithmic internal representation of
sensation [Fec60]. It is very pervasive in numerical cognition: it is observed
independently of culture, degree of instruction, age. It is also observed in
various animal species performing many different tasks. The universality of
Weber’s law in animals, humans of all age and education is taken to indicate
the presence of an universal mechanism for approximate number processing
[PI09].
1.1.3
Distance and Size Effects
• The numerical distance effect (Fig.1.5) refers to the empirical finding
that the ability to discriminate between two numbers improves as the
numerical distance between them increases. It is faster and easier to
compare four with eigth than four with five, even after intensive training; The distance effect common to humans and animals, and, in the
first case manifest itself also when processing Arabic digits or number
words. The occurrence of a distance effect even when numbers are presented in a symbolic notation suggests that the human brain converts
numbers internally from the symbolic format to a continuous analogical
format.
• The size effect the discrimination of two numbers worsen as their numerical size increases. This effect is substantially a Weber’s law holds
cross-modally: it is found in animals and humans engaged in discrimination or comparison tasks with visual objects or sounds. The number
size effect is found also when humans are presented with Arabic digits
or number words, also when the subjects are highly trained in mathematics. This indicates that, in certain circumstances, humans access a
numerical representation that is similar to that of animals.
10
CHAPTER 1. NUMERAL SYSTEMS
Figure 1.4: Distance Effect. When subjects are asked to make comparisons
between two numbers, independently of the modality of presentation, the
error rates decrease monotonically, with an approximate logarithmic function
of the numerical distance between the two numbers. (D) Humans exibit a
distance effect also when processing symbolic numerals, but the error rates are
much lower than in the case (C) of a non-symbolic processing. (Reproduced
from Ref. [DDLC98])
In Chapter we will introduce a “distance” between symbols inspired by
the distance and size effect. This distance is expressed in terms of the logic
of construction of higher symbols from lower ones and we will use it mainly
to illustrate some general properties of numeral systems.
In conclusion, as humans we have a very limited discriminating capacity
of number perception, shared with animals and young infants. Our brain
seems more well designed for the approximate calculation, than for the precise
calculation. Fortunately we are a symbolic species [Dea97], and this has been
the impulse for a cultural effort, the invention of a language for numbers. In
the next Section we will take a look to the solutions that humans developed
in their cultures in order to grasp the knowledge of the larger exact numerical
quantities.
1.2. THE LANGUAGE OF NUMBERS
11
Figure 1.5: Size Effect. The performance on various numerical tasks become increasingly imprecise as the number involveg get larger. (A) Rats
compared the number of lever presses to several targets. The dispersion of
the distribution of correct responses grows as the numerical target grows.
(B) Humans compared the numerosity of a dot pattern to a target. The
distribution of the errors shows a similar behavior, but the scale is different.
(Adaptation from Ref. [DDLC98])
1.2
1.2.1
The Language of Numbers
The origins of a language for number
Not all languages have a numeral system. Some languages have quite simple
systems, capable of counting only to about 20 or even lower. In primitive
systems, the words have not always fully lost their non-numerical meanings.
So the word for 5 might also mean “(left) hand”; the expression for “+ 1”
might also mean “and another”; the expression for 10 might also mean “man”
or “whole” or “finished” or “right hand”.
In these systems, either all the numeral expressions are monomorphemic
12
CHAPTER 1. NUMERAL SYSTEMS
(or at least do not contain more than one morpheme with a numerical interpretation), or a relatively low number, such as 2, 3, 4, 5, is used as a basis of
addition (or very much more rarely of subtraction or multiplication). Sometimes, after a base number appears in the counting sequence, it is used for
all higher numbers. But this is not always so, and there can be what appears
to be fairly random interspersing of morphologically complex numerals with
monomorphemic numerals.
In natural languge numeral systems, monomorphemic numerals will be
called elementary symbols.
Numeral systems have evolved by successive small increments of linguistic invention. The successive inventions are built somewhat roughly on the
pre-existing structures, so that growth marks can be seen in the resulting
developed systems. Languages, like organisms can have vestigial characters
that lost all or most of their original function through evolution.
The early stage in the development of a numeral system relies quite certainly in the practice of counting the body parts. Fingers put to a one-to-one
correspondence with any set of items. The gesture of raising three fingers
comes to serve as a symbol for the quantity 3. In many aboriginal groups
there is a rich vocabulary of numerical gestures, that fulfill the same role
of a symbolic representation. Then, immediatly beyhond the gesture is the
naming. Naming a body part suffices to evoke the corresponding numeral. In
many societies in New Guinea, for example, the word six is literally “wrist”.
In countless languages throughout the world the etymology of the word “five”
evokes the word “hand”. But the body parts are a small number, a very natural limit is twenty, given by the sum of our fingers. In order to go beyond
this limit we must make “infinite use of finite means”: we need a syntax that
allow larger numerals to be expressed by combining several smaller ones.
This often implies the choice of a base number, and the expression of larger
numbers by means of a combination of sums and products.
Most languages have adopted a base number such as 10 or 20 whose
name is often a contraction of smaller units. 10 “two hands”. Once the new
form is established, it can itself enter into more complex constructions. And
contractions, morphological and phonetical distortions are possible.
The most familiar type of numeral system in better known languages is
decimal, and sometimes also partly vigesimal. This canonical type [Hur99]
has the following characteristics:
• Single words for 1 − 10,
• Use of addition to 10 for 11 − 19
• Use of multiplication by 10 (or 20), (and addition) for 20 − 99,
1.3. LINGUISTICS OF NUMERAL SYSTEMS
13
• Single words for higher bases, typically 100, 1000, and sometimes also
20.
We will use in the following the word canonical for such systems with any
base, but with the same regular rules of formation. Further characteristics,
common to both primitive and developed types of system, are:
• Complete coverage to some limit: there are no gaps in the counting
sequence;
• No ambiguity or homonymy (examples of ambiguous numerals are extremely scarce, if they occur at all);
• Little, if any, redundancy or synonymy (from a vast set of arithmetically possible combinations for any given number, typically there is
only a single well-formed numeral used in the canonical counting sequence. The occasional exception to this generalization occurs, as in
paraphrases like English one thousand one hundred versus eleven hundred);
• Recursion: expressions for higher numbers typically contain expressions
for lower numbers nested within them;
• Packing Strategy: the recursive possibilities are severely constrained by
a principle to the general effect that one builds on the highest valued
expression available (See [Hur87] for details and discussion).
In the following we will refer to the canonical numeral system as one with
base B and the properties just described, but a little bit simplified.
1.3
1.3.1
Linguistics of numeral systems
Composition of Numerals
Phrase-structure rules are a way to describe a given language’s syntax. They
are used to break a natural language sentence down into its constituent parts
(syntactic categories) namely phrasal categories (like noun phrases or verbal
phrases) and lexical categories (part of speech, like nouns, verbs, adjectives,
and so on). [Cho88] [Cho02] With remarkable uniformity, the basic form
of most syntactically complex numerals in most languages can be generated
from a universal schema of just two simple phrase structure rules.
Here, “NUMBER ” represents the category Numeral itself, the set of
possible numeral expressions in a language; “DIGIT” represents any single
14
CHAPTER 1. NUMERAL SYSTEMS
Figure 1.6: Phrase structure rules for a syntax of numeral systems.
numeral word up to the value of the base number (e.g., English one, two,...,
nine); and “M” represents a category of mainly noun-like numeral forms used
as multiplicational bases (e.g., English -ty, thousand, and billion). The curly
brackets in the rules enclose alternatives; thus a numeral may be either a
DIGIT (e.g. eight) or a so-called PHRASE (numeral phrase) followed optionally by another numeral (e.g., eight hundred or eight hundred and eight).
If a numeral has two immediate constituents (i.e., is not just a single word)
the value of the whole is calculated by adding the values of the constituents;
thus sixty four means 60 + 4. If a numeral phrase (as distinct from a numeral) has two immediate constituents the value of the whole is calculated by
multiplying the values of the constituents; thus two hundred means 2 × 100.
1.4
The complexity of numeral systems
We will introduce and study in detail a model for the representation of numeral systems as networks that we will call Ω. This model captures the
essential aspects of numeral systems as symbolic systems
• existence of elementary symbols
1.4. THE COMPLEXITY OF NUMERAL SYSTEMS
15
• compositionality
• recursivity
but does not introduce a priori the grammatical categories, like in the
phrase structure rules in (1.6). We can consider this as a zero model for a
grammar of numeral systems.
Figure 1.7: These “rules” are indeed a zero model for a grammar of numeral
systems. They simply means that a number can be represented with an
elementary symbol, or with a representation built upon the representations
of other two numbers.
In our approach, as a first step (Chapter 2) we will transform these rules
in a network. These networks are constructed starting from the real data
of natural language or written numeral systems, or from abstract model
introduced by us as analytical tools.
In this way we could see the issue of growing numeral systems as growing
networks. [DM03]
Then, in Chapter 4 we will define a transformation that try to reduce the
redundancies present in the original network, and obtains a compact version
of it. In this process the numbers that have the same role or are constructed
following similar patterns are grouped together. We will call these groups
Categories, and we will see that in some relevant cases (Chapter 5) we find
meaningful grammatical categories. The quantities related to the compact
16
CHAPTER 1. NUMERAL SYSTEMS
version of the network will permit us to define a complexity for numeral
systems in a precise way.
Chapter 2
A network model for Numeral
Systems
The aim of this Chapter is to introduce, and study in detail, a network model
that is capable to describe the “logical” organization of numeral systems as
a model for simple symbolic systems.
This model is capable to describe some elementary but very general properties of symbolic systems. We do not make any reference to the meaning of
the symbols: the focus in on how complex symbols are put together in terms
of more elementary ones.
The fact that symbolic systems, in particular language, are composed by
elementary constituents that are combined together in order to form higher
and more complex expressions in known as compositionality and is one of the
common features of all non trivial symbolic systems.
2.1
Ω symbolic systems
Ω symbolic systems are a mathematical model that we introduced in order
to describe the relevant features of simple systems of symbols like numerals
words and sentences in natural language or graphemes1 and combinations of
these in written systems. We will call these elements simply symbols.
2.1.1
Axioms for Ω systems
An Ω symbolic systems is a finite sets of symbols that satisfy the following
axioms:
1
A grapheme is a fundamental unit in a written language.
17
18
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
• there is a subset of elementary symbols that cannot be expressed in
terms of others; all the other symbols are said composed;
• composed symbols are the result of a unique binary operation; the
nature and the number of these operations depends on the system, but
its output is always a well defined element of the system. The input
symbols can be either elementary or composed;
In the set E are defined relationships between elements that describes the
way composed symbols are constructed with “more elementary” ones. It is
natural to define a network structure, that we will call Ω that describes these
relations.
2.2
Elements of Ω networks
Ω networks are intended to represent the “logic” behind the composition of
elementary symbols into complex ones. This logic is deduced inspecting the
representations of these symbols living in the physical world, like signs traced
on a paper, or linguistic sounds floating in the air. We will use the peculiar
notation Ψ in order to signal when we are talking about representations of
numerical concepts. So in natural language numeral systems Ψ is a function
that assigns words or phrases to natural numbers, and analogously, in written
systems, strings of written symbols are concerned. We will see in Section 2.3
how to build the Ω networks from these representations.
The description of how elementary symbols are represented by means of
concrete objects, and how these representations are put together (eventually
transformed) to construct the representations of composed symbols (See for
example [RMR08]) falls out from the purpose of the Ω network model, that
abstracts from these details and concentrates on the logic behind the system.
Nevertheless these aspects are important in a study on the complexity of
numeral systems. We will not follow this line: we will only observe that
physically realizable (and realistic) symbolic system cannot have an infinite
number of elementary symbols. This was stressed by Turing in his classic
1936 paper [Tur36], where he observed that in any realistic symbolic system,
the concrete representations for the elementary symbols occupy a limited
(compact) space, as for example a small square of length 1. Since any realistic
symbol occupies a non-zero area of this space, it is impossible to have infinite
symbols that are also distinguishable.
From now on we will concentrate only in how the symbols are organized
together into a system, and study the complexity of this organization.
2.2. ELEMENTS OF Ω NETWORKS
2.2.1
19
Nodes
In Ω networks there are two kind of nodes:
• circular nodes are representatives of natural numbers 0, 1, 2, . . . ; they
can also represent more complex (hierarchical) structures that we will
call Categories, but for the moment we will not consider this possibility
(See 3.1.1);2
• square nodes are representative of arithmetical binary operations choosen
from an a priori fixed set {⋆1 , ⋆2 , · · · ⋆r }.3
In this Thesis we will consider the addition and the multiplication, as the
only possible arithmetical operations {+, ×}, because these are sufficient in
order to discuss almost all known numeral systems [Hur87]. A noticeable
exception is represented by the Roman written numeral system, in which
the possible operations are {+, −} operation, but we will not discuss it here.
The circular nodes inherits the “elementariness” from the numerical symbols
they represent. We usually colour circular nodes with yellow (or orange) if
they are elementary, and with green if they are composed.
2.2.2
Triple
Every composed number is the output of a binary operation. The two input
nodes have a direct link to the operational node, and this has a direct link to
the output node. Input numbers can be elementary or composed. The input
and output nodes, the operational node and the links are part of a unity that
we will call Triple. An example of this structure is illustrated in Fig.2.1.
We call T (x) the Triple associated to the output node containing x (it is
defined only for composed numbers). Our networks are organized into such
units, as is showed in Fig. 2.2 .
Ω networks, with their characteristic Triple structure, are a natural network representation of symbolic systems that are well described by the phrase
structure rules described in Fig.1.7.
But, while phrase structure rules used by linguists are meant to represent
all possible grammatical sentences in a language [Cho88], and are given in
terms of recursive relations between a priori recognized grammatical categories , Ω networks are built upon the actual sentences in a language, the
2
The Categories have nothing to do with the concept of “category” in mathematics.
In this work we do not consider the possibility of a dynamic generation of operations,
for example under selective pressures or optimality principles, but it is a very interesting
direction of future research.
3
20
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
Figure 2.1: The Triple of the number x (T (x)). The elementary numbers are
associated to the yellow nodes and the composed numbers with the green
ones. The square node represents a binary operation among, among the
available ones, typically {+, ×}. Only in special cases (for example in the
case of the Roman numeral system) the inventory of possible operations is
{+, −}.
raw data, that in our case is given by the numerals or the written symbols
of a numeral systems.
A subset of Ω networks, that I will call ΩHurf ord networks, is a representation of the more restrictive phrase structure rules described in Fig.1.6, that
is well suited for developed numeral systems.
These networks are no more organized into Triples. The recurrent structure is described in the following picture
It is clear that the ΩHurf ord subset is very small with respect to Ω. We
choose this zero model for two reasons. The first is that is simpler to deal
with with analitical and numerical simulation tools. The second, that is the
real reason behind our approach, is that we don’t want to fix the grammatical
categories of numbers a priori. On the contrary we want them to emerge from
the topological properties of a “zero” model, that does not introduce a priori
any category but one: the number.
2.3
To build an Ω network
Now we have all the elements for building the networks of numeral systems.
We illustrate how to do that with some abstract and concrete models, among
which two primitive numeral systems (Fuyuge and Miskito) two from devel-
2.3. TO BUILD AN Ω NETWORK
21
Figure 2.2: The network of a small (randomly built) numeral system. The
yellow nodes represent elementary symbols, all composed numbers are built
upon them through arithmetical operations {+, ×}. Every composed number forms a Triple with its operational node and the nodes of its parental
numbers. The network associated to numeral systems are organized into
Triples.
oped ones (Italian and French).
2.3.1
Fuyuge and Miskito
Mafulu is the name of the people who live in a group of villages within and
near the north-westerly corner of the area of the Fuyuge-speaking people, a
Papuan language, and may be regarded as one common language throughout
the Fuyuge area [Wil12]. We give the first few numerals of its numeral system,
which is substantially a base-2 one. This quite regular (and redundant)
structure is made visible in its associated Ω network (See Fig.2.4).
• 1 = Fida (One).
• 2 = Gegedo (Two).
22
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
Figure 2.3: The fundamental unity of a ΩHurf ord network. This means that
a number (category Number) can be a digit (elementary symbol), or a more
complex numeral. In this case it is composed by a Phrase and (eventually)
another number, connected by an addition. The Phrase is always composed
by a numeral (category M) expressing a base or a power of it, and (eventually)
another number. In this case the operation between them is a multiplication.
• 3 = Gegedo minda (Two and another).
• 4 = Gegedo ta gegedo (Two and Two).
• 5 = Gegedo ta gegedo minda (Two and Two and another) [ or Bodo fida
(one hand)].
• 6 = Gegedo ta gegedo ta gegedo (Two and Two and Two).
• 7 = Gegedo ta gegedo ta gegedo minda (Two and Two and Two and
another) [or Bodo fida ta gegedo (one hand and Two)] .
• 8 = Gegedo ta gegedo ta gegedo ta gegedo (Two and Two and Two and
two [or Bod o fida ta gegedo minda (one hand and Two and another)].
2.3. TO BUILD AN Ω NETWORK
23
• 9 = Gegedo ta gegedo ta gegedo ta gegedo minda (Two and Two and
Two and Two and another) [or Bodo fida ta gegedo ta gegedo (one hand
and Two and Two)].
• 10 = Bodo gegedo (Two hands).
• 11 = Bodo gegedov’ u minda (Two hands and another).
• 12 = Bodo gegedo ta gegedo (Two hands and Two).
• 13 . . .
Figure 2.4: The network of Fuyuge System. We notice that the number 5
is represented in two different forms, one as a elementary symbol, one as
the output of 2 + 2 + 1. The synonimy is very rare in numeral systems
and is mainly present in primitive ones (one occasional exception occurs in
English, numbers like one thousand one hundred 1100 and its paraphrase
eleven hundred ).
Miskito is an indigenous language of Central America, spoken by nearly
200, 000 people in Nicaragua, Honduras and Belize. The Miskito numeral
24
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
system is substantially base-5 [Hal91], but there are evidences of base-2 and
base-6 structures [Hur99]. The irregularities (often called idiosyncracies in
lingustic jargon) are reflected into the disordered structure of its Ω network
(See Fig. 2.3.1).
• 1 = Kum (One)
• 2 = Wol (Two)
• 3 = Yumpa (Three)
• 4 = Wol Wol (Two Two)
• 5 = Matsip (Five)
• 6 = Matlalkahbi (Six)
• 7 = Matlalkahbi pura kum (Six + One)
• 8 = Matlalkahbi pura wal (Six + Two)
• 9 = Matlalkahbi pura yumhpa (Six + Three)
• 10 = Matawalsip (Ten)
• 11 = Matawalsip pura kum (Ten + One)
• 12 = Matawalsip pura wal (Ten + Two)
• 13 . . .
2.3.2
Italian and French systems
The most familiar type of numeral system is decimal, like the Italian system,
and sometimes also partly vigesimal like the French system. Italian numeral
system is of a canonical type for numbers lesser than 104 , than there is a
transition to an higher base (the superbase 103 ).4 Its elementary simbols
correspond to the single words for 0, 1, . . . , 10
zero, uno, due, . . . , dieci
and for the following powers of ten: 102 , 103 , 106 , 109, . . .
cento, mille, un milione, un miliardo, . . . .
4
The insertion to a superbase is very common in developed system (See [Hur87]).
2.3. TO BUILD AN Ω NETWORK
25
Figure 2.5: The network of Miskito numeral system.
The addition by 10 is used for 11, . . . , 19, the multiplication by 10 for 20, . . . , 90,
and multiplication followed by addition is used for 21, . . . , 29, . . . 91, . . . , 99.
Italian numeral system has a very regular (and redundant) network as is evident from Fig.2.3.2, where is possible to note also the peculiar role played
by the base 10 and the unity.
The French counting system is partially vigesimal: 20 (vingt) is used as
a base number in the names of numbers from 60 to 99. The French word for
80, for example, is quatre-vingts, which literally means “four twenties”, and
soixante-quinze (literally “sixty-fifteen”) means 75.5 This system is comparable to the archaic English use of score, as in fourscore and seven, meaning
87, or “threescore and ten”, meaning 70. Belgian French and Swiss French
are different in this respect. In Belgium and Switzerland 70 and 90 are sep5
This particular structure was introduced during the French Revolution as an attempt
to unify the different counting systems (mostly vigesimal near the coast, because of Celtic
and Viking influences).
26
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
Figure 2.6: A small part of the network of the Italian numeral system. Notice
the different treatment of number 1 for addition and multiplication. All the
nodes containing the same numbers are to be considered identified, we draw
them separately only for the sake of clarity.
tante and nonante. In Switzerland, depending on the local dialect, 80 can be
quatre-vingts or huitante. In Belgium, however, quatre-vingts is universally
used.
The elementary symbols for French are 0, 1, 2, 3, 4, . . . , 10
zéro, un, deux, trois, quatre, . . . , dix
and 102 , 103, 106 , . . .
cent, mille, un million, . . .
The seemingly indipendent symbols for 11, 12, . . . , 16
onze, douze, . . . , seize
but in reality, like the corresponding Italian numerals are derived from their
Latin ancestors. So we will consider them as
10 + 1, 10 + 2, . . . , 10 + 6.
Higher numbers 20 − 69 consists of a word for the multiple of 10 plus
optionally the number for the 1 − 9 from the list opposite. The names of the
tens {20, 30, 40, 50, 60} are vingt, trente, quarante, cinquante, soixante.
2.3. TO BUILD AN Ω NETWORK
27
These continue on from {70, 71, 72, . . . , 79} (soixante-dix soixante et onze,
soixante-douze, . . . )
Notice the et in 71 mimics the behaviour of 21, 31, . . ..
The French for 80 is quatre-vingts. Numbers 81 − 99 consist of quatrevingt- (minus the -s) plus a number 1−19 (quatre-vingt-un, quatre-vingt-deux,
quatre-vingt-dix, quatre-vingt-onze, . . . , quatre-vingt-dix-neuf )
The Ω network for the French system is less regular than the Italian
network; it shows the relics of the vigesimal system, emphasizing the role of
number 20, but at the same time it shows the substantially decimal nature
of the French numeral system too (See Fig. 2.3.2).
Figure 2.7: A small part of the network of French numeral system. Notice
the peculiar construction of the numbers 70 (soixante-dix), 80 (quatre-vingts)
and 90 (quatre-vingt-dix).
Now we introduce two idealized numeral systems that will have a very
important role in the following.
2.3.3
Holistic and Unary system
In all realistic numeral systems there is, in a certain sense, a compromise
between the Holistic and the Unary system. In the evolution of numerical
symbols, as we saw in the preceding Chapter, there is evidence of the fact that
28
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
almost all written systems begin with a sequence of Unary-like symbols. On
the other side each numeral system has a repertory of elementary symbols,
typically the first few numbers, and it is Holistic within this range.
The Holistic system’s Ω network is a set of isolated nodes, all of them
standing for elementary symbols (See Fig.2.8). The Unary system instead
has only one elementary symbol, standing for the number 1, and all other
numbers are created by reiterated additions (See Fig.2.9). These models are
somewhat abstract, but we will find them very useful in the development of
the theory, in that they are in a certain sense two extremal points in the
space of Ω networks.
Figure 2.8: The Holistic system. If we consider the first few words in the
numeral systems of all cultures, the great majority of times we find that are
independent and irreducible words, from a morphosyntactic point of view,
i.e. they are elementary symbols. This so for example in Italian for the words
from 1 to 10, then compositionality arise. There are exceptional cases, as for
example in Panjabi numerals where the Holistic part of the system extends
up to 100. In this sense every numeral system has an Holistic part, but this
part can have very different extensions.
2.3.4
Positional systems with arbitrary base
Positional notation or place-value notation is a generalization of decimal notation to arbitrary base. These include binary (base 2) and hexadecimal
(base 16) notations used by computers as well as the base 60 notation of
Babylonian numerals. Indian mathematicians developed the Hindu-Arabic
numeral system, the modern decimal positional notation, in the 9th century.
Positional notation is distinguished from previous notations (such as Roman
2.3. TO BUILD AN Ω NETWORK
29
Figure 2.9: The Unary system. Notice that in order to construct any symbol
it is necessary to construct all the others first. The only exception is for 1,
that is the only elementary symbol.
numerals) for its use of the same symbol for the different orders of magnitude (for example, the “one’s place”, “ten’s place”, “hundred’s place”). This
greatly simplified arithmetic and lead to the quick spread of the notation
across the world. In order to construct the Ω network for positional systems
we must first establish which number is an elementary symbol. In positional
systems the digits and the base are to be considered elementary symbols,
because they are independent graphical signs. The other elementary symbols are the powers of the base. For any given natural number x the natural
decomposition in a positional system of size N = B k is
x = B k−1 × ak−1 + B k−2 × ak−2 + · · · + B 0 × a0
where the {ai } are digits. In order to build the Ω network we first construct
the Triple
n
T (x) : x = B k−1 × ak−1 + B k−2 × ak−2 + · · · + B 0 × a0
o
and then the inner Triples corresponding to the two inner terms
B k−1 × ak−1 ,
B k−2 × ak−2 + · · · + B 0 × a0 .
This procedure is repeated iteratively, until all the Triples involved have
elementary symbols as input nodes.
30
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
The Ω network associated to a positional system is then clearly highly
redundant, and its topology must reflect a special role of the digits and the
powers of the base. We observe that all the digits are treated on the same
footing in these networks, in particular there is no special role for number 1.
2.3.5
Canonical systems
We described canonical systems in Chapter 1; they are a class of models
inspired by the structure of the decimal system, when incorporated in natural
languages. They reserve a special role to the symbols for unity and zero. The
unity does not appear in multiplicative expressions, and the zero is a standalone elementary symbol (notes that in positional systems the zero is highly
connected to the rest of the network, it is indeed a hub). The Ω networks
associated to positional systems are highly ordered, as we can see in Figg.2.10,
2.11.
Figure 2.10: The Ω network associated to the canonical (base 4) numeral
system with size 64 (43 ). In orange we painted the elementary symbols.
2.3. TO BUILD AN Ω NETWORK
31
Figure 2.11: A larger part of the Ω network for the canonical (base 4)(size
N = 256 = 44 ).
2.3.6
Primes systems
Prime numbers are the “building blocks” of natural numbers. Their crucial
importance in mathematics, and particularly in number theory, stems from
the fundamental theorem of arithmetic which states that every positive integer larger than 1 can be written as a product of one or more primes in a way
which is unique except possibly for the order of the prime factors [GW79].
For example, we can write
666 = 2 × 3 × 3 × 37
We will adopt a standard factorization in which the prime factors are
ordered in a crescent way (p1 < p2 · · · < pr ) so that for any given number n
we will have
n = pµ1 1 pµ2 2 · · · pµr r
(2.1)
where µi is the multiplicity of the i-th prime factor.
32
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
Our aim here is to use this decomposition as a mean to represent natural
numbers. We imagine that every prime p is associated to an independent
symbol Ψ(p) and the expression of a number n is given by the string formed
by all the symbols pi of its decomposition, drawn a number of times equal
to the multiplicity µi of the relative prime factor. For example in the case of
666 the representation will be given by
Ψ(666) = Ψ(2)Ψ(3)Ψ(3)Ψ(37).
We can associate a network to this representation in the following way.
The elementary symbols, as we said, are the primes; for all other numbers we
define their Triple T (n) as formed by pm × pnm , where pm is the smallest prime
factor of n, taken with multiplicity 1. The resulting network has interesting
properties, as we will see in the rest of this Chapter. Like the Unary and the
Holistic, we can consider the Prime system’s Ω network as a frontier point
in the space Ω.
Let us consider the set of integers divisible by a prime p. The probability
that extracting a random integer (with uniform measure) we find a number
in this set is clearly 1p . Take now the set of integers divisible by both p and
q, where q is another prime. To be divisible by p and q is equivalent to being
1
divisible by pq, and consequently the probability of this set is pq
. Since
1 1
1
= ×
pq
p q
we can interpret this by saying that the “events” of being divisible by p
and q are independent, and so, in a certain sense “primes play a game of
chance” [Kac59]. This is the beginning of a new development which links in
a significant way number theory and probability theory.
One of the earliest findings in the probabilistic properties of prime numbers is that the number of prime numbers lower than N, usually indicated
with π(N), is asymptotically equal to logNN . This is the celebrated Prime
Number’s Theorem, obtained independently by Hadamard and de la Vallèe
Poussin in 1896.
Another probabilistic result in number theory that we will find useful for
studying the Ω network of the primes is described below. Let us define ω(n)
as the number of different prime factors of n and Ω(n) as its total number of
prime factors6 ; thus, referring to equation 2.1,
ω(n) = r,
6
Ω(n) = µ1 + µ2 + · · · + µr
This is the reason for the name Ω, given to the networks representing numbers: its
is an homage to these functions and incidentally to the Cantor first transfinite ordinal
number ω.
2.3. TO BUILD AN Ω NETWORK
33
Figure 2.12: The Ω network associated to the Prime numeral system with
size 100. It is evident the presence of three hubs corresponding to numbers
2 (in the middle part of the picture), 3 (upper left) and 5 (upper right). The
orange numbers are the primes.
Both ω(n) and Ω(n) behave irregularly for large n, and both functions
are 1 when n is prime, while, for example
Ω(n) =
log n
log 2
when n is a power of 2.
Although the behavior of ω(n) and Ω(n) is erratic, (see Fig.2.3.6) both
these functions show a statistical regularity, captured by a theorem (see
[GW79] Theorem 430 page 355) the average order 7 of both ω(n) and Ω(n)
7
In number theory, the average order of an arithmetic function is often studied by means
of some simpler or better-understood function which takes the same values on average.
34
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
10
8
LD[n]
6
4
2
0
0
50
100
200
150
n
250
300
Figure 2.13: Given an integer n, Ω(n) is the number of its primes factors,
counted with their multiplicity. In the y axis we reported the related function
LD(n) = Ω(n)−1 (See Sec. 2.4.5). The erratic behavior of the Ω(n) function
is evident.
is log log n. More precisely
X
ω(n) = N log log N + B1 N + o(N)
(2.2)
X
Ω(n) = N log log N + B2 N + o(N)
(2.3)
n≤N
n≤N
where the constant B1 is the Mertens constant (see [GW79]) and B2 can
be expressed in terms of B1 by
B2 = B1 +
∞
X
1
k=1
pk (pk − 1)
pk being the k − th prime in the natural ordering.
Their values are approximatively
B1 = 0.26149,
B2 = 1.03465.
Let f be an arithmetic function. We say that the average order of f is g if
X
X
f (n) ∼
g(n)
n≤x
as x tends to infinity.
n≤x
2.3. TO BUILD AN Ω NETWORK
35
Figure 2.14: A larger part (size N = 103 ) of the network associated to the
Prime system. Every prime number plays the role of an hub in such network.
Here only two hubs (2 and 3) are distinguishable.
2.3.7
The ensemble of random numeral systems
Random numeral systems are artificial systems that we introduce for two
reasons:
• to study the properties of Ω networks associated to human-created
systems by comparison with those of a random ensemble;
• to study the properties of the space of Ω networks.
The creation of a random system is obtained by a random growth from
the only preexisting (and necessary) symbol that we suppose to be available
at the beginning: the 1. We create randomly a Triple for each x, starting
from 2, then for 3, etc. Once a number x has its symbolic representation
(once its Triple is created) it is available for the creation of higher numbers’
36
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
representations. We add the elementary symbol for 0, but it is an isolated
node, and has no function nor role in this system.
The creation of a random system depends on two parameters:
• the probability that at each step an elementary symbol is created, that
we will call ε;
• the probability of creating a + Triple, once we know that a Triple
must be created (when the symbol is not elementary). We will call this
probability p.
To be more precise we give here the pseudocode for its generation:
• x = 0 and x = 1 are always elementary symbols, so
T (0) : {0} , T (1) : {1}
• for all other x: T (x) : {x} with probability ε and, if this is not the
case:
– if x is prime create a random Triple with the + operation T (x) :
{x = z + y}
– if x is not prime extracts an operation {+, ×} with a Bernoulli
distribution with parameters {p, 1 − p}
– create a random Triple T (x) : {x = z ⋆ y} with the operation ⋆
extracted.
The ensemble so defined depends on the two parameters (ε, p), and we
will denote it E (ε, p). When we want to compare random numeral system of
size N with other systems of the same size, we will set the parameters (ε, p)
in a way so that the π (number of elementary symbol) and p are the same, on
average. If we indicate with hπi the average number of elementary symbols
in such a random network we have
hπi = 2 + ε (N − 2)
(2.4)
and when we want to compare a random system with another system in
which a certain π is given, we will have to use the correct ε:
π−2
ε=
(2.5)
N −2
For that regards the parameter p, in a network with a certain value of +
nodes, called π⋆+ in the following, we find that the right p is
p=
π⋆+
N −2
(2.6)
2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS
37
Figure 2.15: The Ω network associated to a Random numeral system with
size 100. In orange we painted the elementary symbols. (parameters of the
ensemble: ε = 0.1 p = 21 )
2.4
Elementary observables of the Ω networks
Now that we have introduced some models of numeral systems we develop
the tools for their analysis. In this section we will introduce simple definitions
and use standard tools from the theory of complex networks [RB02] [New03]
[CRTVB07].
2.4.1
The number of elementary symbols
The first quantity that we will consider is the number of elementary symbols,
that we will indicate with π.8 This is one of the most relevant quantities
regarding numeral systems and their networks: the number of elementary
8
This coincides with the notation introduced for the Prime system.
38
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
Figure 2.16: A larger part (size N = 103 ) of the network associated to a
Random system. We note the absence of any structure.
symbols has to do with the memory resources required to learn to use such
systems.
For any given Ω network we define:
π (ω) =| {n ∈ ω | n is elementary} |
(2.7)
where the operation | · | extracts the cardinality of the set within the curly
brackets. As we will see in Sec. 6 this definition admits a generalization that
reduces to the actual definition when the networks involved are simple.
Let us now consider some examples. In the Holistic system π = N, while
in the Unary system, that can be consider at the opposite side in the space
Ω, we have π = 1. In positional and canonical systems the growth of the
function π(N) is logarithmic, after an initial linear behavior. This is due
to fact that the first numbers are also the digits (elementary symbols), and
a new elementary symbol is introduced whenever the size exceeds a power
of the base. In the Prime system the situation is more intriguing: as we
2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS
39
saw in Sec. 2.3.6, the number of prime factors lower than a given N is an
irregular function that behaves asympotically like logNN for the Hadamard
theorem. Finally, for random systems the function π(N) is stochastic but its
mean value grows linearly: hπ(N)i = 2 + ε (N − 2). We can see that, just
considering a very simple aspect of Ω networks, and a limited set of models,
the phenomenology of Ω networks is already very rich.
2.4.2
The description of a simple Ω network
In order to completely specify an Ω network we must give its elementary
symbols and a description of its wiring diagram (i.e. a description of how
the nodes are linked together). Since Ω networks are organized into Triples
a description of the wiring diagram is a list of these Triples. We stipulate
that the descriptional length of an elementary symbol is equal to 1, and the
same convention is made for the descriptional length of a Triple.9
Since the number of Triples is N −π the descriptional length of a (simple)
Ω network is given by
L = π + (N − π) = N.
We will see in the development of the Thesis how this concept of descriptional length will be useful. At this point the descriptional length is the
same for all Ω networks, independently of their level of organization or of
randomness: it depends only on their size.
2.4.3
Degree sequence and its distribution
The degree is an essential property of a node. Let us consider the node representing a number x in an Ω network. If x is an elementary symbol than
kin (x) = 0, otherwise, if it is a composed symbol kin (x) = 1; {0, 1} are the
only two possible values because we have excluded the possibility to have
“synonyms”, i.e. multiple different ways to construct the same number. The
possible values of kout instead are all non-negative integers; if x is a dominant
element, its kout will be high, with respect to the others. Such dominant elements are often called hubs in the language of complex networks and usually
play key roles in networked systems. As we saw in Sec. 2.3.5 in canonical
systems the base number and its powers are hubs in the corresponding network. The possible values for the degrees kin and kout of a node are described
in Fig. 2.17.
9
Other choices are possible, like for example to assign length 3 to the description of a
Triple, but such differences are immaterial.
40
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
Figure 2.17: The generic node x and its possible degree values kin and kout .
In the following we will always refer to the kout of a node as its degree,
because the kin only signals if a symbol is elementary or not. The kout one
of the fundamental observables in studying the structure of Ω networks. We
will analize the kout as a function of x (degree function) and its probability
distribution P (kout ) (often replaced with P (k)). For example the degree
function for the canonical decimal system is reported in Fig. 2.18.
10000
k_out(x)
1000
100
10
1
10
100
x
1000
10000
Figure 2.18: The degree function for the canonical base 10 system. We can
observe the hubs in correspondence of the powers of the base and of their
multiples. The recursive structure of its Ω network is reflected in the selfsimilar structure of the degree function graph.
We analized the P (k) for the all the models of numeral systems introduced
2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS
41
in the preceding section. We see an example in Fig. 2.19, where the P (k) of
a base 4 canonical system is reproduced.
1
0.1
P(k)
0.01
0.001
0.0001
1e-05
1e-06
0.1
1
4
16
64
256
1024
4096
k
Figure 2.19: The P (k) of a base 4 canonical system. The tail of the distribution is generated by the presence of the hubs, corresponding to the higher
powers of the base.
2.4.4
The Tree
Let us consider a number x and its network representation in its Ω network.
If x is composed, it will be the output of a binary operation involving its
parental nodes. The latters, in their turn can be either elementary or composed and so on. If we pick up that part of the Ω network consisting of all
the Triples that are “upstream” of a certain number x we obtain its Tree
(See Fig. 2.20), and the function so defined will be called Tree(x).
Definition 1 (Tree) The Tree of a number x ∈ Ω is the set constituted by
all the Triples that are upstream of x.
The Tree(x) retains all the information about the decomposition of x in
terms of its elementary constituents. At the level of representation, Ψ(x) is
typically composed by a certain combination of the representations of the
elementary symbols that are the leaves of Tree(x). Every elementary symbol
leaves a trace in this representation; for example in the Italian system the
representation of a composed number like 234 is given by the numeral phrase:
Ψ(234) = duecentotrentaquattro
in which the representations of the elementary symbols 2 (due), 100 (cento),
3 (tre), 10 (-enta), 4 (quattro) are recognizable. The main characteristic of
42
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
Figure 2.20: The Tree of a number (275) in a (randomly generated) Ω network.
the Tree is its length. In the next Section we will explore this property.
The length of the Tree will be interpreted as the analogue of the length of a
computation. This computation starts from certain available numbers (that
does not need to be computed), that are the elementary symbols, and is
described by the sequence of operations in Tree(x). This length will be also
called logical depth for analogy with a complexity measure introduced in the
context of Algorithmic Information theory (or Kolmogorov Complexity).
2.4.5
The logical depth
The logical depth is a complexity measure introduce by Charles Bennett
[Ben88], that is rooted in the theory of Kolmogorov Complexity [LV97]
[Par03], and, roughly speaking, measures the time required for computing
a number from the shortest program that generates it.
We will use the term logical depth in the context of our Ω networks in
analogy with the Bennett’s logical depth, because the number of operations
needed to “compute” a number x in a given numeral system is the length of
its Tree. Consequently this length can be thought as the computational time
2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS
43
required from building an object from its minimal representation.10
We stress that this is only an analogy: the logical depth is a sophisticated
concept rooted in Kolmogorov Complexity theory, and its definition requires
mathematical rigour. Our main focus will be not in the logical depth of a
single number x, but on its average over the whole network, that we will
call πC (or sometiemes ALD, depending on the situations). The value of
πC is a measure of the mean cognitive effort required in order to build the
representations of numbers in a given system, and it depends mainly on two
factors:
• the topology of the wiring diagram;
• the number of elementary symbols π.
The tendency is that disordered Ω networks, corresponding to systems
with an high number of intricated rules, like for example random systems
shows an high value of πC if compared with ordered ones, corresponding to
systems with a few, “intelligent” rules. On the other side a large number of
elementary symbols (high π) tends to lower the πC , with the specification
that the effectiveness of rising the number of elementary symbols on πC
depends on the LD of the numbers that are promoted to the “elementariness”
condition (See Fig. 2.32). Large values of π require a proportional request
of memory resources, in order to remember the elementary symbols. In fact
we find that the developed numeral systems in natural languages, evolved
under the pressure of lowering the cognitive efforts required for their use, are
highly ordered, and simultaneously make use of a low number of elementary
symbols: that these two factors work in the direction o f lowering the requests
made to our cognitive system.
Definition 2 (Logical depth (narrow sense)) The logical depth of a number x is the number of arithmetical operations contained in its Tree.
LD(x) =| {operations in Tree(x)} | .
An number x is an elementary symbol if and only if its logical depth is
zero:
{x is elementary} ⇔ {LD(x) = 0.}
In terms of LD we can express other quantities, for example the number
of elementary symbols contained in T ree(x), counted with their multiplicity:
| {elementary symbols in Tree(x)} |= LD(x) + 1.
10
This is true only in systems that makes an “optimized” use of the resources, corresponding to an optimized Ω network.
44
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
as we can see from Fig.2.20. The number of elementary symbols π can be
easily rewritten in terms of the LD function:
π=
N
−1
X
δ (LD (x) , 0)
x=0
In conclusion the LD function contains a lot of information regarding Ω
networks, together with its probability distribution P (LD) (See Fig. 2.21).
0.5
Random
Canonical Base 10
P(LD)
0.4
0.3
0.2
0.1
0
0
5
10
15
20
25
30
LD
Figure 2.21: The probability distributions of LD in two completely different
systems. In the random system (red) we observe a Gaussian behaviour of
the LD around a quite high value (around 14). The size N of these networks
is 104 . The random system was generated with a ε value (probability of
creating an elementary symbol) ten times higher than the value suitable for
a comparison (See eq. 2.5). The canonical base 10 network shows a very
small modal value (6) of the LD distribution: this contribution is given by
numbers like 4598 (for which LD = 6).
Given a number x, its logical depth refers to the way this element is
constructed through the absolutely elementary things. Sometimes it can
be useful to measure the depth relatively to other numbers which are not
necessarily elementary:
2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS
45
Possonian fit <LD> = 2.37046
0.25
P(LD)
0.2
0.15
0.1
0.05
0
0
2.37
10
5
15
LD
Figure 2.22: The Primes systems shows a Poissonian distribution of the LD.
The size of the network used is N = 5 ∗ 105 , notice that the average value
of LD is very small, if compared to other systems. This means that the
representation of a number in terms of its prime factor decomposition is
highly compressed, but this is paid in terms of a large number of elementary
symbols. The number of primes lower than N in fact grows asymptotically
as logNN : this is the Prime Number Theorem, obtained independently by
J.Hadamard and de la Vallé Poussin (1896).
Definition 3 (Relative logical depth) The logical depth of a number x ∈
ω, relative to a set of numbers y1 , y2 , . . . , yr :
RD (x | y1 , y2 , . . . , yr )
is the logical depth of x computed as if all the numbers y1 , y2 , . . . , yr were
elementary.
Obviously this definition differs from that of logical depth only when at
least one of the yi is composed.
Let us consider the case in which a single reference y is involved. If x is
not in the set of numbers that are “downstream” of y we put
LD(x | y) = ∞,
46
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
meaning that it is impossible to construct the symbol x starting from the
symbol y, within the Ω network. This is the case, for example, when y
belongs to the set of elements that are upstream of x. So we can state that
the LD(x | y) is not a symmetric function in its arguments.
Definition 4 (Logical independence) Two symbols (x, y) will be called
logically independent if and only if
LD(x | y) = LD(y | x) = ∞
(2.8)
This is always the case when (x, y) are two elementary symbols, and the
meaning is that it is impossible to construct one of them starting from the
other. Metaphorically, the quantity LD(x | y) measures how far is a certain
result x, considered as the final point of a computation, from a premise,
when the latter is taken as the starting point for a manipulation. When this
quantity is infinite it means that, at least in this system, x is not obtainable
from y, or, that is the same, that is obtainable in an infinite number of
operations (an consequently infinite time).
We now discuss an important concept, that is the average logical depth πC
(sometimes also referred as ALD). We define it as the logical depth averaged
over all the numbers of an Ω network
Definition 5 (Average logical depth) The average logical depht of ω is
the quantity
1 X
πC (ω) =
LD (x) ,
(2.9)
N x
where N is the size of ω.
The value of πC is the average number of operations that are necessary in
order to build the representations of the different numbers in a given numeral
system. The ω for which πC is absolutely minimal is clearly the one with
all isolated nodes, corresponding to the Holistic system: for all its numbers
LD = 0. The price payed in order to reach optimality from the side of πC ,
is an high number of elementary symbols, or in other words an high value of
π. These two functions typically push in opposite directions.
We studied the πC in all the models we introduced. In canonical systems,
for example, it has a logarithmic behaviour, as we can see in Fig. 2.23.
2.4.6
Relation between LD and kout
Most natural complex networks show an high degree of heterogeneity in the
degree of their nodes [RB02][New03][BLM+ 06]. Very often there are a few
2.4. ELEMENTARY OBSERVABLES OF THE Ω NETWORKS
47
Figure 2.23: The average logical depth πC (ALD) as a function of the size
N for several models. The “Base 10 S3” is the canonical decimal system
with superbase 103 . This is like the Italian system, in which we do not
have an elementary symbol for all the powers of the base (104 is dieci-mila
for example) as in purely canonical systems, but, beyond the number 103 it
behaves like a base 103 system. We notice that the random numeral systems
is highly unefficient, while the Prime system is the one who realize the best
performance, from the point of view of πC .
nodes with an extremely high connectivity (hubs), which play a fundamental
role in the network’s functioning, while the great majority of nodes is linked to
a few others. This heterogeneity establishes a natural hierarchy between the
nodes, but there are situations in which other natural hierarchies are possible.
This is true especially in systems of symbols, like numeral systems, where
some of these are elementary and others are composed, with a high or low
degree of complexity, corresponding to the number of operations required to
construct them, described by the LD function. In these cases it is interesting
to check if nodes that are highly connected are also the ones with an high
degree of elementariness (low LD). We find that the tendency, in natural
systems like the canonical (Fig. 2.25) (but also in the Prime system that can
48
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
be consider natural for mathematical reasons (Fig. 2.26) ), can be expressed
in a somewhat unprecise but evocative fashion in this way : “The more a
symbol is elementary (low LD), the more is connected (high kout )”.
Figure 2.24: The LD(x) and kout (x) in the canonical (base 5) system (kin (x)
is also drawn). We observe the tendency of numbers with an high kout (the
peaks in the black graph), to have a low value (0 or 1) of LD (the minima
in the green graph).
A direct consequence is that the symbols that are from a cognitive point
of view more directly accessible are also the ones more frequently used. This
property seems a very natural trait of real symbolic systems and it is experimentally verified (See for example the experiment on the frequency of
numbers in the WWW described in [DMO05]).
2.5
Other functionals defined on Ω networks
In the preceding sections we introduced two functionals on Ω networks: π
and πC . The first is the number of elementary symbols, which is intended to
be a measure of the cognitive resources required to memorize symbols, and
the second represents the average computational length, and is a measure
of the cognitive cost of constructing numbers starting from their elementary
components. These are only two aspects of the complexity of Ω networks,
other aspects can be captured introducing other functionals, like for example
the entropy of the degree distribution P (k) [CRTVB07]. The entropy of P (k)
is a measure of the heterogeneity of the degree function, and it is expected to
2.5. OTHER FUNCTIONALS DEFINED ON Ω NETWORKS
49
Figure 2.25: In the Ω network of a canonical base 4 system, we calculated
the number of nodes with fixed values of LD and kout . The graph shows the
tendency of numbers with a low LD to be highly connected, and viceversa.
Figure 2.26: In the Prime system there is a strong correlation between LD
and kout . Notice that the relevant values of LD are particularly low.
be high in systems with a disordered network topology, reflecting the presence
of a large number of rules.
50
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
Figure 2.27: The residual correlation between LD and kout in a random
system is due to finite size effects.
2.5.1
Entropy of the Degree distribution
In directed networks the total degree is the sum of the inner and outer degree
k = kin + kout .
As we already know, in Ω networks the inner degree can only be 0 or 1,
so we concentrate on the external degree only and on its probability distribution P (k): this is the probability that a certain node for a number has
exactly k outgoing links (when we pick number nodes at random with uniform measure). The entropy of the degree distribution P (k) is a measure of
the heterogeneity of the degree function. The entropy of P (k) as a function
of the size was studied extensively in canonical systems with several bases,
ranging from 2 to 100. We report in Fig.2.28 the results for the first small
bases.
2.6
Comparison of different models
Now we briefly compare some of the models introduced in this Chapter. We
already discussed the behavior of the functional π in the different models in
2.4.1.
51
2.6. COMPARISON OF DIFFERENT MODELS
Entropy of P(h,k)
(rescaled)
entropy/(number of elementary symbols)
0.4
Base2
Base3
Base4
Base5
Base6
0.3
0.2
0.1
0
0
1
2
3
log(N)/log(Base)
4
5
Figure 2.28: The entropy of P (k) of canonical systems with bases from 2
to 6. A suitable rescaling of the dependent variable is introduced in order
to compare the different datasets. We do not get a clear indication on the
complexity of a numeral system from these kind of statistical measures on
the network topology.
2.6.1
Holistic and Unary
Let us call ωH and ω1 the networks associated respectively to the Holistic
and the Unary system. We already know that:
π(ωH ) = N,
π(ω1) = 1;
and it is easy to see that
πC (ωH ) = 0,
πC (ω1 ) =
N(N − 1)
2
from definition 5. Although in a trivial case, we see from these formulas that
there is a sort of compensation that is typical of all symbolic systems: if we
have many elementary symbols at our disposal the expressions are short and
viceversa.
2.6.2
Canonical and Positional systems
Let us call ωc and ωp the networks of respectively a canonical system and
a positional system with the same base B and same size N. As we already
observed in 2.4.1,
π(ωc ) = π(ωp ) = B + [logB N],
52
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
where the square brackets stand for the integer part. In both these systems
the behavior of πC is logarithmic (in an approximate way for the canonical, see Fig. 2.29). In particular, for positional systems we find an exact
expression:
πC (ωp ) = 2[logB N] − 1.
The reason why the canonical system exibits a more complex behavior is due
to the fact that 0 and the 1 are treated in different ways with respect to all
other numbers of the base, while in positional systems these number are all
treated on the same footing. This introduces some redundancies (as in the
case of 10 × 1 = 10) that are absent in canonical systems and account for the
inequality
πC (ωc ) < πC (ωp ).
5
4
Canonical base 10
Positional base 10 ( m = 2 )
Logarithmic fit ( m = 1.7)
ALD
3
2
1
0
10
100
1000
10000
N
Figure 2.29: The canonical system has a lower πC than the positional systems
of the corresponding base.
2.6.3
Primes
Let us call ωP the network associated to the Prime system. We already know
that, for the “prime number theorem” of Hadamard
π(ωP ) ∼
N
,
log N
where log is the natural logarithm and N is the size of the network. From
equation 2.3, and remembering that the relation between the function Ω(n)
and the logical depth is
Ω(n) = LD(n) + 1,
2.7. THE SPACE Ω
53
we find that11
πC ∼ log log N.
We want now to find this result with an heuristic argument, based on the
experimental observation of P (LD). The experimental distribution of the
LD is well fitted by a Poissonian (See Fig. 2.22). The mean (and variance)
of LD, πC = hLDi depends on the size N, and as long as the Possonian
approximation is good we can say that:
P (LD = 0) = e−πC
On the other side P (LD = 0) is equal to the probability that, picking
at random a number x < N this is prime so that asymptotically, for the
Hadamard theorem we have
1
P (LD = 0) ∼
log N
and, confronting the two expressions we find
πC ∼ log log N.
The average length of the representations in the Prime system is significanly lower than what we found in more natural systems, like canonicals or
the positionals, but this have a price in terms of an higher and diverging π
(unfeasible, following the Turing’s argument on the finiteness of the number
of elementary symbols of any realistic symbolic system).
2.6.4
Random
In random Ω networks extracted by some ensemble E (ε, p) (See Sec. 2.3.7)
the expected value of π is (See eq. 2.4):
hπ(N)i = 2 + ε (N − 2) .
The behaviour of πC as a function of N is not so easy to understand: it
depends on the values of (ε, p). A typical behavior is shown in Fig. 2.23, but
a study of this dependency requires further investigations. The P (LD) in a
typical situation is reported in Fig. 2.21, and shows a Gaussian behavior.
2.7
The space Ω
The space of (simple) Ω networks of a given size N is a very large one, and
grows with N at a very fast rate.
11
We do not take into account the constant B2 because we are only interested in the N dependency.
54
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
Figure 2.30: An artistic view of the space Ω. Notice that we draw the
representative point of Holistic, Unary and Prime systems on the boundary.
2.7.1
The size of the space of simple networks
Counting the number of microstates of a macroscopic physical system is one
of the fundamental concerns of Statistical Physics since its foundation by
L.Boltzmann, and it has been recently recognized that a similar approach is
relevant in the context of network ensembles [Bia08] [AB09] [GBPV08].
We want to estimate here the number of simple Ω networks of a given
size, that is the cardinality of the space Ω. This is a “zero-degree” ensemble,
because no other quantity with a “macroscopical” meaning like π or πC is
restricted to some fixed value.
Let us consider the space of all simple Ω networks of size N. This space,
that we will denote with ΩN in the following, contains a huge number of
points (ω) as N grows. Let us define the volume of ΩN as its cardinality:
V (N) =| ΩN | .
We know that Ω networks are organized into Triples T and, since every
Triple is in principle independent from each other, the V (N) is given by
V (N) =
N
Y
{1 + ζ(x)}
x=2
where the product starts from 2 because 1 is always an elementary symbol,
and the factor {1 + ζ(x)} needs an expanation. The contribution 1 accounts
for the possibility that x is an elementary symbol; the function ζ(x) is the
number of all possible ways in which x can be represented in terms of an
addition or a multiplication of two smaller numbers.12 Separating the two
12
Addition by zero and multiplication by one are excluded.
2.7. THE SPACE Ω
55
contributions:
ζ(x) =
x
+ ν(x),
2
where ν(x) is the number of possible ways of splitting x into two (nonnecessarily primes) factors.13
So we have:
N Y
x
1+
+ ν(x) ,
V (N) =
2
x=2
and taking the logarithm we obtain the “entropy” of the network ensemble
ΩN :
N
X
x
log 1 +
S(N) = log V (N) =
+ ν(x) .
2
x=2
We show in Fig. 2.31 how huge can be V (N), already for small sizes like
N ≤ 20.
Figure 2.31: The Volume V (N) function as a function of the size N, compared
with the same function (the number of microstates) for a chain of N spin.
13
This function is really hard to compute, and we calculated it explicitly for the first
few numbers.
56
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
Due to the astronomical growth of V (N) it is particularly difficult to
control the dynamics in the space of Ω networks. We could ask for example:
starting from a disordered network, and rewiring it at random accepting the
move if it lowers a certain functional, can we reach ordered networks ? We
define in next section the rewiring dynamics.
2.7.2
Dynamics in the Ω space
The possible ways to rewire a single Triple T (x) in an Ω network are three
• raise a x to the elementary symbol status;
• lower the elementary symbol x to a non-elementary status building a
new Triple T ′ (x);
• rewire the input links and/or the operation of T (x) obtaining a new
Triple T ′ (x).
All these operations are represented in Fig. 2.32.
2.8
Conclusions
In this Chapter we begun studying numeral systems from the point of view
of complex systems. This has been done first introducing a suitable network
representation, that takes into account how the numerical concepts are organized into a system, making explicit the way composed numbers are built
on elementary ones and by which arithmetical operations. Then, after illustrating how to construct such networks from linguistic data or from abstract
auxiliary models, we started describing the tools to analyze these networks
quantitatively. These tools have been developed in the theory of graphs and
of complex networks, and others, specific to Ω networks have been introduced
by us. We studied the statistical and topological properties relevant in this
contex, and the observed quantities gave many valuable information into the
organization of these systems. We realized that Ω networks exhibit a very
rich structure and are a solid framework for the quantitative study of simple
symbolic systems. All the observables we considered in this Chapter, expecially π and πC described isolated aspects of the complexity of these systems,
but no one of these captured at once all these aspects. The statistical and
topological global properties usually studied in the context of complex networks, like the degree distribution P (k) gives important informations on the
structure of these systems, but it is hard from this to have precise insights
2.8. CONCLUSIONS
57
Figure 2.32: There are three possible ways to rewire a certain Triple Ω network. The first (a) is promoting a composed symbol (green) to elementarity,
eliminating the incoming link, the operational node and the other links of its
original Triple. The second (b) is to downgrade a certain elementary symbol
to a composed one, building up a new Triple (we show only the new incoming link). The third (c) is to substitute the original Triple with another one.
This operations can be completely random or can be driven following the
minimization of some suitable functional φ defined on Ω.
on the rules that are behind their construction. This observation was the
motivation for a new approach to the complexity of numeral systems, that
we will describe in the remaining of this Theses.
In Chapter 4 we will propose to define the complexity of a numeral system
through a procedure, that we will call reduction, that exploits systematically
the “redundancies” present in simple Ω networks, in order to reach an equivalent, but shorter, network representation. In order to define this “reduced”
network we have to enlarge the space of simple Ω networks: this will be done
in the next Chapter.
58
CHAPTER 2. A NETWORK MODEL FOR NUMERAL SYSTEMS
20
number of elementary symbols
ALD
15
10
5
0
0
200
400
M
Figure 2.33: During a rewiring simulation (M is the number of iterations),
following the π gradient (blue line) an elementary symbol (LD = 0) which
is an hub of the network is rewired into a composed one. Its LD raises from
zero, causing a big leap in the πC value (green line).
Chapter 3
Development of the formalism
Until now we have studied simple Ω networks, organized into Triples with
the characteristic structure drawn in Fig.2.1. In these networks a circular
node represented an individual symbol for a certain natural number. Now
we want to generalize this network structure, allowing for circular nodes to
contain sets of numbers, organized into a hierarchical structure of Categories.
The space obtained in this way will be called Ω (without the specification
simple, is the true Ω space) and contains all simple networks, and much more.
The elements of this new space are always organized into Triples (generalized
ones), and the operations inventory is the same as for simple networks, but
their meaning is different. As we will see, the meaning of such an operation
is that the arithmetic operation involves all the elements contained into the
first Category with all the elements contained into the second.
The generalized Ω networks are always obtained from simple ones (representing concrete numeral systems), through a recursive transformation called
reduction, that will be the object of of the next Chapter. We develop here
the necessary formalism in order to describe these generalized networks.
3.1
3.1.1
Generalization of Ω networks
Categories
Until now a circular node contained a single number. In view of a future
development of the theory (see Chapter 4), we allow a circular node to contain
sets of numbers, organized into a hierarchical structure. We will call a node of
this kind a Category for reasons that will be clear later, but the concept is that
Categories can contain numbers (and other smaller Categories) that have the
same role in the system, or are constructed following similar computational
59
60
CHAPTER 3. DEVELOPMENT OF THE FORMALISM
paths.
The most general Category (See 3.1) contains sets of numbers and of
other smaller Categories.
Definition 6 Suppose that a certain Category C contains the numbers
x1 , x2 , . . . , xr
and other Categories
C1 , C2 , . . . , Cs .
The description of a Category C is the list of its inner element at the first
nested level
C = {x1 , x2 , . . . , xr , C1 , C2 , . . . , Cs }
and its length is
l(C) =| C |
The inner Categories C1 , C2 , . . . , Cs can have a rich inner structure, but
they contribute to the descriptional length l(C) as much as a single number.
A circular node, containing only one number x will be sometimes called
self − Category and labelled with Cx .1 . When there is no need to specify if
we are referring to a number or a Category we refer generically to an element
e of the network.
Figure 3.1: A Category C0 and its inner structure. In the description of C0
the inner structure of C1 is not considered: it stops at the first nested level.
It is useful to consider an operation on a Category that destroys its inner hierarchical structure, reporting all numbers contained in every inner
Category at each level, on the “surface”. This operation is called lysis 2
1
2
It is somewhat pedantic but we will find it useful in the future.
In analogy with the bacterial lysis in biology.
3.1. GENERALIZATION OF Ω NETWORKS
61
Definition 7 (Lysis of a Category) The lysis of a Category C is obtained
popping its inner Categories at every nested level. The result will be called
C lysed .
For example if we have C = {x1 , C1 }, C1 = {x2 , x3 , C2 } and C2 = {x4 , x5 }
we have
C2lysed = C2 ;
C1lysed = {x2 , x3 , x4 , x5 }
C lysed = {x1 , x2 , x3 , x4 , x5 }
Definition 8 (Equal and Identical Categories) Two Categories (C1 , C2 )
are equal if their lysed counterparts contain the same symbols. They are identical if their Category structure at all nested levels is the same.
Obviously two identical Categories are also equal, but in general the converse is false. For example C and C lysed in the preceding example are equals
but not identical. We will use this distinction during the description of the
reduction transformation R in Chapter 4.
A Category C that, in an Ω network, is not contained in any other Category is called external. When a certain Category C appears only as input in
a certain number of Triples (or is isolated) we will call it an input Category.
They are the generalization to Categories of the elementary symbols (2.2.1).
The input Categories are the only ones really relevant in the description of
an Ω network.3 All the others Categories are called output Categories.
3.1.2
Generalized Triples
The Triples in which circular nodes contain only one number will be called
simple; the Ω networks composed of simple Triples are called simple too.
Simple networks have a great importance; when we build and Ω network
from a numeral system, (See Sec.2.3) we build a simple network: they are
the starting point of all our analysis. In general Ω networks are composed
of Triples that are not simple, in which the circular input and output nodes
can be Categories.
When we consider a simple Triple it is clear what is the output of a certain
operation, given its input: it is an arithmetical fact. But when we consider
general Triples (See Fig.3.2) the situation is ambiguous: “What is the result
of the addition or a multiplication between two Categories ? ”.
3
This is evident for simple Triples like the one in Fig.2.1, but the general case requires
clarifications. This is an important point and it will be stated precisely in Sec.4.2.4.
62
CHAPTER 3. DEVELOPMENT OF THE FORMALISM
Figure 3.2: A generalized Triple has the same form of a simple Triple. The
difference is that the arithmetical operation involves Categories, that can
have an arbitrary complex inner structure (The meaning of the operation in
this case will be explained in the next Chapter, see the text).
If the input Categories C and D contain only a set of numbers
C = {x1 , x2 . . . , xr }
D = {y1 , y2 . . . , ys }
than it is still possible to give a meaning to the operation C ⋆ D = E,
posing
E = {x1 ⋆ y1 , x2 ⋆ y1 , . . . , xr ⋆ ys } .
But, when Categories with a more complex inner structure are involved,
it is not clear how we can define the ⋆ operation. This is as we will see in
Chapter 4 an apparent problem, because all the non-simple Ω networks we
will consider are a result of certain tranformations R applied on an initial
simple ω, and during this transformation process the meaning of the operations on input Categories, and the resulting output Category is always well
defined.
3.1.3
Description of a generic Ω network
A general Ω network can be constitued of both non-simple and simple Triples.
We can see an example in Fig. 3.3.
In this Section we define its description, establishing some terminology
useful for future developments. Let us call the size N of a network ω ∈ Ω the
number of its elements (among single numbers and Categories): | ω |= N.
The description of ω has two contributions: one from the wiring diagram,
3.1. GENERALIZATION OF Ω NETWORKS
63
Figure 3.3: A generic Ω network. There are four Triples, and seven input
Categories (Ca , Cb , Cc , Cf ) are self-Categories, the others (A, H, I)
have an inner structure: A = {g, f }, H = {n, o, C}, I = {D, E}. The
description of this network is given by it wiring diagram, i.e. the list of its
Triples, and the description of its input elements.
describing the input, the output and the operation of each Triple, and one
from the description of the Categories (as it was defined in (6)).
The description of the wiring diagram is the list of the Triples of each
element e of the Ω network. The Triple of the generic element e is
T (e) : {e = f ⋆e g} ,
where e, f are the inputs and ⋆e the operation.
We take as descriptional length of the wiring diagram the length of the list
of Triples, that coincides with the number of operational nodes. This number
is called π⋆ and it is easy to see that, if we denote with πI the number of
input elements
π⋆ = N − πI
64
CHAPTER 3. DEVELOPMENT OF THE FORMALISM
Suppose that in ω there are the input elements
{e1 , e2 , . . . , er }
Definition 9 The length of the input elements of an ω network is
π=
X
l(ei )
(3.1)
ei
In simple networks the length of the input elements coincides with the
number of elementary symbols.
Definition 10 [Description of ω and its length] The description of ω ∈ Ω is
given by the list of its Triples and the description of its input elements. The
length of this description is
L = π + π⋆ .
(3.2)
3.1.4
Other concepts related to Ω networks
Definition 11 (Similar Triples) Two Triples T1 , T2 will be called similar
iff they share at least an input element e and their operational node contains
the same operation.
Once we fix an element e and an operation ⋆ we individuate a set of
similar Triples.
Definition 12 (Offspring) The Offspring of an element e (O(e)) is the set
of all elements, not including e, that are reachable from e following some link,
through a finite number of operations.
Definition 13 (Ancestors) The Ancestors of an element e (A(e)) is the
set of all elements, not including e, that are reachable from e following some
link, in a finite or eventually infinite number of operations.
We introduce also a generalized version of both offspring and ancestors
of a given element e (Og , Ag ), both including the element e itself.
The following definition regards only simple networks. It is a fundamental ingredient of the notion of “logical (or symbolic) distance” that will be
introduced in (3.2).
Definition 14 (Common ancestors) The common ancestor z of two numbers x, y is the biggest number in the intersection of the generalized ancestors
of the two elements
z = max {Ag (x) ∩ Ag (y)}
3.2. DISTANCE BETWEEN SYMBOLS
65
Figure 3.4: The common ancestor of the elements x and y. It is an essential
ingredient for the definition of a distance between symbols.
Definition 15 (Relative distance) The relative distance between two elements (e, f ), denoted as RD (e | f ) is the minimum number of operations
that are needed in order reach e starting from f .
This is an oriented, asymmetric distance. It will be used only in its
symmetrized version, when we will define the “logical distance ” in (3.2).
3.2
Distance between symbols
Distance between Categories, in particular we look at the distance between
numbers. We consider only simple networks in the examples.
Definition 16 (Logical distance) We define as logical (or symbolic) distance between two elements e and f , having g as their common ancestor, the
following function
d (e, f ) =
RD (e | g) + RD (f | g)
.
LD (e) + LD (f )
It is obvious that when a common ancestor exists the d satisfies
0 ≤ d(x, y) ≤ 1.
(3.3)
66
CHAPTER 3. DEVELOPMENT OF THE FORMALISM
When (e, f ) have not a common ancestor, their distance is undefined, and
we will pose d (e, f ) = 2, this happens in particular when (e, f ) are distinct
elementary symbols, or, more in general, for only-input elements.
Figure 3.5: We plot the Histograms of the P (d) (the distribution of the
distances) in the Canonical Base 3 system (blue) and, for comparison, in a
random system (red). We note that the distances in a random system are
systematically lower than in the Base 3 system. This is interpreted as a worse
discrimination power for the random system. < dC >= 0.507, < dR >=
0.247. Notice that for the Holistic system, that has the higher discrimination
power as a symbolic system this probability is P (d) = δ (d − 2).
67
3.2. DISTANCE BETWEEN SYMBOLS
1
Cumulative P(distance)
0.8
Positional (Base3)
Random
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
distance
Figure 3.6: The cumulative function, referred to the preceding Figure. We
note that P (d < 0.2) > 0.8 for random systems, while in Base 3 we have
P (d > 0.2) > 0.9.
68
CHAPTER 3. DEVELOPMENT OF THE FORMALISM
Figure 3.7: Logical distances in a positional (base 4) system are compared
with distances in a random system (Size for both systems N = 256). We
plotted the distances d(x, y), where the number x is also the abscissa, and
y is a number at a numerical distance δ from x: | x − y |= δ. We plotted this function as a function of x for several values (1, 2, 3, 4, 5) of δ for
both a Canonical base 4 and a random system. The peaks at d = 2 are in
correspondence of the maximal possible logical distance between (x, y) that
is realized when they have no common ancestor. The logical distances in a
random system are systematically smaller.
Chapter 4
Reduction of redundancies
In mathematics a transformation could be any function from a set X to
itself. However, often the set X has some additional structure (geometric,
algebric) and the term “transformation” refers to a function from X to itself
which preserves this structure. Examples include linear transformations and
affine transformations such as rotations, reflections and translations. More
generally a transformation in mathematics is one facet of the mathematical
concept of function; the term mapping is also used in ways that are quite close
synonims. In this sense the term transformation only flags that a function’s
more geometrical aspects are being considered, and a special attention is paid
to invariants.
In this Chapter I will introduce a transformation R on the space Ω that
will be called reduction. R preserves the Ω-structure of the elements ω on
which is applied
R:Ω→Ω
and is reversible, i.e. R−1 is always defined. The latter transformation
will be called separation and will be called S.1 The aim of R is to exploit the
regularities contained in ω, conceived as redundancies in its wiring diagram,
in order to construct a more compact but equivalent representation of it.
The equivalency is defined in (19) and is guaranteed by the reversibility. The
fact that R preserves the Ω-structure allow the composition of an arbitrary
number of R transformations; the points ω-s visited in this way form an orbit
in Ω.
The idea behind R was inspired by a well know data-compression algorithm, non-sequential recursive pair substitution (NSRPS), introduced for
1
Altough we are interested mainly in numeral systems, R can be applied to any symbolic
system that can be represented through an Ω network (See Sec. 2.1). This will be clear
from its description in this Chapter.
69
70
CHAPTER 4. REDUCTION OF REDUNDANCIES
sequences of symbols by Jiménez-Montaño, Ebeling and others [WM80], and
subsequently refined by Peter Grassberger [Gra02].
The NSRPS algorithm is simple to describe: it searches through the initial
sequence the most frequent couple of consecutive symbols, and substitute
this couple with a new symbol, created for the purpose. Than repeat this
transformation iteratively, until the length of the sequence (plus the length
of a suitable description of the substitutions) reaches a minimum, so that no
further substitutions can improve the compression.
We adapted the basic idea of NSRPS to Ω networks. But in doing that
we faced the subtleties of dealing with a network structure, and it has been
necessary to define new concepts in order to describe our manipulations. The
definition itself of Ω network, of its constituents (Categories) and the other
concepts that we will introduce in this Chapter, are in large portion the result
of this effort. The transformation R depends on two parameters: an element
e of ω and an operation ⋆ to which e is connected. We can iterate R until we
reach a point in which there are no more redundancies to exploit. The main
difficulty in this context is contained in the following question: “How can we
choose the sequence of reductions (the sequence of (e, ⋆) ) in order to exploit
in an optimal way the redundancies contained in ω ? ”. The answer to this
question is not easy, but we proposed a greedy algorithm (as it is the NSRPS)
that functions well in simple cases. The result of a complete reduction (that
is the final point of an orbit) will be called “reduced” network and will be
marked with the superscript R.
The descriptional length L of an Ω network (See Def. 10 in Chapter 3)
now plays a crucial role. The value of this functional on a reduced network
ω R defined as the descriptional complexity of the original network ω. In this
way we can formulate the problem of the search of the reduced network ω R
as the minimization of the functional L.
This Chapter is organized as follows. In the first Section (4.1) we will
explain the functioning of the NSRPS algorithm. Than in Sec. 4.2 we will
describe the R transformation in great detail and the greedy reduction algorithm. In Sec. 4.3 we will depict the general aspect of the reduced networks
ω R and introduce for them a suitable description. We describe its elements
and their meaning in our framework. In the final Section (4.4) we will introduce the complexity functional(s) for Ω networks and discuss the reasons for
this choice. Until now we did not really define what a ω R is, we simply stated
that is the final point of a sequence of R transformations. The complexity
functional permits us to define clearly the concept of reduced networks as
the solutions of a minimum problem.
71
4.1. THE NSRPS ALGORITHM
4.1
The NSRPS Algorithm
The NSRPS algorithm has been studied and preciselydefined by P.Grassberger
as a tool for data compression and entropy estimation [Gra02]). He deduced
some important properties of the method and used it to estimate the entropy of the written English. The results in [Gra02] and the conjectures
made therein have been invesigated in a subsequent paper [BCG06] in a rigorous setting.
Our aim here is to describe its basic idea and functioning, with the explicit
intention of introducing the common features with our R tranformation.
Let us call the original sequence σ
σ = s0 s1 . . .
built from the symbols of a finite alphabet {αi} (i ∈ {0, . . . , m − 1}) of
size m.
We count the numbers njk of non-overlapping consecutive pairs of symbols
in σ where st = αj and st+1 = αk , and find their maximum
nmax = max njk .
(j,k)<m
The corresponding index pair is (j0 , k0 ). Then we create a new symbol
correspondig to the concatenation
αm = (αj0 αk0 )
(4.1)
and form the sequence σ 1 by replacing everywhere the pair αj0 αk0 by αm .
For the special case j0 = k0 , any string of 2r + 1 symbols αj0 is replaced
by r characters αm , followed by one αj0 .
This is the elementary step of this transformation - let us call it S - that
is repeated recursively: the sequence σ i+1 is obtained from σ i by replacing
the most frequent pair αji αki by a new symbol αm+i .
The procedure stops when the length, consisting of both a description of
σ i+1 and a description of the pair (ji , ki) is definitely longer than the corresponding description of σ i , for the present and all subsequent i.
We can see the sequence σ i as an orbit in the space of sequences S. All
the points in an orbit are equivalent in that they represent exactly the same
sequence, because the substitution S is invertible. 2
2
Remember that the description of the pairs is to be considered as a part of the description of the sequence.
72
CHAPTER 4. REDUCTION OF REDUNDANCIES
Figure 4.1: The substitution of a frequent pair (a, b) with the new symbol c.
At the end of this process the length of the final sequence and the substitution is an estimate of the entropy of the sequence.
Unfortunately we could not touch the very interesting questions related
to the transformation of the statistical properties of the sequences induced
by the substitution. The results in [Gra02] and [BCG06] showed that the
NSRPS is effective as a tool for data compression and entropy estimation.
This conclusion is based on the fact that Markov sequences are a attractive
fixed points for the S transformation. We think that this aspects can be
fruitfully investigate also in Ω networks, and we hope to turn back on this
subject in a future research.
4.2. THE REDUCTION TRANSFORMATION R
73
Figure 4.2: The orbit of NSRPS.
4.2
The reduction transformation R
The transformation R depends, as we said, on two variables (e, ⋆), where e
is an element of ω and ⋆ an operation. We will suppose for the moment that
these are given, and describe the generic R (e, ⋆).
The R (e, ⋆) takes all the similar Triples (the ones individuated by the
couple (e, ⋆)), and is in itself articulated in elementary steps. It start considering two similar Triples (it is not important how they are selected) and
tries to reduct them together preserving the Ω-structure. The result of an
elementary reduction is the creation of a new Triple and new Categories (
See 2.2.2). That is the analogous of the creation of a new symbol in a sequence, from a couple of consecutive and frequent ones. We describe now
the elementary reduction step.
4.2.1
Reduction of two simple Triples
Let us suppose that the two selected Triples are T1 : {z1 = x ⋆ y1 } and T2 :
{z2 = x ⋆ y2 }.
Here the number x acts like a Pivot (See 4.2 for its definition), around
which the reduction is going on.
When we reduct two Triples together several rearrangements are performed in the network, reguarding both the wiring diagram, and the nature
74
CHAPTER 4. REDUCTION OF REDUNDANCIES
and number of the nodes, in order to preserve the Ω-structure
• the two Triples are destroyed. This means that all the links and the
operational node are removed;
• two new Categories X and Y are created.3 The first contains the input
symbols of the destroyed Triples y1 and y2 , the second contains the
output ones z1 and z2 ;
• a new Triple is created, involving the pivot x and the new Categories,
and a new operational node;
Figure 4.3: The reduction of two simple Triples. Two new Categories are
formed Y = {y1 , y2 } and Z = {z1 , z2 }. The two operational nodes have been
identified and their total number is diminished by one.
4.2.2
Transformation R (e, ⋆)
Now that the elementary reduction step is defined, we must describe how to
perform all the reductions inside the set of similar Triples held by (e, ⋆).
The order with which the elementary steps within R (e, ⋆) are executed is
established by the following rule. If the similar Triples are named {T1 , T2 , . . . , Tr }
3
In reality they are created only if they are not yet present in the network. At this
point this is certainly so.
4.2. THE REDUCTION TRANSFORMATION R
75
the first step is the reduction of {T1 , T2 }. These are then destroyed, the new
T1 is the new Triple, and the remaining ones T3 , ·, Tr are renamed T2 , ·, Tr−1.4
We need now to make two points clear
• As long as the steps within R (x, ⋆) are going on, the newly formed
Categories and Triples are temporary. It is well possible that, say in the
second step, the Categories created in the first step are be destroyed. If
the second step involves the newly formed Triple and T3 = {z1 = x ⋆ y3 }
formed Triple and the Categories Cin and Cout are destroyed;
• After that the R (x, ⋆) is completed the situation is like the one in Fig
(4.4). At this point the new Triple is definitively created. The newly
formed Categories too are created, if they were not already created
during another R transformation. In this case the effect of the creation
is just the update of the Category’s input and output degrees.
Figure 4.4: The similar Triples after R (x, ⋆). There is the possibility that
not all the Triples can be reducted together (See Sec. 4.2.5), in this case they
are left unreduced.
4
We are aware that other choices are equally possible.
76
CHAPTER 4. REDUCTION OF REDUNDANCIES
4.2.3
Reduction of two general Triples
The reduction of two general Triples goes along the same lines described for
simple Triples. Suppose for example that at a certain point during the reduction process we must reduce T1 : {Cγ1 = Cα ⋆ Cβ1 } and T2 : {Cγ2 = Cα ⋆ Cβ2 }
where all the Catgories involved can have an arbitrary inner structure (See
Fig. 3.1).
The result is the same as for simple Triples and is depicted in Fig. 4.5.
Figure 4.5: The reduction of general Triples goes along the same lines of the
simple Triples.
4.2.4
Reversibility of R and separation
During the reduction process the description of the ω network, defined in Sec.
4.2.1 evolves. New Categories appears, new Triples are formed, and old ones
are destroyed. A part from the descriptional length, that guides us in finding
an optimal network (See Sec. 4.4), the description enters in a fundamental
property of the transformation R, that is its reversibility.5
It is clear from the definition of the elementary reduction step in 4.2.1
that if two Triples T1 and T2 are fused together in a new Triple T3 , we can
always reconstruct the original Triples from the description of the Categories
and the wiring diagram of T3 . This process will be called separation.
5
A similar consideration can be given for the NSRPS algorithm: the S transformation was reversible only if a description of the substitutions was maintained during the
compression.
4.2. THE REDUCTION TRANSFORMATION R
77
Definition 17 (Separation) The separation S is the inverse transformation of the reduction
S = R−1
defined for all (e, ⋆)
The tranformation R is now defined, but there is still a problem that
will force us to define a constraint on the possibility of reducing two similar
Triples. We will illustrate it for the Unary system described in 2.3.3.
In the Unary sistem we have only one number with a large kout that is 1,
and this is also the unique elementary symbol. The obvious tranformation
that exploits the redundancies in its network is R (1, +) (although other
choices are possible). If we perform this reduction we obtain the network
reported in Fig. 4.6.
Figure 4.6: The Unary network after the (unconstrained) R (1, +). There is
left only one operational node, but it is an apparent reduction. In order to
build a number n we must pass through this operational node more than one
time (n − 1). Since we want our reduced networks to represent the cognitive
resources exploited to represent numbers, we will introduce a constraint on
the reducibility in order to avoid this kind of situations.
78
CHAPTER 4. REDUCTION OF REDUNDANCIES
Why then we are unsatisfied with this elegant result ? After all the Unary
system is very simple to define, and its network ω1 is very redundant and
deserve a description as compact as possible. In reality this simplicity does
not reflects the fact that the Unary system involves lengthy (and, for humans,
tedious) manipulations of the unique elementary symbol.
In order to represent the number 4 for example, we must necessarily pass
through the representations of all the lower numbers: looking at ω1 (Fig.
4.7) this fact is very clear. Instead, from its reduced version in Fig. 4.6, it
could seem that with only one operation we can reach any number.
Figure 4.7: The Unary system is irreducible, when we apply the constraint.
We are unsatisfied with the network in Fig. 4.6 because we want that
the operational nodes in the reduced network ω R represent the irreducible
computations needed in order to build any one of the composed symbols from
the elementary ones.
This is essential for us in order that the reduced networks reflects a complexity of the numeral system, taking into accounts not only their definition,
but also the effort in constructing higher symbols out of the lower ones. This
can be expressed in an elegant way referring to a causality principle. In the
next Section we formulate a constraint on the reduction step that forbids
violations of the causality when tranforming ω networks by means of our R.
4.2.5
The causality constraint
The constraint we put on the reduction of two similar Triples has the nature
of a causality constraint. We cannot reduct two Triples if in the newly formed
4.2. THE REDUCTION TRANSFORMATION R
79
Triple some element appears both in the new input and output Categories.
More in general, the reduction is forbidden if there is some element in ω
belonging to the ancestors of the two input elements that is also in the
offspring of one of the output elements.
In the Unary system for example, this constraint clearly forbids any reduction. Let us consider two arbitrary Triples: T1 : {k + 1 = 1 + k} , T2 :
{h + 1 = 1 + h} with k < h. Looking at Fig. 4.7 it is easy to see that a
reduction of T1 , T2 has as a consequence the violation of causality.
Another similar example is given in the ω network associated to the
Fuyuge numeral system (See Sec. 2.3.1). In we consider these two Triples:
T1 : {4 = 2 + 2}, T2 : {6 = 4 + 2}.
Figure 4.8: In the Fuyuge system that is substantially Base two the 6 and
the 4 are constructed in this way. These two Triples cannot be reducted
together, due to the constraint of causality.
we are faced exactly with the same problem as in Unary system, and the
reduction is impossible. The conclusion is that the computations represented
by the two operational nodes cannot be the same thing.
The causality constraint involves more general situations than the ones
we just showed. The general enunciate of the constraint is given in terms of
80
CHAPTER 4. REDUCTION OF REDUNDANCIES
the generalized Ancestors (Ag ) and Offsprings (Og ) (See definitions 12 and
13) of the input elements that are not in common between the two Triples:
Definition 18 (Reducibility) Two similar Triples ( i.e. involving the same
operation and sharing one of the input elements ) T1 , T2 are said to be reducible if and only if, denoting with e1 , e2 the input elements that are not in
common between them, the following conditions are both satisfied:
Ag (e1 ) ∩ Og (e2 ) = ∅
(4.2)
Ag (e2 ) ∩ Og (e1 ) = ∅
(4.3)
When two Triples are reduced the Offspring of the newly formed output
Category is the union of the Offsprings of the two output elements (z1 , z2 )
in the original Triples. Analogously for the Ancestors of the newly formed
input Category:
A(Cin ) = A(e2 ) ∪ A(e2 )
(4.4)
O(Cout ) = O(z1 ) ∪ O(z2 )
(4.5)
This is an highly non local constraint, and in certain conditions, expecially
for very large systems, can be computationally hard to check when two Triples
are reducible. We will not consider the computational complexity of the
reduction algorithm here.
4.2.6
The reduction algorithm
Until now we have described in full detail the reduction transformation R
and its effect to a set of similar Triples “holded” by a couple (e, ⋆). The other
Triples are left unaffected by R. But the redundancies are typically spread
over different places in a network ω and, as we will see, at different hierarchical levels. In order to exploit all these redundancies a single transformation
R is usually not sufficient: we must explore the landscape. But what is the
best strategy to do that ? In other words what is the algorithm (if it exists),
that gives us the right sequence of (ei , ⋆i ), or can explore stochastically the
landscape in order to exploit optimally the redundancies ? Suppose for the
moment that we have this algorithm. This will product as a final result an
ω network that is no more reducible. I call these ω-s reduced networks, but
still have not defined them. I will define the reduced networks in the last
Section, as solutions of a minimum problem.
In NSRPS the greedy algorithm was efficient, and there was a simple criterion to know when the algorithm must stop: when the descriptional length
4.2. THE REDUCTION TRANSFORMATION R
81
reaches its minimum. Here the situation is much more complicated, but we
will adopt a similar point of view, defining a greedy algorithm that generates
a sequence of (ei , ⋆i ), based only on the topological properties (specifically
the degree function) of the ω i-s that are visited during this process.
The reduction algorithm proceeds following these steps. Starting from a
simple ω network:
• firstly we find a set of elements (symbols or Categories) that are dominant in the network, in the sense that they have the maximal kout
(there can be more than one)
P = {e1 , e2 , . . . , er }
that we will call Pivots.
The Pivots are not necessarily elementary symbols or Categories, but
very often elementary or low-LD elements fall into P . We will usually
sort this set in a lexicographic order, but this is not that important.
The important thing to realize is that when we have more than one
Pivot it is like when we are at a crossing, and we have to choose which
way will be walked. We will discuss these “bifurcations”6 later;
• choose an element, say e1 from P , and an operation ⋆ among all operations in which e1 is involved. In concrete we can choose the operation
⋆
for which the kout
i.e. the number of outgoing links that are connected
with ⋆ operations is maximal
• consider all the similar Triples individuated by (e1 , ⋆) and apply R (e1 , ⋆)
• Return to the first step.
This process stops when one of the following conditions are satisfied
• ∀ element e: (kout (e) = 1) ∨ (kout (e) = 0) (this is realized only in very
special systems);
• ∀ element e with kout (e) > 1, there is no more reducible couple within
the similar Triples associated to each operation ⋆.
The greedy algorithm contains a stochastic part that is unavoidable, occurring when there are bifurcations due to the presence of more than one
Pivot.
6
This is not completely appropriate as a term, because the Pivots can be more than
two.
82
CHAPTER 4. REDUCTION OF REDUNDANCIES
There are virtually infinite possibilities to modify this greedy algorithm
with stochastic perturbations. One possibility is, for example, to extract the
Pivots with a probability distribution that gives high probability to elements
with an high kout and viceversa.
4.2.7
The orbits of R, and the (approximate) reduced
network
Starting from a certain ω the sequence of reductions gives a sequence of
networks
ω, R e1 , ⋆1 ω, R e2 , ⋆2 R e1 , ⋆1 ω, . . .
where ei , ⋆i is the sequence of Pivots and operations selected as described
in 4.2.6.
This sequence forms an orbit in the space Ω
ω, ω (1) , ω (2) , . . . , ω R
that we will call Γ(ω).
Since the reduction R, and in particular the elementary reduction step,
is reversible all the elemements lying on the same orbit are in this sense
equivalent: they all contain the same information.
Definition 19 (Equivalent networks) Two Ω networks ω1 and ω2 and
are equivalent if and only if they can be connected by a finite path of elementary reduction (or separation) steps.
This is an equivalence relation, and realizes a partition of the space Ω into
regions A1 , A2, . . . . These regions are invariant with respect to the dynamics
defined by R, and consequently any orbit is confined to the Ai that contains
the initial ω.
The networks belonging to a certain region Ai are all equivalent to a
certain simple network ωi, that can be taken as a representative element,
typically the simple one.
As we observed, there are possible bifurcations on the sequence; if we
consider simultaneously all the possible orbits we obtain a tree, and the
leaves of this tree will be called the (approximate) reduced network. We will
write :
ω∗R = R (ω) .
(4.6)
4.2. THE REDUCTION TRANSFORMATION R
83
The situation is typical of the optimization problems of in a complex
landscape. In the interesting situations we want to find the solution of some
complex optimization problem, but we do not have an exact algorithm that
gives this solution, due to the lack of insights on the problem, or intrisic
unfeasibility. Then we are forced to use approximate algorithms, based on
heuristics i.e. simple computational strategies that are phenomenologically
effective [GJ79]. A more complex computational strategy could be realized
performing stochastic separations and reductions, after an initially greedy
phase.
In the next Section we will describe the general aspect of the (approximate
or not) reduced networks, and will introduce the appropriate language for
describing them. But first we want to observe that the R transformation has
at least two relevant fixed points.
4.2.8
Holistic and Unary system are two fixed point of
R
Let us consider the Holistic numeral system, to which the ωH network is
associated. It is obvious that the transformation R in this case is the identity,
being ωH a set of isolated (elementary) nodes:
ωH → ωH .
Figure 4.9: The holistic system is irreducible
The same conclusion, but for a different reason, is reached in the case of
the Unary system, as we saw in the preceding Section. Therefore
ω1 → ω1 .
84
CHAPTER 4. REDUCTION OF REDUNDANCIES
So the R transformation has at least two fixed points. This makes sense
in the light of the discussion on the meaning that we want to give to reduced
networks.
Figure 4.10: The piece of an orbit of the Reduction algorithm. Starting from
ω the first four R (ei , ⋆i ) are represented. The green line represents a step
backward in the reduction, corresponding to a separation of a certain group of
similar Triples. The separation is not allowed, in the reduction algorithm we
have described, but it can be considered for different heuristics. Starting from
ω (2) there are several possible directions available. Our reduction algorithm
can choose randomly one of these. The Holistic and Unary systems are fixed
points of R, and this, instead is independent of the heuristics.
4.3
The ω R networks and their relevant quantities
Now we will suppose to have determined the real ωiR associated to ω. The
complete description of an Ω network, as it was defined in Sec. 3.1.3, requires
the descriptions of all input and output elements, besides the description of
4.4. THE COMPLEXITY FUNCTIONAL
85
the wiring diagram. We recall that the description of an element is a list
of its inner elements, and its length is their numerosity (See Sec. 3.1.1).
This is clearly overabundant, because in simple Ω networks the output of
a Triple is completely determined by its inputs and the operation. We can
always recover an arbitrary Ω network from its wiring diagram and its input
elements. This network is equal but not identical, because the inner structure
of the output Category is lost.
4.3.1
Irreducible operations: π⋆
Definition 20 (Irreducible operations) The number of operational nodes
π⋆ of ω R is the number of irreducible operational nodes of the original network ω.
The effect of a reduction on π⋆ is monotonic:
π⋆ (ω) ≤ π⋆ (R (ω)) .
If we have a lot of redundancies the π⋆ can be decreased very much, at a
low expense of π growth, due to the eventual definition of new Categories. In
ω R there can remain simple Triples, that we will call unreduced. Unreduced
Triples arise, depending on the cases, when some computational steps are
particularly “unreducible”, like for example in the case of the Unary system,
or when some parts of the network ω did not fit in the regularity schemas
emerged during the reduction process.
We finally note that πC is invariant along the orbit:
πC (ω) = πC (R (ω))
4.4
The complexity functional
Definition 21 (Descriptional Complexity) The descriptional complexity of an ω network is given by the descriptional length of its reduced network(s):
K = π + π⋆
(4.7)
This is the analogous of the descriptional length of a sequence S maximally compressed by means of a NSRPS algorithm. In particular the π⋆
contribution is analogous to the length of the compressed sequence | S R | (it
coincides with number of Triples); the π instead is the analogous of a coincise
description of the substitutions. This is substantially the amount of information that we need in order to recover without any loss the initial ω. There
86
CHAPTER 4. REDUCTION OF REDUNDANCIES
can be prefactors, but it is the most natural choice. The descriptional complexity does not always capture the information contained in πC , the average
logical depth. For example if the Triples are all connected, as in positionals, the system has a low number of input categories, but involves lenghty
computations with respect to a system in which all Triples are disconnected.
Definition 22 (Complexity) We will define complexity of a numeral system ω the following quantity:
C = K + πC .
(4.8)
Now we can finally define precisely the concept of reduced network:
Definition 23 (Reduced
network) Given a simple ω ∈ Ω, we call its ren o
R
duced network(s) ωi the solutions of the following minimum problem
min K
Aω
(4.9)
where Aω it the set of all networks that are equivalent to ω (in the sense
of definition 19).
We will study in the next Chapter the complexities of some network
models, and we finish this Chapter with an observation. If in a reduced
network there are unreduced Triples it means that they are fallen out of the
regularity schema recognizable by the reduction algorithm. In this sense they
can be considered at the same level of noise, and we have a non-ambiguous
criterion to distinguish regularities and irregularities in a numeral system,
and this is very useful for defining complexity. A description of this unreduced
Triples does not deserve the degree of precision used for the description of
the regular part, so we could adopt a statistical description for the irregular
part. This amount to a constant term, that stands for the description of the
statistical ensemble from which this Triples are imagined to be extracted. In
this way we obtain an interesting result: our complexity definition will be
low for very regular systems and for random ones, it will be high only for
intermediate systems, with intricate rules.
Chapter 5
Complexity of numeral systems
In this Chapter we apply the transformation R to several numeral systems,
finding their ω∗R (the final points of the R orbits) and their complexities.
This results are based on the application of the greedy algorithm described
in Sec. (4.2.6)
We will discuss in great detail the functioning of the greedy algorithm
in Sec. (5.2) for the Italian and French natural language numeral systems,
based on their network representations introduced in Sec. (2.3.2). A special
attention will be given to the evolution under the transformation R of the
degree function, from which the sequence of the Pivots depends. At the end
of this section we will describe the ω∗R that we have found in the two cases,
stressing their differences, and the interpretation of their input Categories.
Then in the last section (5.3) we consider positional systems, which network representation was introduced in (5.2), for all the bases B ≥ 2 and all
the sizes. In this case we find and explicit formula for the complexity as a
function of the base B and of the size N (the latter is the highest number
that has a symbolic representation). The complexity features an interesting
behaviour: for each fixed N it has a minimum for small (but not too much)
bases, with a very slow dependency on N.
Since positional systems are an abstract model that approximates very
well the developed natural language systems (this means that the way higher
symbols are constructed with lower ones are similar), we think that this is a
qualitative argument showing that the use of small (but not too much) bases
in numeral systems all over the world is not only the result of historical
accidents, but responds also to the objective need of simplicity of a cognitive
system. [CVA03]
87
88
CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS
5.1
Holistic and Unary
Let us pose | ω |= N; the complexity of Holistic system has only one positive
contribution from π; this is the length of input Categories, that in this case
coincides with the number of elementary symbols (See Sec. 2.3.3). This is
equal to the sistem’s size
C (ωH ) = N.
For the Unary system π = 1, π⋆ = N − 1 (See Sec. (2.3.3)), and finally
πC =
N
N −1
1 X
.
k=
N k=1
2
So the complexity of Unary system is asymptotically
3
C (ω1 ) = N.
2
Prefactors in complexity formulas are not very important, the essential is
the functional form. We observe that both the Holistic and Unary system’s
complexity grow linearly with the system’s size. This is a rapid growth if
compared with more familiar systems, like positionals, in which on has a
logarithmic behaviour, as we will see in Sec. 5.3.
5.2
Complexity of the Italian and French system
The aim of this section is to describe in detail the functioning of the greedy
reduction algorithm. In order to do that it is better to consider small networks (N = 100).
Italian
The starting point is the simple ωItalian that we described in Sec. 2.3.2 (we
will call it ω in the following). Its degree function is reported in Fig. 5.1.
• The first Pivot is (as we could expect) the base 10 with kout = 17, and
is the unique Pivot at this point.
Than we choose an operation among {+, ×}; the greedy algorithm
leaves the freedom to choose random, for example +.
The similar Triples are nine: T1 : {10 = 10 + 1}, T2 : {11 = 10 + 2},
. . . , T9 : {19 = 10 + 9}.
5.2. COMPLEXITY OF THE ITALIAN AND FRENCH SYSTEM
89
20
k_out(x)
15
10
5
0
0
20
40
x
60
80
100
Figure 5.1: The degree function kout (x) of the simple ω associated to the
Italian system. We note that the number 10 is the only one Pivot with
kout = 17.
The reduction of these Triples is denoted as R (10, +) (See Sec. 4.2.6).
The effect of R (10, +) is the creation of two new Categories: C0 =
+
{0, 1, 2, . . . , 9} and C10
= {11, 12, . . . , 19} (See Fig. (5.4), quadrant 1
(1)
); this is the ω .
• At this point the Pivots are {2, 3, . . . , 10, 20, 30, . . . , 90} (See Fig. 5.2),
and the greedy algorithm enters in the first bifurcation (See Sec. 4.2.7).
Suppose that, for example, 3 is choosen as the next Pivot, with the +
operation. The transformation R (3, +) leads to the creation of two
new Categories:
Cd = {20, 30, . . . , 90}, and C1 = {23, 33, . . . , 93} (See Fig. 5.4, quadrant 2). The Category C1 could seem strange, but there is nothing
wrong in principle with the creation of such Category.
• The Pivots are now {2, 4, 5, . . . , 10} (See Fig. 5.3); we will suppose
that the following reductions are R (2, +) , . . . , R (9, +) (See Fig. 5.4,
quadrant 3 ). Every one of these reductions creates a new Category
Ci and “recreates’ ’ Cd , that is not really created two or more times: a
new outgoing link is added to the unique Cd every time.
• The other reductions are described in the Figures 5.4, 5.5) .
In the following figure we report the sequence of reductions and describe
the formation of the new Triples and Categories.
90
CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS
12
k_out(x)
10
8
6
4
2
0
0
10
20
30
40
50
x
60
70
80
90
100
110
Figure 5.2: The degree function kout (x) of the network R (10, +) (ω) = ω 1.
We note that there is a large number of maximal-kout (x) elements (Pivots).
One of these is still 10 and its kout is diminished, because its similar Triples
have been reduced together in one big Triple (See also 5.4). We have assigned
+
to the newly formed Categories C0 and C10
the index 100 and 101 respectively.
+
The Category C0 acquire a kout = 1 and C10
remains at kout = 0 because they
are respectively input and output Categories of the newly formed Triple.
The reduced network ω∗R that we obtain for the Italian system is represented in Fig. .
From the reduced network of the Italian system (Fig. 5.6) we can see that
π = 21 and π⋆ = 2. Numerically we find that πC = 3.5 and consequently:
K = π + π⋆ = 23,
C = K + πC = 26.5.
French
The discussion in the case of the French system proceeds along the same lines,
and we will not reproduce it here. I just report the sequence of reductions.
• Pivots = {60, 80}, R (60, +), Cv is created;
• Pivots = {80}, R (80, +), Cv raises its degree;
• Pivots = {10}, R (10, +), C0 is created;
• Pivots = {20}, R (20, +), C0 raises its degree;
5.2. COMPLEXITY OF THE ITALIAN AND FRENCH SYSTEM
91
12
10
k_out(x)
8
6
4
2
0
0
10
20
30
40
50
x
60
70
80
90
100
110
Figure 5.3: The degree function kout (x) of the network R (3, +) (ω 1 ) = ω 2 .
We note that not all the elements which were Pivots in ω 1 behaved in the
same way. Taking into account that 3 diminished its kout (x) for the reduction
of its similar Triples, there are two groups: A = 2, 3, 4, . . . , 9 and B =
20, 30, . . . , 90. The 10 forms a group apart (C) because all its remaining
links are connected to a × operation.
Figure 5.4: The sequence of reductions 1−9 for the Italian system; the newly
formed Categories and Triples at each step are represented.
92
CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS
Figure 5.5: The sequence of reductions 10 − 13 for the Italian system; the
newly formed Categories and Triples at each step are represented.
• Pivots = {30, 40, 50}, R (30, +),R (40, +),R (50, +),
C0 raises its degree;
• Pivots = {10}, R (10, ∗), C0′ is created;
• Pivots = {C0 }, R (C0 , +), Cd′ is created;
• Pivots = {Cv }, R (Cv , +), Cs is created;
The reduced network ω∗R for the French system is reported in Fig. 5.8.
From the reduced network of the French system (Fig. 5.8) we can see that
π = 44 and π⋆ = 4. Numerically we find that πC = 4.5 and consequently:
K = π + π⋆ = 48,
5.3
C = K + πC = 52.5.
Complexity of the positional systems
We consider the positional systems, in their written form. There is a particular role of the symbols 0, 1, which is different from the other natural
language systems. In particular the 0 appears in several places in the ω network. This is an effect of the fact that the same numerical quantity can be
represented in several possible ways in positional systems. For example 0
5.3. COMPLEXITY OF THE POSITIONAL SYSTEMS
93
Figure 5.6: The ω∗R for the Italian system. The input Categories are in complete agreement with the linguistic reality. C0 collects the numbers that have
independent words, these are called “digits” because they are the analogous
in natural language of the digits in the written systems. C0′ collects the digits that are multiplied by 10 (dieci). In Italian the number words for the
elements in C0′ (v-enti, tr-enta, quar-anta) are constructed with the number
words for the digits (due, tre, quattro, . . . ) and the suffixes -enti -enta -anta,
marking the multiplication by 10. (We do not take into account the complex
linguistic transformations that change the form of the digit words when the
suffix is placed.)
and 0000 are both zero, but their representation is different. In positional
systems we must take into account this difference in our network, because,
as we have remarked, we are concerned with the logical organization of the
symbols representations. With this in mind we can realize that 0 is an elementary symbol, and instead 0000 is the result of the sequence of operations:
(0 × B 3 ) + (0 × B 2 ) + (0 × B 1 ) + (0 × B 0 ).
We can calculate explicitly the complexity for positional systems with
any base B, for any size B k . We do not enter in the details on how the
reduction algorithm operates, we will only say that the first pivot, as it is
clear, is B k−1 .
The reduction of the similar Triples leads to the formation of two new
Categories, the first (input) contains all elements from 0 to B k−1 − 1, the
output Category is formed by all elements of the form B k−1 + x, where
x ∈ C0,k−1 . The reduction process goes iteratively along the usual lines and
the final result is depicted in the following picture:
94
CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS
20
k_out(x)
15
10
5
0
0 4
10
20
30
40
50
60
70
80
90
100
x
Figure 5.7: The degree function kout (x) for the French system. There are
two Pivots {60, 80} with kout = 19. The kout of the numbers 4 (quatre) and
20 (vingt) is higher than that of their “ peers” (numbers that in the ω R are
in the same Category, see Fig. 5.8
The ω∗R is constitued by a single connected component. In the Triple
with the × operation is represented the formation of the numbers of the
form x × B l , where x is a digit and l some positive power of the base. The
l
othern Triples describe the
o formation of numbers of the form x×B + y, where
l
y ∈ 0, 1, 2, . . . , B − 1 .
There are only two input Categories:
C0 = {0, 1, 2, . . . , B − 1}
containing the digits, and
n
C∗ = B, B 2 , B 3 , . . . , B k−1
o
containing the powers of the base.
Now we calculate the complexities K and C as functions of the base and
of the size. Taking into account that k = logB N, the length of the input
Categories is
π = B + (k − 1) = B + logB N − 1,
and the number of irreducible operation (or the number of Triples) of ω R is
π⋆ = k = logB N.
The descriptional complexity of the ω R for positional systems with a base B
and size N = B k is
K = B + 2 logB N − 1.
(5.1)
5.3. COMPLEXITY OF THE POSITIONAL SYSTEMS
95
Figure 5.8: The ω R for the French system. The Triple for the construction
of the number 80 quatre-vingts is unreduced. A possible interpretation is
that this construction breaks the regularity of a numeral system otherwise
would be substantially similar to the Italian system (See also Sec. 4.4 for our
interpretation of unreducible Triples). The other Categories group togheter
numbers that are constructed in a similar way (output Categories) or have
similar roles (input Categories) in the system. For example Cv collects the
numbers that are added to 60 and 80 so they have similar roles.
10000
k_out(x)
1000
100
10
1
10
100
x
1000
10000
Figure 5.9: The degree function kout (x) of the simple ωDecimal for the positional base 10 system. The size is N = 105 but we set the range of the
abscissa at [0 − 104 ].
96
CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS
Figure 5.10: The ω∗R for base B positional systems of size N = B k . We did
not reproduce the details of the output Categories’ inner structure, because
it is described in the text. There are two input Categories: C0 that contains
the digits 0, 1, . . . , B − 1 and C ∗ that contains the positive powers of the base
B, B 2 , . . . , B k−1 . In ΩHurf ord networks these Categories correspond exactly
to the DIGIT and M grammatical categories.
The additional term (average logical depth, or πC ) is simply
πC = 2 logB N,
and the complexity is finally
C = B + 4 logB N − 1.
(5.2)
For example in the case of the decimal positional system with N = 102 we
find C = 17, to be confronted with, for example the value we obtained from
the Italian system (See Sec. 5.2).
Deriving the equation 5.2 with respect to B and making equal to zero we
obtain the implicit equation that gives the optimal base as a function of the
size
1
N = exp
log (B)2 B
4
(5.3)
97
5.4. CONCLUSIONS
30
28
K
C
26
24
complexity
22
20
18
16
14
12
10
8
1
2
3
4
5
7
6
8
9
10
11
base (B)
Figure 5.11: The complexity functionals K and C in a positional network of
size N = 102 , as a function of the base.
45
C
K
40
complexity
35
30
25
20
15
10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
base (B)
Figure 5.12: The complexity functionals K and C in a positional network of
size N = 104 , as a function of the base.
5.4
Conclusions
It seems that our reduction algorithm, at least in these simple cases, is able
to find meaningful categories. In the Italian system for example 1 and 10
make a category apart, as 4 and 20 in the French system. A category for the
decades Cd is found in both Italian and in French systems. For positional
systems we find explicit formulas for the complexities K and C that share the
interesting feature of exhibiting a range of approximate minimal solutions
around “small but not too much” bases. In both cases this solutions are
dependent on the size in the expected way. In natural languages numeral
system a range of bases from 2 to 100 is covered. But the predominance in
98
CHAPTER 5. COMPLEXITY OF NUMERAL SYSTEMS
70
K
C
65
60
complexity
55
50
45
40
35
30
25
20
2
4
6
8
10
12
14
16
18
20
22
24
26
28
base (B)
Figure 5.13: The complexity functionals K and C in a positional network of
size N = 1012 , as a function of the base.
1e+24
C
K
N
1e+18
1e+12
1e+06
1
10
5
15
B
Figure 5.14: The optimal base as an implicit function of the size N with
repect to the functionals K (blue) and C (red). As one expects the optimal
base adapts to the range of numbers that are involved.
evoluted systems of the base 10 is almost universal and we think that this has
to do with the fact that, in this range of bases, an optimization of cognitive
resources is realized.
Chapter 6
Perspectives and conclusions
In this Thesis we studied numeral systems, as representatives of symbolic
systems. We decided to abstract from their physical manifestations (sound
of the words and phrases, signs) in order to capture in the same model
natural language and written numeral systems, and to concentrate on the
logical organization of these symbols into a system, which is possible once we
introduce a network model. In a first moment we focalized on measurements
of the structure of these systems, learning a lot on their relevant features and
on the quantities that can be useful to parametrize this space. The initial
idea was to define the complexity of such systems looking at topological
and statistical properties, expecially comparing different models of numeral
systems, inspired by natural language and written systems with abstract
models introduced as theoretical tools.
Than we moved to a completely different approach. In order to find a
suitable definition of complexity it was possible to get inspiration from a well
known compression algorithm, introduced by P.Grassberger for sequences
of symbols. The central idea of this algorithm is to search through the
given sequence a dominant (most frequent) couple of nearby symbols, and
substitute this couple with a new symbol. This is obviously a new sequence,
but with a different alphabet: the old one plus the newly born symbol. The
transformation is repeated iteratively, and stops only when the descriptional
length of the sequence, defined as the length of the actual sequence plus
the length of a concise description of the new symbols, reaches a stationary
regime. The second term was actually discarded, because it was very small
if compared with the first one.
Compared with previously existing approaches, the contribution of this
Thesis is twofold. First of all we established a new mathematical framework
that has revealed very fruitful for the comprehension at a systemic level of
the organization of numeral systems. This framework is quite flexible and
99
100
CHAPTER 6. PERSPECTIVES AND CONCLUSIONS
we think that it can be adapted to other (simple) symbolic systems with a
reinterpretation of the meaning of the operations and of the symbols. The
second contribution consists in the introduction of a transformation (the
reduction R) in a space of networks with non trivial topology, that maps
the original network into an equivalent version, suitable to describe it in a
coincise and in a certain sense meaningful way.
Two main lines of research seems to be promising in the immediate future.
• One, more mathematical, consist in studying the reduction R transformation more deeply. Expecially intriguing would be, from my point of
view, to investigate the possible relationships of the functional K with
information-theoretical measures such as Kolmogorov Complexity. Another interesting issue could be a more systematic investigation of the
space Ω from the point of view of Statistical Mechanics, for example
calculating the cardinality (volume) of the network ensemble E (ε, p)
as a function of (ε, p); and on the surfaces defined in Ω by fixing some
functional, like for example πC . We tried to move in this direction, expecially in the early stages of this work, relying mainly on simulations,
but without great success. But recent developments in the Statistical
Mechanics of networks shows that such entropy calculations are possible and shed considerable ligth on the information-theoretical aspects
of networks ensembles [AB09].
• The other research line, for which we think that the tools developed
in this work are useful, is the modelization of the evolution of numeral systems by means of interactions in a community of cognitive
agents, and more in general in the evolution of language, broadly interpreted [LS07]. In many biological, technological and social systems,
initially the units of which the system is composed interact among
themselves and with the environment in a sensorial and non-symbolic
way, their communication system not being predetermined nor fixed
from a global entity [MN06] [NF04] [Ste06]. The communication system emerges spontaneously as a result of the interactions of the agents
and it could change continuously due to the mutations occurring in the
agents, in their objectives as well as in the environment. An important
question concerns how conventions are established, how communication
arises, what kind of communication systems are possible and what are
the prerequisites for such an emergence to occur. In this perspective it
is interesting to investigate the emergence of syntactic forms of agreement: compositionality, categories, syntactic or grammatical structures
[CFL09].
Bibliography
[AB09] Kartik Anand and Ginestra Bianconi. Entropy measures for
networks: Toward an information theory of complex topologies. Physical Review E (Statistical, Nonlinear, and Soft Matter
Physics), 80(4), 2009.
[BCC+ 08] M. Ballerini, N. Cabibbo, R. Candelier, A. Cavagna, E. Cisbani,
I. Giardina, V. Lecomte, A. Orlandi, G. Parisi, A. Procaccini,
M. Viale, and V. Zdravkovic. Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proceedings of the National
Academy of Sciences, 105(4):1232–1237, January 2008.
[BCG06] Dario Benedetto, Emanuele Caglioti, and Davide Gabrielli. Non
sequential recursive pair substitution: Some rigorous results,
2006.
[Ben88] Charles H. Bennett. Logical depth and physical complexity. In
The Universal Turing Machine: A Half-Century Survey, pages
227–257, 1988.
[Bia08] Ginestra Bianconi. Entropy of randomized network ensembles.
Europhysics Letters, 81, 2008.
[BLM+ 06] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D. Hwang.
Complex networks: Structure and dynamics. Physics Reports,
424(4-5):175–308, February 2006.
[CFL09] C. Castellano, S. Fortunato, and V. Loreto. Statistical physics of
social dynamics. Review of Modern Physics, 81:591–646, 2009.
[Cho88] N. Chomsky. Language and Problems of Knowledge. The Managua Lectures. Cambridge, Mass., 1988.
[Cho02] Noam Chomsky. Syntactic Structures. Walter de Gruyter, 2nd
edition, December 2002.
101
102
BIBLIOGRAPHY
[CRTVB07] Luciano da F. Costa, Francisco A. Rodrigues, Gonzalo Travieso,
and P.R. Villas Boas. Characterization of complex networks: A
survey of measurements. Advances In Physics, 56:167, 2007.
[CVA03] Nick Chater, Paul Vitányi, and Coventry Cv Al. Simplicity: A
unifying principle in cognitive science? In Trends in Cognitive
Sciences, pages 7–19, 2003.
[DDLC98] S. Dehaene, G. Dehaene-Lambertz, and L. Cohen. Abstract representations of numbers in the animal and human brain. Trends
Neurosci, 21(8):355–361, August 1998.
[Dea97] Terrence Deacon. The Symbolic Species : The Co-evolution of
Language and the Brain. W.W.Norton, New York, 1997.
[Deh03] S. Dehaene. TRENDS in Cognitive Sciences, 7:145–147, April
2003.
[DM03] S. N. Dorogovtsev and J. F. F. Mendes. Evolution of Networks:
From Biological Nets to the Internet and Www (Physics). Oxford University Press, March 2003.
[DMO05] S. Dorogovtsev, J. Mendes, and J. Oliveira. Frequency of occurrence of numbers in the world wide web, April 2005.
[Fec60] G. T. Fechner. Elements of psychophysics. New York: Holt,
Rinehart & Winston, 1860.
[Fre79] G. Frege. Begriffsschrift. Halle, 1879.
[GBPV08] A. C. C. Coolen G. Bianconi and C. J. Perez-Vicente. Entropies
of complex networks with hierarchically constrained topologies. Physical Review E (Statistical, Nonlinear, and Soft Matter
Physics), 78, 2008.
[GJ79] Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H.
Freeman & Co., New York, NY, USA, 1979.
[Gra02] Peter Grassberger. Data compression and entropy estimates by
non-sequential recursive pair substitution, 2002.
[GW79] G.H.Hardy and E.M. Wright. An Introduction to the Theory of
Numbers, 5th edn. Oxford University Press, 1979.
BIBLIOGRAPHY
103
[Hak49] H. Haken. Information and Self-Organization: a macroscopic
approach to complex systems. Springer-Verlag, Berlin 1988,
1949.
[Hal91] Ken Hale. Miskito numerals, 1991.
[Ham06] H. Hammarström. Complexity in numeral systems with an investigation into pidgins, pidgincreoles and creoles. Language
Complexity: Typology, Contact, Change [Studies in Language
Companion Series]. Amsterdam: John Benjamins., 2006.
[Hur87] J. Hurford. Language and Number: the emergence of a cognitive
system. Basil Blackwell, Oxford, 1987.
[Hur99] J. Hurford. Artificially growing a numeral system. In Jadranka
Gvozdanovic, editor, Numeral Types and Changes Worldwide,
pages 7–41. 1999.
[Kac59] Marc Kac. Statistical Independence in Probability, Analysis and
Number Theory (Carus Mathematical Monographs, No. 12). Wiley, New York, 1959.
[LS07] Vittorio Loreto and Luc Steels. Social dynamics: Emergence of
language. Nature Physics, 3:758–760, November 2007.
[LV97] Ming Li and Paul Vitanyi. An Introduction to Kolmogorov
Complexity and Its Applications (Texts in Computer Science).
Springer, February 1997.
[MN06] D. Marocco and S. Nolfi. Origins of communication in evolving robots. In S. Nolfi, G. Baldassarre, R. Calabretta, J. Hallam, D. Marocco, O. Miglino, J-A Meyer, and D. Parisi, editors,
From animals to animats 9: Proceedings of the Ninth International Conference on Simulation of Adaptive Behaviour. LNAI.
Volume 4095. Springer Verlag, Berlin, Germany, 2006.
[New03] M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167–256, 2003.
[NF04] Stefano Nolfi and Dario Floreano. Evolutionary Robotics: The
Biology, Intelligence, and Technology of Self-Organizing Machines (Intelligent Robotics and Autonomous Agents). The MIT
Press, March 2004.
104
BIBLIOGRAPHY
[OTI81] T Oyama, Kikuchi T., and S. Ichihara. Span of attention backward masking and reaction time. Percept. Psychophys, 29:106–
12, 1981.
[Par03] G. Parisi. Complexity and intelligence. In Lectures Notes
in Physics, editor, The Kolmogorov Legacy in Physics. Part II.
Algorithmic Complexity and Information Theory, pages 76–88.
Springer Berlin / Heidelberg, 2003.
[PI09] Manuela Piazza and Véronique Izard. How humans count: Numerosity and the parietal cortex. The Neuroscientist, 15(3):261–
273, June 2009.
[RB02] Albert Reka and Barabási. Statistical mechanics of complex
networks. Rev. Mod. Phys., 74:47–97, June 2002.
[RMR08] Andrij A. Rovenchak, Ján Macutek, and Charles Riley. Distribution of complexities in the vai script. CoRR, abs/0810.0200,
2008.
[SB92] Denise Schmandt-Besserat. Before writing: from counting to
cuneiform. University of Texas Press, Austin, 1992.
[Ste06] Luc Steels. Experiments on the emergence of human communication. Trends in Cognitive Sciences, 10(8):347–349, August
2006.
[Tur36] A. M. Turing. On computable numbers, with an application to
the entscheidungsproblem. Proc. London Math. Soc., 2(42):230–
265, 1936.
[Wil12] The Mafulu. Mountain People of British New Guinea. 1912.
[WM80] Ebeling W. and Jiménez-Montano M.A. On grammars, complexity, and information measures of biological macromolecules.
Math. Biosc. 52:, 53-71., 1980.
[WR27] Alfred North Whitehead and Bertrand Russell. Principia Mathematica. Cambridge University Press, 1925–1927.