Download Information and Entropy in Neural Networks and Interacting Systems

Document related concepts

Basil Hiley wikipedia , lookup

Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup

Bell test experiments wikipedia , lookup

Renormalization group wikipedia , lookup

Quantum dot cellular automaton wikipedia , lookup

Scalar field theory wikipedia , lookup

Double-slit experiment wikipedia , lookup

Quantum field theory wikipedia , lookup

Max Born wikipedia , lookup

Measurement in quantum mechanics wikipedia , lookup

Quantum dot wikipedia , lookup

Particle in a box wikipedia , lookup

Copenhagen interpretation wikipedia , lookup

Delayed choice quantum eraser wikipedia , lookup

Bell's theorem wikipedia , lookup

Hydrogen atom wikipedia , lookup

Path integral formulation wikipedia , lookup

Quantum fiction wikipedia , lookup

Quantum decoherence wikipedia , lookup

Coherent states wikipedia , lookup

Many-worlds interpretation wikipedia , lookup

Algorithmic cooling wikipedia , lookup

Quantum electrodynamics wikipedia , lookup

EPR paradox wikipedia , lookup

Symmetry in quantum mechanics wikipedia , lookup

Density matrix wikipedia , lookup

Interpretations of quantum mechanics wikipedia , lookup

Probability amplitude wikipedia , lookup

Orchestrated objective reduction wikipedia , lookup

Quantum computing wikipedia , lookup

History of quantum field theory wikipedia , lookup

Quantum machine learning wikipedia , lookup

Quantum group wikipedia , lookup

Quantum key distribution wikipedia , lookup

Canonical quantization wikipedia , lookup

Hidden variable theory wikipedia , lookup

Quantum state wikipedia , lookup

Quantum teleportation wikipedia , lookup

Quantum entanglement wikipedia , lookup

T-symmetry wikipedia , lookup

Transcript
Information and Entropy in Neural
Networks and Interacting Systems
Fariel Shafee
A DISSERTATION PRESENTED TO
THE FACULTY OF PRINCETON UNIVERSITY
IN CANDIDACY FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
RECOMMENDED FOR ACCEPTANCE BY
THE DEPARTMENT OF PHYSICS
Adviser: Self
January 2009
c
°Copyright
by Fariel Shafee, 2009.
All rights reserved.
Abstract
In this dissertation we present a study of certain characteristics of interacting
systems that are related to information. The first is periodicity, correlation
and other information-related properties of neural networks of integrate-andfire type. We also form quasiclassical and quantum generalizations of such
networks and identify the similarities and differences with the classical prototype. We indicate why entropy may be an important concept for a neural
network and why a generalization of the definition of entropy may be required. Like neural networks, large ensembles of similar units that interact
also need a generalization of classical information-theoretic concepts. We extend the concept of Shannon entropy in a novel way, which may be relevant
when we have such interacting systems, and show how it differs from Shannon
entropy and other generalizations, such as Tsallis entropy. We indicate how
classical stochasticity may arise in interactions with an entangled environment in a quantum system in terms of Shannon’s and generalized entropies
and identify the differences. Such differences are also indicated in the use of
certain prior probability distributions to fit data as per Bayesian rules. We
also suggest possible quantum versions of pattern recognition, which is the
principal goal of information processing in most neural networks.
iii
Contents
Abstract
iii
List of Figures
vii
List of Tables
xii
Acknowledgements
xiv
1 Prolegomena
1
2 Integrate-and-fire Networks with Finite-width Action Potentials
8
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Integrate-and-fire models . . . . . . . . . . . . . . . . . . . . . 11
2.3
Convergence to periodicity in the finite width case . . . . . . 15
2.4
Rate of convergence to limit cycle . . . . . . . . . . . . . . . . 18
2.5
Nonleaking networks with finite width action potential . . . . 22
2.6
8
2.5.1
Effect of pulse shape . . . . . . . . . . . . . . . . . . . 23
2.5.2
Convergence rate and the value of A . . . . . . . . . . 25
2.5.3
Effect of region of initial excitation . . . . . . . . . . . 26
2.5.4
Synchronicity, bin-size and dynamic entropy . . . . . . 28
Leaking networks with finite action potential . . . . . . . . . . 30
iv
2.7
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Neural Networks with Quantum Interactions
37
3.1
Quantum Interaction between Nodes and Coherence Scale . . 39
3.2
Technological Realization of Quantum ‘Neurons’ and ‘Action
Potentials’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3
Scales of Coherence . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4
Time scales of quantum gates and system evolutions . . . . . . 46
3.5
3.4.1
Quasiclassical Net . . . . . . . . . . . . . . . . . . . . . 49
3.4.2
Quantum-Gated Net . . . . . . . . . . . . . . . . . . . 49
3.4.3
Completely Entangled Net . . . . . . . . . . . . . . . . 50
Difference between our Quantum Models and Existing Models
4 Quasiclassical Neural Network Model
4.1
51
53
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.1
Comparison with the Hopfield Model . . . . . . . . . . 54
4.2
Quantum Transitions . . . . . . . . . . . . . . . . . . . . . . . 55
4.3
Some Analytically Soluble Cases . . . . . . . . . . . . . . . . . 58
4.3.1
Constant External V, No Interneuron Interaction . . . 59
4.3.2
Harmonic External V, No Interneuron Interaction . . . 60
4.3.3
Exponentially Damped External V, No Interneuron Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.4
Damped Harmonic External Potential, No Interneuron
Interaction . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.5
No External Potential, Constant Interneuron Interaction 62
4.3.6
Constant External Potential and Constant Interneuron
Interaction . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.7
No External Potential, Damped Harmonic Interneuron
Interaction
4.4
. . . . . . . . . . . . . . . . . . . . . . . . 63
Quasiclassical Hopfield-Herz type Neural Network . . . . . . . 64
v
4.5
Input Dependence . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.6
Results of Simulation . . . . . . . . . . . . . . . . . . . . . . . 67
4.7
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.8
Possible Applications . . . . . . . . . . . . . . . . . . . . . . . 75
5 Quantum-gated Neural Networks
77
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2
Hierarchical Structure in Quantum Machines . . . . . . . . . . 79
5.3
Brief Review of Elements of a Quantum AI Machine: Qubits
and Quantum Gates . . . . . . . . . . . . . . . . . . . . . . . 83
5.3.1
Qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3.2
Quantum Gates . . . . . . . . . . . . . . . . . . . . . . 85
5.4
The Quantum Neural Network Model . . . . . . . . . . . . . . 92
5.5
Results of Simulation . . . . . . . . . . . . . . . . . . . . . . . 97
5.6
Discussion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6 Fully-entangled Neural Networks Model
107
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2
An Entangled Quantum Network Model . . . . . . . . . . . . 108
6.3
Periodic and Aperiodic Regimes . . . . . . . . . . . . . . . . . 111
6.4
Simulation Results and a Modified Gate . . . . . . . . . . . . 112
6.5
Creation and Detection of Entangled States . . . . . . . . . . 115
6.6
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.7
Patterns in Entangled Quantum States . . . . . . . . . . . . . 119
6.8
6.7.1
Qubit Pattern Generation/Representation . . . . . . . 119
6.7.2
Qubit Pattern Recognition . . . . . . . . . . . . . . . 120
Learning Quantum Patterns . . . . . . . . . . . . . . . . . . . 123
7 Generalization of Entropy
7.1
126
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
vi
7.2
Entropy of a Neural Network
. . . . . . . . . . . . . . . . . . 128
7.3
Defining the New Entropy . . . . . . . . . . . . . . . . . . . . 131
7.4
Applications of the New Entropy . . . . . . . . . . . . . . . . 137
7.5
Probability Distribution for the New Entropy . . . . . . . . . 142
7.6
Probability, Lambert Function Properties and Constraints . . 145
7.7
Numerical Comparison . . . . . . . . . . . . . . . . . . . . . . 149
7.8
Free Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.9
Shannon and Tsallis Thermodynamics . . . . . . . . . . . . . 154
7.10 Thermodynamics of the New Entropy . . . . . . . . . . . . . . 155
7.11 Application to a Simple System . . . . . . . . . . . . . . . . . 157
7.12 Summary of the Mathematical Properties of the New Entropy 161
8 Generalized Entropy and Entanglement
8.0.1
164
Entangled and Pure States . . . . . . . . . . . . . . . . 166
8.1
Stochasticity from Entanglement with Environment . . . . . . 169
8.2
Entanglement, Entropy and Mutual Information . . . . . . . . 171
8.2.1
Single System Interacting with Environment . . . . . . 171
8.2.2
Entangled Systems interacting with the Environment . 173
8.3
Quantum Pattern Recognition and Mutual Information . . . . 182
8.4
Entanglement and Mutual Information . . . . . . . . . . . . . 183
9 Prior PDFs, Entropy and Broken Symmetry
190
9.1
Priors and Moments of Entropy . . . . . . . . . . . . . . . . . 192
9.2
Comparison of Shannon and Our Entropy . . . . . . . . . . . 194
9.3
Results for Mean Entropy . . . . . . . . . . . . . . . . . . . . 195
10 Illation and Outlook
202
Bibliography
205
vii
List of Figures
2.1
Square pulse with width w = 0.01, A = 0.96, I = 1. . . . . . . 24
2.2
Same as Fig. 2.1, but with a triangular pulse. . . . . . . . . . 24
2.3
Square pulse, with A = 0.24, I = 1, w = 0.2 . . . . . . . . . . 25
2.4
Same as Fig. 2.3, but with a triangular A.P. pulse. . . . . . . 26
2.5
As Fig. 2.2, but with peripheral initial excitation. . . . . . . . 27
2.6
As Fig. 2.4, but with peripheral initial excitation. . . . . . . . 27
2.7
As Fig. 2.2, but with a time step 10 times greater. . . . . . . . 28
2.8
As Fig. 2.2, but with leaking neurons with R = 10. . . . . . . 31
2.9
First firing of a net of leaking neurons, with a resistance just
above the critical value of R = 1. . . . . . . . . . . . . . . . . 33
2.10 As above, showing successive peaks with growing synchronicity. The first firings shown in detail in the previous figure are
the small clump at around t = 2.8. . . . . . . . . . . . . . . . 33
viii
3.1
(a) In classical H-H network each node receives a current k
from each nearest neighbor which has fired; (b) In our first
model of quasiclassical network, every node is a qubit-like object, but sees its neighbors as decohered classical sources of
potential; (c) In our second model, a quantum gated one, every qubit interacts with its nearest neighbors seen as qubits,
but the coherence length is the interqubit distance. (d) In our
third model of completely entangled network all qubits are in
coherence and the gates also maintain the coherence. . . . . . 47
4.1
V0 = 0.2, width = 0.2, k = 0.2: typical pattern of the triggering of the neurons in the quasiclassical neural network. There
is apparently no phase locking. . . . . . . . . . . . . . . . . . . 68
4.2
Cumulative number of triggering against time. One can see
a fairly regular linear behavior despite quantum stochasticity.
This is for a single chosen neuron. . . . . . . . . . . . . . . . . 68
4.3
Same as Fig 4.2, but for the whole system. . . . . . . . . . . . 69
4.4
Transition from short term behavior to asymptotic behavior
with all peripheral nodes initially in state 1. . . . . . . . . . . 71
4.5
Same for peripheral nodes initially in states |1i and |0i alternately. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.6
Same for peripheral nodes initially in random states of excitation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1
A train of action potential pulses in a biological neural system. 81
5.2
Oscillations of the c part of a qubit for a non-cutoff model with
² = 0.01. This is the quantum amplitude for the state |1i. The
modulus squared of this quantity gives the probability for a
particular measurement to find the state to be |1i, though
individual experiments may give either 0 or 1. . . . . . . . . . 99
ix
5.3
Oscillation of the c parts summed of all qubits in the network
for cthresh = 0.7, ² = 0.01. All boundaries excited initially. . . . 100
5.4
Correlation between two qubits for no threshold case with
² = 0.01, h10, 10|20, 21i, where the qubits are located by their
(x, y) coordinates in the lattice. . . . . . . . . . . . . . . . . . 102
5.5
Correlation h10, 10|20, 21i for the above case.
. . . . . . . . . 103
5.6
Variation of periodicity with ² for excitations from all sides. . 104
6.1
Quantum network for Bak type training for pattern recognition. The intermediate and final qubits are shown integrated
with OR gates to sum the contributions of the qubits connected behind. The circular gates are rotation gates. The
curved feedback information paths that control the gates’ rotations are shown only for two gates for clarity. . . . . . . . . 125
7.1
W0 (z) is real along the real axis from −1/e to ∞; the value of
W0 ranges from −1 to ∞; we do not show the W−1 (z) branch,
which is real from z = −1/e to z = 0, because it is not suitable
for our entropy, as explained in the text. . . . . . . . . . . . . 146
7.2
Comparison of the pdf for the new entropy for values of q =
1, 1.1, 1.2 and 1.3. The solid line is for q = 1, i.e. the Gibbs
exponential distribution and the lines are in the order of q . . 149
7.3
Comparison of pdf’s for Tsallis nonextensive entropy (solid
line) and the new entropy presented here, for q = 1.1 . . . . . 151
7.4
The same as Fig. 7.3, but for a higher q = 1.3 . . . . . . . . . 151
7.5
A for Shannon (top), Tsallis with ² = 0.1 (middle) and Tsallis
with ² = 0.25 (bottom) . . . . . . . . . . . . . . . . . . . . . . 158
7.6
S for same three entropy forms (from bottom to top Shannon,
Tsallis (0.1), Tsallis (0.25)) . . . . . . . . . . . . . . . . . . . . 158
7.7
U for same three entropy forms (same order as for S) . . . . . 159
x
7.8
C for same three entropy forms (peaks bottom to top - Shannon, Tsallis (0.1), Tsallis (0.25)) . . . . . . . . . . . . . . . . . 159
7.9
Comparison of A for Shannon (apart), Tsallis with ² = 0.1
and new entropy for ² = 0.05 (superposed) . . . . . . . . . . . 160
7.10 Comparison of S for same three forms of entropy (Shannon,
Tsallis (0.1), new entropy (0.05) - Tsallis is just over the new
entropy). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.11 Comparison of U for the same three entropies. Shannon is
separated, other two overlap. . . . . . . . . . . . . . . . . . . . 161
7.12 Specific heat C for same three entropies. Again, only Shannon
is separated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.1
IAB as a function of entanglement angle θ in A-B space and the
entanglement angle θ0 with the environment, which is related
to the stochasticity. . . . . . . . . . . . . . . . . . . . . . . . . 176
8.2
Difference of MI from our entropy with that from Shannon
entropy at θ = π/4. . . . . . . . . . . . . . . . . . . . . . . . . 179
8.3
Same as Fig. 8.2 with θ0 = π/4 . . . . . . . . . . . . . . . . . 179
8.4
MI difference between our entropy form and Shannon for q = 0.7180
8.5
Same as Fig. 8.4 but for q = 1.3 . . . . . . . . . . . . . . . . 180
8.6
Difference between MI from our entropy and Tsallis’s with
θ = π/4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
8.7
Same as Fig. 8.6 but with θ0 = π/4. . . . . . . . . . . . . . . 182
9.1
Ratio of expected value (first moment) of new entropy plotted against bin number K and prior exponent β for entropy
parameter q = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . 196
9.2
Same as Fig. 9.1, but for q = 1.0, i.e. Shannon entropy. . . . 196
9.3
As previous two figures, but for q = 1.5. . . . . . . . . . . . . 197
xi
9.4
Clearer view in 2-dimensional plot, with K = 10. Red, green
and blue lines are for q = 0.5, 1.0 and 1.5 respectively . . . . . 198
9.5
As Fig. 9.4, but for bin number K = 100.
. . . . . . . . . . 199
9.6
As previous two figures, but for K = 1000. . . . . . . . . . . . 199
9.7
Our entropy for a three-state system, with parameter q = 2.44,
as two independent probabilities p1 and p2 are varied wit the
constraint p1 + p2 + p3 = 1. The expected maximum at the
symmetry point p1 = p2 = p3 turns out to be a local maximum.
The global maxima are not at the end points with one of
the probabilities going up to unity and the others vanishing,
which gives zero entropy as expected, but occurs near such
end points, as shown clearly in the next figure. . . . . . . . . . 201
9.8
Two-dimensional version of the previous Fig. 9.7, with p1 =
1/3 fixed, so that only p2 varies. This shows a clearer picture
of local maximum at the symmetry point and global maxima
near the end points. . . . . . . . . . . . . . . . . . . . . . . . . 201
xii
List of Tables
2.1
Time period for leaking networks . . . . . . . . . . . . . . . . 34
4.1
Strength of Quantum Potential and Average Period of Neurons
(width = 0.2, V0 = 1): Best Fit . . . . . . . . . . . . . . . . . 70
4.2
Variation of Period with Duration of Quantum Potential (v =
0.2, V0 = 1): Best Fit . . . . . . . . . . . . . . . . . . . . . . . 70
xiii
Acknowledgements
I am grateful to Professor J.J. Hopfield for teaching me about neural
networks. I am indebted to Professor W. Bialek and Dr. I. Nemenman for
their help in learning about entropy and priors. I thank Professor H. Rabitz
and Dr. Ignacio Sola for many discussions that made me aware of the role of
quantum control in small systems. Lastly, I acknowledge with gratitude the
support of Professor C.G. Callan in finishing this work.
xiv
Chapter 1
Prolegomena
This work deals with interacting complex systems where information and entropy play key or important roles. Possibly the most complex system known
to us is the human brain. We now interpret the activities of the brain in terms
of neural networks, i.e. the firings of nodes (depolarization-repolarization cycles of nerve cells) that are connected in a fashion evoking different patterns
of oscillations in different regions, or sub-nets interlinked in a complex way
that keeps changing. Since the strength of sensation, or of reaction, is dependent on the frequency of the pulses, periodicity of oscillatory behavior
and its variation or absence are important attributes of an organic neural
network. It has been known for a long time that aggregate periodicity of the
neurons in the central nervous system (CNS) in various forms [1], such as
1
α, β, γ and δ waves can indicate the overall state of alertness of a human being. Artificial neural networks (ANN) may also try to mimic these features,
as in many cases it may be more convenient than a simple sequential set
of one-time transformations. Short-term memory, in particular is refreshed
periodically both in biological neural networks and in ANN. In current neuroscience research the spectral analysis of signals in correlated circuits is an
important theoretical tool in understanding the extent and circumstances of
the correlations [2, 3]. In the biological systems many processes have a rhythmic character, such as heartbeats, respiration, diurnal sleep/wake patterns,
contraction of muscle filaments, which may be assigned to periodic behaviors
of the controlling centers in the neural system [4, 5, 26, 6, 7, 8]. Despite
having its own oscillatory neuro-muscular circuits, the heart receives pulses
periodically [10, 7] according to the need. The diaphragm also contracts and
relaxes as the oxygen need felt by the CNS sends signals of the required periodicity [10, 9]. The actin myosin protein molecules slide on each other on
the muscles at a rate determined [26] by motor nerve signals from the neural
system to suit the rate of contraction, which produces the required pulling
force. Short-term memory is not due to changes in firmware in the brain, but
due to periodic refreshing [6] as in dynamic Random Access Memory (RAMs)
of a computer. The enjoyment of music or the meaning of the spoken word
2
depends on periodic repetitions [8] of the sound patterns inside the CNS.
Periodic behavior in sections of a biological neural network is an essential
feature of intelligent life, and is a subject of interest in its own right. It may
also be a model for types of artificial devices to carry out similar tasks.
Many years ago Hopfield and Herz [11, 12] constructed an apparently
very simple model of a neural network, where neurons fire [14, 15] when
they receive a specified quantity of total charge from their nearest neighbors,
just as biological neurons also fire when they receive the requisite quanta of
chemical neurotransmitters across the synapses. This integrate-and-fire was
an over-simplified version of the biological system, as it was bidirectional
and the connectivity weights were kept uniform for simplicity, in addition to
restricting the active contributors to nearest neighbors geometrically. However, the most interesting property of this simplified ANN was a uniform
periodicity of the firing of the nodes (here the discharge of capacitors) and
the attainment of phase-locking throughout the network asymptotically, creating correlations in phase.
Hence, despite its simple nature and departures from the biological systems, it is intrinsically interesting in studying the properties of periodicity
and correlations from the most basic assumptions. One can then build upon
this model to obtain greater complexities, more realistic biologically, or more
3
useful in ANN research. Among other idealizations of the original HopfieldHerz (H-H) model was an action potential of zero width, i.e. a δ(t − t0 )
time dependence, which would be unrealistic both biologically, as well as in
ANN, as signals need a finite time to form to full strength and then decay. In
this dissertation, we have, therefore, carefully examined the effect of a finite
width of various shapes on the results obtained in [12], i.e. whether we get
periodicity and phase-locking, and if they are dependent on the width of the
action potential or its ANN analog.
To have sufficient device density, so that a great amount of information
can be retained or processed in a compact low power assembly, it appears
that ANN must eventually resort to a scale where the laws of quantum mechanics take over [16]. This may take place in stages, with the smallest
subunits first needing quantum treatment, with a quantum redefinition of
the classical “bit”, then possibly the application of quantum gates, some of
which already exist in reality, and finally in a completely coherent (in the
quantum sense) machine performing algorithms not permitted in the classical context. We have, in this dissertation, compared the prototypical H-H
model with this sequence of quasiclassical, quantum-gated and completely
quantum-entangled networks, mostly with respect to the periodic behaviors
and correlations. Chapters 2, 3 and 4 deal with these results.
4
Though Kolmogorov complexity [13] (alternatively entropy) enters in the
discussion of most nontrivial systems, including even the simple H-H network, with or without a finite width action potential as we shall show, at
finite temperature the presence of random collisions or interactions among
units in different states makes entropy a vital quantity in the determination
of the probabilistic distribution of states among the units, which is related
to presence or lack of information due to the uncertainty associated in any
probabilistic arrangement. On the other hand, if the interactions are not
entirely random, but have a deterministic bias, then the measurement of
uncertainty needs to be modified. This was realized by the proponents of
alternative forms of entropy such as Renyi [17] or Tsallis [18]. The relation
between entropy and information was clarified by Shannon’s coding theorem
[16]. This involves the “phase space” volume of the bits transferred in a random stream following a probability distribution. However, with interaction,
we might anticipate a deformation of this volume. Using this approach, we
arrive at a new form of entropy where the bits occupy cells with a dimensionally modified volume. In this dissertation we present work where the most
likely probability distribution (for different energy states) corresponding to
such a deformed form of entropy is used, and show that an analytically exact
form is obtainable, making it an attractive expression to work with. However,
5
it is necessary to check if such a modification can lead to perceptible differences in the related statistical mechanics of the system for which it is used.
We shall present such comparisons in Chapter 5, where we shall also include
other forms of generalized entropies to note the similarities and differences
and indicate where our form may be more appropriate than others.
As our interest is in small information processing systems, or the information content of small systems, with quantum effects to be prominent, we
later present our work where we indicate how from a pure quantum state
involving the environment we can obtain the classical measures of stochasticity, i.e. the impurity factor in the quantum density matrix. We do this
for our form of the entropy and also other generalized forms, and Shannon
entropy. Chapter 6 deals with this aspect of our work. Mutual information,
quantum entanglement and stochasticity all become related in this formalism with different quantitative implications for different choices of the form
of energy.
However, ultimately probability distributions can only be formed from
actual data. Some prior knowledge, possibly theoretical constraints or expectations, can help in the formulation of more accurate posterior distributions,
where the number of possible states is high and the data sparse. However,
it was found by us [19] that such a direct approach is often fraught with the
6
danger of insensitivity. We show in Chapter 7 how our form of entropy, by
introducing one more parameter, and asymmetric weighting that modifies
the conventional concept of a symmetric distribution for a maximal entropy,
may offer a useful path.
7
Chapter 2
Integrate-and-fire Networks
with Finite-width Action
Potentials
2.1
Introduction
Models of neural networks are important to understand biological information processing and storage systems, and also because their relation to many
physical systems shed new light on those nonbiological complexes. It is wellknown that associative memory networks [11, 12] share many common features with spin glasses [20, 21]. Artificial neural networks may one day pave
8
the way for creating computers that are more akin to organic pattern recognition systems, and hence more efficient in tasks at present not addressable
by mechanistic serial processing.
Associative memory networks [22, 23, 24, 25, 26, 27, 28] are usually conceived as static patterns holding memories of inputs, but in recent years
there is growing interest in systems that hold memories of inputs in a dynamic fashion too, in the form of phase-locked oscillations. Several different
type of models have been presented [26, 29] with different assumptions about
their modus operandi. The differences may be in the ways the neurons are
triggered, in the ways they convey their information to other neurons, in the
fashion they are interconnected, and other conjectures about the role of the
environment.
Among these models are included a set of relatively simple networks of
integrate-and-fire neurons conceived by Hopfield and Herz [12], whose work
shows that many interesting nontrivial predictions regarding the dynamic
behavior of such networks could be made from their simple properties. However, in their work, to maintain simplicity, they assumed that the action
potential is in the form of a delta function in time, i.e. it is instantaneous.
But in some of the models they present, when a neuron fires on attaining
threshold from part of the action potential (AP) from a neighbor, the re-
9
maining excess charge from the neighbor may remain stored and contributes
to the next firing of the receiver, which in a sense mimics the effects of a
finite width, without involving quantitative complications that an explicit
finite-width model would entail. Experiments confirm that the shapes of
the action potentials are indeed of finite width and vary in different regions
[30, 31, 32].
A more realistic approach, however, cannot avoid dealing with all the
consequences of the finite width of the AP, which is a biological fact and must
also be an inevitable feature of most physical realizations. A priori, these
may be in the form of changed periods, different convergence behavior to
limit cycles or modified rates of convergence, different paths to globalization,
or its absence altogether etc. Specialized cases have been studied including
time delays [33] or specifically shaped action potentials [34, 35, 36].
The width introduces its own time scale that can be expected to interact
with the time scale of the zero-width period obtained in the H-H models,
which in turn depends on the strength of coupling among the neurons and
the environmental current. It also introduces the possibility of a more complicated distribution of the action potential current to the receiving neuron,
affecting its attainment of threshold, i.e. its period and properties related to
convergence to limit cycles.
10
We have, therefore, investigated the consequences of introducing finite
widths into the integrate-and-fire models. Where possible we have tried to
use analytical methods. Where that turns out to be impossible, we have used
simulation. In the next section we first describe different types of integrateand-fire models to establish the concepts and notation used in this work. In
section 2.3 we show that the convergence to periodic behavior for Class C
models can be analytically proved even in the finite width case. In section 2.4
we present our simulations with class C models using APs of various finite
widths and different shapes. In section 5 we do the same with the leaking
model A. Finally, in section 6 we present our conclusions.
2.2
Integrate-and-fire models
Integrate-and-fire neurons are usually considered in four different scenarios
[12]. In each case every neuron in the network receives some charge from other
neurons in the neighborhood, and also from an environmental current source.
This raises the electrostatic potential of the neuron. When the neuron has
accumulated sufficient charge and its potential exceeds a threshold, it fires,
i.e. it gives forth an AP to the other neurons, which too may be triggered in
due course. If all neurons attain the same periodicity, the phase difference
11
among them must become constant, i.e. the system becomes phase-locked. It
is also possible for some of the neurons to fire simultaneously (synchronicity)
if they reach the limit cycle together.
Let ui (t) be the potential of the i-th neuron at time t, Ii (t) the external
current input into this neuron, R its leaking resistance and Tij fj (t) the action
current getting into neuron i from neuron j. We can then write the current
conservation equation as:
Cdui /dt = −ui (t)/R +
X
Tij f (t) + Ii (t)
(2.1)
j
For simplicity with no loss of generality we take the threshold for firing
to be u = 1, by renormalizing u to the dimensionless quantity u/uthreshold .
We can also divide the equation by C and absorb it in R and I, in which
case R actually represents CR, the decay time constant, and I represents
I/(Cuthreshold ), which has the dimension of inverse time. In other words, in
our notation R and 1/I will set two different time scales. In most of our
work we shall assign the value 1 to I to represent the scale of an external
clock against which all internal dynamics is timed.
In the H-H work the action pulse shape f (t) is a Dirac delta function, i.e.
it is a zero-width normalized pulse:
12
fj (t) = δ(t − tj )
(2.2)
where tj is the time of the last firing of neuron j.
The four models differ in the treatment of the leaking as represented by
R and in the way they handle the after-fire potential ui .
In Model A, R is finite and may be taken to be equal to 1. This establishes
the time scale CR. In this model if neuron i receives a synaptic charge in
excess of that required to take the potential to the threshold, then, after
firing, ui (t) resets to a value corresponding to the excess charge (we omit the
implied symbol for summation over neighbors j for clarity):
ui (t+ ) = [ui (t− ) + ∆Tij 1] + (1 − ∆)Tij = (1 − ∆)Tij
(2.3)
∆Tij = 1 − ui (t− )
(2.4)
where
In Model B, the potential resets to zero after firing, irrespective of the
oncoming synaptic current and the previous state of the neuron. It too has
a leaking R = 1. Model C is the version of A that does not leak, i.e. R = ∞,
and Model D is the non-leaking version of B.
13
One can integrate the model C potential since the last firing at t0 gives,
assuming for simplicity constant external current I:
X
ui (t) = ui (t+ ) +
Tij 0 + I(t − t0 )
(2.5)
j0
where the summation is over only those other neurons j 0 that have fired
since t0 .
It is convenient to use a simple topology of the network by assuming only
nearest neighbor interactions. Then, after phase-locking has been established
with period P , we must have:
u(t0 + P ) = 1 =
X
Tij + IP
(2.6)
j
So
P = (1 − A)/I
(2.7)
where for notational convenience, assuming constant Tij , we get:
A=
X
Tij = Zα
(2.8)
j
Z being the co-ordination number of the neural lattice, i.e. the number
14
of nearest neighbors. In Model A it is not possible to integrate Eqn. 2.1,
because the integral depends on the specific times the contributions from the
different neighbors are received. This uncertainty results from the leakage,
which is proportional to the instantaneous potential.
2.3
Convergence to periodicity in the finite
width case
When the action potential occupies a finite width we have to use the pulse
form factor f with
Z t0 +∆t
t0
dtf (t) = 1
(2.9)
Here to is the time of the beginning of a firing and ∆t is the width of the
action potential and the consequent synaptic current.
As in the case of zero width action potentials, we can show that in the
finite width case too the system moves to a limit cycle with phase locking
among the neurons. We shall do it for the nonleaking model C. We present
the proof in three parts.
I: The time between the beginning of two successive firings of the same
15
neuron cannot be less than P = (1 − A)/I, where the variables are as defined
in Eqn. 2.7 and Eqn. 2.8.
Proof: Let i be the neuron that has the smallest interval between two
firings (there may be other neurons with the same least interval). Let the
two firings begin at times t1 and t2 . Let us consider any neighbor j of i.
Since, i has the least interval between successive firings, j cannot begin to
fire twice within this interval. Hence, i can receive at most fractional charges
from two firings of j within this interval, of which the beginning of the second
firing t02 can be within the interval, but not the earlier one t01 . So
t2 − t1 ≤ t02 − t01
(2.10)
t2 − t02 ≤ t1 − t01
(2.11)
i.e.
The total contribution received from j by i within this interval is:
αj =
Z s2 ∆t
0
f (t)dt +
where
16
Z ∆t
s1 ∆t
f (t)dt
(2.12)
s1,2 = (t1,2 − t01,2 )/∆t.
(2.13)
As, s2 ≤ s1 , we see that
αj ≤ α
(2.14)
ensuring that the period for i, Pi ≥ P .
II: The system converges to a limit cycle for the finite width case:
Proof: The Lyapunov function E is defined by [12] E = −
P
ui . Then its
change after an interval P given by Eqn. 2.7 is
E(t + P ) − E(t) =
=
X
i
(−IP −
X
j
i
t+P
Z
Tij
X
t
X
[−ui (t + P )] −
fj (t)dt) +
= −(1 − A)N + (1 − A)
= −(1 − A)[N −
XZ
i
t
i
t
[−ui (t)]
i
t+P
X Z t+P
X Z t+P
i
t
fi (t)dt
fi (t)dt
fi (t)dt]
(2.15)
As no neuron can begin to fire twice in an interval of less than P , E(t +
P ) − E(t) is nonpositive, as in the case of zero width AP. Hence, like the
zero width case, we have a convergence to a limit cycle.
17
III: If there is a periodicity, the period must be P = (1 − A)/I. Proof:
Firstly, by part 1, it cannot be less than P , because no neuron can ever begin
to fire again within an interval less than P . Indeed if we now assume that a
periodicity T has been established, i.e. all neurons have become phase locked
with equal phase difference in all periods, then, if ti is the last time neuron i
began to fire: ui (ti + T ) = 1 =
P
j
Tij
R t+T
t
fi (t)dt + IT = A + [(1 − A)/P ]T .
The last equality follows from the fact that if i receives a fraction of a
pulse from a neighbor j at the beginning of the cycle, it will receive the rest
at the last part of its cycle because of phase lock. i.e.
R t+P
t
fj (t)dt = 1 for
all j. So T = P .
2.4
Rate of convergence to limit cycle
An interesting observation can be made for the speed of convergence to the
limit cycle with period P . Let us consider, for simplicity, finite pulses of
square shape. Then the contribution from a neighbor j to a neuron i will
be proportional to the fractional duration of the pulse received by i within
its (variable period) cycle. Again, for simplicity let us consider only one
neighbor. Then, if Pn is the period of the n-th cycle, and fn is the duration
of the pulse received by i, and w is the width of the pulse, then, with P given
18
by Eqn. 2.7,
Pn+1 = [1 − (A/w)(w − fn ) − Afn+1 /w]/I
= P + A/(wI)[fn − fn+1 ]
(2.16)
Now, writing
∆Pn = Pn − P
(2.17)
we get
∆Pn+1 /∆Pn = A/[Iw + A]
(2.18)
where we have made use of the relation between the difference of successive pulse fractions fn with the difference between successive periods Pn ,
obvious for square pulses.
We can see that convergence to the limit cycle is a geometric sequence
and the rate of convergence depends on the A/(Iw +A) ratio, which of course
is dimensionless in our choice of notation. For fixed w a higher current would
19
bring in faster convergence. It may seem that for the zero width case we have
no convergence at all. However, in this singular case we cannot split the pulse
into two parts as in Eqn. 2.16, so that we must have fn = 1, and ∆Pn = 0
for all n, i.e. the system goes into phase lock after the completion of the first
set of firings. As we pointed out in the beginning, this relation is derived
assuming a single neighbor. If Z > 1, then in general the different neighbors
can contribute with different phase differences, which may keep changing
until phase lock. Eqn. 2.17 will then need to be modified to account for
these differences. If r of the Z neighbors contribute fully within a single
period, then we have:
∆Pn+1 /∆Pn = A0 /(Iw + A0 )
(2.19)
where we have used the symbol
A0 = (1 − r/Z)A
(2.20)
Therefore, when all the neighbors contribute their full action potential
within a single cycle, the period becomes P . However, the inverse is not
true, i.e. it is possible to distribute the charge from a single AP to succeeding
cycles of the neighbor and the whole system can still be in total phase-lock
20
with period P , because the R.H.S. of Eqn. 2.19 becomes indeterminate. In
the case of pulses of arbitrary shapes it is easy to see that we get:
∆Pn+1 =
X
(αj /I) +
j
Z Pn+1
Pn
f (t)dt
(2.21)
where the sum runs over only those neighbors that do not yet deliver their
full synaptic current in a single cycle. This is an integral equation depending
on the shape function f (t). Despite the arbitrariness of f (t) we can see that
if ∆Pn+1 → ∆Pn , then both must approach zero, i.e. the sequence must
be convergent to the limit cycle. If ∆Pn+1 /∆Pn → r < 1, then too the
convergence is obvious. If as a first approximation we replace the arbitrary
pulse by a square one of same height but a width giving unit area, then
we would expect a geometric convergence similar to Eqn. 2.18- 2.20. The
details of the pulse shape may produce smaller perturbations on the rate of
convergence without affecting the general pattern.
21
2.5
Nonleaking networks with finite width action potential
We simulate a square lattice with periodic boundary conditions to reduce
finite size effects. It has been noted that a 40 X 40 lattice is sufficient to
demonstrate all the important characteristics. At each step the neurons
are updated according to the charge they receive from their neighbors and
the external current during the time loop and then, if the potential reaches
threshold, it fires. The effect of the finite width of the action potential can
be two-fold: (1) reduction of u over a number of time loops corresponding
to the width of the AP after a neuron begins to fire, and (2) the arrival of
the synaptic current Tij f (t) over a number of time loops according to the
shape of the packet f (t). We do not expect any role for ectopic (i.e. out
of place) pulses resulting from recharging to threshold before the previous
firing is completed, though the synaptic current arriving at a neuron while
it discharges will be retained for the next firing. Hence, in the nonleaking
models it is unnecessary to model the actual shape of the fall of u. The results
will be indistinguishable from those of a zero-width model in such models,
though the situation in leaking models will be quite different because the
leakage will depend on the falling potential. The distribution of the synaptic
22
current over multiple time loops, however, will carry the nontrivial differential
characteristics of finite width action potentials in all models. In this section
we consider only a nonleaking model. We have in our algorithm provision for
generating pulses of various shapes in the form of discretized f (ti ) over the
time loops during which the action pulse works. Arbitrary shapes can also
be given as input. However, we have only done the simulations with square
and isosceles triangular shapes for simplicity.
2.5.1
Effect of pulse shape
In Fig. 2.1 and Fig. 2.2 we present our simulations with A = 0.96 (i.e.
Tij = 0.24), I = 1 and width = 0.01 (which is 1/4 of P ) using first a square
pulse (Figure 2.1), and then a triangular one (Figure 2.2). We note that
indeed in both cases the period 0.04 is equal to P [= (1 − A)/I], as proved
earlier. However, there is a slight difference between them in their rate of
convergence to the limit cycle; the square pulses almost converge after t = 4,
whereas for the triangular ones a similar level of convergence happens after
t = 6. This can be expected on account of the simplicity of the square
pulse allowing it to adapt itself more easily for phase-locking. Though only a
few complete periods are shown in the figures, convergence can be deduced,
despite the complexity of the pattern in each period. When successive periods
23
show the same complex pattern, since any time development depends on the
previous step only, it is bound to develop in the same way later too.
16
14
no of firings
12
10
8
6
4
2
0
5 5.02 5.04 5.06 5.08 5.1 5.12 5.14 5.16
time
Figure 2.1: Square pulse with width w = 0.01, A = 0.96, I = 1.
30
no of firings
25
20
15
10
5
5.9
5.92
5.94
5.96
time
5.98
6
6.02
Figure 2.2: Same as Fig. 2.1, but with a triangular pulse.
24
2.5.2
Convergence rate and the value of A
In Figure 2.3 and Figure 2.4 we show the effect of changing the value A,
keeping the same ratio of the width with the corresponding period P . Figure
2.3 is for a square pulse and Figure 2.4 is for a triangular one. Of course,
varying A changes the period. But the relative magnitudes of A, I and w
vary according to Eqn. 2.18- 2.21. A higher contribution from the neighboring neurons than the constant external driver seems to delay convergence,
which may be expected on account of the more intricate interactions in phase
adaptation of the neighboring neurons than in their susceptibility for acting
cohesively with a common external driver. For A = 0.24, i.e. α = 0.06, we
get practically immediate convergence after all the neurons have fired for the
first time. This feature is common for both shapes of pulses.
12
no of firings
10
8
6
4
2
0
0
0.5
1
1.5
2 2.5
time
3
3.5
4
Figure 2.3: Square pulse, with A = 0.24, I = 1, w = 0.2
25
18
16
no of firings
14
12
10
8
6
4
2
0
0
0.5
1
1.5
2 2.5
time
3
3.5
4
Figure 2.4: Same as Fig. 2.3, but with a triangular A.P. pulse.
2.5.3
Effect of region of initial excitation
The network can be made to receive an initial excitation at the periphery, or
throughout the network. In Figure 2.5 we repeat the experiment of Figure
2.2, with the same parameters, but with only the peripheral neurons initially
excited. We see that the general pattern remains the same, and cannot be
distinguished from the whole net excitation of Figure 2.2. However, in Figure
2.6 we have done the same with the smaller A(= 0.24) case, first studied with
whole net excitation in Figure 2.4. There is remarkable synchronicity in this
case, the great majority of the neurons firing together periodically. We can
understand what is happening here if we picture the initial development of
the system. All the inner neurons are initially uncharged and they do not
receive pulses from neighbors until their first firing at t = 1, which is caused
26
simultaneously by the common driver I. Henceforth, all these inner neurons
remain phase locked. By contrast for A = 0.96, the external current has little
role in charging up the neurons, and hence the neurons get phase-locked only
after their mutual interactions, making the situation similar to the whole net
excitation.
160
150
no of firings
140
130
120
110
100
90
0.8
0.9
1
1.1 1.2
time
1.3
1.4
1.5
Figure 2.5: As Fig. 2.2, but with peripheral initial excitation.
800
700
no of firings
600
500
400
300
200
100
0
0
1
2
time
3
4
5
Figure 2.6: As Fig. 2.4, but with peripheral initial excitation.
27
2.5.4
Synchronicity, bin-size and dynamic entropy
While the synchronicity just described, induced by the common external
agent, may indeed be genuine, the other synchronicities may be more suspect.
Since in simulation experiments the time loops must occur after a small but
finite duration, the procedure puts many events of different exact times in
the same bin. This is revealed if we change the bin size in the experiment.
Neurons that appear in the same bin seem to split up when the resolution is
increased. In Figure 2.7 we show the results of lumping 10 loops of Figure 2.2
into one loop. Apart from the proportionately higher bin counts, convergence
also appears to arrive faster, which of course is only an artifact.
65
no of firings
60
55
50
45
40
35
3 3.02 3.04 3.06 3.08 3.1 3.12 3.14 3.16
time
Figure 2.7: As Fig. 2.2, but with a time step 10 times greater.
Because of the built-in time scales related to charging rate, affinities with
nearest neighbors, and the width of the action pulses, it is not expected that
28
such neural networks would show self-similarity, i.e. a fractal structure. It
is obvious that if the time bin size is made small enough, the amplitude, i.e.
the number of firings per bin, would be normally be reduced to a sequence
of units and zeros, though, because of the discrete nature of the upgradings
in a computer simulation, there may be occasional coincidences.
In a time series Kolmogorov-Sinai (K-S) entropy [13] is often used as a
characteristic measure of randomness of the series. Let be the probability
of the system being in states {ik } at times {k}T , k = 0, 1, ..., n in a time
sequence, be pi0 ,i1 ,...,in . Then we can define
X
Kn = −
pi0 ,i1 ,...,in ln(pi0 ,i1 ,...,in )
(2.22)
i0 ,i1 ,...,in
Now, Kn+1 − Kn is a measure of the information needed to predict which
state the system will be in at time (n + 1)T , after having reached one of the
possible states at time nT . The Kolmogorov-Sinai K entropy is then defined
by the limit
K=
lim
n→∞,T →0
Kn+1 − Kn
(2.23)
Hence, a completely random sequence has the highest K-S entropy as the
next step can be any, with equal probability; a periodic system with some
noise will have less, and a completely periodic system, i.e. for a phase-locked
29
periodic system, the entropy is zero, as only a particular repetitive sequence
of states i1 , ..., in will occur after phase locking, with Kn+1 = Kn , and the next
step will be completely predictable. In that case the entire trajectory with
the length of the periodic subsequence would stand for a single “dynamical
state”, which is fixed, with probability 1, leading to zero entropy.
2.6
Leaking networks with finite action potential
In Figure 2.8 we have shown the results of simulating a neural network similar
to that of Figure 2.2, but with a finite value of the leaking resistance R.
Compared to Figure 2.2 we see a higher time period T = 0.047, though in
this case the convergence seems to have taken place much earlier at t = 1.7.
For R = 2, we get T = 0.069, and the limit cycle seems to appear only
around t = 3.5. The smaller resistance of course dampens the charging more
and this results in the higher time period. It also effectively reduces the
effect of the charging external current I and, as we have commented earlier,
this weakening of a common driver increases the role of the complex phase
adjustments of the neighbors, which in turn delays the convergence to the
limit cycle. If we integrate Eqn. 2.1 assuming like H &H that all the synaptic
30
current from the neighbors arrive at the beginning of the cycle, and that we
have narrow widths as a first approximation, but keeping R as a free time
no of firings
scale, we get
28
26
24
22
20
18
16
14
12
10
8
2
2.05
2.1
time
2.15
2.2
Figure 2.8: As Fig. 2.2, but with leaking neurons with R = 10.
T = R ln[(IR − A)/(IR − 1)]
(2.24)
Again, this is derived assuming that each neuron gets the full synaptic
current from the neighbors at the very beginning of the cycle and a narrow
width as in H-H. We notice that though IR is a dimensionless quantity here,
the period T is controlled by the scale of R if IR is constant, i.e., if we vary
both I and R keeping IR constant, then
T1 /T2 = R1 /R2
31
(2.25)
So, for I = 1, which we have used throughout as an external time scale reference, making R = 1 should give a singularity. This physically corresponds
to a situation where the leakage drains away all the charge it accumulates
from the environment and, hence, the neuron can never charge up to saturation to get the requisite potential to fire, as the contribution from the
neighbors is insufficient. Indeed when we tried to simulate a network with
I = 1 and R = 1, not a single neuron fired. However, if we change R to a
slightly higher value, e.g. R = 1.0001, other parameters remaining identical,
we see a very interesting phenomenon. The neurons now get sufficient accumulated charge from their initial random charges, the external current and
the synaptic current from neighbors to defeat the damping of the potential
due to leakage and reach firing threshold. However, as the damping is severe
near the critical value R=1, the differentiation of the initial phases becomes
relatively unimportant and many neurons move towards synchronicity. In
Figure 2.9 we see the first bunch of firings, with some dispersion, whereas in
Figure 2.10, which gives a more extended picture, the synchronicity appears
to become more established in successive firings. This is in contrast to the
case of weaker damping (Figure 2.8) described above, where periodicity establishes phase-locking, but not synchronicity. Eqn. 2.24 seems to be more
accurate for large R with small T , than for small R near criticality (Table
32
1).
25
no of firings
20
15
10
5
0
2.6
2.8
3
time
3.2
3.4
Figure 2.9: First firing of a net of leaking neurons, with a resistance just
above the critical value of R = 1.
1200
no of firings
1000
800
600
400
200
0
0
5
10
time
15
20
Figure 2.10: As above, showing successive peaks with growing synchronicity.
The first firings shown in detail in the previous figure are the small clump at
around t = 2.8.
It is understandable that for large periods the discrepancy between Eqn. 2.24
and the results of the simulation increase. The neighboring neurons can then
distribute their synaptic contributions over a larger duration, and any signal
33
Table 2.1: Time period for leaking networks
R
Tf ormula
Tsim [w = 0.001]
Tsim [w = 0.002]
1.2
0.22
0.095
0.21
1.5
2.0
0.11 0.078
0.078 0.068
0.11 0.077
5.0
0.050
0.051
0.049
10.0
0.044
0.048
0.045
arriving late is attenuated by R less than those arriving at the beginning
of the cycle, which we have assumed. So, for large T , the effective A in
Eqn. 2.24 is higher than the real A, which decreases the period to some extent. We also note that if the width of the synaptic current pulse is changed
there may be a change in the period, because a wider pulse is more liable
to get attenuated by the damping R. The last row of the table presents the
results of simulating with w = 0.002, which gives almost perfect agreement
with Eqn. 2.24. For much smaller values of w, at low values of R, i.e. for
high leakage and damping, the agreement is less satisfactory, because the singularity at R = 1 seems to affect the narrower (smaller w) action potentials
more than the wider ones, which have a chance to spread out.
34
2.7
Discussion
In this work we find that many of the results of an over-simple zero-width
integrate-and-fire neural network are valid for more realistic finite-width
models. We have proved analytically that even with a finite width, the system converges to a limit cycle with a period identical to the zero-width case.
However, the problem of convergence is more complicated for the case of finite
width AP spikes, and we have shown that probably only asymptotic convergence ensues as a geometric sequence, which depends on the parameters of
the model, i.e. the width, the external current and the synaptic coupling with
the neighbors. The shape of the pulse does not seem to be the determining
factor in controlling convergence, though it might have some perturbative
modifications. The question of synchronicity seems to be complicated by the
inevitable finiteness of the time steps in a simulation and opens up the possibility of investigating any fractal properties associated with such networks.
The radical difference between peripheral and whole net excitation also seems
interesting and worth further investigation. Stochastic and mean field approximations may give analytic results where exact methods may be limited,
as in the case of leaking neurons, though the simple formula Eqn. 2.24 for
the period, derived with drastic assumptions gives remarkably accurate predictions. The transition to sharp synchronicity in almost critically damped
35
networks may therefore be amenable to more precise analysis.
36
Chapter 3
Neural Networks with
Quantum Interactions
In the next three chapters we shall describe three different models of neural
networks where, in artificial devices, the classical action potential is replaced
by quantum interactions, and the “neurons” are quantized objects with states
represented by allowed eigenstates of an operator that correspond to a measurable quantity. This quantity may be the state of spin or polarization of
a particle or a microsystem behaving like a particle, and in the simplest instance, may be like a binary bit with two possibilities, but in general may
also be superpositions of eigenstates, until a measurement is made, when it
would collapse to either of the two possibilities.
37
In this brief chapter, which is a general introduction to the next three
chapters, we describe the differences of these three models from one another
and from the classical model in some detail, so that a comprehensive view of
the models and also our motivation for considering them may be seen.
It is also true that as per Moore’s law the density of electronic devices in a
chip doubles about every 18 months. The limit at present is about 108 transistors per chip, and the average size of a component unit is of the order of
100 nm. At the rate of technological refinement given in Moore’s law, within
the next 12 years we shall reach the size of an atom for the storage of a single
bit of information, and then quantum effects will become impossible to avoid
in our considerations of the working of such chips. Hence, it is a fair guess
that in the foreseeable future we have to concern ourselves with the quantum
properties of AI systems anyway, whether we are interested in utilizing the
advantages of quantum algorithms or not. This makes a comparison of different levels of quantum effects and their relation to legacy classical “hardware”
a subject of great interest. While making such comparisons it is necessary to
make sure that other features of the models remain more or less identical, so
that we determine the newly emerging quantal effects of greater coherence
with clear distinction. This limits us necessarily to relatively simple systems.
Our designs of some simple quasi-quantum and quantum artificial intelli-
38
gence systems produce interesting results, suggesting the promise of practical
devices being produced by using similar, if not exactly the same, schemes in
the future for the contexts we shall mention in each case. Our results also
validate further research both experimentally and theoretically in these directions. The concepts and conclusions shown in these models also advocate
future investment in designing more complicated devices along this line, and
also support experimental research in the practical design of parts used in
these schemes. Most of these elementary parts, however, are shared by a
number of other nano-devices and quantum computers, and hence these designs would act as further incentives for creating practical quantum computer
parts such as logic gates.
For each model we produce theoretical formulations and simulations using
our simple designs.
3.1
Quantum Interaction between Nodes and
Coherence Scale
In the classical biological neural network the action potential proceeds like a
soliton wave across the axon of the neuron and across synapses and causes
neurotransmitters to transfer to the target cell, which , after accumulating
39
sufficient charge from all such receptors, itself fires when the potential difference across the membrane crosses a threshold. The integrate-and-fire (I&F)
model is a simplification of this process, giving the signals bi-directionality,
which is not actually present in biological networks, and also connecting only
geometrically nearest neighbors in a Z n grid. Despite its simplicity the classical I&F model is an excellent theoretical laboratory to test many aspects
of the dynamics of a neural network, including its periodicity.
When formulating a quantum version for artificial intelligence (AI), we
can consider a hierarchy of allied models, all of which may be physically realizable, if not right now, at least later, because all obey the laws of physics.
They are not approximations of one another in a hierarchical sequence. They
are simply models with different properties, resulting from the different degrees of coherence on the net, i.e. the spatial extent of maintained coherence,
which is principally a technological problem. Hence, the sequence of models
may indeed be related to the class of AI devices which will arise in time in
a hierarchical mechanism of increasing coherence. Nevertheless, they may
also co-exist at the same time, as representations for different levels of performance or for different demands.
Our motivation for choosing these models is to compare the temporal
development of these neural networks with different levels of coherence, and
40
hence different combinations of classical and quantum properties. As we have
remarked in the introductory chapter, periodicity is an important characteristic in the H-H type networks, and we have shown in the previous chapter
that it is retained in the classical model even if we introduce finite widths of
the action potential, instead of taking them as zero-width idealizations. In
the quantum processes, we would again expect that time scales would come
in again, one possibly depending on the strength of the coupling, one on the
type of input, one involving the details of the interaction at each node, which
may be a quantum analogue of the shape and width of the classical action
potential.
Our models represent essentially memory devices with forms of dynamic
memory depending on the input, which to some extent at least dictates the
mode of oscillation of the system, principally the global periodicity, in a
complex, possibly changing pattern.
We shall see that the chief difference in the systems we deal with, compared to the classical H-H I&F model [11], is the scale of quantum coherence
among the components.
41
3.2
Technological Realization of Quantum ‘Neurons’ and ‘Action Potentials’
Two-state physical objects are not difficult to find in quantum mechanics. A
spin-1/2 real particle in a magnetic field, an exciton with two energy states,
a photon with two polarizations, can all have superposed mixed states of the
two eigenstates and serve as a “qubit” to be defined later. Progress has been
made with a number of systems, theoretically and experimentally:
i) NMR spins [16] The quantum registers can be built of liquid hardware
containing about 1018 molecules in a strong static magnetic field. A qubit is
the spin of a nucleus in a molecule. The interaction between such nodes are
facilitated by quantum gates that utilize resonant oscillating magnetic fields
or Rabi pulses used in nuclear magnetic resonance (NMR) technology. The
spin-spin Hamiltonian between neighboring atoms provide the theoretical
basis of information exchange. Many quantum algorithms, including Grover’s
algorithm, the quantum Fourier transform and Shor’s algorithm, have been
executed experimentally using NMR and molecules containing three to seven
qubits. However, it is not possible to go up in scale on account of the difficulty
of chemical synthesis beyond 6 to 7 spins, and the measured signal drops
exponentially with the number of qubits in a molecule. Hence, the idea of
42
quantum computing with NMR has fallen out of favor.
ii) Quantum electrodynamics techniques [37] Experiments have been
performed in which a single atom interacts with a single mode or a few modes
of the electromagnetic field inside a cavity and the two states of a qubit can
be represented either by the polarization states of a single photon, or by
two excited states of an atom. Such QED techniques in a cavity have allowed the implementation of one and two-qubit gates and have been used to
demonstrate entanglement between quantum states.
iii) Cold ions in a trap [38] The quantum register is a string of ions
confined by a combination of static and oscillating electric fields in a linear
trap known as a Paul trap. A qubit is a single ion with two long-lived
states of the ion making up the two states. After a quantum computation
is completed, the state of each ion can be measured using quantum jump
detection when each ion is bombarded with laser light, whose polarization
and frequency is adjusted so that it absorbs and then re-emits photons only
if it is in the upper state.
The interactions between qubits result from the collective vibrational motion of the trapped string of ions. To implement the two-qubit C-NOT gate
to be described later, Cirac and Zoeller [40] suggested a scheme where the
quantum state of the control qubit is mapped onto the vibrational state of
43
the whole string (which is known as bus-qubit), and laser beams are focused
on that ion. A gate operation can then be performed between the bus qubit
and the target ion. The effect of a laser beam on the target qubit depends
on the state of the bus-qubit and hence it is the control. This state is then
mapped back onto the control ion. The C-Z gate has been constructed by
the Innsbruck group [39], using two 40 Ca+ ions in a linear trap which can
be individually addressed using laser beams. A generic single-qubit state is
encoded in a superposition of the ground state S1/2 and the metastable state
D5/2 with a lifetime of about 1 s.
In this type of system sources of errors are the heating due to fluctuating
electric fields and the ambient magnetic field laser frequency noise. The
CNOT gate takes about 10 µs to operate, whereas the decoherence time
scale is of the order of 1 ms [41].
iv)Superconducting systems [42]
In superconductors, the Cooper pairs are confined to boxes of micron
size. In a Josephson junction a Cooper pair box is connected by a tunnel
junction to a superconducting reservoir. They enter the island one by one
when a control gate electrode, capacitively coupled to the island, is varied.
The island has discrete quantum states and, under appropriate experimental
conditions, the two lowest energy states form a qubit. The manipulation of
44
one-qubit states is possible: a microwave resonant pulse of duration t induces
controlled Rabi oscillations between the states. The decoherence time scale
is about 10 s for this circuit [43], which is more than sufficient to implement
a single-qubit gate that takes as short as 2 ns. A two-qubit gate was recently
operated using a pair of coupled superconducting qubits [44].
3.3
Scales of Coherence
Among our models we have used quantum gates in the second and third,
to create the interaction between qubits. However, in the first model to be
described in the next chapter, we simply have a qubit-like two-state entity
being acted on by nearest neighbor classical sources of potential, similar to
the classical sources of current in the H-H I&F model. We are presuming
here that no node can see the other nodes as superpositions, but only as
decohered classical objects. So, in this model the scale of spatial coherence
is the minimal.
For the next two models we assume that two neighboring qubits interact
through a quantum gate. Since each qubit can see the neighbor as a quantum
superposition, the scale of coherence has increased here to the interqubit distance, at least. But in the second model, it stops at the lattice distance.(Fig.
45
3.1), whereas in the third model that coherence extends throughout the lattice, i.e. the entire network. In the last case we can use the entire lattice as
a quantum register of entangled qubits. In this case all the gates also must
act jointly to maintain the coherence property whereas in case of the second
model each node can be updated separately with its neighboring gates acting
in unison.
3.4
Time scales of quantum gates and system
evolutions
Even without the decoherence problems mentioned in section 3.2 which are
mostly stochastic and related to interaction with environment and noise of
various kinds and which can be improved for better performing devices in
future, a quantum gate has intrinsic time scales set by quantum laws. Toffoli
and others [46] have obtained expressions for such time limits. It had earlier
been shown [45] that the minimal time needed for the autonomous quantum
evolution of a system from one state to another one orthogonal to the initial
state given by
τ=
h
4E
46
(3.1)
k
v
(a)
(b)
c-NOT
updated qubit
control qubits
(c)
((c)c)(c)(c)(c)
all qubits entangled
coherent
cNOT
gates
(d)
Figure 3.1: (a) In classical H-H network each node receives a current k from
each nearest neighbor which has fired; (b) In our first model of quasiclassical
network, every node is a qubit-like object, but sees its neighbors as decohered
classical sources of potential; (c) In our second model, a quantum gated
one, every qubit interacts with its nearest neighbors seen as qubits, but
the coherence length is the interqubit distance. (d) In our third model of
completely entangled network all qubits are in coherence and the gates also
maintain the coherence.
47
where E is the average energy of the system hEi.
For single gate operation with a flip between states with energies E1 and
E2 , the minimal time is given as
τ=
h
2|E2 − E1 |
(3.2)
and the operations of a flip followed by a phase change of θ is given by
τ=
h
(1 + 2(θmodπ)/π)
4E
(3.3)
Numerically, for an ion-trap involving Ca+ ions , using laser light with
wavelength 397 nm, the time is of the order of 10−16 s. This appears much
smaller than the present-day stochastic limits, but may become a genuine
limit which has to be addressed in fast quantum computing.
The situation is similar to the finite width of the action potential. Hence,
when we simulate each of the models, we shall use techniques similar to that
for the finite width classical model. The updating of each node will be done
in small finite time steps, so that the small continuous effects on each other
may be taken into account as the system develops in time.
48
3.4.1
Quasiclassical Net
Here the procedure is fairly straightforward. The primitive qubit is a recyclable two-state entity, and it feels the total potential from nearest neighbor
sources, as well as from a global field, similar to the universal current charging all neurons in the H-H classical model (Fig. 3.1 b). Time dependent
perturbation theory is used for finding its development in time in terms of a
transition amplitude, and from that we get the probabilities of transition to
the other state in a nondeterministic quantum manner. In a simulation we
examine the development in terms of discretized time spans, with updated
values of the transition amplitudes expressed in terms of excited nearest
neighbors for each qubit in turn, in an arbitrary sequence over the lattice not
involving any time ordering. Then we get back to the first qubit again, and
the process of updating is repeated after the set small time interval.
3.4.2
Quantum-Gated Net
In this case too we do not let the quantum gate act instantaneously, but
divide the action of the gate into thin discretized time bins and consider the
sequence of the snapshots. Each node is considered one at a time, because
of lack of coherence beyond a nearest neighbor distance. The gates act on
the nodes one at a time, but all nearest neighbors of the qubit contribute
49
simultaneously in each time slab updating. In actual operations full gate
action may involve a number of laser pulses, which themselves have a finite
time structure, but we shall use a simplified uniform continuum in our model
that we shall discretize into uniform time slabs for simulation.
For any unitary operator U representing a quantum gate we get the master evolution equation, at time ti , the time of updating node i:
|i, ti + ∆ti
Y
|i + n, ti + ∆ti = Un (∆t)|i, ti i
n
Y
|i + n, ti i
(3.4)
n
where i²L ⊂ Zd is the vector position label of the qubit being updated in a
d-dimensional cubic lattice, and n indicates the relative vector position labels
of the nearest neighbors of any qubit. i is sequenced in time for updating
within each cycle. The neighbors actually remain unchanged until it is their
time to update:
|i + n, ti + ∆ti = |i + n, ti+n i
3.4.3
(3.5)
Completely Entangled Net
In this case we have a single transformation matrix representing the simultaneous updating of all the qubits at i²L ⊂ Zd at every small time step:
50
Y
|i, t + ∆ti = UN (∆t)
i²L
Y
|i, ti
(3.6)
i
The operator UN now involves all the qubits in the system (say numbering
N ), and is therefore a N XN -dimensional matrix, but fortunately of a block
tridiagonal form when arranged properly in terms of nearest neighbors.
3.5
Difference between our Quantum Models
and Existing Models
Though quantum computing has been a subject of great interest for some
time, there has been virtually no work involving quantum networks. Zak and
Williams [47] have presented the only other model we know of, which involves
a completely entangled qubit set of nodes, in their case a 3-node system, and
not a lattice, which we have retained as a common feature in all our models,
classical and quantum. We highlight the similarities and differences between
their model and our third network.
Both the model presented previously and our last model have an operator
acting coherently on an entangled quantum system. However, there are some
significant differences:
51
1. Their system has no input dependence. The operator acting on the
quantum wave function defines the patterns emerging, depending on the
mutual interactions of the nodes only. Different patterns may emerge, as the
operators keep repeating the interaction. However, the design of the operator
alone defines these patterns, in a simple way.
In our system, an arbitrary input signal changes the pattern in the system
and stores it, in some form. Hence, the former model is more like a quantum
EPROM (Erasable Programmable Read-Only Memory) and ours is like a
quantum RAM.
2. The former does not consider continuous time evolution of the system. The operator acts discretely and instantaneously and a new pattern
emerges due to that interaction. In our system, the system interacts with
the environment and receives an input state. The operator then makes the
system evolve with time in a continuous manner, and partial gate operations
at each step of updating produce a richer set of patterns, not possible in the
predefined complete gate operations.
3. Dynamical inputs as memory can be injected in our network.
52
Chapter 4
Quasiclassical Neural Network
Model
4.1
Introduction
The introduction of novel nano quantum devices and the idea of quantum
computers raise the possibility of new age artificial intelligence systems based
on quantum principles. Quantum factorization algorithms [16] suggest that
these new intelligent devices may be able to “think” in a smarter way for certain purposes. The basic concepts of quantum systems, on the other hand,
can also be used in macroscopic complex systems like social networks [48] or
the financial market [49] where stochastic behaviors and the ideas of choice
53
may be observed. On the other hand, recent researchers have been debating
about the possibility of quantum interactions in the human brain as well
[50, 51, 52, 53]. Although the possibility of quantum interactions in the human brain remains an unresolved debate, some of the key ideas can be used
to create artificial brains taking advantage of quantum interactions to show
effects unseen in classical systems. The experimental production of quantum
bits [16], especially those using semiconductor quantum dots [54, 55], makes
it plausible that artificial quantum systems will be producible in the future.
In that light, we attempt to produce artificial intelligent systems based on
quantum principles, that are able to store patterns or exhibit stochastic memory. In the present chapter we discuss a quasiclassical system [56], and then
we move to quantum gated systems [57, 58]
4.1.1
Comparison with the Hopfield Model
It was shown that classical neural networks with a finite-width action potential [30, 31, 32], rather than ones with delta function type action potentials
acting between neuron [12] may be a more realistic model of the actual human
brain.
Hopfield and Herz had found that there is a simple relation between the
contributions A from the neighbors and an external current I, with the time
54
period of the firing of the network when phase-lock is established:
τ = (1 − A)/I
(4.1)
In the previous chapter, we have shown [59] that phase locking takes
place for networks with an arbitrarily shaped action potential with finite
width. The first part of the quantum artificial intelligence design is inspired
by the original Hopfield-Herz model, but with further modification so that
the neurons “fire” stochastically in a semi-classical way and are placed in
a potential field rather than being immersed in a background current. The
introduction of uncertainty in the firing would introduce fuzziness in the
model, which might be an interesting aspect of an artificial machine not yet
produced classically
4.2
Quantum Transitions
We first consider the simplest case of a quantum neuron placed under the
influence of a time dependent interaction. A net potential may arise at
a neuron due to neighboring neurons, or due to a mean field interaction
averaging the effect all other neurons, or due to an external interaction.
For a state under the effect of such a potential,
55
|ti = U (t, t0 )|t0 i
(4.2)
where
U (t, t0 ) = exp[−i
Z t
0
dtH]
(4.3)
Here the Hamiltonian H consists of a static part H0 and an interactive
part V (t) from all other sources. One of the few exactly soluble cases is that
of the sinusoidal interaction in a two-state system, with energies E1 and E2 .
V (t) = beiωt |1ih2| + h.c.
(4.4)
This gives Rabi’s formula; if the system is initially in state |1i, then at
time t the probability that it will be in the state |2i is
P2 (t) =
b2 /h̄2
sin2 [(b2 /h̄2 + (ω − ω21 )2 /4)1/2 t]
b2 /h̄2 + (ω − ω21 )2 /4
(4.5)
Here ω21 = (E2 − E1 )/h̄. The probability of finding it in the original state
is
56
P1 (t) = 1 − P2 (t)
(4.6)
Hence, sinusoidal oscillation can be observed. It is important to note that
these closed, exact expressions are true only in a probabilistic sense, and the
oscillations and time periods mentioned here and later are only averages over
a large number of measurements of the state. Each of these measurements is
individually a completely random event, as postulated by quantum mechanics.
Generally, for a more complicated interaction, time dependent perturbation theory should to be used so that the interaction picture is more convenient, and the equation of motion takes the form
ih̄
∂|tiI
= VI (t)|tiI
∂t
(4.7)
Here the interaction picture states and operators are related to Schrodinger
states and operators by
|tiI = eiH0 t/h̄ |tiS
and
57
(4.8)
VI (t) = eiH0 t/h̄ VS (t)e−iH0 t/h̄
(4.9)
H0 is the noninteracting part of the Hamiltonian.
The transition rate per unit time from state |1i to |2i, to first order in
V , after dropping the I subscript for simplicity, is
¯Z t
¯2
¯
¯
0 iω21 t0
0
¯
Γ = (1/h̄ ) ¯ dt e
h2|V (t )|1i¯¯ /t.
2
0
(4.10)
This expression produces Fermi’s golden rule when t → ∞.
4.3
Some Analytically Soluble Cases
We consider a network of quantum neurons, taking perturbative approximations [60] into account. We can often get analytic results for certain types
of simple, but not unrealistic, interactions. They may be relevant to nanoelectronic devices. The potential, as before, may arise due to several possible
scenarios mentioned previously.
58
4.3.1
Constant External V, No Interneuron Interaction
The external potential V is turned on at t = 0 and remains constant at all
subsequent times. Making h̄ = 1 for simplicity, we get
2
t)
Γ21 = 4|V21 |2 sin2 (ω21 t/2)/(ω21
(4.11)
When we take the average time before transition to be τ = 1/Γ21 = t
above, we get a relation for it:
sin(ω21 τ /2) = ω21 /(2|V21 |)h
(4.12)
This gives multiple values of τ for the same value of V21 . The lowest value,
however, will stochastically dominate. If the energy difference between the
two states of the neuron is small, i.e. ω21 ∼ 0, then the time for transition
becomes inversely proportional to the interaction strength. This is seen in
the case below (Eqn. 4.15).
59
4.3.2
Harmonic External V, No Interneuron Interaction
Here we have V = V0 e−iωt + h.c. and hence transition rate finally becomes,
after time integration form t = 0 to t
1 − cos((ω − ω21 )t)
2
ω 2 − ω21
t
(4.13)
2
2 sin ((ω − ω21 )τ )/2
4|Vo |
2
ω 2 − ω21
(4.14)
Γ = 2|Vo |2
This gives, with t = τ = 1/Γ
1=
Again, we see that for the same interaction, multiple average transition
times of the neurons are possible. This relation is the first order approximation of the Rabi formula above, as expected. We note that we still get a
resonance when ω = ω21 with
τ = 1/|V0 |
Interestingly, this value is independent of ω and ω21 .
60
(4.15)
4.3.3
Exponentially Damped External V, No Interneuron Interaction
We now consider the external potential with an exponentially decaying form
with time constant 1/a
V (t) = V0 e−at
(4.16)
2
1 = |V0 |2 [1 + e−2aτ − 2e−aτ cos(ω21 τ )]/(ω21
+ a2 )
(4.17)
On integration we get
For small energy difference between the two states, we find a simpler
expression for τ
1 = |V0 |2 (1 − e−aτ )2 /a2
(4.18)
We again check that for a → 0 we get Eqn. 4.15
4.3.4
Damped Harmonic External Potential, No Interneuron Interaction
We have, in this more general case,
61
V (t) = V0 e−iωt−at
(4.19)
simply a shift of ω21 to ω21 − ω in Eqn. 4.17
1 = |V0 |2 [1 + e−2aτ − 2e−aτ cos((ω21 − ω)τ )]/((ω21 − ω)2 + a2 )
(4.20)
A resonance situation is observed when ω21 = ω
1 = |V0 |2 [1 − e−aτ ]2 /a2
(4.21)
We note that for small a we get as in the case of constant V ( Eqn. 4.15)
τ = 1/V0 , as expected.
4.3.5
No External Potential, Constant Interneuron Interaction
In this case, though the interneuron potential v may be constant, the turning
on time of the neighbors may be randomly different. We shall get only an
effective potential, smaller than the full value v, and in the long run, with
averaging, this too will be a constant, say f v, with f < 1. We then get the
62
formula Eqn. 4.15, with V0 → f v.
4.3.6
Constant External Potential and Constant Interneuron Interaction
We again get Eqn. 4.15 with V0 → V0 + f v, with f < 1 if the interneuron
interaction has a random time onset.
4.3.7
No External Potential, Damped Harmonic Interneuron Interaction
We assume the potential due to it neighbors (four for a square lattice network)
at any neuron to be given by
v(t) =
X
v0 eiωt+iδi −at
(4.22)
i
This is the most general form. With appropriate limits, this can be
converted to simple damped interactions or even constant interactions. After
integration and considerable algebra the transition rate is given by
2|v0 |2 −2at (1 + e2at − 2eat cos[(ω − ω 0 )t])(2 +
e
Γ=
t
a2 + (ω − ω 0 )2
63
P
i6=j
cos[δi − δj ])
(4.23)
Here, we have used the symbol ω 0 ≡ ω21 .
Resonance is seen again at ω = ω 0 . If there is phase-locking, the phase
factors remain constant, and we get the relation for the time for the average
transition from
a2 = K|v0 |2 e−2aτ (1 − eaτ )2
(4.24)
Here, K is constant depending on the constant phases. For small values
of a we again get a relation like τ ∼ 1/|V0 |.
4.4
Quasiclassical Hopfield-Herz type Neural
Network
We are now prepared to construct the biologically inspired model.
We consider a square lattice of quantum neurons that can communicate
with neighbors by means of quantum signals. The duration of interaction
is varied in analogy with the action potential currents of varying widths
in the classical case. We also retain a constant potential background, V0
R
background and then add the contributions dtf (ti )v from the i-th neighbor.
The integration is spread over the duration of the quantum pulse that is set
64
to zero at each triggering.
We keep the energy difference between the two states of the neurons is
small, i.e. ω21 ∼ 0. In this case, we get
1= √
h̄
1
= Rτ
P
(Γτ )
| 0 dt(Vo + fi v)|
(4.25)
For small perturbations this would look like
1 ≈ k 0 (1 − |
XZ τ
0
dtfi v|)/|
Z τ
0
dtV0 |
(4.26)
This is analogous to the classical formula [59, 12].
A certain difference between the classical and the quasiclassical model
can be observed here. In the classical case it is possible to have 1 = A
(in the standardized units used there) and get a zero period, i.e. an always
saturated net. In this quantum case this is impossible because the more exact
expression Eqn. 4.25 cannot give a zero. This is because in the quantum case
transition or non-transition at any time is never a certainty.
Here, we make the further assumption that the pulse width is a duration
greater than the average period, and we have to take a proportionate amount
of contribution from the neighbors. This gives the relation
65
τ = k 0 /(V0 + 4v/w)
(4.27)
This is a simple relation for the average period and can be solved in terms
of the system parameters V0 , width w of the pulse and the pulse size v.
We have a randomness in the contribution from the neighbors in the quasiclassical case, which was not present in the classical model. We introduce
a parameter q with the interaction v between neighbors to account for this
randomness. Now we get:
τ = k/(V0 + 4qv/w)
4.5
(4.28)
Input Dependence
The memory and experience stored in the real brain arise from a person’s
interactions with the environment by means of “perception”. A memory
device as well should be expected to respond to the environment and interact
with it, if it is to be of use. Data fed to the device can then be stored and
updated and compared with more inputs. However, the internal topology
and fixed weights of the connections contribute to “preferring” one set of
data over another or relating to a certain pattern. This might be similar to
66
the aspects of a person’s personality that is genetically fixed and appear as
addictions or phobias in extreme cases [61]
We next observe how our quasiclassical artificial intelligence responds to
the outside world when we vary parameters. The fuzziness and “average”
periodicity associated with the quasiclassical stochastic behavior is unlikely
to produce exact closed equations. However, we also try to see if a simple
formula like Eqn. 4.28 can give agreements with the simulation and examine
how a memory from an experience is eventually lost.
4.6
Results of Simulation
We see from our simulation that:
(1) In case of a 40X40 lattice with periodic boundary conditions, so that
it looks like an infinite lattice, we get a constant average for the neurons
for about 40,000 simulations with a given set of parameters but different
initial states. Hence, in an average sense we do have periodicity (Fig. 4.1)
and the system is not chaotic. If we follow its history for a single neuron,
we see an almost linear relation between the cumulative number of firings
and time (Fig.
4.2). If we consider the average sum of all nodes of the
lattice, the stochasticity almost disappears and we see a practically straight
67
line (Fig. 4.3).
25
no of triggerings
20
15
10
5
0
19.8
19.85
19.9
time
19.95
20
no. of triggerings
Figure 4.1: V0 = 0.2, width = 0.2, k = 0.2: typical pattern of the triggering
of the neurons in the quasiclassical neural network. There is apparently no
phase locking.
90
80
70
60
50
40
30
20
10
0
0
5
10
15
time (arbitray units)
20
Figure 4.2: Cumulative number of triggering against time. One can see a
fairly regular linear behavior despite quantum stochasticity. This is for a
single chosen neuron.
(2) As the strength of the signal from the neighbor increases, the time
68
160000
no. of triggerings
140000
120000
100000
80000
60000
40000
20000
0
0
5
10
15
time (arbitray units)
20
Figure 4.3: Same as Fig 4.2, but for the whole system.
period decreases (Table 4.1) as well. However, it does not appear to go
to zero, unlike the classical case, as we had expected, because the quantum
mechanical transition rate cannot have a singularity in this case. Hence, we
can make the signals arbitrarily strong for a quasiclassical network without
worrying about singularities or negative periods. The parameter q was fitted
to the first value and then Eqn. 4.28 was used to predict all the other periods
accurately.
(3) The pulse width is important here as well. If the pulse is spread out,
the average period becomes bigger (Table 4.2). Once again, we get excellent
agreement of Eqn.
4.28 with the simulation results, with the same value of
q used in Table 4.1.
(4) The input dependence is observed after averaging over 100 simulation
69
Table 4.1: Strength of Quantum Potential and Average Period of Neurons
(width = 0.2, V0 = 1): Best Fit
v
0.1
0.2
0.3
0.4
0.5
1.0
Tpred
0.049
0.036
0.028
0.023
0.020
0.011
Tsim
0.050
0.035
0.028
0.023
0.020
0.013
Table 4.2: Variation of Period with Duration of Quantum Potential (v = 0.2,
V0 = 1): Best Fit
width
0.1
0.2
0.3
0.5
1.0
Tpred
0.022
0.035
0.044
0.054
0.065
70
Tsim
0.023
0.035
0.043
0.054
0.065
runs. We used the following types of inputs:
(a) All peripheral nodes in state |1i and all body nodes in state |0i:
We see (Fig. 4.4) a smooth transition from a state with an initial firing
rate proportional to the number of initially excited nodes that die down
quickly and the system forgets the input and lets the system parameters
take over with a noisy pattern, despite the averaging over the runs. It is
remarkable that the initial few cycles with the memory show virtually no
noise.
18
no. of triggerings
16
14
12
10
8
6
4
2
0
0.05
0.1
0.15
time (arbitrary units)
0.2
Figure 4.4: Transition from short term behavior to asymptotic behavior with
all peripheral nodes initially in state 1.
(b) Peripheral nodes alternate between states |1i and |0i: body nodes
are in state |0i (Fig. 4.5). Here, we start with a smaller number of firings
because of halving the initially excited nodes, and there are a few kinks in
71
the initial cycles, most possibly due to the conversion of the spatial lack of
symmetry to temporal. The system moves to the common noisy asymptotic
behavior after forgetting the input as we had expected.
18
no. of triggerings
16
14
12
10
8
6
4
2
0
0
0.05
0.1
0.15
time (arbitrary units)
0.2
Figure 4.5: Same for peripheral nodes initially in states |1i and |0i alternately.
(c) Peripheral nodes in random |0i and |1i states: Here too we see fairly
prominent kinks (slightly less than the oscillating |1i ⇔ |0i pattern in the
previous case) from the interaction of the noncoherent randomness of input
in the neighboring peripheral nodes until the short-term memory disappears
(Fig. 4.6).
(d) We make the whole lattice initially uniform in state |0i. However due
to the external driving potential, the system soon develops into the noisy
final state.
72
18
no. of triggerings
16
14
12
10
8
6
4
2
0
0.05
0.1
0.15
time (arbitrary units)
0.2
Figure 4.6: Same for peripheral nodes initially in random states of excitation.
4.7
Discussion
We have shown that many quasiclassical neural networks can be analytically treated for suitably chosen external and interneuron interactions. This
may be relevant in nanoelectronic devices because of their harmonic and/or
damped properties. In certain extreme cases, such as resonances and small
damping, we obtain a very simple inverse relation between the time period
and the effective interaction potential.
Our results, both analytic and simulated, indicate that in a quasiclassical
microscopic neural network designed to mimic the classical neural network
with an integrate-and-fire model with classical action potentials in ways,
indicate the presence of an average time period reminiscent of the classical
model. However, the following differences are apparent for the quantum
73
version: 1) A system with a zero time period (infinitely fast) found in the
Hopfield-Herz model cannot be observed. 2) We can use arbitrarily high
potentials to link the neurons
We have also seen that the average system period depends on the system
parameters as given in Eqn. 4.28 in a simple way. The choice of a single
parameter predicts a wide range of potential pulse strengths and durations.
Hence, the quantum system is shows a universality.
We have not introduced dissipation to consider Hebbian learning. The
effects of dissipation in quantum neural networks have been studied by Altaisky [62] and Zak [47]. Such effects in the context of quantum computing
are an important field of intense current study [63] because many of the
advantages of quantum computing come from the entanglement of qubits.
Decoherence is inevitable in our quasiclassical model as the neurons are all
allowed to fire independently, with no quantum coherence or entanglement.
However, unlike these authors, we have considered large systems. Here the
role of adaptivity in the quantum context becomes too complex.
Other approaches for constructing and studying quantum neural nets can
be found in various papers [64, 65, 66, 67, 68, 80] , and comparison of their
work with our work can be found in some of them.
A very interesting possibility exists of a semiclassical neural network based
74
on the classical motion of phononic solitons obtained from quantum considerations, e.g. Davydov solitons along a protein chain [81, 82, 83]. A nonlinear
Schrödinger equation with an effective potential similar to Eqn. 4.22 with the
right kind of space and time dependence of the parameter a, may produce
such a soliton, though this cannot be solved perturbatively. These solitons in
an α-helix assembly may act as carriers of logical data for distributed nodes
in a network. The quantized signal would travel in a manner similar to a
biological action potential along an axon.
4.8
Possible Applications
Although concepts of quantum computer and related algorithms [69] together
with the experimental creation of simple qubits predict that in the future fully
quantum complex devices with qubits and gates will be feasible to produce,
experimental difficulties related with decoherence and noise prevent the possibility from coming into being in the very near future. However, quasiclassical
devices may be created to explore the move towards a quantum world. The
analytic results found using harmonic and/or damped interactions may have
electromagnetic origin and hence may be quite relevant in a network of nanoelectronic devices. The biologically inspired system considered here can serve
75
as short-term dynamic memory with a built-in mechanism for effacing all input dependence in course of time. This is somewhat similar to the retina,
which triggers once a certain threshold of intensity is reached [70] and then
it goes back to the initial state to receive a second input. In the human brain
too, short-term memory exists together with long term ones, and while fully
entangled devices described later may act as a memory holder, quasiclassical devices such as this may act as temporary “experience” devices where a
certain experience depending on a threshold triggers an action before that
“experience” is effaced.
The fuzziness introduced by the stochasticity is similar to a non-quantitative
experience that lingers for a short time, which may be compared with emotions evoked due to fuzzy ideas in the back of the mind. While any such
mechanism in the human brain is still controversial, it is possible to mimic
fuzzy short-term “feelings” triggering certain actions based on the intensity
of a perception in a device that acts “by instinct”. However, whether such
concepts need quantum indeterminism, and cannot be explained by classical
stochasticity is a debatable question which we shall not go into.
76
Chapter 5
Quantum-gated Neural
Networks
5.1
Introduction
In this chapter we move one step further towards the creation of quantum
artificial intelligence by introducing qubits and gates. While we observed the
effects of stochasticity when a quasiclassical model was introduced in the last
chapter, the introduction of qubits and gates can produce transformation of
quantum states instead of firings of neurons so that a memory is represented
by the states of qubits in a transformation space where rules are created by
the connecting quantum gates.
77
This is similar to a cellular automaton, with the gates dictating the rules
of transformation of a qubit. The rules we program into the quantum gated
lattice, will however, still be inspired by the classical neural network model
so that this gated artificial intelligence is still not a fully entangled device,
but a classical neural network recreated with quantum technology.
In a classical neural network, the plasticity of the weights [15] with which
the neurons are connected to each other contribute to the learning process.
Though classical learning is a one way irreversible process with permanent
changes made to the hardware of the device, in a quantum device the unitarity of transformations would make the “learned states” reversible. Hence,
learning in a quantum level cannot be realized in the same manner as classical
learning and a collapse of the quantum system to a well-defined eigenstate
might be necessary in order to have the results of the internal operations
retrievable by coupled peripherals.
However, the parallel processing of information within a quantum device
promises the possibility of handling data with speeds unattainable in classical
machines.
78
5.2
Hierarchical Structure in Quantum Machines
The input data is first processed locally and sent to the central system in
a form that best suits the analytic operations of the central nervous system (CNS) in biological nervous systems (see, e.g., [10]). This pre-processing
helps partly to avoid overburdening the CNS, so that it can perform its operations with greater efficiency and less probability of chaotic interference. For
example, in the eye, the primary signals from the retinal rods and cones are
processed by bipolar, horizontal, amacrine and ganglion cells, which themselves form an intricate network within the eye, before the signal reaches the
brain. In classical computers, most peripherals have their own “brains” for
this preprocessing of data before sending it to other parts of the machine.
Hence, it seems most probable that in quantum computing as well, the entire machine may have a hierarchical structure of smaller quantum networks.
Each of these structures would perform more efficiently than the corresponding classical analogue because of the quantum nature. It would be extremely
difficult to build a gigantic quantum machine as a single unit, since larger
systems would undergo decoherence more easily and, hence, would be less
manageable.
79
Research is carried out also to design electronic devices mimicking living
nervous systems, with sensory organs like the retina and the cochlea [71, 72,
73, 74] as electronic perception devices. Similarly, the motor-control subunits
of living systems have also been copied electronically [75, 76].
Hence, we may assume that future quantum intelligent machines, used
either in information processing, or in decision-making, may have component
subunits that operate quantum mechanically but are connected classically to
exchange the processed information. Each of these components may be small
enough to manage decoherence, and some of these peripheral units might be
quantum networks.
In the biological nervous system, input data is converted into a train of
pulses (Fig. 5.1). This is achieved by a neural system where the action
potential is triggered as long as above threshold input signals persist. Shortterm memory of the CNS is also created in the form of dynamic pulses within
a subsystem that is more strongly interconnected. This connectivity is either
genetic, or due to formation and strengthening of links coming from similar
repetitive experiences.
In the next section, we explain the nature and operation of the quantum
gates. The quantum gates act on superposed quantum microstates called
qubits and are the quantum analogues of logic gates used with electronic
80
+40 mV peaks
0 mV
-60 mV
resting potential
repolarization
overshoot
time ->
2 ms
Figure 5.1: A train of action potential pulses in a biological neural system.
devices such as transistors. These gates have been experimentally produced
[77] and their operations verified. We construct a network with such gates
and qubits. This network can act as a component of future realistic quantum
devices.
To fully exploit the notions of computer devices and the existing quantum
algorithms, the connectivity of the nodes of quantum devices need to be
studied extensively.
Although quantum gates and qubits have been used to design quantum
algorithms to calculate specific mathematical expressions, quantum networks
have not yet been designed properly to utilize quantum mechanics for the
purpose of learning and pattern recognition. Altaisky [62] has made some
preliminary investigations into a single quantum perceptron. As was stated
81
before, the irreversibility of the learning process is worth looking into when
dealing with quantum machines that can only undergo reversible unitary
transformation. Attempts have been made [78, 79] to explain such a change
with decoherence at the output with reversal of the intermediate processes
in a quantum computer. The transition to a certain eigenstate is inserted in
an ad hoc manner for quantum neural networks, as in Altaisky’s work and
also in that of Zak et al [47].
Here, we will into get into the complexities of decoherence, and retrieving
information from a quantum neural network. Rather, we concentrate in
examining the input dependence of such a quantum gated network to observe
how the qubits react and form patterns.
Some other authors [65, 66, 67, 68, 80] have tried to formulate the problem
from different perspectives and comparison of our approach with theirs may
be found in the cited references.
Our study will concern mainly the nature of dynamic memory in these
networks, and how they evolve. However, we review the basic concepts of
qubits and gates first and then go into the mathematical modeling and simulation experiments.
82
5.3
Brief Review of Elements of a Quantum
AI Machine: Qubits and Quantum Gates
5.3.1
Qubits
A quantum machine is markedly different from a classical one because of
quantum superposition, which allows each quantum bit or qubit to exist as a
superposition of multiple possible states. Each of these eigenstates in general
may represent a different value of the measurable quantity, i.e. if the quantity is measured in any experiment. In that case, the measured value must
be one of the eigenvalues of the operators. However, quantum uncertainty
makes it impossible to predict in advance which eigenvalue will be obtained
for each particular experiment although an ensemble would give stochastic
classical values corresponding to the probability of each of the states. The
coefficients used to construct the linear superposition of eigenstates, which
are in general complex numbers, represent the quantum amplitudes of particular eigenstates, that can be interpreted as the square root of the probability
for a superposed quantum state to collapse into one of the many possibilities. The result of each of the measurements is only through interactions
with classical devices that act as detectors, and are highly indeterministic,
although the probability of each outcome depends on the squares of the prob83
ability. Several models exist to try to bridge the link between the quantum
and the classical world [84, 85, 86, 87] However, although the quantum measurement process is an open problem in physics, experimental results predict
that the quantum world and the classical rules co-exist in different limits,
although the overlap between the two is still poorly understood. We keep
the philosophical/speculative or developing models outside the scope of this
dissertation, and stress on how quantum bits can be used to get classical
results in an efficient way.
In a quantum bit, with superposed states, if the unit is a particle with
spin 12 , then quantum mechanics allows a measurement process to give only
one of the two possibilities for the z-component – spin up, with sz =
1
,
2
or spin down, with sz = − 12 , although the state of the system before the
measurement may be a superposition.
Symbolically
¯
¯
À
À
¯ 1
¯ 1
¯
¯
|ψi = c0 ¯−
+ c 1 ¯+
2
2
(5.1)
However, experiments indicate that at present it seems more likely that
polarized light, with two spin projections or helicities, may turn out to be a
better candidate as qubits than material particles with spin 1/2 such as the
electron or the proton, because quantum gates already exist to operate on
84
photons.
Until decoherence or measurements take place, collapsing the entire quantum network into one of the possible macrostates, the qubits are allowed to
interact with one another and change the coefficients of the superposed states
so that the probability of the system collapsing into a specific microstate dynamically changes because of the rules programmed into the gates to update
these coefficient.
5.3.2
Quantum Gates
Like classical logic gates, quantum gates also give a deterministic output
from a number of inputs. Classical gates such as AND, OR, NOT, XOR etc.
all have their quantum analogues. These gates effectively act on an output
qubit to change its state based on the states of the input qubits. The gate
may depend on more than one inputs and must be precisely positioned not
to introduce any bias towards any of the inputs.
Let us first consider a single input quantum gate, the NOT gate. It can
be represented by the equation:
UN OT (c1 |1i + c0 |0i) = c1 |0i + c0 |1i
85
(5.2)
which corresponds to a spin flip for both the components.
This gate performs a unitary transformation, a rotation in this case, in
the Hilbert space of quantum vectors.
Unitary operators keep the norm of a state vector unchanged, i.e. if we
have the operator U operating on the usual state vector |ψi in the Hilbert
space, and its Hermitian adjoint U † is the corresponding operator in the
adjoint Hilbert space , where the mirror state vector is hψ|, then
hψ|U † U |ψi = hψ|ψi
(5.3)
implying that the operators U and its Hermitian adjoint U † are related by
U U † = U †U = 1
(5.4)
Hence, the mirror of the operator U operating in the adjoint state vector
space is simply its inverse operator and the norm of the vector given by the
scalar product of the adjoint vector and the original vector, hψ|ψi, stays
invariant.
The conservation of the norm ensures the closure of the entire quantum
system so that even the probability of a certain state changes, the total
probability of finding the system in one of the possible eigenstates remain
86
the same. This means that the system will definitely collapse to one of the
possible eigenstates.
The novelty of quantum gates is that, unlike the classical case, there are
quantum gates [16] representing “square root” of NOT, where the rotation
is only halfway through for the two eigenstates. There are also phase transformation gates that selectively change the phase of one component of a
superposition.
In the case of classical gates, there is an important theoretical result,
which has also been applied to construct many circuits. This states that all
the different types of multi-bit gates, viz. AND, OR, NAND, NOR and XOR
can be constructed from combinations of only one universal gate. This is the
NAND gate.
A similar theorem may hold for qubit operations or multi-qubit ones.
The universal set of all unitary transformations may be represented by combinations of the controlled NOT or cNOT gate as described below, and two
single qubit gates – one gate that changes the phase difference between the
coefficients c1 and c0 in a particular ratio, and a Hadamard gate that mixes
up the up and down components with different signs as represented by the
matrix:
87


1 1
1 

UH = √ 
2
1 −1


.

(5.5)
The controlled NOT or cNOT gate flips a target qubit depending on
whether or not a control qubit is in the on state or the off state. If the
c-qubit is |1i, then it is capable of flipping the target (NOT). If the c-qubit
is in the |0i state it does nothing. The c-bit itself is never changed by in the
operation. Hence, it can be represented by the matrix in the product state
space Hc
N
Ht of the control qubit and the target qubit, where the product
eigenstates can be listed in a vector form as |0ic |0it , |0ic |1it , |1ic |0it and
|1ic |1it . The matrix form of the cNOT operation, thus, may be given by


 1 0



 0 1

UcN OT = 


 0 0



0
0
0
0 



0 




1 



(5.6)
0 0 −1 0
Here the first two product states remain unchanged as the control qubit is 0,
and the next two product states are interchanged with the c-qubit remaining
unchanged at 1, but the t-qubit flipping.
For unitarity purposes, a phase change during flipping is needed, as can
88
be seen easily when real coefficients are used for the qubit superposition. If
the control and target bits are
|ci = a|0i + b|1i
|ti = c|0i + d|1i
(5.7)
then, normalization of the states demands that
|a|2 + |b|2 = 1
|c|2 + |d|2 = 1
(5.8)
We get after the cNOT operation
|c0 i = |ci
|t0 i = (ac + bd)|0i + (ad − bc)|1i
(5.9)
and it can be checked with real coefficients that normalization of the target
qubit is preserved.
89
Like a quantum computer, a quantum information storage machine is
most likely to be built with quantum gates, and there may be appropriate
combinations of Hadamard, phase change and cNOT gates to produce different types of processing, representing unitary transformations required in
the algorithm. It is possible that in the future the arrangement of gates
may be automatically controlled by the problem, and, hence, a quantum
neural network may be more faithful in mimicking a biological brain than
a classical neural net. The only firmware in the latter are the EEPROMs
(Electrically Erasable Programmable Read-Only Memory) used in applications such as the BIOS, and though the programming of the EEPROMs is
relatively straightforward, it is also very time-consuming. Additionally, for
the quantum machine, the signals acting on the gates are at the quantum
level and are hence at the speed of light, leading to very fast data processing.
The high connectivity of neurons within the brain (where one neuron
can reach to 10000 other neurons [88]) also predicts that quantum machines
might be able to mimic the biological neural nets better than electronic ones.
Quantum gates acting on photon nodes may have multiple connectivity, on
account of the possibility of using coherent extended wave functions, while
in classical electronic gates, connectivity is usually limited to a few inputs
and outputs.
90
However, we refrain from speculating further here and concentrate on
the possibility of designing a simple system based on fixed links between the
nodes. Our studies in the later sections would indicate the major differences
a quantum network would have with a classical network. These results may
later be used when more complex network systems are created.
Particularly, we would like to examine how static input data is transformed into dynamic memory within a quantum network. This property
is observed in biological networks and such results obtained from quantum
networks will predict the feasibility of designing quantum networks as more
faithful imitations of the living brain since electronic neural networks have
so far not been able to reproduce this characteristic. The so-called “dynamic
memory” in the form of RAM retains the data in static patterns as well, even
though it is refreshed regularly. However, short-term biological memory [89],
which is often used in quick decision-making is dynamic and is expressed as
an expanded pattern of oscillations in the network.
We now examine the behavior of a simple quantum network where the
qubits are connected with quantum gates to examine future complex quasibiological quantum networks.
91
5.4
The Quantum Neural Network Model
As was seen previously, in the I&F model a neuron receives a current from
the fired neighbors, and when its own potential exceeds the threshold, it fires
as well, feeding its own neighbors. Since in a quantum process all transitions
of the neurons must be designated by unitary operators, in place of the firing
of a neuron, we have a less spectacular unitary transformation that simply
performs a rotation of the state vector or the qubit.
This operation should involve time too in principle, and thus we write:
|ti = U (t, t0 )|0i
(5.10)
to indicate the transformation of a neuron from time to to time t. For small
time changes it is possible to write:
U (t + dt, t) = U (t, t) + idtH
(5.11)
d|ti = idtH|ti
(5.12)
So that
in the lowest order, with a Hermitian operator H, the Hamiltonian.
92
As stated earlier, in quantum computing a complete set of unitary operators may make use of Hadamard gates, phase change gates or controlled NOT
(cN OT ) gates. Entanglement between different nodes may be manipulated
by using these gates, e.g. the cN OT or the Toffoli gate, which can function
as an adder.
For simple biologically inspired quantum machines, it is not necessary
that the entire system is entangled. At the lowest non-trivial level, it is
possible to have pairwise entanglement. However, after successive operations,
the entanglement may spread to the entire network. This bears similarity
with obtaining a dense matrix from the multiplication of a large number of
sparse matrices with nonzero elements at different positions.
The basic postulates, dictating the rules to be designed into the gates are
as follows:
1. each qubit represents a neuron;
2. an excited neuron |1i will turn on a neighbor in a ground state |0i, i.e.
flip it to a state of |1i
3. an excited state will make an excited neighbor “fire” and flip back to
|0i [induced emission] ;
4. the excited state itself will go down to the ground state |0i in the
process
93
5. an unexcited neuron stays inert with no effect to its neighbors or itself
the designing of the gates based on these “rules” can be done as follows:
Postulates 2, 3 and 5 by adding c-NOT gates to each “operating” neuron
and its neighbor so that the state of the operator serves as a condition for
flipping the neighbor when needed. There are four neighbors for each neuron
in the square lattice we consider. Hence, in place of cN OT gates we shall
need cN OT 4 gates, where one controller would be flipping all four neighbors
if it is in state |1i and do nothing if it is in state |0i.
An AN D gate connecting every neuron with a common |0i state after
the cN OT 4 gate can satisfy postulate 4.
The cN OT gate can be represented by


 1
U =


0 

0 iσ2


(5.13)
for any particular neighbor where σ2 is the flipping Pauli matrix with a phase
change, needed to preserve normalization of the target qubit, as explained
in the previously. Eqn.
4.10 represents a unitary operator. However, we
connect the cN OT matrices using a weight factor, ², to represent the strength
with which a neuron can affect its neighbor. This weight factor differs from
the weight factors in classical neural networks, which must sum up to a
94
specific normalized value. Instead, this quantum net weight acts as just
a measure of the strength with which the neurons are able to affect the
neighbors.
Another Hermitian operator in the qubit space can represent the .AN D.|0i
operation:


 0 0 





U0 = 
(5.14)
0 1
Here the sequence of states in the rows and columns are, as usual |1i|1i,
|1i|0i, |0i|1i and |0i|0i, the first being the controlling state.
The AN D operator used here is not unitary, although it is Hermitian, as
it is a projection operator. It is the nonunitarity of the AN D operator that is
responsible for the collapse of the state to the ground state after it has reached
a threshold. This is not the ideal quantum computer operation, where all
gates should be unitary, but is a hybrid of unitary connections among neurons
and a nonunitary collapse. This mix satisfies two much desired characteristics
of this network: the introduction of unitary rotations of the neurons in the
qubit space, making our device fast and efficient, and also ensuring collapse
to the ground state by the AN D operator. This “collapse”, however, simply
restarts the rotational activity of a neuron when connected to an excited
95
neighbor, and, hence, is not quite the same as decoherence. Here, states do
not simply collapse probabilistically to one of components of the superposed
states and delete all memory but transitions take place to a specified state
and the timing of the collapse holds information about when that specific
threshold was reached to enable a time sequence pattern.
Finally, to introduce at each node the controlling qubits remaining unchanged due to its own action, we formulate:




 c 
 c 





→





s
(5.15)
s
and to formulate the change for the neighbors receiving signal from it, i.e.
operated on by the cN OT 4 gates, we have:




0
0
0
0


0
 c 
 c 
 −s .c






→
 + ²





s
0
s
c .c




(5.16)
As the small ² approximation of the unitary operators are not themselves
unitary, it is necessary to renormalize each qubit at each step in simulation.
96
5.5
Results of Simulation
For ease of comparison, we construct a 40X40 network of qubits with periodic
boundary conditions similar to the simulations we carried out for the classical
case [59], so that it behaves in certain ways as an infinite lattice. Although
it is very difficult to create an experimentally viable fully entangled large
network because of issues like complexity, a theoretical study of such systems
is unavoidable.
We feed the peripheral neurons to feed data into the system and make the
inside neurons either all random or all zero [(0, 1)] . The qubits are updated
according to Eqn. 4.26 and 4.27.
A large number (40, 000) of time steps are chosen for various ² values.
This parameter, representing the strength of coupling, occurs together with
dt, and, hence, may also indicate the width of the pulse at each time step if
we compare with the classical neural network model.
We have assumed that though the quantum gates may eventually flip a
state, the process does not take place instantly, but proceeds through usual
continuous time development through an appropriate Hamiltonian, which
can be broken up into smaller bits of discrete transformations with small
time intervals. This assumption is not only legitimate, but obligatory in the
context of quantum dynamics, and highly desirable in satisfying our main
97
objective of comparing the quantum case to the classical.
The following observations were made for different runs:
In the first model each neuron was allowed to go up to its top qubit
value of (1, 0) with c = +1 or −1, before it “fired”, i.e. came down. It was
observed, interestingly, that in this case no well-defined periodicity existed,
either for any single neuron or for the correlation between nodes in the network (Figs. 5.2, 5.4). Though all neurons do indeed go through the (1, 0)
to (0, 1) cycles, the oscillations appeared aperiodic. An explanation could
be that the exact equation of motion coupling the nearest neighbor neurons
becomes insoluble in terms of periodic functions.
However, it should be noted that unlike a biological action potential voltage, the time series plotted in Fig. 5.2 and Fig. 5.4 are not any directly
measurable physical quantity , but are quantum amplitudes. As stated before, the modulus squared of the c-part of the qubit shown in the plots
corresponds to the probability of observing the pure state |1i at the given
time step shown along the x-axis. Similarly, in Fig. 5.3 and Fig. 5.5 the
quantum correlation amplitude was shown, whose modulus squared gives the
varying probability of observing the indicated physical subsystems (qubits)
in the same eigenstate as time progresses. A c-amplitude of +1 or −1 for a
qubit component indicate unit probability, i.e. certainty, as the state is then
98
a pure |1i eigenstate, and not a superposition of eigenstates. In that case,
at the corresponding time steps, measurements on the system would retain
the purity of the state. Measurements carried out at other times, with a
c-amplitude whose modulus is less than unity, would have a |0i component,
and, hence, may give either 0 or 1 when measured. However, after collapse,
the subsystem becomes a pure state. Similarly, a correlation amplitude of
both 1 and −1 indicate a perfect correlation (both qubits giving 0 or both
1, when measured together). On the other hand, a correlation amplitude of
0 indicates a perfect anti-correlation, the two qubits always giving opposite
single neuron c
values on simultaneous measurement.
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
19.9
19.95
time
20
Figure 5.2: Oscillations of the c part of a qubit for a non-cutoff model with
² = 0.01. This is the quantum amplitude for the state |1i. The modulus
squared of this quantity gives the probability for a particular measurement
to find the state to be |1i, though individual experiments may give either 0
or 1.
99
900
890
sum c_i
880
870
860
850
840
830
820
19.75
19.8
19.85 19.9
time
19.95
Figure 5.3: Oscillation of the c parts summed of all qubits in the network for
cthresh = 0.7, ² = 0.01. All boundaries excited initially.
A slightly different version of the model of the network more akin to the
classical one was also experimented with. Here a threshold for the excited
part of the neuron was introduced. When crossed, this caused the qubit
to jump to the ground state (0, 1), i.e. if c > cthresh , then (c, s) make a
transition to the (0, 1) state. If the quantization axes of the nodes and of the
AND gate are not the same, the corresponding rotation matrix would make
a (cs , ss ) qubit look like a (1, 0) to the gate, making it drop to the ground
state prematurely.
More interesting results were found with this latter model. periodic oscillations were seen in the system, with all neurons in the same phase. The
√
threshold was made 0.7, which is just below 1/ 2, corresponding to equal
100
mixing of (1,0) and (0,1) and turned out to be the critical threshold that
gives regular oscillations.
Mathematically, this critical behavior may exist because the cut-off effectively serves to truncate the complicated coupled behavior of the system at
this value, reducing it to a simpler periodic system, just as the truncation
of a transcendental function by a polynomial with a finite number of terms
provides it with a simpler behavior. The function:
F (t) = cos(θ + sin(²θ))
(5.17)
with θ = ωt , for example, assumes the periodic form
F (t) = cos[(1 + ²)ωt]
(5.18)
for ²θ ¿ 1 only, but has a more complicated behavior when this condition is
not satisfied. This phenomenon needs to be studied further.
However, the other reason for this behavior might be that the rate at
which the AN D operator acted was different from the rate of the cN OT
operator acting on the system, and the coupling of the two incongruous
frequencies yielded a complex pattern.
We note that these oscillations are seen in the behavior of a single neuron
101
(Fig. 5.4), or the sum of all neurons of the system, or even in the correlation
<i|j>
hi|ji between neuron |ii and neuron |ji (Fig. 5.5).
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
19.9
19.95
time
20
Figure 5.4: Correlation between two qubits for no threshold case with ² =
0.01, h10, 10|20, 21i, where the qubits are located by their (x, y) coordinates
in the lattice.
Another interesting observation is that, for large ² (> 0.7), if we put
the initial signal only at two parallel sides of the square, oscillations do not
appear, but a static asymptotic state is quickly reached, whereas if we put the
signal on all four sides, we get periodic oscillations with changed frequency.
The lack of constraints in the orthogonal directions might be responsible
for allowing the pattern to settle down to a static state in the first case.
This is similar to the one-dimensional Ising model having a trivial phase
transition. When signals arrive from both x and y-directions, the attractor
for the system becomes dynamic, as it cannot find a stable equilibrium when
102
1
0.95
<i|j>
0.9
0.85
0.8
0.75
0.7
19.75
19.8
19.85 19.9
time
19.95
Figure 5.5: Correlation h10, 10|20, 21i for the above case.
it tries to adjust in both directions.
5.6
Discussion
In this chapter we have shown that a quantum neural network similar to the
biological integrate and fire neuron network can be constructed with qubit
nodes connected by cN OT and AN D gates.
We have noted that when no threshold is imposed, the system converges
to a dynamic state with no fixed period and no phase locking, apparently
similar to a chaotic system, but with an average non-chaotic behavior.
Dynamic behavior emerges when a threshold is set for the qubits to “collapse” to the ground state. The period is almost inversely proportional to
103
0.06
0.05
period
0.04
0.03
0.02
0.01
0
0.05
0.1
0.15
epsilon
0.2
0.25
Figure 5.6: Variation of periodicity with ² for excitations from all sides.
the coupling strength (Fig.
5.6), but is nonlinear for strong coupling. If
the coupling is reasonably strong, and the initial excitations are only in one
direction, the system seems to converge rapidly to a static attractor. However, with excitations from both directions of the square lattice, dynamic
oscillations are observed
The correlation between neurons as measured by the overlap of the two
qubits, hi|ji also shows periodic time dependence when the oscillations have
fixed periods. Interestingly, we have found that although these models are
constructed as simple lattices, they can hold dynamic memories of the input
indefinitely.
The pattern generated may be more interesting when complex phases are
introduced in the coupling, and also if a dynamic external agent is affecting
104
the peripheral neurons, and not just an initial input. Delay lines can also be
placed between the neurons to introduce a time scale.
These results further point to the similarities between biological networks
and these quantum nets, as in the neural network of a biological entity too
the complexity of the dynamical behavior shows almost chaotic patterns,
but in specific stages of alertness, as is known for human beings, the network
shows average periodicities such as the alpha, beta, gamma and delta waves.
Evolution always seeks the most efficient biological systems that can survive well and has produced such a system where the dynamical signal can
damp out. Adjusting the threshold allows us to mimic such behavior in our
system as well. The inbuilt erasing mechanism may be advantageous for a
quantum AI that may be able to refresh its memory after one set is exchanged
with peripheral devices. Hence, future engineering developments in AI that
want to follow the biological path (see for example [90, 91] for electronic attempts) may find it useful to study quantum neural networks that are able
to mimic the biological brain in many ways. Recent studies show [92] that
epileptic seizures in the human CNS is preceded by a loss of normally present
chaotic firings, which indicates the possibility of large classical deterministic
AI systems going into phase lock freezing. Quantum networks can avoid this
automatically on account of their stochastic nature.
105
The ability of bio-systems to handle ill-posed data in a more effective way
also evokes the belief that the quantum gated machines with their inherent
probabilistic nature may achieve similar robustness in processing real-life
situations with incomplete complex data.
106
Chapter 6
Fully-entangled Neural
Networks Model
6.1
Introduction
In this chapter, we exploit the full advantage of quantum entanglement and
try to construct an effective storage system based on the large capacity
promised by quantum superposition. In the last chapter, we only presented
pairwise entanglement in the first order. We now discuss some results with
a more complex model of a completely entangled quantum network on a
computable scale. We construct the simplest nontrivial quantum network
connected with gates and perform simulation experiments. Networks con107
structed with ordinary c-NOT gates need considerably more computing time
in simulation than ones where modified c0 − N OT form of the gates are used
. We also show that the periodicity found in the case of the pairwise entangled quantum nets cannot be found in a fully entangled network. Finally, we
show how ab initio periodicity may be introduced by hand to make this fully
entangled network more similar to a biological memory device.
6.2
An Entangled Quantum Network Model
We start with a nXn lattice with usual periodic boundary conditions, so that
it can effectively mimic a bigger lattice. We now have N = nXn independent
nodes, each of which is a quantum qubit, e.g. a spin-1/2 object. These nodes
are individually connected to its neighbors with c-NOT gates. This simple
design would allow us to study the general behavior of an entangled quantum
network that is not built to handle a specific task. Hence, we will be studying
how the degree of robustness of a quantum network connected with gates that
allows full entanglement fares when input data is fed to it.
The entire system at any time can be represented by the state:
|ψi =
X
aI |ψI i
I
108
(6.1)
The complete set of unentangled product basis space includes all possible
combinations of:
|q1 q2 ....qN i
(6.2)
where N = n2 .
Each of the qubits can be either in state |1i or in state |0i here.
Initially we may have a pure state with only one aI = 1, and all others
zero. However, as the entanglement is allowed to proceed through the cNOT gates between the nodes, all or a subclass of states may be expected
to become entangled. By choosing a non-factorizable superposition of the
product states, we can also choose an entangled initial state.
The effect of each gate on every aI need to be considered carefully at each
time step. The c-NOT gate will flip the controlled qubit if the controlling
qubit is in state |1i, while doing nothing if the controller is in state |0i. The
controller node is unchanged.
Each node needs to be taken in turn and to consider the effect on all aI
as the neighbors of this node act on this node in a procedure is similar to
that described in the earlier unentangled version.
We first do the flipping in a continuous manner by choosing for each time
step dt the transition submatrix for a small change as we had done in the
109
previous case, and get


 1 0



 0 1
 1

0



A=
=




 0 0
0 exp(i²σ1 dt)





0
0 

0
0 

1
0 0 i²dt





i²dt 



(6.3)
1
However, since the previous model did not consider complete entanglement of the net, but only local pairwise entanglement, the state space had
only 2n2 components (i.e. polynomial in n). In the case of complete entanglement, the possible state space becomes exponential. It is impossible
to handle a large lattice like 40X40 when it is entirely entangled. Hence,
in this work we have considered the smallest nontrivial lattice, i.e. a 3X3
lattice. As we have stated earlier, this is in some respects similar to an infinite lattice because of periodic boundary conditions,. Even if we consider a
4X4 lattice, we have 216 product states to upgrade at each step. This is not
computationally economic with a classical computer
We have first linearized the 2 − d label of each qubit (i, j) to a single 1 − d
label by choosing a sequence to optimize computing time, and then we have
constructed our label I (stated above) by simply taking the sum:
110
I=
X
2i
(6.4)
i
Here the sum is over only those qubits for which the state is |1i and i is the
linear sequential position label of the qubit ranging from 0 to N. This allows
us to ascertain the state of any qubit in a particular position i with a single
bitwise AND (&) operation and, hence, speeds up the simulation process.
This also permits us to create on any initial state for the simulation, pure or
entangled, by choosing the right combination of I’s.
6.3
Periodic and Aperiodic Regimes
Before starting the simulation, we point out a behavior that can be anticipated from purely theoretical considerations for any quantum net on which
a particular sequence of unitary operators work repeatedly.
Lemma: It is not possible to move any system from an aperiodic regime
to a periodic one by a repeated sequence of unitary operators.
Proof: The product of any given sequence of unitary operators Ui is
equivalent to a single unitary operator, say U .
Let |ii be a vector in the orbit of U in the periodic regime. Now if we
operate on |ii by
111
U † = U −1
(6.5)
U −1 |ii = |ji
Here |ji must be on the orbit. However, if the state was reached from the
aperiodic regime, then we must also have, with the same inverse operation,
a transition to a state in the aperiodic regime. This is impossible, because
U and its inverse are both linear operators in usual quantum mechanics, and
must give unique results whichever state they operate on.
Hence, we cannot get any transition to a periodic system in our simulation.
6.4
Simulation Results and a Modified Gate
There is only one interior point for a 3X3 lattice. So, if we initially excite
the whole periphery we have in the beginning of the simulation.
a495 = 1.0
112
(6.6)
As
495 = 1111011112
(6.7)
We first deliberately used a nonunitary series of operations with ² imaginary in order to emphasize the importance of unitarity in these operations.
In reality this may be possible in a semi-open system with some leakage to a
classical thermal environment which acts as a temporal damping factor. Our
simulations indicate that even if we begin with a pure state, the nodes get
entangled quickly after only a few steps. However, after a sufficiently long
√
time the system degenerates to a uniform state with all ai = 1/ N .
This can be attributed to the operator A above, which tries to form a
continuous c-NOT gate in place of the unitary discrete one:


 1
C=


0 

0 σ1


(6.8)
which, using the imaginary parameter ², becomes a nonunitary one. Hence,
unlike in the case of a unitary c-NOT operator, the eigenvalues of this discrete
version with an imaginary parameter do not all have modulus 1. Therefore,
the eigenstate with the highest eigenvalue emerges, as is usually the case
for multiple operations of a nonunitary operator. However, since the full
matrix even for the 3X3 net must be 512X512, this cannot be checked easily
113
computationally or analytically. It can be argued from symmetry that the
highest eigenstate must be the symmetric one, though we are unaware of any
mathematical theorem justifying this hypothesis.
To minimize computational expenses, we shall adhere to real matrices.
We next construct the infinitesimal form of a unitary matrix representation
with a c0 − N OT gate defined as a quantum gate that reversing the phases
of the flipped infinitesimal changed coefficients:


 1 0



 0 1
 0 0 


U = exp(iHdt) = 1 + i²dt 

=




 0 0
0 σ2





0
0
1
0 0 −²dt
0 



0 




²dt 



(6.9)
1
Here H is a hermitian Hamiltonian, making U unitary.
We start with pure initial states and note the biggest components after
1000 time loops.
|27i → 0.71(|27i) + 0.21(|11i + |19i + |25i + |26i)
where we have omitted the smaller order terms.
114
(6.10)
It can be noticed immediately that the initial state has retained its dominance even after 1000 time steps, and also that the next to leading states
are separated from it by just a single 1 in a neighboring qubit.
If we choose another single state |17i = |1 + 16i i.e. the one with only a
corner and the middle qubit of the lattice initially excited, the final state is
(highest amplitude terms):
|17i → 0.27(|27i) + 0.23(|19i + |25i)
(6.11)
It can be noted that although this time the original state has disappeared
from the list of dominant states finally, the terms now dominating do not have
a clear choice, and with the AND operation on the bits of the dominating
ones we get back the initial |17i !
(10001) = (11011)&(10011)&(11001)
6.5
(6.12)
Creation and Detection of Entangled States
It was assumed in the simulation above that the system acquired an entangled state as input. Rabitz et al [93] have presented a method of obtaining
superposition of states from a ground state in a molecular system. In a
115
similar spirit we here show how it may be possible to create arbitrary combinations of states in our model in a general quantum network. We consider
a matrix, which we call an “extended unitary matrix”, as given below:


 aR

U (R, x ) = 

b|x0 ihn + 1| 

0
0
|n + 1ihn + 1|


(6.13)
The normalization factor is
|a|2 + |b|2 = 1
(6.14)
This operator matrix acts on the (n + 1)-dimensional basis where the
last vector is |n + 1i an auxiliary vector not related to the n-dimensional
entangled vector space.
Then, given any state |xi, we get the normalized new state
|x00 i = aR|xi + b|x0 i
(6.15)
with R a nXn unitary operator that gives a vector orthogonal to the new
vector |x0 i to be superposed. This generalized unitary operator separately
maintains the length of the n-dimensional vector of the active system and
the single auxiliary vector in the extended (n + 1)-dimensional vector space
116
, which may be an additional dummy component of the system.
As was stated before, a complete set of quantum gates can [94] simulate
any unitary operator. the arguments can be trivially extended to produce
our extended unitary operator with such gates too. Hence, at least in theory,
a physical realization is not an insurmountable problem.
Despite being superpositions, the entangled states are pure states. So, in
principle the detection of the entangled states is no more complicated than
that of single states in the usual basis of the product basis of single spins,
given that we rotate the basis to a one that has the chosen vector as a basis
vector. Grover’s search procedure [95] can be used to detect the presence
of any particular state. Alternatively, filtering matrices that perform the
same operation directly in the original basis by adapting the sign-reversing
and diffusion matrices of Grover can be determined for the superposed state
taking the appropriate linear transformations. As originally proposed by
Deutsch [96], the detection process in quantum computation is of course
only stochastic.
117
6.6
Discussion
We have shown that networks with c0 -NOT gates indeed get fully entangled
in general, with a smearing of the excitation from the initially excited nodes
to its neighbors, as expected. However, the memory of the put state seems
non-trivial, depending on whether it was pure or entangled. However, the
final state can easily be back projected to find the initial state. How low noise
can be filtered out of a system like this can be an interesting problem. A
possible investigation could also include whether a quantum network can be
used to filter separable and entangled state in methods similar to or different
from those proposed recently by Doherty et al [97].
We have proved that unlike a classical network, a quantum one cannot
move to a region of dynamic phase locked oscillations if the input is static.
However, periodic dynamic behavior may be injected into the system at
will by choosing the right operator, i.e. a suitable unitary operator that
rotates some or all states with time. Choosing the appropriate connectivity
among the nodes may achieve this, and remains to be studied in detail.
Identical periods may be assigned to subclasses to use such a net for pattern
identification over a huge database created by the entire set of separable and
entangled states. However, the largest advantage of an entangled net is the
huge storage capacity that comes with the idea of superposition.
118
6.7
6.7.1
Patterns in Entangled Quantum States
Qubit Pattern Generation/Representation
In the previous chapters, we have discussed quantum networks performing
like artificial intelligence and responding to inputs or holding patterns. Here,
we will look into the concept of quantum patterns in more detail. Most of this
discussion follows work presented elsewhere [98]. A single quantum vector is
capable of representing any pattern expressed by means of a finite sequence
of numbers. Therefore, pattern recognition is dependent on handling such
state vectors. Quantum nodes are used in these new types of neural networks
so that coherence is not lost and the information is retained. Hadamard gates
[16]defined by the matrix

1 1
√ 

H = 1/ 2 

1 −1





(6.16)
can produce entanglement among pure states, with a resultant non-factorizable
state-space.
This gate along can flip a |0i state through an angle of π/4 when acting
on a 2-D (|0i , |1i) space. This can also rotate a |1i state through 3π/4. For
a more general qubit state,
119




 α 
α+β
√ 



 = 1/ 2 



H
β
α−β




(6.17)
a superposition of the tensor product of the |0i and |1i bits of the qubits
, in factorizable or nonfactorizable entangled forms, can be produced by a
series of Hadamard gates working in parallel. This would imply that, these
gates alone can create any state from the |0000...i state. Hence, any pattern
recognition in the form of qubits can be done by using these gates.
It is the relevant Hadamard gates that would dictate the exact patterns
of the states. These states can be converted into data streams by means of
photons.
We had previously [58] constructed a Hopfield-Herz type [12] type neural network using cNOT type of gates. It was seen that even if a network
started off as the |0000...i state vector for the whole net, soon the qubits get
entangled. Simple inputs can thus be transformed into complex patterns in
these devices.
6.7.2
Qubit Pattern Recognition
Grover’s algorithms allows for a certain state to be detected in an entangled
N state system in a method much faster than the classical analogous search
120
algorithm needing N 2 steps. This is a probabilistic success that can be made
to be close to unit probability by increasing the number of operations, but
keeping it much less than the classical threshold.
We consider a simple problem involving a two dimensional lattice. Each
point (i, j) on the lattice may indicate a pixel. 2 qubits can express a 16(=
24 )-level intensity
|p, i, ji = c00 |0i > |0i + . . . + c33 |1i|1i
(6.18)
Here |cij |2 indicates the probability of the intensity level denoted by (i, j).
1 is the certainty of a level , and 0 is the total absence of it. We use a k
as an index to indicate the qubit label with respect to position, defining its
range from 0 to log2 N , where N is the total number of pixels. If we have
a pattern by four adjoining pixels taken to be the vertices of a square, with
intensities A, B, C and D in clockwise orientation, the quantum state can be
defined with
|k, P i = |k, Ai|l, Bi|m, Ci|n, Di
(6.19)
The letters k, l, m, n indicate the qubit serial index values for the four
pixels.
121
By imposing translational symmetry to the pattern, we form the combination
|P i >= N
X
|k, P >
Y
[r 6= (k, l, m, n)]|r, Xi
(6.20)
k
Where N is the normalization constant and the product over the other qubits
of the lattice are of the indifferent type:
|Xi = 1/2[|1i|1i + |1i|0i + |0i|1i + |0i|0i]
(6.21)
By imposing a further constraint of orientation invariance, the state vector
is symmetrized with appropriate interchanges of k, l, m and n .
Hence, if the pattern recognition device uses the operator,
O(P ) = |1 >D < P |
(6.22)
then its operation on the test object will give the output state |1iD . A
macroscopic detector can be coupled to this to detect the pattern.
If the pattern is rigid, with well-defined intensity levels, each corresponding to a unique set of qubits, the operator O will also be well-defined. However, if we want to categorize the objects in larger classes, e.g. into a range
of levels, then the detection operator O must consist of linear superpositions
122
of all the allowed states in that given range. This will give a nonzero scalar
product for any object in that class.
It might be practically impossible to obtain a detector of the type |1iD h1|P ,
and one might have to use algorithms similar to Grover’s to search for the
given pattern.
The robustness of data becomes moot in that respect. In the last chapter, when we performed [58] simulation experiments on a small cNOT gated
network, it was seen that even after many thousands of interactions among
the qubits, which produced new entanglements among states, the dominant
states were still reminiscent of the original data.
6.8
Learning Quantum Patterns
Classical neural networks show Per Bak type of learning [104], and are fairly
effective. We can train the network to recognize combinations of inputs as
pre-assigned patterns. This is achieved by using an input, an output and an
intermediate layer of neurons, and then by increasing the weights in neuron
paths from the set of inputs input to the output, and also decreasing the
weights for failed paths. All possible paths are connected with equal weights
in a regular network. Only a subset of paths are carefully chosen in a small
123
world network [99, 100], whereas the connectivity is random in a random
network. If the number of initial input neurons is ni , the number of output
neurons is no and that of intermediate neurons is nm , to obtain a reliable
recognition ability after a finite number of trials, one needs
nm ≥ ni .no
(6.23)
If qubits are used instead of classical neurons, and the connecting paths
are replaced by quantum gates, a similar structure may be constructed:
We show one possible form of a quantum analog of the Per Bak [104]training
machine Fig. 6.1. The contributions of the qubits to the next stage are assumed to be proportional to the co-efficient of the |1i part of the qubit, e.g.
c1 of Eqn. 5.1. The gates are used to rotate the qubit contributions. The
gates on the paths rotate the input from the previous stage incrementally
towards the |1i part when a reward is promised, whereas punishment for
failures promotes rotation of the signals in the paths towards |0i. This arrangement should be able to train a quantum network in a manner similar
to classical training. However, larger capacity of the qubits may make the
number of qubits needed less than in the classical case. We may also need to
modify the relation among the nodes (Eqn. 8.47).
In a different context, quantum learning was explored [98] by utilizing the
124
i
input qubits
gates
intermediate
qubits
gates
output qubits
Figure 6.1: Quantum network for Bak type training for pattern recognition.
The intermediate and final qubits are shown integrated with OR gates to
sum the contributions of the qubits connected behind. The circular gates
are rotation gates. The curved feedback information paths that control the
gates’ rotations are shown only for two gates for clarity.
nonlinear switching action found in classical learning in the quantum domain.
There, a Sigmoid-type curve found in biological systems, was reproduced by
co-operative quantum devices.
125
Chapter 7
Generalization of Entropy
7.1
Introduction
As we have seen above, neural networks are not necessarily deterministic.
The input and the output are not always uniquely related. The loss of information may be inherent in the design of the system, or it may result
from interactions with an environment that introduces random noise, making the input-output mapping probabilistic. This leads to the relevance of
the concept of entropy in a neural network system. However, the cross-talk
may be random only in part, and there may be systemic biases. Hence, the
Boltzmann-Gibbs type of statistical mechanics based on randomly interacting systems may not be the most relevant for dealing with the states of such
126
ensembles.
In a system with a large number of constituents, entropy measures randomness, and is maximum when the system can be in maximum possible
states with equal probability randomly, and is minimum when the system is
frozen into one state with no uncertainty. Although all forms of entropy share
these to end-point definitions, the functional forms may vary in between (
[105]). Keeping energy or some other conserved quantity a constant, the
different entropies look into different possibilities of probability distributions
within the system. When the total entropy of the system can be expressed
as the sum of the entropies of its subsystems, it is extensive, and the most
common example of such an entropy is Shannon’s entropy. Renyi entropy (
[17]) is also extensive, but is different from Shannon.
Recently, Tsallis entropy ( [18, 106, 107]) has attracted a lot of attention
not only for its conceptual and theoretical novelty, but also because it can
be shown in specific physical cases ( [108, 109, 110, 111, 112, 113]) to be the
relevant form where interactions among the subunits of a system give rise to
nonextensivity. Shannon entropy can be retrieved from Tsallis entropy when
proper limits are taken, indicating the consistency of the concept.
In this chapter, we introduce the concept of entropy from yet another
viewpoint. This too will bear resemblance to the Shannon form in the limit.
127
In the first part of this chapter, we derive the rationale for this entropy, and
then we compare it with some known forms of entropy. In the end we will
find the probability distribution of this entropy, from now on mentioned as
the s-entropy. As we shall see, this new definition of entropy is closely related
to rescaling the phase space.
The normalization of the probability distribution function of a system
depends on its free energy and controls the macroscopic properties of the
ensemble. Hence, our first step here would be to find a method to obtain
the free energy as a function of temperature, and then apply it to a physical
simple system. We shall do it for Tsallis entropy and also our newer form of
entropy. Then we shall find the specific heat for the system and study how
that changes as the parameters are tuned.
7.2
Entropy of a Neural Network
A completely deterministic network would have a one-to-one mapping between input states and output-states. So, we have, say, Ni different inputs
and an equal number of outputs. The output may be in the form of a periodicity (frequency) or a final static pattern in the network that is measurable.
One can also have a continuous version of the one-to-one mapping in a func-
128
tional form:
xin → yout = F (xin )
(7.1)
with F and F −1 single valued functions in the domain of interest. So we
have the probabilities pio = 1 for the corresponding input and output states,
and pif 0 = 0, f 0 6= o for non-matching pairs. Hence, conventional Shannon
entropy for each given input state i is
Si = −
X
pif log pif = 0
(7.2)
f
Even if we have a probabilistic distribution of input signals, the weighted
average over Si would be zero as it is zero for each i. An interesting point
to note for a dynamical system is that the output state o may be a function
of time, but may be deterministic if the time of measurement is known. If
the time is unknown or the changes are too fast for the measuring system to
capture the state at a known time, then we shall have a probabilistic outcome
and a nonzero entropy. In general our definition in Eqn. 7.2 may then be
used with the appropriate pdf.
If the mapping by the network is many-to-one, the network performs pattern recognition, or a generalization, or classification. With dynamical evo-
129
lution of a deterministic kind, we shall have clearly marked attractor basins
formed by sets of input states i from where the system converges to static
final states o(i), or changes continuously in time in a predictable fashion with
o = o(i, t), making the entropy zero again: S{i} = −
P
{i}o
p{i}o log p{i}o = 0.
However, if the design of the network is inadequate, e.g. if the number
of intermediate processing neurons are insufficient in number, irrespective
of the number of training cycles the outcome will always have a degree of
uncertainty [114], and we shall have a non-δ pdf, leading to a nonzero entropy:
S=−
X
p(f |i)p(i) log[p(f |i)p(i)]
(7.3)
if
with each (if ) pair forming a single pair-index, and the p(i) normalized to
unity. If the final state is indicated by a frequency ω for a dynamical network,
we can rewrite this expression for entropy, replacing f by ω, and using an
integration over the range of frequencies corresponding to a given input state
i:
Z
Si = −
S=−
XZ
dωpi (ω) log pi (ω)
dωp(ω|i)pi log[p(ω|i)pi ]
i
130
(7.4)
with appropriate weighting and normalization over the initial states.
However, both in biological and AI neural networks the pattern of interference or the inadequacy of design may produce effects different from a
perfectly random ensemble of modules, opening up the possibility of using
nonextensive entropies, where the sum of entropies of subunits may be different from the entropy of the combined system because of the non-random
nature of the interactions.
7.3
Defining the New Entropy
We formulate the problem by using definitions from information theory, as
used in Shannon’s coding theorem. Let us consider a register of only one
letter. Let pi be the set of probabilities for each of the N letters Ai that can
occupy this position. Though we are here using the language of information
theory, it is trivially extensible to states i of a single state of an ensemble
where the individual systems can be in any N states with probabilities pi .
We now consider a small deformation of the (single cell) register to a new
size so that it can accommodate q = 1+∆q letters. The new probability that
the whole phase space is occupied by the letter previously associated with pi
is now pqi by the corresponding AND operation. Hence, the probability that
131
the new deformed cell is occupied by any of the pure letters Ai is
N (q) =
X q
pi
(7.5)
i
For q > 1 this would be less than the original total probability of unity
for q = 1. The shortfall, which we denote by
M (q) = 1 −
X q
pi
(7.6)
i
represents the total probability that the mixed cell has a mixture of Ai and
some other Aj fractionally, since the total probability that the cell is occupied
by one or more (fractional included) letters must sum to one. Hence, the
mixing probability M (q) is introduced by the disorder created by increasing
the cell scale from unity to 1 + ∆q.
The fractional values of cell numbers can be understood in the same
spirit as defining the fractal (Hausdorff) dimensions of dynamical attractors (
[115, 116]), and in complex systems. Studies of diffusion ( [117, 118, 119, 120])
and percolation in complex systems with effectively fractional dimensions for
fluids have taken place, where the special geometric constraints translate into
a change in the dimension of the corresponding space to an apparently nonintuitive fractional dimension. We come across Huffmann coding in coding
132
theories for optimal transmission of information ( [16]), where the optimum
alphabet size may be formally a fraction, though it is changed to the nearest
higher integer for practical purposes. We may therefore consider a fractional
size of the register, or equivalently, an integral number of cells in the register
with fractional sized cells to accommodate a given amount of information
in probabilistic optimization. Replacing deterministic parameterization with
probabilistic optimization becomes inevitable in classical Shannon information theory ( [16]) in quantum computing contexts, and, hence, our use of
the fractional cell sizes may be a classical precursor of the departure from
stringent Shannon-type concepts.
Another variety of parameter-dependent entropy and probability distribution has also been studied by [121, 122].
The entropy for alphabet of m can be defined from the information content of the register by
mS(q)∆q = m(M (q+∆q)−M (q))
(7.7)
Hence, the entropy indicates an effective change in the mixing probability
due to an infinitesimal change in the cell-size of the register.
This gives
S(q) = dM (q)/dq
133
(7.8)
or equivalently,
S(q) = −
X q
pi log pi
(7.9)
i
This is the mathematical definition of our form of entropy. Another equivalent form is given later in Eqn. 7.26.
This expression is analogous to but different from the Tsallis form of
entropy, which is defined by the expression:
ST (q) = −
X
(1 − pqi )/(1 − q)
(7.10)
i
which has an apparent singularity at q = 1 in the Shannon limit. If
entropy is expressed as the expectation value of the (generalized or ordinary)
logarithm, the difference between the Tsallis expression and ours becomes
clearer. If we use the generalized q-logarithm defined by
(q−1)
Logq (pi ) = (1 − pi
)/(1 − q)
(7.11)
Tsallis entropy can be defined more simply by
ST (q) = − < Logq (p) >
134
(7.12)
Another form of entropy similar in appearance to ours was presented by
Aczel and Daroczy (A-D) [123] and [124] (a summary of many different forms
including the A-D form can be found in [125])
SAD = −
X q
pi log pi /
i
X q
pi
(7.13)
i
This has an extra denominator term, so that the weights for log pi are
normalized.
Wang [126] has also defined yet another form, apparently similar to our
entropy, but from a different physical viewpoint, and he uses the condition
X q
pi = 1
(7.14)
i
Defined in terms of the simple probability distribution, the expectation
value is
< O >=
X
pi Oi
(7.15)
i
We define the expectation value with respect to the deformed probability corresponding to the extended cell in our case, while keeping the usual
logarithm
135
Ss (q) = − < log(p) >q
(7.16)
with
< O >q =
X q
pi Oi
(7.17)
i
Because of the denominator sum, the simplicity of the relation between
the weights and log pi are lost in the Aczel-Daroczy form. In the Wang form,
the probabilities are simply redefined as
p̃i = pqi
(7.18)
SW = −(1/q)hlog p̃i
(7.19)
giving
which is a re-scaled version of the usual Shannon form. The Wang form,
thus, is extensive, unlike our form, where the deformed probabilities do not
add up to unity, allowing for information leakage.
The function Logq approaches the normal logarithm in the limit q → 1
and, hence, Tsallis entropy coincides with Shannon entropy and also as pqi →
136
pi , we too get the normal Shannon entropy.
The Renyi entropy is defined by
SR (q) = log(
X q
pi )/(1 − q)
(7.20)
i
This, like Shannon entropy, is also extensive, i.e. simply additive for
two subsystems for any value of q. One needs ( [127]) a slightly different
formulation of the extensivity axiom to get Shannon entropy uniquely
S1+2 = S1 +
X
p1i S2 (i),
(7.21)
i
where S2 (i) is the entropy of subsystem 2 given subsystem 1 is in state i.
7.4
Applications of the New Entropy
As was stated briefly in the beginning of this chapter, intuitively, entropy is
associated with randomness, because it is a measure of the loss of information about a system, or the indeterminacy of its exact state. This in turn
depends on the probability distribution for various states. As mentioned before, maximal uncertainty in state space is indicated by a uniform probability
distribution function (pdf) among all states, whereas Dirac/Kronecker delta
(continuous or discrete states) pdf with no uncertainty has zero entropy. The
137
Boltzmann form can be found from combinatorics:
S = k log[N !/
Y
ni !]
(7.22)
i
because the ni are simply N pi in equilibrium. Here N is the total number
of subsystems, and ni is the number of subsystems in the i-th state. One gets
the Shannon form given below in terms of the pi themselves. The exponential
probability distribution
the constraints
P
P
pi Ei = U is given by maximizing the entropy with
pi = 1 and (with Ei the energy of the i-th state, and U the
total energy, which is fixed)
pi = C exp(−βEi )
(7.23)
The Lagrange multiplier constant β here can be identified as the inverse
of the temperature.
We consider Shannon coding theorem [16]: when the letters Ai of the
alphabet used in a code have the probabilities pi . It can be shown fairly
easily that a stream of random letters coming out of the source with the
given probabilities in the long run will relate the entropy to the probability
of the given sequence:
138
P (sequence) =
Y Np
i
pi
= exp[−N S]
(7.24)
i
where S is the entropy per unit and N is the large number of letters in
the sequence.
Now we see how our new entropy can be used in the context of physical
situations. As was stated in the last chapter, the effective size of clusters qi
defined for the i-th state in our new entropy may in general be a fraction,
and if the interaction is weak, the average cluster size qi is just over unity.
In liquid clusters, the typical subsystem in state i may bean assembly of ri
molecules, but this may change due to environmental factors, such as pH
value, to si , to produce a rescaling value of qi = si /ri , greater or less than 1.
In general we allow the qi parameter in our entropy to be different for each i.
Since pi represents the probability of a single occurrence of the i-th state, i.e.
for a cluster of size unity (which may consist of a typical number of subunits),
the probability for the formation of a cluster of size qi is p(qi ) = (pi )qi . We
now consider the vector
vi = p(qi ) = (pi )qi
(7.25)
This is n-dimensional, where n is the number of single unit states available
139
to the system components.
We now study the “phase space” defined by the qi co-ordinates. As we
have said above, the deviations of these parameters from unity give the effective (which may be fractional when an average is taken) cluster sizes in
each of the states. A value smaller than unity would indicate a degeneration
of the micro-system to a smaller one in a hierarchical fashion, partially if it
is a fraction. In other words, we consider clusters forming superclusters or
being composed of subclusters, with a corresponding change of scale in terms
of the most basic unit relevant. Elsewhere, we have discussed the interesting
question an oligo-parametric hierarchical structure of complex systems [128].
However, here, we restrict ourselves to cluster hierarchy changes that do no
qualitatively change the description of the system.
Hence, the divergence of the vector vi in the qi space gives a measure
of the escape of systems from a given configuration of correlated clustering.
Inversely, the negative of the divergence shows the net influx of systems into
an infinitesimal cell with cluster sizes qi . We have unfragmented and also
non-clustered, i.e. uncorrelated units at that hierarchy level if all the qi are
unity.
It can be argued first from the point of view of statistical mechanics
that this negative divergence or influx of probability, may be interpreted as
140
entropy.
S=−
=−
X
X ∂pqi i
i
∂qi
(7.26)
log(pi )pqi i
i
The free energy is defined by
A = U − TS
(7.27)
where A is the free energy, U is the internal energy, and S is the entropy.
T is the temperature, or a measure of the average random thermal energy per
unit where we choose the Boltzmann constant k chosen to be unity. Hence,
ST is a measure of the random influx of energy into the system due to the
breaking/making of correlated clusters due to random interactions in a large
system. The subtracted quantity A, can thus be related with the free energy
or the “useful” energy. Usual thermodynamic phase space factors in dealing
with a macroscopic system are dropped as common factors in what follows.
We arrive at the same expression in terms of the Shannon coding theorem
also. A stream of units (clustered) emitted will correspond to the probability
141
exp[−S] =
Y pqi
i
pi
(7.28)
i
which too gives us Eqn. 7.26
since, with qi average clustering the i-th state occurs with probability pqi i ,
7.5
Probability Distribution for the New Entropy
By maximizing the entropy with constraints
X
pi − 1 = 0
(7.29)
pi Ei − U = 0
(7.30)
i
and
X
i
we can obtain the pi in terms of the energy of the states, or possibly also
other criteria in the usual way.
The constrained function
142
X
L = S + β(
X
pi Ei − U ) + α(
i
−
pi − 1)
(7.31)
i
q −(q−1)
−(q−1)
pi
(log pi − 1) + 1 + γpi
=0
q−1
(7.32)
is optimized with respect to the pi , where we have used for brevity γ = α+βEi
The simpler form,
−c log y + dy = 1
(7.33)
is used to relate to the Lambert function the terms with pi ,
This converts into
d
d
− ye−(dy/c) = − e1/c
c
c
(7.34)
c
d
y = − W (− e1/c )
d
c
(7.35)
which gives
and with our variables, we get for pi
"
−qW (z)
pi =
(α + βEi )(q − 1)
143
#1/(1−q)
(7.36)
where
z = −e(q−1)/q (α + βEi )(q − 1)/q
(7.37)
and W(z) is the Lambert function defined by [129]
z = W (z)eW (z)
(7.38)
The parameters α and β appear from the Lagrange’s multipliers for the
two constraints and are related to the overall normalization and to the relative
scale of energy, i.e. to temperature ( 1/(kT )) as in the Shannon case where
we get the Gibbs expression for pi . In the Tsallis case this gives pi the
well-known value
pi = [α + β(q − 1)Ei ]1/(1−q)
(7.39)
which is easily seen to reduce to Shannon form for q → 1.
It can be shown (after some algebra) that this form reduces to the Shannon form for q → 1.
The nonextensivity property of Tsallis entropy is evident by expanding
144
S T 1+2 = −
X
pi pj (1 − pi q pj q )/(1 − q)2
(7.40)
= S T 1 + S T 2 + (1 − q)S T 1 S T 2
(7.41)
ij
However, Renyi entropy has the simple additive relation
S R 1+2 = S R 1 + S R 2
(7.42)
For the new entropy, we have
S s 1+2 = S s 1 + S s 2 − M2 (q)S s 1 − M1 (q)S s 2
(7.43)
where the Ma are the mixing probability of states for subsystem a as
defined in Eqn. 7.6.
7.6
Probability, Lambert Function Properties
and Constraints
The transcendental equation defining the Lambert function consists of an
infinite number of Riemann sheets separated by cuts. These are related to
145
the cut of the log function from −∞ to 0. The different branch values for
the same z are marked by a subscript n, with n = 0 corresponding to the
principal value and W0 (z) is real along the real z axis from −1/e to ∞ (
Fig. 7.1) .
W
1
0.5
0.5
1
1.5
2
2.5
3
z
-0.5
-1
Figure 7.1: W0 (z) is real along the real axis from −1/e to ∞; the value of W0
ranges from −1 to ∞; we do not show the W−1 (z) branch, which is real from
z = −1/e to z = 0, because it is not suitable for our entropy, as explained in
the text.
Another branch, conventionally labeled W−1 (z), also gives real values for
real z in the domain −1/e < z < 0, going down from −1 at z = −1/e to
−∞ at z = 0.
Four different regimes can be pointed for the parameters:
(a) α + βEi > 0, and q > 1:
A real positive pi requires from Eqn. 7.36 and Eqn. 7.37 (with ² ≡ q − 1
146
and γi ≡ α + βEi ) that W (zi ) < 0, and, hence,
−1/² < zi < 0
(7.44)
.
Another constraint is that pi ’s are less than unity. This gives a cut-off
value of Ei given by
|W (zi )|q/² ≥ α + βEi
(7.45)
where zi is again dependent on Ei as given in Eqn. 7.37, so that it is a
transcendental equation.
(b) α + βEi < 0 and q > 1;
Assuming that pi is real demands that W (zi ) > 0. Hence, initially zi
need only be positive. But since ² < 0 and is in the exponent, the condition
pi < 1 gives as a lower cutoff of zi the value z˜i , given by
W (z˜i ) = γi |²|/q
(7.46)
This means that the negative Ei have a highest possible value given by
Eqn. 7.46, which is again a transcendental equation in Ei .
(c) α + βEi > 0 and q < 1:
147
The same reality of pi constraint implies in this case that W (zi ) > 0. So,
we have zi > 0. Similarly, the condition pi < 1 gives the cut off z˜i defined by
W (z˜i ) = |²|γi /q
(7.47)
So, Ei has a maximum value given by this transcendental constraint.
(d) α + βEi < 0 and q < 1:
Here, −1/e < zi < 0 initially due to the reality of pi and the constraint
pi < 1 gives, as in the previous cases, a cutoff z˜i , defined by
|W (z˜i )| = |²||γi |/q
(7.48)
and one can solve for the cut of Ei from the other parameters numerically
in specific problems with given sets of parameters.
The last branch of the Lambert function W−1 (z) is also real and negative
for −1/e < z < 0 with the values −1 to −∞, but is not acceptable as it does
not give the limit W−1 (z) → 0 as z → 0, when q → 1, which is required to
get the Shannon limit for q → 0.
As the probability function is of the exponential Boltzmann form for
Shannon entropy, it is defined for any arbitrary value of the energy, because
an exponential has no finite root. It can be seen that for finitely nonzero q−1,
148
the spectra of energy states Ei may be constrained when our entropy is used.
This is the same with Tsallis entropy, where too the functional dependence
has finite roots and a power behavior.
7.7
Numerical Comparison
The variation of the probability function for different E at different q values
are shown in Fig. 7.2
p
1
0.8
0.6
0.4
0.2
1
2
3
4
b E
Figure 7.2: Comparison of the pdf for the new entropy for values of q =
1, 1.1, 1.2 and 1.3. The solid line is for q = 1, i.e. the Gibbs exponential
distribution and the lines are in the order of q
We note that the new pdf has a smaller curvature than the Shannon
form. It drops increasingly rapidly for higher values of q at high energy
values it, and is quite different in shape and in magnitude from the Gibbs
149
exponential distribution. A variation of even 10% from the standard value
of q = 1 causes a discernible change in the pdf and should be observable in
experimental contexts easily . The shape is almost linear at q = 1.3.
The comparison of Tsallis pdf and the pdf for the new entropy was shown
in Fig.
7.3 and Fig.
7.4 for the same values of q: 1.1 in the former and
1.3 in the latter. It can be noticed that for larger q values the new entropy
yields much stiffer probability functions that depart substantially from the
Tsallis pdf’s.
The pdf for both the Tsallis form and our new form of entropy can be
noted to hit the axis at finite values of the energy, making the support of
the probability finite, unlike the exponential form of Shannon’s, as we have
discussed in the previous section.
7.8
Free Energy
We first assume that the entropy S is a sum of the components from all the
states:
S=
X
φ(pi )
(7.49)
i
where φ is a generalized function which will in general be different from
150
p
1
0.8
0.6
0.4
0.2
1
2
4
3
b E
Figure 7.3: Comparison of pdf’s for Tsallis nonextensive entropy (solid line)
and the new entropy presented here, for q = 1.1
p
1
0.8
0.6
0.4
0.2
0.5
1
1.5
2
2.5
3
b E
Figure 7.4: The same as Fig. 7.3, but for a higher q = 1.3
the Shannon form.
By optimizing L ( Eqn. 7.31), we get:
151
φ0 (pi ) = α + βEi
(7.50)
φ(pi ) = (α + βEi )pi
(7.51)
with the simple solution
The constant of integration vanishes because there can be no contribution
to the entropy from a state that has zero probability.
We assume that the Helmholtz free energy A is defined by
β(U − A) = S
(7.52)
If S is nonextensive, and U is extensive, this makes A to also be nonextensive.
Using Eqn. 7.49, Eqn. 7.51 and Eqn. 7.52 get
A = −α/β
(7.53)
Hence, we get the relation for the pdf
pi = ψ −1 (β(Ei − A))
152
(7.54)
with the definition
ψ(p) = φ(p)/p
(7.55)
Here, the assumption was made that the function ψ can be inverted.
This, however, may not always be the case for an arbitrary expression for
the entropy, at least not in a manageable closed form. A can be obtained
from the constraint equation
X
i
pi =
X
ψ −1 (β(Ei − A)) = 1
(7.56)
i
After A has been determined, it may be placed in Eqn. 7.54 to obtain the
pdf pi for each of the states, all properly normalized, and can find U and its
derivative C, the specific heat can also be found. For the simple system that
we shall discuss later, pressure and volume, or their analogues, will not enter
our considerations, and, hence, we have only one specific heat, Cv , with β
now defined as the inverse scale of energy, the temperature T ,
C = −β 2
153
∂U
∂β
(7.57)
7.9
Shannon and Tsallis Thermodynamics
Using Eqn. 7.54 and Eqn. 7.55, the Shannon entropy immediately gives,
pi = e−α−βEi = eβ(A−Ei )
(7.58)
so that Eqn. 7.56 gives the familiar expression for A
A = − log(Q)/β
(7.59)
where Q is the partition function
Q=
X
e−βEi
(7.60)
i
The separation of the A dependent factor is allowed by the exponential
form of pi in Eqn. 7.58 and, hence, we get such a simple expression for A in
the case of Shannon entropy, giving us normal extensive thermodynamics.
In the Tsallis case a common A dependent factor can no longer be separated out, and we cannot find an expression for A in terms of the partition
function in the usual way. Instead we need to solve the normalization equation Eqn. 7.56. This will give an infinite number of roots for a general value
154
of ² ≡ q − 1 , but for values of ² corresponding to reciprocals of integers, we
shall have polynomial equations with a finite number of roots, which too may
be complex in general. Later, we shall see, at least for the simple example
we considered in the end of this chapter, that a real and stable root can be
found that approaches the Shannon value of A as ² → 0. This is because in
that limit the Logq (p) function in the definition of the Tsallis entropy also
coincides with the natural logarithm.
7.10
Thermodynamics of the New Entropy
We have to obtain A by solving the transcendental Eqn. 7.56 numerically
in a manner similar to the Tsallis case. For the specific heat, we get
C = β2
X
i
Ei (Ei − A)
i
e−W (1+1/²)
1 + Wi
(7.61)
For brevity, we have written W i = W0 (²β(Ei − A)) and have used the
following identities related to the Lambert function ( [129]).
W (z)
z(1 + W (z))
(7.62)
W (z)/z = e−W (z)
(7.63)
W 0 (z) =
155
As W (z) ∼ z for small z, it can be noticed that for small ² we effectively
get classical pdf and thermodynamics, as in the Tsallis case. Hence, the parameter ² is again a measure of the deviation from the standard statistical
mechanics due to nonextensivity. However, our nonextensivity differs functionally from the Tsallis form, and the values of ² in the two forms can only
be compared in the limit of low β. Power series expansion of W (z) gives
W (z) =
X (−n)n−1 z n
n=1
n!
(7.64)
A comparison with the power series expansion for log(1 + z) in a form of
the Tsallis pi similar to the new entropy form gives
pi = e− log(1+²T β(Ei −A))/²T
(7.65)
We get a cancelation of the parameter ²n of the new entropy and of
Tsallis parameter ²T in the first order, so that both distributions approach
the Shannon pdf, as we have already mentioned, but if second order equality
is demanded, we get
1
²n = ²T
2
As the third order difference between W (Z) and log(1+z) is only
156
(7.66)
1 3
z ,
24
the
difference between the Tsallis form and our form of entropy will be detectable
at rather low T , i.e. high β.
7.11
Application to a Simple System
We now consider the simplest nontrivial system, where only two energy eigenvalues ±E exist, as in a spin-1/2 system. The standard results ( [130]) corresponding to Shannon entropy can be expected for a non-interacting system:
A = − log[2 cosh(βE)]/β
(7.67)
S = log[2 cosh(βE)] − βE tanh(βE)
(7.68)
U = −E tanh(βE)
(7.69)
C = (βE)2 / cosh2 (βE)
(7.70)
We shall take ² = 0.25, 0.10 and then solve numerically for the Tsallis
entropy. The values are shown in Figs. 7.5 - 7.8. We notice that Tsallis
entropy gives very similar shapes for all the variables and for ² = 0.10 we get
a fit much nearer to the Shannon form than for ² = 0.25.
The typical Schottky form for a two-level system can be noticed in the
157
A
-1
-1.5
-2
-2.5
-3
-3.5
1
2
3
4
5
6
T
-4.5
Figure 7.5: A for Shannon (top), Tsallis with ² = 0.1 (middle) and Tsallis
with ² = 0.25 (bottom)
S
0.7
0.6
0.5
0.4
0.3
0.2
0.1
1
2
3
4
5
6
T
Figure 7.6: S for same three entropy forms (from bottom to top Shannon,
Tsallis (0.1), Tsallis (0.25))
specific heat.
We can replace p− by 1 − p+ for faster execution of the numerics though
158
U
1
2
3
4
5
6
T
-0.2
-0.4
-0.6
-0.8
-1
Figure 7.7: U for same three entropy forms (same order as for S)
C
0.4
0.3
0.2
0.1
1
2
3
4
5
6
T
Figure 7.8: C for same three entropy forms (peaks bottom to top - Shannon,
Tsallis (0.1), Tsallis (0.25))
the variables involve both p+ and p− , after finding A.
For our new entropy too we first determine the numerical value of A from
the normalization condition
159
e−W+ /² + e−W− /² = 1
(7.71)
and then use this value to find U , S, and C. In Figs. 7.9- 7.12 the values
corresponding to ²n = 0.05 can be seen, which is half the Tsallis parameter
²T = 0.10 used, and also the Shannon values. We observe that only for the S
curve we have a perceptible difference between Tsallis entropy and our new
entropy at values of β near 1.
A
0.2
0.4
0.6
0.8
1
T
-1.025
-1.05
-1.075
-1.1
-1.125
-1.15
Figure 7.9: Comparison of A for Shannon (apart), Tsallis with ² = 0.1 and
new entropy for ² = 0.05 (superposed)
160
S
0.4
0.3
0.2
0.1
0.2
0.4
0.6
0.8
1
T
Figure 7.10: Comparison of S for same three forms of entropy (Shannon,
Tsallis (0.1), new entropy (0.05) - Tsallis is just over the new entropy).
U
-0.75
-0.8
-0.85
-0.9
-0.95
0.2
0.4
0.6
0.8
1
T
Figure 7.11: Comparison of U for the same three entropies. Shannon is
separated, other two overlap.
7.12
Summary of the Mathematical Properties of the New Entropy
From the discussions above, it is apparent
161 as the definition of entropy departs
from the simple classical form a class of mathematical functions of increased
C
0.4
0.3
0.2
0.1
0.2
0.4
0.6
0.8
1
T
Figure 7.12: Specific heat C for same three entropies. Again, only Shannon
is separated.
complexity arises in the probability distribution functions. The evolution can
be seen from the simple exponential of the Boltzmann form, to the generalized
q-logarithm of Tsallis, to our Lambert function, and they all reduce to the
simple form in a limit, which is the case for most physical laws and is true in
various physical situations. Is still an open problem whether a hierarchical
set structure can be given to such functions, with corresponding hierarchical
limits of physical parameters.
Previously, the Lambert function has been used in various physical contexts ( [112, 129]), such as Wien’s displacement law, two-dimensional capacitor fields, enzyme kinetics, models of combustion, range and endurance of
airplanes etc. It has also been used in the combinatorial problem of counting
162
of unrooted trees. Since combinatorics is close to the definition of entropy,
the presence of the Lambert function in our entropy is not unnatural.
The form of the entropy may be taken as an indicator of the effective
interaction among the constituent systems of the ensemble, the Shannon
form being the limiting case of the zero interaction case, and Tsallis or our
form being results of different forms of interaction with ² signifying a coupling constant. It is noteworthy that most thermodynamic functions we have
considered here are not crucially dependent on the form of the entropy with
adjusted coupling. However, the value of entropy itself may vary significantly
in the different formulations. This may act as an indicator to discriminate
between the suitability of different definitions of entropy in different contexts.
163
Chapter 8
Generalized Entropy and
Entanglement
In the entropy proposed in the last chapter, the definition is particularly simple, as it can be expressed simply as the divergence of a vector representing
the modified probabilities for the different possible states taking into account
a rescaling as a consequence of correlations or clustering due to interactions
between the microsystems.
Quantum entanglement of states is also a relevant issue in microscopic
systems . The problem of quantum entanglement of two states in the picture
of Tsallis type nonextensive entropy has been studied before [131, 132, 133].
The generalization of Shannon entropy to the very similar von Neumann en164
tropy using density operators in place of probability distributions [16] reveals
common features of the stochastic and the quantum forms of uncertainties
and this treatment can be extended to Tsallis’ form too.
Our purpose here is to present a combined study of stochasticity and
quantum entanglement, so that the former emerges from the quantum picture in a natural way, and then we intend to show that our new approach
of defining entropy also allows us to obtain a measure of mutual information
that involves stochasticity and entanglement together in a clear comprehensible way. The fact that our new definition of entropy, which is conceptually
very simple, also gives the probability distribution function in a closed form
in terms of Lambert W functions [59] allows one to carry out many calculations with the same ease as for Tsallis entropy. In this work, however, the
probability distribution will not be needed for explicit use.
In previous chapters, we had discussed the dynamics of entangled quantum networks. However, we have not considered decoherence effects that
would be inevitable when practically designing such quantum devices. Here,
we study the effect of entangled states collapsing to form mixed states and
study the phenomenon in terms of mutual information using different entropies.
First, we clarify a few definitions and mathematical identities that we will
165
be using later.
8.0.1
Entangled and Pure States
A state vector is entangled when it cannot be expressed as the factorizable product of vectors in the Hilbert spaces of the two subsystems in the
combined Hilbert space H of two particles (or subsystems in HA and HB ).
Entanglement, hence, is actually a property related to projection in the subspaces. Thus, it cannot be expected to be measurable by properties in the
bigger space alone.
If we have the state
|ψi =
X
Cij |iiA |jiB
(8.1)
ij
the density matrix for the product space is defined as
∗
ρAB = Cij Ckl
|ijiAB hkl|AB
(8.2)
The partial density matrix for the HA can be found by taking the trace
over the HB part
ρA = CC †
166
(8.3)
where the C 0 s are now the coefficient matrices. An entangled state (for an
explicit example) of two qubits (a “qubit”, or quantum bit, being a quantum
superposition of two possible states) may be expressed by the reduced 2X2
matrix from the |0i|0i, |1i, |1i basis sub-set in terms of density matrices:


 c


2
ρ=
γcs
γcs 

s
2


(8.4)
with γ = 1 for the pure quantum (entangled) state
|ψi = c|0i|0i + s|1i|1i
(8.5)
Here we have used the compact notation c = cos(θ), and s = sin(θ). This
entanglement occurs in the subspace of the product Hilbert space involving
only the two basis vectors |00i and |11i. We can obtain other entangled
combinations simply by relabeling the basis vectors. So, we shall use this as
the prototype.
We have an impure state with a classical stochastic component in the
probability distribution for |γ| < 1, although we still have probability conservation. This is because T r(ρ) = 1, which remains unchanged under any
unitary transformation. Factorizability (“purity” [135]) can be measured
by finding ζ of a quantum state, or its quantum non-entanglement, which
167
remains invariant under changes of γ
ζ = T rA [T rB (ρAB )2 ]
= c4 + s4 .
(8.6)
Hence, we see that for the maximum entanglement ζ = 1/2 when θ = π/4,
and the minimal entanglement corresponds to ζ = 1 (pure factorizable states)
when θ = 0, π/2.
Classical stochasticity is represented by quantum impurity and attains
the maximum value when γ = 0, and is nonexistent when γ = 1, which
corresponds to a pure entangled state.
It can be noted that ζ does not involve the stochasticity-related parameter
γ at all, but remains the quantifier of the quantum entanglement.
Another equivalent and interesting way of quantifying entanglement may
be the parameter
EAB = 2(T r[ρAB ] − T r[ρA ]T r[ρB ]) = sin2 (2θ)
(8.7)
which is more symmetric in the two subspaces and resembles a correla168
tion function. This has a value of 0 for no entanglement when θ = 0, π/2,
and maximal entanglement 1 for θ = π/4, as desired. This definition of entanglement follows the idea of mutual information, though we have not used
the entropy at this stage, but only the probabilities directly. This does not
involve the stochasticity in terms of the purity parameter γ. We have used,
in the relation above
ρA = T rB [ρAB ]
(8.8)
and similarly for ρB . In our specific case, for A or for B,


 c2
ρA,B = 


0 

2
0 s


(8.9)
with T r[ρA,B ] = 1 ensured.
8.1
Stochasticity from Entanglement with Environment
If the entangled state |ΨAB > is coupled to the environment state |ΨE >
quantum mechanically and then taking the trace over the environment states
can give a measure of impurity
169
|ΨABE i =
X
cijk |iiA |jiB |kiE
(8.10)
ijk
The density matrix for the pure quantum system, thus, is
X
ρABE =
cijk c∗lmn |ijkihlmn|
(8.11)
ijk,lmn
and as we trace over the environment, we get
ρAB =
X
cijk c∗lmk |ijihlm|
(8.12)
ij,lm
For the entangled mixture of |00i and |11i in HAB and the couplings
c000 = cc0
c001 = cs0
c110 = ss0
c111 = sc0
with c0 = cos(θ0 ) and s0 = sin(θ0 ),
the trace over HE states, yields the density matrix
170
(8.13)



ρAB = 


c2
2csc0 s0 

0 0
2c s cs
s
2


(8.14)
Hence, we introduce classical stochasticity by taking the trace over the
environment space with
γ = sin(2θ0 )
8.2
(8.15)
Entanglement, Entropy and Mutual Information
8.2.1
Single System Interacting with Environment
We consider a single system A interacting with the environment E. The
entanglement between the measured system and the environment is contained in the product space HA
N
HE , and hence the density operator for the
combined system-environmental space is as given in Eqn. 8.4 where γ = 1
indicates a pure entangled state, and the environment-traced density is given
by Eqn. 8.8. ρA and ρE are equal. Mutual information may be construed
as the entanglement, and defined by
171
EAE = 2(T rAE [ρAE ] − (T rA [ρA ])2 ) = sin2 (2θ0 )
(8.16)
with θ0 the angle of entanglement as defined before. Hence, the coupling
of the system to the environment is reflected by measurements on the system
A and the mutual information is contained in the parameters of the system
itself. The calculations are done in terms of von Neumann entropy, which is
simply the quantum density matrix form of the Shannon entropy (with mixing in an orthogonal quantum basis, it becomes similar to Shannon entropy)
[136],
S = −T r[ρ log(ρ)]
(8.17)
.
From the Araki-Lieb relation [103], we know that
SAE ≥ |SA − SE |
(8.18)
With SAE = 0 in a pure quantum state, we must have SA = SE , and
hence,
172
IAE = SA + SE − SAE = 2SA
= −2T rA [ρA log(ρA )]
(8.19)
which also confirms the view that the system itself contains the mutual
information in its parameters in such a case, as we found above.
If our new form of entropy [137] is used with the hypothesis that the
mutual information is still given by the same form with the parameter q not
equal to 1, which is the case for Shannon entropy, then we get
IAE = −2T rA [ρqA log(ρA )]
= −2c02q log(c02q ) − 2s02q log(s02q )
8.2.2
(8.20)
Entangled Systems interacting with the Environment
We have already shown ρAB in Eqn. 8.14 with the 3-system entanglement
shown in Eqn.
8.10 and the relatively simple choice of couplings in Eqn.
8.13. We may find the 3-system mutual information with similar construction
173
of ρAE , ρBE and ρABE , and by defining the 3-system mutual information as
IABE (q) = −SABE (q) + SAB (q) + SBE (q) + SAE (q)
−SA (q) − SB (q) − SE (q)
(8.21)
and with SABE (q) = 0 for any q, for a single 3-system pure state.
We trace over the B space to get ρAE , which, using as basis |00i, |01i,
|10i and |11i in the |AEi product space, yields


2 02
 c c



 c2 c0 s0

ρAE = 



0



2 0 0
c cs
0
c2 s02
0
0
2 02
0
ss
0
s2 c0 s0
0




0 




s2 c0 s0 



(8.22)
s2 c02
and an identical matrix for ρBE .
Using the relevant eigenvalues, we finally we get
IABE (q) = c02q log(c02q ) + s02q log(s02q )
−λq+ log(λq+ ) − λq− log(λq− )
174
(8.23)
where the eigenvalues λ1 and λ2 are for the ρAB matrix obtained after
tracing over E-space.
λ+,− = (1/2)(1 ±
√
[1 − 4(1 − γ 2 )c2 s2 ])
(8.24)
with γ given by Eqn. 8.15.
If we had started with a stochastic picture of entangled impure A-B system, where 1 − γ represents the stochasticity, the mutual information would
be
IAB (q) = −SAB (q) + SA (q) + SB (q)
= λq+ log(λq+ ) + λq− log(λq− )
−2c2q log(c2q ) − 2s2q log(s2q )
(8.25)
We first show the mutual information (MI) calculated according to the
Shannon form of the entropy in Fig. 8.1. This is equivalent to our form at
q = 1, as a function of the A − B entanglement angle θ and the entanglement
angle of (A − B)-system with the environment θ0 , which is related to the
stochasticity γ as explained above. We note that mutual information is
smoothly dependent on the angle of entanglement with the environment θ0 .
175
In this case, it seems that traditional entropy in this case is fairly insensitive
to details of coupling with the environment when the mutual information
between two systems is measured.
äAB
1
1.5
0.5
1
0
Θ¢
0.5
0.5
Θ
1
1.5
Figure 8.1: IAB as a function of entanglement angle θ in A-B space and
the entanglement angle θ0 with the environment, which is related to the
stochasticity.
We show the deviations ∆IAB of our MI from the Shannon MI, as a function q and entanglement angles θ0 and θ respectively in Fig. 8.2 and in Fig.
8.3, keeping the other angle at π/4 in each case. There is symmetry around
π/4. The variation is fairly smooth for fixed θ0 , i.e. fixed entanglement with
the environment. However, if we keep the entanglement between A and B
fixed at near π/4, then the mutual information using our form of entropy
changes sharply with θ0 near the symmetry value θ0 = π/4. It can be seen
176
that this comes from one of the eigenvalues of the density matrix ρAB approaching zero for this mixing value, and with q 6= 1, the there is either a
sharp peak or dip compared to the Shannon entropy case, which has fixed
q = 1.
It can be noted here that recently in a study of the entropy of a chain
of spins in a magnetic field [138] it has been found that both the usual von
Neumann and Renyi forms of entropy yield nonzero and surprisingly simple
closed expressions. Though this work does not mention entanglement explicitly, the correlation functions presented here, which determine the density
matrix and, therefore, its diagonalized form needed for entropy calculation,
actually are manifestations of the entanglement among the spins and between the spins and the magnetic field. The chain has been split into two
parts, similar to our A and B subsystems, and the external magnetic field
acts like the environment we have introduced in this work. Though they
carry out their extensive calculations at zero temperature, unlike our finite
temperature treatment, the fact they obtain nonzero SA for the first L spins
is apparently due to the segmentation of the pure state of the fully entangled
quantum system and the consideration of part A only for the entropy calculation. This is effectively equivalent to summing the states of part B and
the entanglement with the environment, and produces entropy due to the
177
corresponding loss of information about state of the whole system. Hence,
their results for this explicit model are consistent with our general result that
classical stochasticity and entropy may be a reflection of segmented consideration of bigger complete systems. The values of the entropy of different
types, such as the canonical Shannon form or generalized forms such as the
Renyi form, which goes to the Shannon form in the usual limit, like that of
the related parameter we have mentioned for Tsallis entropy, and for our new
form of entropy in this work, reflect the extent of entanglement or interaction or, equivalently, correlation. In their work a length scale comes out of
this segmentation, which appears to be similar to the angle of entanglement
in our case. We do not get a phase transition as they do, because we have
considered a simplified general finite system of only two or three component
subsystems, not an infinite chain, and finite systems cannot show any phase
transitions.
It is shown in fig. 8.2 that little variation takes place with changing θ0
for almost any q. Pronounced changes at small q are shown in fig.
8.3 (
q < 1) for different θ.
In Fig. 8.4 and Fig. 8.5 the difference between our MI and the Shannon
MI was shown as a function of θ and θ0 simultaneously, keeping q fixed at
0.8 and at 1.2. At q = 1, we get no difference, as our entropy then coincides
178
0.2
0.1
DIAB
0
-0.1
1.5
1
0.8
Θ¢
0.9
0.5
1
q
1.1
Figure 8.2: Difference of MI from our entropy with that from Shannon entropy at θ = π/4.
0.2
DIAB0.1
0
-0.1
-0.2
0.8
1.5
1
Θ
0.9
0.5
1
q
1.1
Figure 8.3: Same as Fig. 8.2 with θ0 = π/4
with the Shannon form. Again we notice that the mixing angle between A
and B shows fairly smooth variation, but θ0 or equivalently the stochasticity,
179
0.2
DIAB
0.1
1.5
1
0
Θ¢
0.5
0.5
Θ
1
1.5
Figure 8.4: MI difference between our entropy form and Shannon for q = 0.7
0
-0.05
DIAB
-0.1
-0.15
1.5
1
Θ¢
0.5
0.5
Θ
1
1.5
Figure 8.5: Same as Fig. 8.4 but for q = 1.3
causes pronounced peak ( for q < 1), or dip (for q > 1). We can conclude
that our method of entropy calculation can indicate a greater role of the
180
entanglement with the environment when this mixing is nearly equal for the
A − B entangled states.
0.1
0.05
DIAB
0
-0.05
1.5
1
Θ¢
0.9
0.5
1
q
1.1
1.2
Figure 8.6: Difference between MI from our entropy and Tsallis’s with θ =
π/4
We have previously [137] compared our form with results from Tsallis
entropy in formulating a general thermodynamics in view of the prevalent
familiarity with the Tsallis form of nonextensive entropy. We showed that
despite the conceptual and functional differences between Tsallis entropy
and our new form, the results are very similar if we take the Tsallis q to
be twice as far from unity (the Shannon equivalent value) as our value of q.
In Fig. 8.6 and Fig. 8.7 the difference of our MI from that derived from
Tsallis’s entropy was shown. It can be observed again that among θ and
θ0 , differences are both relatively more significant for q values different from
181
0.1
DIAB
0
1.5
1
-0.1
Θ
0.9
0.5
1
q
1.1
1.2
Figure 8.7: Same as Fig. 8.6 but with θ0 = π/4.
1, for both angles near pi/4, with peaks and dips similar to the comparison
with the mutual information that was calculated with Shannon entropy.
8.3
Quantum Pattern Recognition and Mutual Information
In previous chapters, we have discussed how quantum networks can store
memories of quantum mechanical nature. In the last sections, we have discussed quantum mutual information with respect to entanglement with the
environment. In this section, we apply the idea of density matrix and mutual information developed in the earlier section further to a more practical
182
purpose, one of quantum pattern recognition.
We first discuss the concept of quantum mutual information in more detail
with respect to patterns and detectors. We first review some of the concepts
discussed in [98].
8.4
Entanglement and Mutual Information
When the test object (or its representation) is coupled completely to the
pattern recognizing device in the case of perfect recognition, quantum states
must match one to one absolutely. Conversely, if there is no recognition, the
state of the two will be uncorrelated. The states would then factorize in the
product space of the two state spaces:
|systemi = |objecti
O
|detectori
(8.26)
Each state can be a superposition of eigenstates here.
|objecti =
X
ci |Iiobj |detectori =
X
dj |jidet
(8.27)
When there is no factorization, in the case of full entanglement,
|system >=
X
183
ci |iiobj |iidet
(8.28)
The two basis states will belong to independent Hilbert spaces in this
case. However, the same label i is used to show a one-to-one correspondence.
When we have partial matching and, therefore, incomplete entanglement,
we get
|systemi =
X
cij |iiobj |jidet
(8.29)
For this specific composite system, the density matrix can be written as
ρsys =
X
cij c ∗ ij|iiobj |jidet hi|obj hj|det
(8.30)
The correlation between the object pattern and the detector can be measure by
ζ1 = T rsys [ρsys − ρobj ρdet ]
(8.31)
In this specific case, we calculate the partial traces
ρobj = T rdet [ρsys ]
(8.32)
ρdet = T robj [ρsys ]
(8.33)
184
The measure ζ defined before translates to
ζ2 = T robj [ρobj ρobj ]
(8.34)
for this specific case.
[103] shows how these two entanglement measures are related by
ζ1 = 1 − ζ2
(8.35)
The Shannon entropy can be defined in terms of the density function as
SS = −T r[ρ log(ρ)]
(8.36)
Our entangled system, with a state equation
|systemi = a|1i|1i + b|0i|0i
(8.37)
has a density matrix of the general form:


 |a|2

ρ=

∗
a bc
We use the normalization
185
ab∗ c 

|b|
2


(8.38)
T r[ρ] = 1 = |a|2 + |b|2
(8.39)
As we see, c does not appear in it. c measures the degree of impureness,
with a possible value ranging from 0 for completely impure states, to |c| = 1
for a perfectly pure pair of states. When a pure state is considered, it has
a probability of 1 associated with that specific state and a probability of 0
related with any other state. Eqn. 8.34, then, gives
Ssys = 0
(8.40)
We then get from the Araki-Lieb inequality [103]
SAB ≥ |SA SB |
(8.41)
Sobj = Sdet
(8.42)
,
the relation
Mutual information between the object and the detector, thus, is
186
Iobj−det = Ssys Sobj Sdet = −2T r[ρobj log(ρobj )]
(8.43)
The matrix of Eqn. 8.38, representing a two qubit coupling, gives
ζ2 = |a|4 + |b|4
(8.44)
The mutual information, in this case, is
Iobj−det = 0
(8.45)
This value occurs because the trace operator does not depend on diagonalization and when the states are pure (with |c| = 1, ρ it can have only one
eigenvalue equal to 1, making the others 0.
In most physical contexts, Shannon entropy is the most generally used
form since the Boltzmann probability distribution is verified experimentally
in these cases. However, in recent years, other forms of entropy have been
suggested, giving different probability distributions for different energies with
modified functional forms. Some of these new forms of entropy [17] [18]
were discussed before, and another entropic function was proposed by us (
[137, 139]). Here we shall simply mention the mathematical form of the new
entropy:
187
SN = −T r[dρq /dq]
(8.46)
This new definition of entropy gives the mutual information between the
object and the detector as
N
= −2T r[ρq log(ρ)]
Iobj−det
(8.47)
One can simply take the sum over the eigenvalues by diagonalizing ρ .
This diagonalizing will give only one eigenvalue equal to 1 and the other to
equal to zero for a pure quantum state, making mutual information zero. It is
impurity, and not entanglement, that is measured by this mutual information
as expressed by the variable c in Eqn. 8.38, giving a deviation of |c| from 1,
which indicates a pure quantum state.
A mixture of pure quantum states at finite temperature produces this
impurity, and it is an impure system only that can show distinction between
the different forms of entropy.
The problem of mutual information because of impurity may become a
relevant issue since a pattern recognition system may contain such impure
mixtures of quantum states. The calculations shown above may provide one
more test for the appropriateness of different types of entropy in various
188
situations. ζ was defined before related to the correlation between the states
of the object and those of the detector. However, this coefficient does not give
a maximal value for perfect correlation and zero for random association. For
example, a value of b = 0, gives ζ2 = 1, and ζI = 0in Eqn. 8.44. This would
also be the situation in the case of factorized states. This can be explained
by observing that as one qubit is taken away from the detector-pattern pair,
the idea of pattern recognition becomes meaningless. ζI = 1/2, is reached
√
for a = b = 1/ 2, i.e. when both patterns exist with the same frequency
and can also be identified perfectly.
189
Chapter 9
Prior PDFs, Entropy and
Broken Symmetry
The justification of choosing Shannon or any other more generalized entropy,
such as that of Tsallis or Renyi, or the one we have presented here, lies
eventually in the relevance or “good fit” such an entropy would produce in the
data corresponding to a situation where the presence or lack of interactions
among the members or other considerations suggest the need for a proper
choice. However, data are always finite, and probability distribution is the
limit of relative frequencies with an infinite sample. One, therefore faces the
problem of estimating the best probability distribution function (PDF) from
a finite sample [140]. This PDF may be subject to the constraint of a known
190
entropy, in whatever way defined, as a functional of the PDF.
Mathematically, the problem of determining the best posterior PDF,
given a rough prior PDF and data points, is expressed formally by Bayes
Theorem. However, the constraint of the constant entropy makes the functional integral impossible to handle even for a fairly simple prior as found
by Wolpert and Wolf [141] and by Bialek, Nemenman and Shafee [19]. The
integrals involved were first considered in a general context by [141], and the
question of priors was addressed in [140, 19]. It was discovered that, though
the integral for the posterior was intractable, the moments of the entropy
could be calculated with relative ease.
In [19] it has also been shown that for Dirichlet type priors [142]
P (pi ) =
Y β
pi
(9.1)
in particular (which give nice analytic moments with exact integrals, and
hence, are hard to ignore) the Shannon entropy is fixed by the exponent
β of the probabilities chosen for small data samples, and hence, not much
information is obtained for unusual distributions, such as that of Zipf, i.e.
a prior has to be wisely guessed for any meaningful outcome. As a discrete
set of bins has no metric, or even useful topology that can be made use of in
Occam razor type of smoothing, in this paper other tricks were suggested to
191
overcome the insensitiveness of the entropy.
We have noted already that the PDF associated with our proposed entropy differs from that of the Shannon entropy by only a power of pi , but this
changes the symmetry of the integrations for the moments for the different
terms for different bins. We, therefore, shall examine in this chapter if the
nature of the moments are sufficiently changed by our entropy to indicate
cases where data can pick this entropy in preference to Shannon or other
entropies.
9.1
Priors and Moments of Entropy
For completeness, we mention here the formalism developed by Wolpert and
Wolf [141]. The uniform PDF is given by
Punif ({pi }) =
1
Zunif
Z
Zunif =
A
Ã
δ 1−
Ã
dp1 dp2 · · · dpK δ 1 −
K
X
!
pi
i=1
K
X
!
pi
(9.2)
i=1
where the δ function is for normalization of probabilities, Zunif is the total
volume occupied by all models. The integration domain V is bounded by each
192
pi in the range [0, 1]. Because of the normalization constraint, any specific pi
chosen from this distribution is not uniformly distributed and “uniformity”
means simply that all distributions that obey the normalization constraint
are equally likely a priori.
We can find the probability of the model {pi } with Bayes rule as
P ({pi }|{ni }) =
P ({ni }|{pi })Punif ({pi })
Punif ({ni })
P ({ni }|{pi }) =
K
Y
(pi )ni .
(9.3)
i=1
Generalizing these ideas, we have considered priors with a power-law dependence on the probabilities calculated as
Ã
!
K
K
X
Y
1
Pβ ({pi }) =
δ 1−
pi
pβ−1
,
i
Z(β)
i=1
i=1
(9.4)
It has been shown [19] that if pi ’s are generated in sequence [ i = 1 → K]
from the Beta-distribution
Ã
P (pi ) = B
!
qi
1−
P
j<i
pj
B (x; a, b) =
; β, (K − i)β
xa−1 (1 − x)b−1
B(a, b)
gives the probability of the whole sequence {pi } as Pβ ({pi }).
193
(9.5)
Random simulation of PDF’s with different shapes ( a few bins occupied,
versus more spread out ones) show that the entropy depends largely on the
parameter β of the prior and hence, sparse data has virtually no role in
getting the output distribution shape. This would seem unsatisfactory, and
some adjustments appear to be needed to get any useful information out.
We shall not here repeat the methods and results of [19], which considers
only Shannon entropy.
9.2
Comparison of Shannon and Our Entropy
In our case with the entropy function given by Eqn.
7.26, we note that
it involves not only a simple replacement of the individual factors involving
nu ²{ni } [141] by nu + q − 1 in the product integral involved in the moment
determination, but a complete re-calculation of the moment, using the same
techniques given in [141]. Apparently, the maximal value of entropy should
correspond to the most flat distribution, i.e.
Smax = K (1−q) log(K)
(9.6)
In the limit of very sparse data, i.e. ni → 0, we get eventually the
expression for the first moment, i.e. the expected entropy
194
hS1 i/hS0 i = K
Γ(β + q) Γ(βK)
∆Φ0 (βK + q, β + q)
Γ(β) Γ(βK + q)
(9.7)
where we have for conciseness used the notation of ref. [141]
∆Φp (a, b) = Ψ(p−1) (a) − Ψ(p−1) (b)
(9.8)
Ψn (x) being the polygamma function of order n of the argument x. It can
be checked easily that this expression reduces to that in ref. [19] when q = 1,
i.e. when we use Shannon entropy.
9.3
Results for Mean Entropy
So, we now have, unlike Shannon, a parameter q that may produce the
difference from the Shannon case, where q is fixed at unity. In Figs.
9.1
- 9.3 we show the variation of the ratio of hS1 i/Smax with variable bin number
K. In ref. [19] we have commented how insensitive the Dirichlet prior [142] is
when Shannon entropy is considered in the straightforward manner given in
ref. [141]. In our generalized form of the entropy, we note that by changing
the parameter, specific to our form of the entropy, for q > 1, we get a peak
for small β and large K values.
This peak allows us to choose uniform Dirichlet priors with appropriate q
195
0.8
2.0
0.6
S'
0.4
1.5
0.2
0.0
0
1.0
Β
0.5
500
K
1000
0.0
Figure 9.1: Ratio of expected value (first moment) of new entropy plotted
against bin number K and prior exponent β for entropy parameter q = 0.5.
2.0
0.5
S'
1.5
0.0
0
1.0
Β
0.5
500
K
1000
0.0
Figure 9.2: Same as Fig. 9.1, but for q = 1.0, i.e. Shannon entropy.
value, that would nevertheless lead to asymmetry not possible with Shannon
entropy. In other words, instead of the priors, we can feed the information
196
2
2.0
S'
1
1.5
0
0
1.0
Β
500
0.5
1000
K
1500
2000
0.0
Figure 9.3: As previous two figures, but for q = 1.5.
about expected asymmetry of the PDF to the entropy with no need to choose
particular bins. The nonextensivity of our entropy, coming possibly from interaction among the units, gives rise to situations where the entropy maxima
do not increase with the number of bins like log(K), but being K (1−q) log(K),
may be extended or squeezed, according to the value of q being less than or
greater than unity.
The interesting thing to note is that for q > 1 and large K, at small prioric
parameter β, the entropy peak exceeds the normally expected expression in
Eqn. 9.6, with full K, so, the expected value of entropy is seen to exceed the
formal maximum. The clustering or repulsive effects, change the measure
of disorder from the Shannon type entropy. So, the highest expected value
197
of entropy may correspond not to a uniformly distributed population, but
to that corresponding to one with a smaller subset that is populated. This
means that for our entropy the most uniform distribution is not the least
informative, the pq weighting distorts it to an uneven distribution for the
expected maximal entropy value. This result is in some ways similar to
spontaneous symmetry breaking in field theory, where the variation of a
parameter leads to broken-symmetry energy minima.
A neater view of these results can be seen in Figs. 9.4- 9.6 with K values
fixed.
S'
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
Β
Figure 9.4: Clearer view in 2-dimensional plot, with K = 10. Red, green and
blue lines are for q = 0.5, 1.0 and 1.5 respectively
We have not obtained the second moment, i.e. the standard deviation,
or the spread, of the entropy distribution, because, with our entropy and
an arbitrary q, the expressions cannot be obtained in the simple form of
198
S'
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
Β
Figure 9.5: As Fig. 9.4, but for bin number K = 100.
S'
3.0
2.5
2.0
1.5
1.0
0.5
0.2
0.4
0.6
0.8
1.0
Β
Figure 9.6: As previous two figures, but for K = 1000.
ref. [19]. We, can however, expect that the variation of the higher moments
from the Shannon case will be less than the first moment, because higher
derivatives of the Γ functions are smoother. We shall assume the spreads are
narrow enough to concentrate on the first moments only.
Apart from the PDF estimates above, this picture of broken symmetry
199
for the maximal entropy when the parameter q > 1 is also manifest directly
in an explicit calculation of the entropy using our prescription with a simple
three-state system. The symmetric expected maximal entropy in this case
should be
Smax = −3pq log p
(9.9)
with p = 1/3.
With two of the probabilities p1 and p2 running free from 0 to 1 with the
constraint p1 + p2 + p3 = 1, the plot for the entropy
S = −pq1 log p1 − pq2 log p2 − (1 − p1 − p2 ) log(1 − p1 − p2 )
(9.10)
we plot S/Smax in Figs. 9.7, 9.8. For q = 2.44 we obtain the most interesting
behavior, with a local maximum at the point of symmetry p1 = p2 = p3 =
1/3, which is not the global maximum. For q ≤ 1 the symmetry point gives
the global maximum.
200
1.0
S
1.0
Smax 0.5
0.0
0.5p2
0.5
p1
1.0
0.0
Figure 9.7: Our entropy for a three-state system, with parameter q = 2.44,
as two independent probabilities p1 and p2 are varied wit the constraint
p1 + p2 + p3 = 1. The expected maximum at the symmetry point p1 = p2 =
p3 turns out to be a local maximum. The global maxima are not at the
end points with one of the probabilities going up to unity and the others
vanishing, which gives zero entropy as expected, but occurs near such end
points, as shown clearly in the next figure.
S
Smax
1.0010
1.0005
1.0000
0.1
0.2
0.3
0.4
0.5
0.6
p1
Figure 9.8: Two-dimensional version of the previous Fig. 9.7, with p1 = 1/3
fixed, so that only p2 varies. This shows a clearer picture of local maximum
at the symmetry point and global maxima near the end points.
201
Chapter 10
Illation and Outlook
We have presented in this dissertation generalizations and extensions of an
important class of artificial neural networks, the integrate-and-fire type, taking them up to a fully quantized model with all nodes completely entangled
with one another, through intermediate steps of a quasiclassical model and
a quantum-gated but not fully entangled model. In this study our primary
concern has been the periodic behavior of these networks, because periodicity is not only of vital concern in biological neural systems, but may also be
important in the design of artificial neural networks. We have found that periodic behavior may be present in quantized versions, which may decay out,
or may be retained indefinitely. Even where the periodic behavior dies out,
as in the completely entangled version, we have found that the input data
202
may be recovered by a method of back-projection. We, therefore, envisage
the possibility of the use of quantum devices with both short-term memories
(where some decoherence comes in) and long-term (fully entangled case) ones
with interesting and useful technological use.
At finite temperature noise and stochasticity comes into play, along with
the concept of entropy. In a neural network, where units are interacting
in a well-defined manner, there may be a competition between order and
disorder, i.e. a partial loss of information due to the finite temperature, due
to noise, or due to intrinsic defect or incompleteness of design. However,
the interference leading to stochasticity may have an element of bias in such
systems due to non-random interactions. Hence, in neural networks and all
such systems with specific interactions, the concept of entropy may need
to be generalized. We have suggested a generalized form of entropy that
is based on information-theoretic considerations. We have argued how the
usual definition of entropy may be generalized to take into account possible
deformations of the volumes of information registers, which may come about
from interactions, leading to mutual information, both at the classical and
quantum levels.
In a stochastic situation actual probability distribution functions are constructed from a finite set of data, which cannot contain the full information
203
about the PDF. One has to use Bayesian techniques to estimate such PDF’s
using prior estimates and the data. It was found before that the functional
behaviors of Shannon entropy and popular Dirichlet type priors conspire to
make the direct application of the formalism very insensitive to the variation
of parameters, requiring adoption of more complicated procedures. In this
work we have shown how our form of entropy may give the necessary levity
for finding good fits. This feature too makes our entropy more useful in the
right context, where interactions effectively add more information, removing
some of the uncertainty of flat priors. One interesting outcome of this investigation was the result that the highest entropy, as defined by us, may not
correspond to the most uniform distribution, as the added information may
break the symmetry in a natural way, leading to clusters, or, equivalently,
the deformation of phase cells for the storage of information.
As the ultimate objective of information processing systems, including
neural networks, or information retaining systems, usually involves pattern
matching, we have looked into the possibility of applying biological type (allosteric) co-operative recognition units to enhance the probability of recognition in quantum systems. This type of approach may indeed be very relevant
for small coherent quantum devices, because biological allosterism, which
forms the more efficient sigmoid switching curves in enzymes seems to be in
204
some ways analogous to quantum entangled systems and nonlocality.
In any case, as in our study of H-H type neural networks, we believe
that the transition from classical to quantum devices for greater efficiency,
in time and storage space, will probably require an interplay of classical and
quantum concepts.
205
Bibliography
[1] B.E. Swartz, Timeline of the history of EEG and associated fields. Electroencephalography and clinical Neurophysiology 106, 173-176 (1998).
[2] W.L. Miller and K.A. Sigvardt, Spectral analysis of oscillatory neural
circuits. Journal of Neuroscience Methods 80, 113–128 (1998).
[3] I. Sugihara, E. J Lang, R. Llins, Uniform olivocerebellar conduction time underlies Purkinje cell complex spike synchronicity in the rat
cerebellum. J. Physiol. Lond. 470, 243–271 (1990).
[4] C.A. Del Negro, C.G. Wilson, R.J. Butera, H. Rigatto, S. Henrique and
C. Jeffrey, Periodicity, mixed-mode oscillations, and quasiperiodicity in
a rhythm-generating neural network. Biophys J. 82, 206-214 (2002).
[5] R. Refinetti, Circadian Physiology (CRC Press, 2005).
206
[6] M.R. Mehta, Role of rhythms in facilitating short-term memory. Neuron 6, 147–56 (2005).
[7] R.W. Kensler, Mammalian cardiac muscle thick filaments: Their periodicity and interactions with actin. Biophys. J. 82 , 1497–1508 (2002).
[8] C. Kaernbach and H. Schulze, Auditory sensory memory for random
waveforms in the Mongolian gerbil. Neuroscience Letters 329, 37–40
(2002).
[9] H. Onimaru, A. Arata and I. Homma, Neuronal mechanisms of respiratory rhythm generation: an approach using in vitro preparation. Jpn
J Physiol. 47, 385–403 (1997).
[10] W. Hoppe, W. Lohmann, H. Markl and H. Ziegler (ed.), Biophysics
(Springer, NY, 1983).
[11] J.J. Hopfield, Proc. Natl. Acad. Sci. USA 79, 2554 (1982).
[12] J.J. Hopfield and A.V.M. Herz, Proc. Natl. Acad. Sci. USA 92, 6655
(1995).
[13] O. Watanabe (Ed.), Kolmogorov Complexity and Computational Complexity (Springer, New York, 1992).
207
[14] A. L. Hodgkin and A. F. Huxley, Quantitative Description of Membrane Current and its Application to Conduction and Excitation in
Nerve. J. Physiol. 117, 500 (1952).
[15] C. Koch, Computation and the single neuron. Nature 385, 207 (1997).
[16] M. Nielsen and I. Chuang, Quantum computation and quantum information (Cambridge Univ. Press, NY, USA, 2000).
[17] A. Renyi, Probability Theory (Amsterdam: North-Holland, 1970).
[18] C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics, J.
Stat.Phys. 52, 479–487 (1988).
[19] I. Nemenman, F. Shafee and W. Bialek, Entropy and Inference, Revisited, in Adv. Neur. Info. Processing 14, eds. T.G. Dietterich, S. Becker
and Z. Ghahramani (MIT Press, Cambridge, 2002) pp. 471–478.
[20] P.W. Anderson, Mat. Res. Bull. 5, 549 (1970).
[21] D.L. Stein(ed.), Spin Glasses and Biology (World Scientific, Singapore, 1992).
[22] D.J. Amit, H. Gutfreund and H. Sompolinsky, Phys. Rev. Lett. 55,
1530 (1985).
208
[23] B. Derida, E. Gardner and A. Zippelius, Europhys. Lett. 4, 167
(1987).
[24] H. Gutfreund and M. Mezard, Phys. Rev. Lett. 61, 235 (1988).
[25] H. Sompolinsky and I. Kantner, Phy. Rev. Lett. 57, 2861 (1986).
[26] D. Horn, Physica A 200, 594 (1993).
[27] F. Zertuche, R. Lopes-Pena and H. Waelbroech, J. Phys. A 27, 5879
(1994).
[28] T.L.H. Watkin and D. Sherrington, J. Phys. A 24, 5427 (1991).
[29] T. Aonishi, Phase Transitions of an Oscillator Neural Network with a
Standard Hebb Learning Rule. Preprint cond-mat/9808121 (1998).
[30] D.A. McCormick, B.W. Connors, J.W. Lighthall and D.A. Prince,
Comparative electrophysiology of pyramidal and sparsely spiny stellate
neurons of the neocortex. J Neurophysiol. 54, 782-806 (1985).
[31] Z. Xiang, J.R. Huguenard, D.A. Prince, GABAA receptor-mediated
currents in interneurons and pyramidal cells of rat visual cortex. J.
Physiol. 1:506, 715–30 (1998).
209
[32] A.M. Thomson, D.C. West, J. Hahn, J. Deuchars, Single axon IPSPs
elicited in pyramidal cells by three classes of interneurones in slices of
rat neocortex. J. Physiol. 1:496 ( Pt 1), 81–102 (1996).
[33] J. Karbowski, N. Kopell, Multispikes and synchronization in a large
neural network with temporal delays. Neural Comput. 12(7), 1573–606
(2000).
[34] W. Gerstner, Rapid Phase Locking in Systems of Pulse-Coupled Oscillators with Delays. Phys. Rev. Lett. 76, 1755–1758 (1996).
[35] D. Golomb, G.B. Ermentrout, Continuous and lurching traveling
pulses in neuronal networks with delay and spatially decaying connectivity. Proc. Natl. Acad. Sci. USA. 9:96, 13480–5 (1999).
[36] S.M. Crook, G.B. Ermentrout, M.C. Vanier, J.M. Bower, The role
of axonal delay in the synchronization of networks of coupled cortical
oscillators. J. Comput. Neurosci. 4(2), 161–72 (1997).
[37] C. Monroe, Quantum Information Processing with Atoms and Photons.
Nature 416, 238–246 (2002).
[38] M. Riebe et al., Deterministic quantum teleportation with atoms. Nature 429, 734–737 (2004).
210
[39] M.D. Barrett et al., Nature, 429, 737 (2004).
[40] J.I. Cirac and P. Zoeller, Quantum computations with cold trapped
ions. Phys. Rev. Lett. 74, 4091–4094 (1995).
[41] D.Vion et al., Science 296, 886 (2002).
[42] T. Yamamoto, Yu. A. Pashkin, O. Astafiev, Y. Nakamura, J. S. Tsai,
Demonstration of conditional gate operation using superconducting
charge qubits. Nature, 425, 941–944 (2003).
[43] N. Gisin, G. Ribordy, W. Tittel, and H. Zbinden, Quantum cryptography. Rev.Mod. Phys. 74, 145–195 (2002).
[44] J.H. Plantenberg, P.C. de Groot, C.J.P.M. Harmans, J.E. Mooij,
Demonstration of controlled-NOT quantum gates on a pair of superconducting quantum bits. Nature 447, 836–839 (2007).
[45] N. Margolus and L. Levitin, The maximum speed of dynamical evolution. Physica D 120, 188–195 (1998).
[46] L.B. Levitin, T. Toffoli and Z. Walton, Operation time of quantum
gates, in Quantum Communication, Measurement, and Computing, J.
H. Schapiro and O. Hirota, ed., (Rinton, 2003) pp.457–459. quantph/0210076 (2002).
211
[47] M. Zak and C.P. Williams, Quantum neural nets. Int. J. Theo. Phys.
37, 651–684 (1998).
[48] F. Shafee, A spin glass model of human logic systems, to appear
in Proc. Euor. Conf. Compl. Syst. 2005, arxiv.org:physics/0509065
(2005).
[49] B.E. Baaquie, Quantum Finance, (Cambridge Univ. Press, Cambridge,
UK, 2004).
[50] R. Penrose, The Emperor’s New Mind (Oxford Univ. Press, Oxford,
1989).
[51] R. Penrose, Shadows of the Mind (Oxford Univ. Press, Oxford, 1994).
[52] M. Tegmark, The Importance of Quantum Decoherence in Brain Processes. Phys. Rev. E 61, 4194–4206 (2000).
[53] S. Hagan, S. Hameroff, J. Tuszynski, Quantum computation in brain
microtubules? Decoherence and biological feasibility. Phys. Rev. E 65,
061901 (2002).
[54] J. R. Petta, A. C. Johnson, J. M. Taylor, E. A. Laird, A. Yacoby, M.
D. Lukin, C. M. Marcus, M. P. Hanson, and A. C. Gossard, Coherent
212
manipulation of coupled electron spins in semiconductor quantum dots.
Science 309, 2180-2184 (2005).
[55] J. M. Taylor, J. R. Petta, A. C. Johnson, A. Yacoby, C. M. Marcus,
M. D. Lukin, Relaxation, dephasing, and quantum control of electron
spins in double quantum dots. Phys. Rev. B 76, 035315 (2007).
[56] F. Shafee, Stochastic dynamics of networks with quasiclassical excitations. Stochastics and Dynamics 7, 403–416 (2007).
[57] F. Shafee, Neural networks with quantum gated nodes. Engineering
Applications of Artificial Intelligence 20, 429-437 (2007).
[58] F. Shafee, Information in entangled dynamic quantum networks. Microelectronics Journal, 37, 1321-1324 (2006).
[59] F. Shafee,
Neural networks with finite width action potentials.
Preprint: arxiv.org: cond-mat/0111151 (2001).
[60] J.J. Sakurai, Modern Quantum Mechanics (Addson-Wellesley, MA,
USA, 1994) p.316
[61] S.L. Rauch, M.R. Milad, S.P. Orr, B.T. Quinn, B. Fischl, R. Pitman,
Orbitofrontal thickness, retention of fear extinction, and extraversion.
Neuroreport. 28:16 1909-12 (2005).
213
[62] M.V. Altaisky, Preprint arxiv.org: quant-ph/0107012 (2001).
[63] D.A. Lidar, I.L. Chuang and K.B. Whaley, Decoherence-free subspaces for quantum computation.
Phys. Rev. Lett.81, 2594–2597
(1998).
[64] A.P. Kirilyuk, Dynamically Multivalued Self-Organisation and Probabilistic Structure Formation, Solid State Phenomena 97-98, 21–26
(2004).
[65] B. Ricks and D. Ventura, in Advances in Neural Information Processing
Systems 16: Neural Information Processing Systems, NIPS 2003, ed.
by Sebastian Thrun, Lawrence K. Saul and Bernhard Schlkopf, (MIT
Press, Cambridge, MA, USA, 2004).
[66] K.W. Cheng, Breaking RSA Code on the Quantum Computer, Thesis,
Kaohsiung University, Taiwan (2002).
[67] A. Fiorentino, Confronto fra reti neurali classiche e quantistiche, Thesis, Universita degli studi di Milano (2002).
[68] J. Faber, and G.A. Giraldi, Quantum models for artificial neural
networks, Technical rep. of LNCC National laboratory for Scientific
Computing, Brazil (2002).
214
[69] P.W. Shor, Polynomial-Time Algorithms for Prime Factorization and
Discrete Logarithms on a Quantum Computer. SIAM J. Comp 26,
1484–1509 (1997).
[70] M. Meister, M.J. Berry The neural code of the retina. Neuron 22,
43550 (1999).
[71] M. Mahowald and C. Mead, The silicon retina. Scientific American
264, 76 (1991).
[72] K. Boahen, Aretinomorphic vision system. IEEE Micro 16, 30 (1996).
[73] R. F. Lyon and C. A. Mead, The cochlea. Analog VLSI and Neural
Systems (Addison Wesley Publishing Co., Reading MA, 1989) p.279
[74] R. Sarpeshkar, R. F. Lyon, and C.A. Mead, An analog VLSI cochlea
with new transconductance amplifiers and nonlinear gain control. Proceedings of the 1996 IEEE International Symposium on Circuits and
Systems, Atlanta, GA 3, 292 (1996).
[75] S. DeWeerth, L. Nielsen, C. Mead, and K. Astrom, A simple neuron
servo. IEEE Tran. Neural Networks 2, 248 (1991).
215
[76] T. Horiuchi, T. Morris, C. Koch, and S. DeWeerth, Analog VLSI circuits for attention-based visual tracking. Advances in Neural Information Processing Systems 9, (MIT Press, 1997) p.706
[77] F. De Martini, V. Buek, F. Sciarrino and C. Sias, Experimental realization of the quantum universal NOT gate. Nature 419, 815 (2002).
[78] A.
Ekert, P.
Hayden and H. Inamori, Basic concepts in quantum
computation. quant-ph/0011013 (2000).
[79] J. Preskill, Quantum Computation. Lecture Notes for Physics 219,
(California Institute of Technology, Pasadena, CA, 1998).
[80] C. Altman, J. Pykacz, R. R. Zapatrin, Superpositional Quantum
network Topologies. International Journal of Theoretical Physics,43,
2029–2040 (2004).
[81] A.S. Davydov, Solitons in molecular systems. (Kluwer, Dordrecht,
1991).
[82] A. Xie, L. van der Meer, W. Hoff and R. H. Austin, Long-Lived Amide
I Vibrational Modes in Myoglobin. Phys. Rev. Lett. 84, 5435–5438
(2000).
216
[83] W. Fann, L. Rothberg, S. Benson, J. Madey, S. Etemad and R.H.
Austin, Dynamical Test of Davydov-Type Solitons in Acetanilide Using
a Picosecond Free-Electron Laser. Phys. Rev. Lett. 64, 607–610 (1990).
[84] F. Shafee, Quantum Images and the Measurement Process. Elec. J.
Theor. Phys. 4:14, 121–128 (2007).
[85] R. Omnes, A model of quantum reduction with decoherence. Phys.
Rev. D71, 065011 (2005).
[86] W.H. Zurek Decoherence, einselection, and the quantum origins of
the classical. Rev. Mod. Phys. 75 , 715 (2003).
[87] G. Sewell On the mathematical Structure of Quantum Measurement
Theory. Rep. Math. Phys. 56, 271 (2005).
[88] R. Douglas, Rules of thumb for neuronal circuits in the neocortex.
Notes for the Neuromorphic aVLSI Workshop, Telluride, CO (1994).
[89] E. Guigon and Y. Burnod, Short-term memory. The Handbook of Brain
Theory and Neural Networks M.A. Arbib ed. ( MIT Press, Cambridge,
MA, 1995) p.867
217
[90] C. Diorio, P. Hasler, B. A. Minch, and C. Mead, A complementary pair
of four-terminal silicon synapses. Analog Integrated Circuits and Signal
Processing 13, 153 (1997).
[91] C. Diorio, P. Hasler, B. A. Minch, and C. Mead, A single-transistor
silicon synapse. IEEE Trans. Electron Devices 43, 1972 (1996).
[92] L.D. Iasemidis and J.C. Sackellares, Chaos Theory Epilepsy. The Neuroscientist 2, 118 (1996).
[93] S.G. Schirmer, H. Rabitz et al, Quantum control using sequence of
simple control pulses. quant-ph/0105155 (2001).
[94] A. Barenco et al, Phys. Rev. A 52, 3457 (1995).
[95] L.K.
Grover, A fast quantum-mechanical algorithm for database
search. Proc. 28th Annul ACM Symposium on the Theory of Computing
(STOC’96) (ACM, Philadelphia, 1996) p.212
[96] D. Deutsch, Proc. Royal Soc. Lond. A 400, 97 (1985).
[97] A.C. Doherty et al, Distinguishing entangled and separable states.
quant-ph/0112007 (2001).
218
[98] F. Shafee, Aspects of quantum pattern recognition in Pattern Recognition Theory and Applications: ISBN: 1-60021-717-6 ed. E.A. Zoeller
(Nova Publishers, 2008).
[99] M.E. Newman, S.H. Strogatz, and D.J. Watts, Random Graphs with
Arbitrary Degree Distributions and Their Applications. Phys. Rev. E
64, 17 (2001).
[100] A. Barabasi and A. Reka, Emergence of scaling in random networks.
Science 286, 509-512 (1999).
[101] Z. Zhou, X. Zhou et al, Conditions for nondistortion interrogation of
quantum systems. Europhys. Lett. 58, 328 (2002).
[102] J. Monod, J. Wyman, & J.P. Changeux, J. Mol. Biol. 12, 88-118
(1965).
[103] H. Araki, and E.H. Lieb, Entropy inequalities. Comm. Math. Phys.
18, 160-170 (1970).
[104] D.R Chialvo and P. Bak, Learning from mistakes. Neuroscience 90,
1137-1148 (1990).
[105] P.T. Landsberg, Entropies galore! Brazilian Journal of Physics 29,
46–49 (1999).
219
[106] P. Grigolini, C. Tsallis and B.J. West, Classical and Quantum Complexity and Non-extensive Thermodynamics. Chaos, Fractals and Solitons 13, 367–370 (2001).
[107] A.R. Plastino, A. Plastino and C. Tsallis, The classical N-body problem
within a generalized statistical mechanics. J. Phys. A 27, 5707–5757
(1994).
[108] B.M. Boghosian, Phys. Rev. E 53, 4754 (1995).
[109] C. Anteneodo and C. Tsallis, Two-dimensional turbulence in pureelectron plasma: a nonextensive thermostatistical description. J. Mol.
Liq. 71, 255-267 (1997).
[110] V.H. Hamity and D.E. Barraco, Phys.Rev. Lett. 76, 4664 (1996).
[111] C. Beck, Nonextensive statistical mechanics and particle spectra, hepph/0004225 (2000).
[112] R.M. Corless, G.H. Gonnet,D.E.G. Hare, D.J. Jeffrey and D.E. Knuth,
On the LambertW function. Adv. Comput. Math. 5, 329-359 (1996).
[113] C. Wolf, Equation of state for photons admitting Tsallis statistics.
Fizika B 11, 1 (2002).
220
[114] E.J.W. Boers, H. Kuiper, B.L.M. Happel and I.G. SprinkhuizenKuyper, Designing modular artificial neural networks. In H.A. Wijshoff(Ed.),Proceedings of Computing Science in The Netherlands
(CSN’93), p. 87-96 (1993).
[115] C. Tsallis, A.R. Plastino, and W.-M. Zheng, Chaos, Solitons and Fractals 8, 885 (1997).
[116] M.L. Lyra, and C. Tsallis, Phys. Rev. Lett., 80, 53 (1998).
[117] A.R. Plastino and A. Plastino, Non-extensive statistical mechanics and
generalized Fokker-Planck equation. Physica A 222, 347-354 (1995).
[118] C. Tsallis and D.J. Barkman, Phys. Rev. E 54, R2197 (1996).
[119] L. Borland, Phys. Rev. E 57, 6634 (1998).
[120] M. Buiatti, P. Grigolini and A. Montagnini, Phys. Rev. Lett. 82, 3383
(1998).
[121] G. Kaniadakis, Nonlinear kinetics underlying generalized statistics.
Physica A 296, 405–425 (2001).
[122] G. Kaniadakis, Statistical mechanics in the context of special relativity.
Phys. Rev. E 66, 056125 (2002).
221
[123] J. Aczel and Z. Daroczy, Charakterisierung der Entropien positiver
Ordnung und der Shannonschen Entropie. Acta Math. Acad. Sci. Hungary 14, 95–121 (1963).
[124] J. Aczel, and Z. Daroczy, Sur la caractrisation axiomatique des entropies d’ordre positif, y comprise l’entropie de Shannon. Comp. Rend.
Acad. Sci. (Paris) 257, 1581–1584 (1963).
[125] M.D. Esteban and D. Morales, A summary on entropy statistics. Kybernetika 31, 337–346. (1995).
[126] Q.A. Wang, Entropy 5, 3 (2003).
[127] A.I. Kinchin, Mathematical Foundations of Information Theory, (Dover
Publications, New York, 1957).
[128] F. Shafee, Oligo-parametric Hierarchical Structure of Complex Systems. NeuroQuantology Journal 5, 85–99 (2007).
[129] S.R. Valluri, R.M. Corless and D.J. Jeffrey, Some applications of the
Lambert W function to physics. Can. J. Physics 78, 823–831 (2000).
[130] R.K. Pathria, Statistical Mechanics, (Butterworth-Heinemann, Oxford,
UK) (1996) p.77
222
[131] A. Vidiella-Barranco, Entanglement and nonextensive statistics. Phys.
Lett A 260, 335–339 (1999).
[132] S. Abe, and A.K. Rajagopal, Quantum entanglement inferred by the
principle of maximum Tsallis entropy. Phys. Rev. A 60, 3461–3466
(1999).
[133] S. Abe, Nonadditive entropies and quantum entanglement. Physica A
306, 316 (2002).
[134] A. R. Calderbank and P. W. Shor, Good quantum error-correcting
codes exist. Phys. Rev. A 54, 1098–1105 (1996).
[135] F. Verstraete and M.M. Wolf, Entanglement versus Bell violations and
their behaviour under local filtering operations. Phys. Rev. Lett. 89,
170401 (2002).
[136] E. Merzbacher, Quantum Mechanics 3rd ed. (John Wiley, NY, 1998)
p. 368.
[137] F. Shafee, Lambert function and a new non-extensive form of entropy,
IMA Journal of Applied Mathematics 72, 785–800 (2007).
223
[138] B.-Q. Jin and V.E. Korepin, Quantum spin chains, Toeplitz determinants and the Fisher-Harwig conjecture. J. Stat. Phys. 116, 79–95
(2004).
[139] F. Shafee, Generalized Entropy with Clustering and Quantum Entangled States. cond-mat/0410554 (accepted by Chaos, Solitons and Fractals) (2004).
[140] W. Bialek, C.G. Callan and S.P. Strong, Field theories for learning
probability distributions. Phys. Rev. Lett. 77, 4693–4697 (1996).
[141] D. Wolpert and D. Wolf, Estimating functions of probability distributions from a finite set of samples, Phys. Rev. E, 52, 6841–6854 (1995).
[142] E.T. Jaynes, Monkeys, Kangaroos, and N, University of Cambridge
Physics Dept. Report 1189 (1984).
224