* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Information and Entropy in Neural Networks and Interacting Systems
Basil Hiley wikipedia , lookup
Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup
Bell test experiments wikipedia , lookup
Renormalization group wikipedia , lookup
Quantum dot cellular automaton wikipedia , lookup
Scalar field theory wikipedia , lookup
Double-slit experiment wikipedia , lookup
Quantum field theory wikipedia , lookup
Measurement in quantum mechanics wikipedia , lookup
Quantum dot wikipedia , lookup
Particle in a box wikipedia , lookup
Copenhagen interpretation wikipedia , lookup
Delayed choice quantum eraser wikipedia , lookup
Bell's theorem wikipedia , lookup
Hydrogen atom wikipedia , lookup
Path integral formulation wikipedia , lookup
Quantum fiction wikipedia , lookup
Quantum decoherence wikipedia , lookup
Coherent states wikipedia , lookup
Many-worlds interpretation wikipedia , lookup
Algorithmic cooling wikipedia , lookup
Quantum electrodynamics wikipedia , lookup
EPR paradox wikipedia , lookup
Symmetry in quantum mechanics wikipedia , lookup
Density matrix wikipedia , lookup
Interpretations of quantum mechanics wikipedia , lookup
Probability amplitude wikipedia , lookup
Orchestrated objective reduction wikipedia , lookup
Quantum computing wikipedia , lookup
History of quantum field theory wikipedia , lookup
Quantum machine learning wikipedia , lookup
Quantum group wikipedia , lookup
Quantum key distribution wikipedia , lookup
Canonical quantization wikipedia , lookup
Hidden variable theory wikipedia , lookup
Quantum state wikipedia , lookup
Quantum teleportation wikipedia , lookup
Information and Entropy in Neural Networks and Interacting Systems Fariel Shafee A DISSERTATION PRESENTED TO THE FACULTY OF PRINCETON UNIVERSITY IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY RECOMMENDED FOR ACCEPTANCE BY THE DEPARTMENT OF PHYSICS Adviser: Self January 2009 c °Copyright by Fariel Shafee, 2009. All rights reserved. Abstract In this dissertation we present a study of certain characteristics of interacting systems that are related to information. The first is periodicity, correlation and other information-related properties of neural networks of integrate-andfire type. We also form quasiclassical and quantum generalizations of such networks and identify the similarities and differences with the classical prototype. We indicate why entropy may be an important concept for a neural network and why a generalization of the definition of entropy may be required. Like neural networks, large ensembles of similar units that interact also need a generalization of classical information-theoretic concepts. We extend the concept of Shannon entropy in a novel way, which may be relevant when we have such interacting systems, and show how it differs from Shannon entropy and other generalizations, such as Tsallis entropy. We indicate how classical stochasticity may arise in interactions with an entangled environment in a quantum system in terms of Shannon’s and generalized entropies and identify the differences. Such differences are also indicated in the use of certain prior probability distributions to fit data as per Bayesian rules. We also suggest possible quantum versions of pattern recognition, which is the principal goal of information processing in most neural networks. iii Contents Abstract iii List of Figures vii List of Tables xii Acknowledgements xiv 1 Prolegomena 1 2 Integrate-and-fire Networks with Finite-width Action Potentials 8 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Integrate-and-fire models . . . . . . . . . . . . . . . . . . . . . 11 2.3 Convergence to periodicity in the finite width case . . . . . . 15 2.4 Rate of convergence to limit cycle . . . . . . . . . . . . . . . . 18 2.5 Nonleaking networks with finite width action potential . . . . 22 2.6 8 2.5.1 Effect of pulse shape . . . . . . . . . . . . . . . . . . . 23 2.5.2 Convergence rate and the value of A . . . . . . . . . . 25 2.5.3 Effect of region of initial excitation . . . . . . . . . . . 26 2.5.4 Synchronicity, bin-size and dynamic entropy . . . . . . 28 Leaking networks with finite action potential . . . . . . . . . . 30 iv 2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3 Neural Networks with Quantum Interactions 37 3.1 Quantum Interaction between Nodes and Coherence Scale . . 39 3.2 Technological Realization of Quantum ‘Neurons’ and ‘Action Potentials’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Scales of Coherence . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4 Time scales of quantum gates and system evolutions . . . . . . 46 3.5 3.4.1 Quasiclassical Net . . . . . . . . . . . . . . . . . . . . . 49 3.4.2 Quantum-Gated Net . . . . . . . . . . . . . . . . . . . 49 3.4.3 Completely Entangled Net . . . . . . . . . . . . . . . . 50 Difference between our Quantum Models and Existing Models 4 Quasiclassical Neural Network Model 4.1 51 53 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1.1 Comparison with the Hopfield Model . . . . . . . . . . 54 4.2 Quantum Transitions . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 Some Analytically Soluble Cases . . . . . . . . . . . . . . . . . 58 4.3.1 Constant External V, No Interneuron Interaction . . . 59 4.3.2 Harmonic External V, No Interneuron Interaction . . . 60 4.3.3 Exponentially Damped External V, No Interneuron Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3.4 Damped Harmonic External Potential, No Interneuron Interaction . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3.5 No External Potential, Constant Interneuron Interaction 62 4.3.6 Constant External Potential and Constant Interneuron Interaction . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.7 No External Potential, Damped Harmonic Interneuron Interaction 4.4 . . . . . . . . . . . . . . . . . . . . . . . . 63 Quasiclassical Hopfield-Herz type Neural Network . . . . . . . 64 v 4.5 Input Dependence . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.6 Results of Simulation . . . . . . . . . . . . . . . . . . . . . . . 67 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.8 Possible Applications . . . . . . . . . . . . . . . . . . . . . . . 75 5 Quantum-gated Neural Networks 77 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2 Hierarchical Structure in Quantum Machines . . . . . . . . . . 79 5.3 Brief Review of Elements of a Quantum AI Machine: Qubits and Quantum Gates . . . . . . . . . . . . . . . . . . . . . . . 83 5.3.1 Qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3.2 Quantum Gates . . . . . . . . . . . . . . . . . . . . . . 85 5.4 The Quantum Neural Network Model . . . . . . . . . . . . . . 92 5.5 Results of Simulation . . . . . . . . . . . . . . . . . . . . . . . 97 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6 Fully-entangled Neural Networks Model 107 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.2 An Entangled Quantum Network Model . . . . . . . . . . . . 108 6.3 Periodic and Aperiodic Regimes . . . . . . . . . . . . . . . . . 111 6.4 Simulation Results and a Modified Gate . . . . . . . . . . . . 112 6.5 Creation and Detection of Entangled States . . . . . . . . . . 115 6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.7 Patterns in Entangled Quantum States . . . . . . . . . . . . . 119 6.8 6.7.1 Qubit Pattern Generation/Representation . . . . . . . 119 6.7.2 Qubit Pattern Recognition . . . . . . . . . . . . . . . 120 Learning Quantum Patterns . . . . . . . . . . . . . . . . . . . 123 7 Generalization of Entropy 7.1 126 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 vi 7.2 Entropy of a Neural Network . . . . . . . . . . . . . . . . . . 128 7.3 Defining the New Entropy . . . . . . . . . . . . . . . . . . . . 131 7.4 Applications of the New Entropy . . . . . . . . . . . . . . . . 137 7.5 Probability Distribution for the New Entropy . . . . . . . . . 142 7.6 Probability, Lambert Function Properties and Constraints . . 145 7.7 Numerical Comparison . . . . . . . . . . . . . . . . . . . . . . 149 7.8 Free Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.9 Shannon and Tsallis Thermodynamics . . . . . . . . . . . . . 154 7.10 Thermodynamics of the New Entropy . . . . . . . . . . . . . . 155 7.11 Application to a Simple System . . . . . . . . . . . . . . . . . 157 7.12 Summary of the Mathematical Properties of the New Entropy 161 8 Generalized Entropy and Entanglement 8.0.1 164 Entangled and Pure States . . . . . . . . . . . . . . . . 166 8.1 Stochasticity from Entanglement with Environment . . . . . . 169 8.2 Entanglement, Entropy and Mutual Information . . . . . . . . 171 8.2.1 Single System Interacting with Environment . . . . . . 171 8.2.2 Entangled Systems interacting with the Environment . 173 8.3 Quantum Pattern Recognition and Mutual Information . . . . 182 8.4 Entanglement and Mutual Information . . . . . . . . . . . . . 183 9 Prior PDFs, Entropy and Broken Symmetry 190 9.1 Priors and Moments of Entropy . . . . . . . . . . . . . . . . . 192 9.2 Comparison of Shannon and Our Entropy . . . . . . . . . . . 194 9.3 Results for Mean Entropy . . . . . . . . . . . . . . . . . . . . 195 10 Illation and Outlook 202 Bibliography 205 vii List of Figures 2.1 Square pulse with width w = 0.01, A = 0.96, I = 1. . . . . . . 24 2.2 Same as Fig. 2.1, but with a triangular pulse. . . . . . . . . . 24 2.3 Square pulse, with A = 0.24, I = 1, w = 0.2 . . . . . . . . . . 25 2.4 Same as Fig. 2.3, but with a triangular A.P. pulse. . . . . . . 26 2.5 As Fig. 2.2, but with peripheral initial excitation. . . . . . . . 27 2.6 As Fig. 2.4, but with peripheral initial excitation. . . . . . . . 27 2.7 As Fig. 2.2, but with a time step 10 times greater. . . . . . . . 28 2.8 As Fig. 2.2, but with leaking neurons with R = 10. . . . . . . 31 2.9 First firing of a net of leaking neurons, with a resistance just above the critical value of R = 1. . . . . . . . . . . . . . . . . 33 2.10 As above, showing successive peaks with growing synchronicity. The first firings shown in detail in the previous figure are the small clump at around t = 2.8. . . . . . . . . . . . . . . . 33 viii 3.1 (a) In classical H-H network each node receives a current k from each nearest neighbor which has fired; (b) In our first model of quasiclassical network, every node is a qubit-like object, but sees its neighbors as decohered classical sources of potential; (c) In our second model, a quantum gated one, every qubit interacts with its nearest neighbors seen as qubits, but the coherence length is the interqubit distance. (d) In our third model of completely entangled network all qubits are in coherence and the gates also maintain the coherence. . . . . . 47 4.1 V0 = 0.2, width = 0.2, k = 0.2: typical pattern of the triggering of the neurons in the quasiclassical neural network. There is apparently no phase locking. . . . . . . . . . . . . . . . . . . 68 4.2 Cumulative number of triggering against time. One can see a fairly regular linear behavior despite quantum stochasticity. This is for a single chosen neuron. . . . . . . . . . . . . . . . . 68 4.3 Same as Fig 4.2, but for the whole system. . . . . . . . . . . . 69 4.4 Transition from short term behavior to asymptotic behavior with all peripheral nodes initially in state 1. . . . . . . . . . . 71 4.5 Same for peripheral nodes initially in states |1i and |0i alternately. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.6 Same for peripheral nodes initially in random states of excitation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1 A train of action potential pulses in a biological neural system. 81 5.2 Oscillations of the c part of a qubit for a non-cutoff model with ² = 0.01. This is the quantum amplitude for the state |1i. The modulus squared of this quantity gives the probability for a particular measurement to find the state to be |1i, though individual experiments may give either 0 or 1. . . . . . . . . . 99 ix 5.3 Oscillation of the c parts summed of all qubits in the network for cthresh = 0.7, ² = 0.01. All boundaries excited initially. . . . 100 5.4 Correlation between two qubits for no threshold case with ² = 0.01, h10, 10|20, 21i, where the qubits are located by their (x, y) coordinates in the lattice. . . . . . . . . . . . . . . . . . 102 5.5 Correlation h10, 10|20, 21i for the above case. . . . . . . . . . 103 5.6 Variation of periodicity with ² for excitations from all sides. . 104 6.1 Quantum network for Bak type training for pattern recognition. The intermediate and final qubits are shown integrated with OR gates to sum the contributions of the qubits connected behind. The circular gates are rotation gates. The curved feedback information paths that control the gates’ rotations are shown only for two gates for clarity. . . . . . . . . 125 7.1 W0 (z) is real along the real axis from −1/e to ∞; the value of W0 ranges from −1 to ∞; we do not show the W−1 (z) branch, which is real from z = −1/e to z = 0, because it is not suitable for our entropy, as explained in the text. . . . . . . . . . . . . 146 7.2 Comparison of the pdf for the new entropy for values of q = 1, 1.1, 1.2 and 1.3. The solid line is for q = 1, i.e. the Gibbs exponential distribution and the lines are in the order of q . . 149 7.3 Comparison of pdf’s for Tsallis nonextensive entropy (solid line) and the new entropy presented here, for q = 1.1 . . . . . 151 7.4 The same as Fig. 7.3, but for a higher q = 1.3 . . . . . . . . . 151 7.5 A for Shannon (top), Tsallis with ² = 0.1 (middle) and Tsallis with ² = 0.25 (bottom) . . . . . . . . . . . . . . . . . . . . . . 158 7.6 S for same three entropy forms (from bottom to top Shannon, Tsallis (0.1), Tsallis (0.25)) . . . . . . . . . . . . . . . . . . . . 158 7.7 U for same three entropy forms (same order as for S) . . . . . 159 x 7.8 C for same three entropy forms (peaks bottom to top - Shannon, Tsallis (0.1), Tsallis (0.25)) . . . . . . . . . . . . . . . . . 159 7.9 Comparison of A for Shannon (apart), Tsallis with ² = 0.1 and new entropy for ² = 0.05 (superposed) . . . . . . . . . . . 160 7.10 Comparison of S for same three forms of entropy (Shannon, Tsallis (0.1), new entropy (0.05) - Tsallis is just over the new entropy). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 7.11 Comparison of U for the same three entropies. Shannon is separated, other two overlap. . . . . . . . . . . . . . . . . . . . 161 7.12 Specific heat C for same three entropies. Again, only Shannon is separated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 8.1 IAB as a function of entanglement angle θ in A-B space and the entanglement angle θ0 with the environment, which is related to the stochasticity. . . . . . . . . . . . . . . . . . . . . . . . . 176 8.2 Difference of MI from our entropy with that from Shannon entropy at θ = π/4. . . . . . . . . . . . . . . . . . . . . . . . . 179 8.3 Same as Fig. 8.2 with θ0 = π/4 . . . . . . . . . . . . . . . . . 179 8.4 MI difference between our entropy form and Shannon for q = 0.7180 8.5 Same as Fig. 8.4 but for q = 1.3 . . . . . . . . . . . . . . . . 180 8.6 Difference between MI from our entropy and Tsallis’s with θ = π/4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 8.7 Same as Fig. 8.6 but with θ0 = π/4. . . . . . . . . . . . . . . 182 9.1 Ratio of expected value (first moment) of new entropy plotted against bin number K and prior exponent β for entropy parameter q = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . 196 9.2 Same as Fig. 9.1, but for q = 1.0, i.e. Shannon entropy. . . . 196 9.3 As previous two figures, but for q = 1.5. . . . . . . . . . . . . 197 xi 9.4 Clearer view in 2-dimensional plot, with K = 10. Red, green and blue lines are for q = 0.5, 1.0 and 1.5 respectively . . . . . 198 9.5 As Fig. 9.4, but for bin number K = 100. . . . . . . . . . . 199 9.6 As previous two figures, but for K = 1000. . . . . . . . . . . . 199 9.7 Our entropy for a three-state system, with parameter q = 2.44, as two independent probabilities p1 and p2 are varied wit the constraint p1 + p2 + p3 = 1. The expected maximum at the symmetry point p1 = p2 = p3 turns out to be a local maximum. The global maxima are not at the end points with one of the probabilities going up to unity and the others vanishing, which gives zero entropy as expected, but occurs near such end points, as shown clearly in the next figure. . . . . . . . . . 201 9.8 Two-dimensional version of the previous Fig. 9.7, with p1 = 1/3 fixed, so that only p2 varies. This shows a clearer picture of local maximum at the symmetry point and global maxima near the end points. . . . . . . . . . . . . . . . . . . . . . . . . 201 xii List of Tables 2.1 Time period for leaking networks . . . . . . . . . . . . . . . . 34 4.1 Strength of Quantum Potential and Average Period of Neurons (width = 0.2, V0 = 1): Best Fit . . . . . . . . . . . . . . . . . 70 4.2 Variation of Period with Duration of Quantum Potential (v = 0.2, V0 = 1): Best Fit . . . . . . . . . . . . . . . . . . . . . . . 70 xiii Acknowledgements I am grateful to Professor J.J. Hopfield for teaching me about neural networks. I am indebted to Professor W. Bialek and Dr. I. Nemenman for their help in learning about entropy and priors. I thank Professor H. Rabitz and Dr. Ignacio Sola for many discussions that made me aware of the role of quantum control in small systems. Lastly, I acknowledge with gratitude the support of Professor C.G. Callan in finishing this work. xiv Chapter 1 Prolegomena This work deals with interacting complex systems where information and entropy play key or important roles. Possibly the most complex system known to us is the human brain. We now interpret the activities of the brain in terms of neural networks, i.e. the firings of nodes (depolarization-repolarization cycles of nerve cells) that are connected in a fashion evoking different patterns of oscillations in different regions, or sub-nets interlinked in a complex way that keeps changing. Since the strength of sensation, or of reaction, is dependent on the frequency of the pulses, periodicity of oscillatory behavior and its variation or absence are important attributes of an organic neural network. It has been known for a long time that aggregate periodicity of the neurons in the central nervous system (CNS) in various forms [1], such as 1 α, β, γ and δ waves can indicate the overall state of alertness of a human being. Artificial neural networks (ANN) may also try to mimic these features, as in many cases it may be more convenient than a simple sequential set of one-time transformations. Short-term memory, in particular is refreshed periodically both in biological neural networks and in ANN. In current neuroscience research the spectral analysis of signals in correlated circuits is an important theoretical tool in understanding the extent and circumstances of the correlations [2, 3]. In the biological systems many processes have a rhythmic character, such as heartbeats, respiration, diurnal sleep/wake patterns, contraction of muscle filaments, which may be assigned to periodic behaviors of the controlling centers in the neural system [4, 5, 26, 6, 7, 8]. Despite having its own oscillatory neuro-muscular circuits, the heart receives pulses periodically [10, 7] according to the need. The diaphragm also contracts and relaxes as the oxygen need felt by the CNS sends signals of the required periodicity [10, 9]. The actin myosin protein molecules slide on each other on the muscles at a rate determined [26] by motor nerve signals from the neural system to suit the rate of contraction, which produces the required pulling force. Short-term memory is not due to changes in firmware in the brain, but due to periodic refreshing [6] as in dynamic Random Access Memory (RAMs) of a computer. The enjoyment of music or the meaning of the spoken word 2 depends on periodic repetitions [8] of the sound patterns inside the CNS. Periodic behavior in sections of a biological neural network is an essential feature of intelligent life, and is a subject of interest in its own right. It may also be a model for types of artificial devices to carry out similar tasks. Many years ago Hopfield and Herz [11, 12] constructed an apparently very simple model of a neural network, where neurons fire [14, 15] when they receive a specified quantity of total charge from their nearest neighbors, just as biological neurons also fire when they receive the requisite quanta of chemical neurotransmitters across the synapses. This integrate-and-fire was an over-simplified version of the biological system, as it was bidirectional and the connectivity weights were kept uniform for simplicity, in addition to restricting the active contributors to nearest neighbors geometrically. However, the most interesting property of this simplified ANN was a uniform periodicity of the firing of the nodes (here the discharge of capacitors) and the attainment of phase-locking throughout the network asymptotically, creating correlations in phase. Hence, despite its simple nature and departures from the biological systems, it is intrinsically interesting in studying the properties of periodicity and correlations from the most basic assumptions. One can then build upon this model to obtain greater complexities, more realistic biologically, or more 3 useful in ANN research. Among other idealizations of the original HopfieldHerz (H-H) model was an action potential of zero width, i.e. a δ(t − t0 ) time dependence, which would be unrealistic both biologically, as well as in ANN, as signals need a finite time to form to full strength and then decay. In this dissertation, we have, therefore, carefully examined the effect of a finite width of various shapes on the results obtained in [12], i.e. whether we get periodicity and phase-locking, and if they are dependent on the width of the action potential or its ANN analog. To have sufficient device density, so that a great amount of information can be retained or processed in a compact low power assembly, it appears that ANN must eventually resort to a scale where the laws of quantum mechanics take over [16]. This may take place in stages, with the smallest subunits first needing quantum treatment, with a quantum redefinition of the classical “bit”, then possibly the application of quantum gates, some of which already exist in reality, and finally in a completely coherent (in the quantum sense) machine performing algorithms not permitted in the classical context. We have, in this dissertation, compared the prototypical H-H model with this sequence of quasiclassical, quantum-gated and completely quantum-entangled networks, mostly with respect to the periodic behaviors and correlations. Chapters 2, 3 and 4 deal with these results. 4 Though Kolmogorov complexity [13] (alternatively entropy) enters in the discussion of most nontrivial systems, including even the simple H-H network, with or without a finite width action potential as we shall show, at finite temperature the presence of random collisions or interactions among units in different states makes entropy a vital quantity in the determination of the probabilistic distribution of states among the units, which is related to presence or lack of information due to the uncertainty associated in any probabilistic arrangement. On the other hand, if the interactions are not entirely random, but have a deterministic bias, then the measurement of uncertainty needs to be modified. This was realized by the proponents of alternative forms of entropy such as Renyi [17] or Tsallis [18]. The relation between entropy and information was clarified by Shannon’s coding theorem [16]. This involves the “phase space” volume of the bits transferred in a random stream following a probability distribution. However, with interaction, we might anticipate a deformation of this volume. Using this approach, we arrive at a new form of entropy where the bits occupy cells with a dimensionally modified volume. In this dissertation we present work where the most likely probability distribution (for different energy states) corresponding to such a deformed form of entropy is used, and show that an analytically exact form is obtainable, making it an attractive expression to work with. However, 5 it is necessary to check if such a modification can lead to perceptible differences in the related statistical mechanics of the system for which it is used. We shall present such comparisons in Chapter 5, where we shall also include other forms of generalized entropies to note the similarities and differences and indicate where our form may be more appropriate than others. As our interest is in small information processing systems, or the information content of small systems, with quantum effects to be prominent, we later present our work where we indicate how from a pure quantum state involving the environment we can obtain the classical measures of stochasticity, i.e. the impurity factor in the quantum density matrix. We do this for our form of the entropy and also other generalized forms, and Shannon entropy. Chapter 6 deals with this aspect of our work. Mutual information, quantum entanglement and stochasticity all become related in this formalism with different quantitative implications for different choices of the form of energy. However, ultimately probability distributions can only be formed from actual data. Some prior knowledge, possibly theoretical constraints or expectations, can help in the formulation of more accurate posterior distributions, where the number of possible states is high and the data sparse. However, it was found by us [19] that such a direct approach is often fraught with the 6 danger of insensitivity. We show in Chapter 7 how our form of entropy, by introducing one more parameter, and asymmetric weighting that modifies the conventional concept of a symmetric distribution for a maximal entropy, may offer a useful path. 7 Chapter 2 Integrate-and-fire Networks with Finite-width Action Potentials 2.1 Introduction Models of neural networks are important to understand biological information processing and storage systems, and also because their relation to many physical systems shed new light on those nonbiological complexes. It is wellknown that associative memory networks [11, 12] share many common features with spin glasses [20, 21]. Artificial neural networks may one day pave 8 the way for creating computers that are more akin to organic pattern recognition systems, and hence more efficient in tasks at present not addressable by mechanistic serial processing. Associative memory networks [22, 23, 24, 25, 26, 27, 28] are usually conceived as static patterns holding memories of inputs, but in recent years there is growing interest in systems that hold memories of inputs in a dynamic fashion too, in the form of phase-locked oscillations. Several different type of models have been presented [26, 29] with different assumptions about their modus operandi. The differences may be in the ways the neurons are triggered, in the ways they convey their information to other neurons, in the fashion they are interconnected, and other conjectures about the role of the environment. Among these models are included a set of relatively simple networks of integrate-and-fire neurons conceived by Hopfield and Herz [12], whose work shows that many interesting nontrivial predictions regarding the dynamic behavior of such networks could be made from their simple properties. However, in their work, to maintain simplicity, they assumed that the action potential is in the form of a delta function in time, i.e. it is instantaneous. But in some of the models they present, when a neuron fires on attaining threshold from part of the action potential (AP) from a neighbor, the re- 9 maining excess charge from the neighbor may remain stored and contributes to the next firing of the receiver, which in a sense mimics the effects of a finite width, without involving quantitative complications that an explicit finite-width model would entail. Experiments confirm that the shapes of the action potentials are indeed of finite width and vary in different regions [30, 31, 32]. A more realistic approach, however, cannot avoid dealing with all the consequences of the finite width of the AP, which is a biological fact and must also be an inevitable feature of most physical realizations. A priori, these may be in the form of changed periods, different convergence behavior to limit cycles or modified rates of convergence, different paths to globalization, or its absence altogether etc. Specialized cases have been studied including time delays [33] or specifically shaped action potentials [34, 35, 36]. The width introduces its own time scale that can be expected to interact with the time scale of the zero-width period obtained in the H-H models, which in turn depends on the strength of coupling among the neurons and the environmental current. It also introduces the possibility of a more complicated distribution of the action potential current to the receiving neuron, affecting its attainment of threshold, i.e. its period and properties related to convergence to limit cycles. 10 We have, therefore, investigated the consequences of introducing finite widths into the integrate-and-fire models. Where possible we have tried to use analytical methods. Where that turns out to be impossible, we have used simulation. In the next section we first describe different types of integrateand-fire models to establish the concepts and notation used in this work. In section 2.3 we show that the convergence to periodic behavior for Class C models can be analytically proved even in the finite width case. In section 2.4 we present our simulations with class C models using APs of various finite widths and different shapes. In section 5 we do the same with the leaking model A. Finally, in section 6 we present our conclusions. 2.2 Integrate-and-fire models Integrate-and-fire neurons are usually considered in four different scenarios [12]. In each case every neuron in the network receives some charge from other neurons in the neighborhood, and also from an environmental current source. This raises the electrostatic potential of the neuron. When the neuron has accumulated sufficient charge and its potential exceeds a threshold, it fires, i.e. it gives forth an AP to the other neurons, which too may be triggered in due course. If all neurons attain the same periodicity, the phase difference 11 among them must become constant, i.e. the system becomes phase-locked. It is also possible for some of the neurons to fire simultaneously (synchronicity) if they reach the limit cycle together. Let ui (t) be the potential of the i-th neuron at time t, Ii (t) the external current input into this neuron, R its leaking resistance and Tij fj (t) the action current getting into neuron i from neuron j. We can then write the current conservation equation as: Cdui /dt = −ui (t)/R + X Tij f (t) + Ii (t) (2.1) j For simplicity with no loss of generality we take the threshold for firing to be u = 1, by renormalizing u to the dimensionless quantity u/uthreshold . We can also divide the equation by C and absorb it in R and I, in which case R actually represents CR, the decay time constant, and I represents I/(Cuthreshold ), which has the dimension of inverse time. In other words, in our notation R and 1/I will set two different time scales. In most of our work we shall assign the value 1 to I to represent the scale of an external clock against which all internal dynamics is timed. In the H-H work the action pulse shape f (t) is a Dirac delta function, i.e. it is a zero-width normalized pulse: 12 fj (t) = δ(t − tj ) (2.2) where tj is the time of the last firing of neuron j. The four models differ in the treatment of the leaking as represented by R and in the way they handle the after-fire potential ui . In Model A, R is finite and may be taken to be equal to 1. This establishes the time scale CR. In this model if neuron i receives a synaptic charge in excess of that required to take the potential to the threshold, then, after firing, ui (t) resets to a value corresponding to the excess charge (we omit the implied symbol for summation over neighbors j for clarity): ui (t+ ) = [ui (t− ) + ∆Tij 1] + (1 − ∆)Tij = (1 − ∆)Tij (2.3) ∆Tij = 1 − ui (t− ) (2.4) where In Model B, the potential resets to zero after firing, irrespective of the oncoming synaptic current and the previous state of the neuron. It too has a leaking R = 1. Model C is the version of A that does not leak, i.e. R = ∞, and Model D is the non-leaking version of B. 13 One can integrate the model C potential since the last firing at t0 gives, assuming for simplicity constant external current I: X ui (t) = ui (t+ ) + Tij 0 + I(t − t0 ) (2.5) j0 where the summation is over only those other neurons j 0 that have fired since t0 . It is convenient to use a simple topology of the network by assuming only nearest neighbor interactions. Then, after phase-locking has been established with period P , we must have: u(t0 + P ) = 1 = X Tij + IP (2.6) j So P = (1 − A)/I (2.7) where for notational convenience, assuming constant Tij , we get: A= X Tij = Zα (2.8) j Z being the co-ordination number of the neural lattice, i.e. the number 14 of nearest neighbors. In Model A it is not possible to integrate Eqn. 2.1, because the integral depends on the specific times the contributions from the different neighbors are received. This uncertainty results from the leakage, which is proportional to the instantaneous potential. 2.3 Convergence to periodicity in the finite width case When the action potential occupies a finite width we have to use the pulse form factor f with Z t0 +∆t t0 dtf (t) = 1 (2.9) Here to is the time of the beginning of a firing and ∆t is the width of the action potential and the consequent synaptic current. As in the case of zero width action potentials, we can show that in the finite width case too the system moves to a limit cycle with phase locking among the neurons. We shall do it for the nonleaking model C. We present the proof in three parts. I: The time between the beginning of two successive firings of the same 15 neuron cannot be less than P = (1 − A)/I, where the variables are as defined in Eqn. 2.7 and Eqn. 2.8. Proof: Let i be the neuron that has the smallest interval between two firings (there may be other neurons with the same least interval). Let the two firings begin at times t1 and t2 . Let us consider any neighbor j of i. Since, i has the least interval between successive firings, j cannot begin to fire twice within this interval. Hence, i can receive at most fractional charges from two firings of j within this interval, of which the beginning of the second firing t02 can be within the interval, but not the earlier one t01 . So t2 − t1 ≤ t02 − t01 (2.10) t2 − t02 ≤ t1 − t01 (2.11) i.e. The total contribution received from j by i within this interval is: αj = Z s2 ∆t 0 f (t)dt + where 16 Z ∆t s1 ∆t f (t)dt (2.12) s1,2 = (t1,2 − t01,2 )/∆t. (2.13) As, s2 ≤ s1 , we see that αj ≤ α (2.14) ensuring that the period for i, Pi ≥ P . II: The system converges to a limit cycle for the finite width case: Proof: The Lyapunov function E is defined by [12] E = − P ui . Then its change after an interval P given by Eqn. 2.7 is E(t + P ) − E(t) = = X i (−IP − X j i t+P Z Tij X t X [−ui (t + P )] − fj (t)dt) + = −(1 − A)N + (1 − A) = −(1 − A)[N − XZ i t i t [−ui (t)] i t+P X Z t+P X Z t+P i t fi (t)dt fi (t)dt fi (t)dt] (2.15) As no neuron can begin to fire twice in an interval of less than P , E(t + P ) − E(t) is nonpositive, as in the case of zero width AP. Hence, like the zero width case, we have a convergence to a limit cycle. 17 III: If there is a periodicity, the period must be P = (1 − A)/I. Proof: Firstly, by part 1, it cannot be less than P , because no neuron can ever begin to fire again within an interval less than P . Indeed if we now assume that a periodicity T has been established, i.e. all neurons have become phase locked with equal phase difference in all periods, then, if ti is the last time neuron i began to fire: ui (ti + T ) = 1 = P j Tij R t+T t fi (t)dt + IT = A + [(1 − A)/P ]T . The last equality follows from the fact that if i receives a fraction of a pulse from a neighbor j at the beginning of the cycle, it will receive the rest at the last part of its cycle because of phase lock. i.e. R t+P t fj (t)dt = 1 for all j. So T = P . 2.4 Rate of convergence to limit cycle An interesting observation can be made for the speed of convergence to the limit cycle with period P . Let us consider, for simplicity, finite pulses of square shape. Then the contribution from a neighbor j to a neuron i will be proportional to the fractional duration of the pulse received by i within its (variable period) cycle. Again, for simplicity let us consider only one neighbor. Then, if Pn is the period of the n-th cycle, and fn is the duration of the pulse received by i, and w is the width of the pulse, then, with P given 18 by Eqn. 2.7, Pn+1 = [1 − (A/w)(w − fn ) − Afn+1 /w]/I = P + A/(wI)[fn − fn+1 ] (2.16) Now, writing ∆Pn = Pn − P (2.17) we get ∆Pn+1 /∆Pn = A/[Iw + A] (2.18) where we have made use of the relation between the difference of successive pulse fractions fn with the difference between successive periods Pn , obvious for square pulses. We can see that convergence to the limit cycle is a geometric sequence and the rate of convergence depends on the A/(Iw +A) ratio, which of course is dimensionless in our choice of notation. For fixed w a higher current would 19 bring in faster convergence. It may seem that for the zero width case we have no convergence at all. However, in this singular case we cannot split the pulse into two parts as in Eqn. 2.16, so that we must have fn = 1, and ∆Pn = 0 for all n, i.e. the system goes into phase lock after the completion of the first set of firings. As we pointed out in the beginning, this relation is derived assuming a single neighbor. If Z > 1, then in general the different neighbors can contribute with different phase differences, which may keep changing until phase lock. Eqn. 2.17 will then need to be modified to account for these differences. If r of the Z neighbors contribute fully within a single period, then we have: ∆Pn+1 /∆Pn = A0 /(Iw + A0 ) (2.19) where we have used the symbol A0 = (1 − r/Z)A (2.20) Therefore, when all the neighbors contribute their full action potential within a single cycle, the period becomes P . However, the inverse is not true, i.e. it is possible to distribute the charge from a single AP to succeeding cycles of the neighbor and the whole system can still be in total phase-lock 20 with period P , because the R.H.S. of Eqn. 2.19 becomes indeterminate. In the case of pulses of arbitrary shapes it is easy to see that we get: ∆Pn+1 = X (αj /I) + j Z Pn+1 Pn f (t)dt (2.21) where the sum runs over only those neighbors that do not yet deliver their full synaptic current in a single cycle. This is an integral equation depending on the shape function f (t). Despite the arbitrariness of f (t) we can see that if ∆Pn+1 → ∆Pn , then both must approach zero, i.e. the sequence must be convergent to the limit cycle. If ∆Pn+1 /∆Pn → r < 1, then too the convergence is obvious. If as a first approximation we replace the arbitrary pulse by a square one of same height but a width giving unit area, then we would expect a geometric convergence similar to Eqn. 2.18- 2.20. The details of the pulse shape may produce smaller perturbations on the rate of convergence without affecting the general pattern. 21 2.5 Nonleaking networks with finite width action potential We simulate a square lattice with periodic boundary conditions to reduce finite size effects. It has been noted that a 40 X 40 lattice is sufficient to demonstrate all the important characteristics. At each step the neurons are updated according to the charge they receive from their neighbors and the external current during the time loop and then, if the potential reaches threshold, it fires. The effect of the finite width of the action potential can be two-fold: (1) reduction of u over a number of time loops corresponding to the width of the AP after a neuron begins to fire, and (2) the arrival of the synaptic current Tij f (t) over a number of time loops according to the shape of the packet f (t). We do not expect any role for ectopic (i.e. out of place) pulses resulting from recharging to threshold before the previous firing is completed, though the synaptic current arriving at a neuron while it discharges will be retained for the next firing. Hence, in the nonleaking models it is unnecessary to model the actual shape of the fall of u. The results will be indistinguishable from those of a zero-width model in such models, though the situation in leaking models will be quite different because the leakage will depend on the falling potential. The distribution of the synaptic 22 current over multiple time loops, however, will carry the nontrivial differential characteristics of finite width action potentials in all models. In this section we consider only a nonleaking model. We have in our algorithm provision for generating pulses of various shapes in the form of discretized f (ti ) over the time loops during which the action pulse works. Arbitrary shapes can also be given as input. However, we have only done the simulations with square and isosceles triangular shapes for simplicity. 2.5.1 Effect of pulse shape In Fig. 2.1 and Fig. 2.2 we present our simulations with A = 0.96 (i.e. Tij = 0.24), I = 1 and width = 0.01 (which is 1/4 of P ) using first a square pulse (Figure 2.1), and then a triangular one (Figure 2.2). We note that indeed in both cases the period 0.04 is equal to P [= (1 − A)/I], as proved earlier. However, there is a slight difference between them in their rate of convergence to the limit cycle; the square pulses almost converge after t = 4, whereas for the triangular ones a similar level of convergence happens after t = 6. This can be expected on account of the simplicity of the square pulse allowing it to adapt itself more easily for phase-locking. Though only a few complete periods are shown in the figures, convergence can be deduced, despite the complexity of the pattern in each period. When successive periods 23 show the same complex pattern, since any time development depends on the previous step only, it is bound to develop in the same way later too. 16 14 no of firings 12 10 8 6 4 2 0 5 5.02 5.04 5.06 5.08 5.1 5.12 5.14 5.16 time Figure 2.1: Square pulse with width w = 0.01, A = 0.96, I = 1. 30 no of firings 25 20 15 10 5 5.9 5.92 5.94 5.96 time 5.98 6 6.02 Figure 2.2: Same as Fig. 2.1, but with a triangular pulse. 24 2.5.2 Convergence rate and the value of A In Figure 2.3 and Figure 2.4 we show the effect of changing the value A, keeping the same ratio of the width with the corresponding period P . Figure 2.3 is for a square pulse and Figure 2.4 is for a triangular one. Of course, varying A changes the period. But the relative magnitudes of A, I and w vary according to Eqn. 2.18- 2.21. A higher contribution from the neighboring neurons than the constant external driver seems to delay convergence, which may be expected on account of the more intricate interactions in phase adaptation of the neighboring neurons than in their susceptibility for acting cohesively with a common external driver. For A = 0.24, i.e. α = 0.06, we get practically immediate convergence after all the neurons have fired for the first time. This feature is common for both shapes of pulses. 12 no of firings 10 8 6 4 2 0 0 0.5 1 1.5 2 2.5 time 3 3.5 4 Figure 2.3: Square pulse, with A = 0.24, I = 1, w = 0.2 25 18 16 no of firings 14 12 10 8 6 4 2 0 0 0.5 1 1.5 2 2.5 time 3 3.5 4 Figure 2.4: Same as Fig. 2.3, but with a triangular A.P. pulse. 2.5.3 Effect of region of initial excitation The network can be made to receive an initial excitation at the periphery, or throughout the network. In Figure 2.5 we repeat the experiment of Figure 2.2, with the same parameters, but with only the peripheral neurons initially excited. We see that the general pattern remains the same, and cannot be distinguished from the whole net excitation of Figure 2.2. However, in Figure 2.6 we have done the same with the smaller A(= 0.24) case, first studied with whole net excitation in Figure 2.4. There is remarkable synchronicity in this case, the great majority of the neurons firing together periodically. We can understand what is happening here if we picture the initial development of the system. All the inner neurons are initially uncharged and they do not receive pulses from neighbors until their first firing at t = 1, which is caused 26 simultaneously by the common driver I. Henceforth, all these inner neurons remain phase locked. By contrast for A = 0.96, the external current has little role in charging up the neurons, and hence the neurons get phase-locked only after their mutual interactions, making the situation similar to the whole net excitation. 160 150 no of firings 140 130 120 110 100 90 0.8 0.9 1 1.1 1.2 time 1.3 1.4 1.5 Figure 2.5: As Fig. 2.2, but with peripheral initial excitation. 800 700 no of firings 600 500 400 300 200 100 0 0 1 2 time 3 4 5 Figure 2.6: As Fig. 2.4, but with peripheral initial excitation. 27 2.5.4 Synchronicity, bin-size and dynamic entropy While the synchronicity just described, induced by the common external agent, may indeed be genuine, the other synchronicities may be more suspect. Since in simulation experiments the time loops must occur after a small but finite duration, the procedure puts many events of different exact times in the same bin. This is revealed if we change the bin size in the experiment. Neurons that appear in the same bin seem to split up when the resolution is increased. In Figure 2.7 we show the results of lumping 10 loops of Figure 2.2 into one loop. Apart from the proportionately higher bin counts, convergence also appears to arrive faster, which of course is only an artifact. 65 no of firings 60 55 50 45 40 35 3 3.02 3.04 3.06 3.08 3.1 3.12 3.14 3.16 time Figure 2.7: As Fig. 2.2, but with a time step 10 times greater. Because of the built-in time scales related to charging rate, affinities with nearest neighbors, and the width of the action pulses, it is not expected that 28 such neural networks would show self-similarity, i.e. a fractal structure. It is obvious that if the time bin size is made small enough, the amplitude, i.e. the number of firings per bin, would be normally be reduced to a sequence of units and zeros, though, because of the discrete nature of the upgradings in a computer simulation, there may be occasional coincidences. In a time series Kolmogorov-Sinai (K-S) entropy [13] is often used as a characteristic measure of randomness of the series. Let be the probability of the system being in states {ik } at times {k}T , k = 0, 1, ..., n in a time sequence, be pi0 ,i1 ,...,in . Then we can define X Kn = − pi0 ,i1 ,...,in ln(pi0 ,i1 ,...,in ) (2.22) i0 ,i1 ,...,in Now, Kn+1 − Kn is a measure of the information needed to predict which state the system will be in at time (n + 1)T , after having reached one of the possible states at time nT . The Kolmogorov-Sinai K entropy is then defined by the limit K= lim n→∞,T →0 Kn+1 − Kn (2.23) Hence, a completely random sequence has the highest K-S entropy as the next step can be any, with equal probability; a periodic system with some noise will have less, and a completely periodic system, i.e. for a phase-locked 29 periodic system, the entropy is zero, as only a particular repetitive sequence of states i1 , ..., in will occur after phase locking, with Kn+1 = Kn , and the next step will be completely predictable. In that case the entire trajectory with the length of the periodic subsequence would stand for a single “dynamical state”, which is fixed, with probability 1, leading to zero entropy. 2.6 Leaking networks with finite action potential In Figure 2.8 we have shown the results of simulating a neural network similar to that of Figure 2.2, but with a finite value of the leaking resistance R. Compared to Figure 2.2 we see a higher time period T = 0.047, though in this case the convergence seems to have taken place much earlier at t = 1.7. For R = 2, we get T = 0.069, and the limit cycle seems to appear only around t = 3.5. The smaller resistance of course dampens the charging more and this results in the higher time period. It also effectively reduces the effect of the charging external current I and, as we have commented earlier, this weakening of a common driver increases the role of the complex phase adjustments of the neighbors, which in turn delays the convergence to the limit cycle. If we integrate Eqn. 2.1 assuming like H &H that all the synaptic 30 current from the neighbors arrive at the beginning of the cycle, and that we have narrow widths as a first approximation, but keeping R as a free time no of firings scale, we get 28 26 24 22 20 18 16 14 12 10 8 2 2.05 2.1 time 2.15 2.2 Figure 2.8: As Fig. 2.2, but with leaking neurons with R = 10. T = R ln[(IR − A)/(IR − 1)] (2.24) Again, this is derived assuming that each neuron gets the full synaptic current from the neighbors at the very beginning of the cycle and a narrow width as in H-H. We notice that though IR is a dimensionless quantity here, the period T is controlled by the scale of R if IR is constant, i.e., if we vary both I and R keeping IR constant, then T1 /T2 = R1 /R2 31 (2.25) So, for I = 1, which we have used throughout as an external time scale reference, making R = 1 should give a singularity. This physically corresponds to a situation where the leakage drains away all the charge it accumulates from the environment and, hence, the neuron can never charge up to saturation to get the requisite potential to fire, as the contribution from the neighbors is insufficient. Indeed when we tried to simulate a network with I = 1 and R = 1, not a single neuron fired. However, if we change R to a slightly higher value, e.g. R = 1.0001, other parameters remaining identical, we see a very interesting phenomenon. The neurons now get sufficient accumulated charge from their initial random charges, the external current and the synaptic current from neighbors to defeat the damping of the potential due to leakage and reach firing threshold. However, as the damping is severe near the critical value R=1, the differentiation of the initial phases becomes relatively unimportant and many neurons move towards synchronicity. In Figure 2.9 we see the first bunch of firings, with some dispersion, whereas in Figure 2.10, which gives a more extended picture, the synchronicity appears to become more established in successive firings. This is in contrast to the case of weaker damping (Figure 2.8) described above, where periodicity establishes phase-locking, but not synchronicity. Eqn. 2.24 seems to be more accurate for large R with small T , than for small R near criticality (Table 32 1). 25 no of firings 20 15 10 5 0 2.6 2.8 3 time 3.2 3.4 Figure 2.9: First firing of a net of leaking neurons, with a resistance just above the critical value of R = 1. 1200 no of firings 1000 800 600 400 200 0 0 5 10 time 15 20 Figure 2.10: As above, showing successive peaks with growing synchronicity. The first firings shown in detail in the previous figure are the small clump at around t = 2.8. It is understandable that for large periods the discrepancy between Eqn. 2.24 and the results of the simulation increase. The neighboring neurons can then distribute their synaptic contributions over a larger duration, and any signal 33 Table 2.1: Time period for leaking networks R Tf ormula Tsim [w = 0.001] Tsim [w = 0.002] 1.2 0.22 0.095 0.21 1.5 2.0 0.11 0.078 0.078 0.068 0.11 0.077 5.0 0.050 0.051 0.049 10.0 0.044 0.048 0.045 arriving late is attenuated by R less than those arriving at the beginning of the cycle, which we have assumed. So, for large T , the effective A in Eqn. 2.24 is higher than the real A, which decreases the period to some extent. We also note that if the width of the synaptic current pulse is changed there may be a change in the period, because a wider pulse is more liable to get attenuated by the damping R. The last row of the table presents the results of simulating with w = 0.002, which gives almost perfect agreement with Eqn. 2.24. For much smaller values of w, at low values of R, i.e. for high leakage and damping, the agreement is less satisfactory, because the singularity at R = 1 seems to affect the narrower (smaller w) action potentials more than the wider ones, which have a chance to spread out. 34 2.7 Discussion In this work we find that many of the results of an over-simple zero-width integrate-and-fire neural network are valid for more realistic finite-width models. We have proved analytically that even with a finite width, the system converges to a limit cycle with a period identical to the zero-width case. However, the problem of convergence is more complicated for the case of finite width AP spikes, and we have shown that probably only asymptotic convergence ensues as a geometric sequence, which depends on the parameters of the model, i.e. the width, the external current and the synaptic coupling with the neighbors. The shape of the pulse does not seem to be the determining factor in controlling convergence, though it might have some perturbative modifications. The question of synchronicity seems to be complicated by the inevitable finiteness of the time steps in a simulation and opens up the possibility of investigating any fractal properties associated with such networks. The radical difference between peripheral and whole net excitation also seems interesting and worth further investigation. Stochastic and mean field approximations may give analytic results where exact methods may be limited, as in the case of leaking neurons, though the simple formula Eqn. 2.24 for the period, derived with drastic assumptions gives remarkably accurate predictions. The transition to sharp synchronicity in almost critically damped 35 networks may therefore be amenable to more precise analysis. 36 Chapter 3 Neural Networks with Quantum Interactions In the next three chapters we shall describe three different models of neural networks where, in artificial devices, the classical action potential is replaced by quantum interactions, and the “neurons” are quantized objects with states represented by allowed eigenstates of an operator that correspond to a measurable quantity. This quantity may be the state of spin or polarization of a particle or a microsystem behaving like a particle, and in the simplest instance, may be like a binary bit with two possibilities, but in general may also be superpositions of eigenstates, until a measurement is made, when it would collapse to either of the two possibilities. 37 In this brief chapter, which is a general introduction to the next three chapters, we describe the differences of these three models from one another and from the classical model in some detail, so that a comprehensive view of the models and also our motivation for considering them may be seen. It is also true that as per Moore’s law the density of electronic devices in a chip doubles about every 18 months. The limit at present is about 108 transistors per chip, and the average size of a component unit is of the order of 100 nm. At the rate of technological refinement given in Moore’s law, within the next 12 years we shall reach the size of an atom for the storage of a single bit of information, and then quantum effects will become impossible to avoid in our considerations of the working of such chips. Hence, it is a fair guess that in the foreseeable future we have to concern ourselves with the quantum properties of AI systems anyway, whether we are interested in utilizing the advantages of quantum algorithms or not. This makes a comparison of different levels of quantum effects and their relation to legacy classical “hardware” a subject of great interest. While making such comparisons it is necessary to make sure that other features of the models remain more or less identical, so that we determine the newly emerging quantal effects of greater coherence with clear distinction. This limits us necessarily to relatively simple systems. Our designs of some simple quasi-quantum and quantum artificial intelli- 38 gence systems produce interesting results, suggesting the promise of practical devices being produced by using similar, if not exactly the same, schemes in the future for the contexts we shall mention in each case. Our results also validate further research both experimentally and theoretically in these directions. The concepts and conclusions shown in these models also advocate future investment in designing more complicated devices along this line, and also support experimental research in the practical design of parts used in these schemes. Most of these elementary parts, however, are shared by a number of other nano-devices and quantum computers, and hence these designs would act as further incentives for creating practical quantum computer parts such as logic gates. For each model we produce theoretical formulations and simulations using our simple designs. 3.1 Quantum Interaction between Nodes and Coherence Scale In the classical biological neural network the action potential proceeds like a soliton wave across the axon of the neuron and across synapses and causes neurotransmitters to transfer to the target cell, which , after accumulating 39 sufficient charge from all such receptors, itself fires when the potential difference across the membrane crosses a threshold. The integrate-and-fire (I&F) model is a simplification of this process, giving the signals bi-directionality, which is not actually present in biological networks, and also connecting only geometrically nearest neighbors in a Z n grid. Despite its simplicity the classical I&F model is an excellent theoretical laboratory to test many aspects of the dynamics of a neural network, including its periodicity. When formulating a quantum version for artificial intelligence (AI), we can consider a hierarchy of allied models, all of which may be physically realizable, if not right now, at least later, because all obey the laws of physics. They are not approximations of one another in a hierarchical sequence. They are simply models with different properties, resulting from the different degrees of coherence on the net, i.e. the spatial extent of maintained coherence, which is principally a technological problem. Hence, the sequence of models may indeed be related to the class of AI devices which will arise in time in a hierarchical mechanism of increasing coherence. Nevertheless, they may also co-exist at the same time, as representations for different levels of performance or for different demands. Our motivation for choosing these models is to compare the temporal development of these neural networks with different levels of coherence, and 40 hence different combinations of classical and quantum properties. As we have remarked in the introductory chapter, periodicity is an important characteristic in the H-H type networks, and we have shown in the previous chapter that it is retained in the classical model even if we introduce finite widths of the action potential, instead of taking them as zero-width idealizations. In the quantum processes, we would again expect that time scales would come in again, one possibly depending on the strength of the coupling, one on the type of input, one involving the details of the interaction at each node, which may be a quantum analogue of the shape and width of the classical action potential. Our models represent essentially memory devices with forms of dynamic memory depending on the input, which to some extent at least dictates the mode of oscillation of the system, principally the global periodicity, in a complex, possibly changing pattern. We shall see that the chief difference in the systems we deal with, compared to the classical H-H I&F model [11], is the scale of quantum coherence among the components. 41 3.2 Technological Realization of Quantum ‘Neurons’ and ‘Action Potentials’ Two-state physical objects are not difficult to find in quantum mechanics. A spin-1/2 real particle in a magnetic field, an exciton with two energy states, a photon with two polarizations, can all have superposed mixed states of the two eigenstates and serve as a “qubit” to be defined later. Progress has been made with a number of systems, theoretically and experimentally: i) NMR spins [16] The quantum registers can be built of liquid hardware containing about 1018 molecules in a strong static magnetic field. A qubit is the spin of a nucleus in a molecule. The interaction between such nodes are facilitated by quantum gates that utilize resonant oscillating magnetic fields or Rabi pulses used in nuclear magnetic resonance (NMR) technology. The spin-spin Hamiltonian between neighboring atoms provide the theoretical basis of information exchange. Many quantum algorithms, including Grover’s algorithm, the quantum Fourier transform and Shor’s algorithm, have been executed experimentally using NMR and molecules containing three to seven qubits. However, it is not possible to go up in scale on account of the difficulty of chemical synthesis beyond 6 to 7 spins, and the measured signal drops exponentially with the number of qubits in a molecule. Hence, the idea of 42 quantum computing with NMR has fallen out of favor. ii) Quantum electrodynamics techniques [37] Experiments have been performed in which a single atom interacts with a single mode or a few modes of the electromagnetic field inside a cavity and the two states of a qubit can be represented either by the polarization states of a single photon, or by two excited states of an atom. Such QED techniques in a cavity have allowed the implementation of one and two-qubit gates and have been used to demonstrate entanglement between quantum states. iii) Cold ions in a trap [38] The quantum register is a string of ions confined by a combination of static and oscillating electric fields in a linear trap known as a Paul trap. A qubit is a single ion with two long-lived states of the ion making up the two states. After a quantum computation is completed, the state of each ion can be measured using quantum jump detection when each ion is bombarded with laser light, whose polarization and frequency is adjusted so that it absorbs and then re-emits photons only if it is in the upper state. The interactions between qubits result from the collective vibrational motion of the trapped string of ions. To implement the two-qubit C-NOT gate to be described later, Cirac and Zoeller [40] suggested a scheme where the quantum state of the control qubit is mapped onto the vibrational state of 43 the whole string (which is known as bus-qubit), and laser beams are focused on that ion. A gate operation can then be performed between the bus qubit and the target ion. The effect of a laser beam on the target qubit depends on the state of the bus-qubit and hence it is the control. This state is then mapped back onto the control ion. The C-Z gate has been constructed by the Innsbruck group [39], using two 40 Ca+ ions in a linear trap which can be individually addressed using laser beams. A generic single-qubit state is encoded in a superposition of the ground state S1/2 and the metastable state D5/2 with a lifetime of about 1 s. In this type of system sources of errors are the heating due to fluctuating electric fields and the ambient magnetic field laser frequency noise. The CNOT gate takes about 10 µs to operate, whereas the decoherence time scale is of the order of 1 ms [41]. iv)Superconducting systems [42] In superconductors, the Cooper pairs are confined to boxes of micron size. In a Josephson junction a Cooper pair box is connected by a tunnel junction to a superconducting reservoir. They enter the island one by one when a control gate electrode, capacitively coupled to the island, is varied. The island has discrete quantum states and, under appropriate experimental conditions, the two lowest energy states form a qubit. The manipulation of 44 one-qubit states is possible: a microwave resonant pulse of duration t induces controlled Rabi oscillations between the states. The decoherence time scale is about 10 s for this circuit [43], which is more than sufficient to implement a single-qubit gate that takes as short as 2 ns. A two-qubit gate was recently operated using a pair of coupled superconducting qubits [44]. 3.3 Scales of Coherence Among our models we have used quantum gates in the second and third, to create the interaction between qubits. However, in the first model to be described in the next chapter, we simply have a qubit-like two-state entity being acted on by nearest neighbor classical sources of potential, similar to the classical sources of current in the H-H I&F model. We are presuming here that no node can see the other nodes as superpositions, but only as decohered classical objects. So, in this model the scale of spatial coherence is the minimal. For the next two models we assume that two neighboring qubits interact through a quantum gate. Since each qubit can see the neighbor as a quantum superposition, the scale of coherence has increased here to the interqubit distance, at least. But in the second model, it stops at the lattice distance.(Fig. 45 3.1), whereas in the third model that coherence extends throughout the lattice, i.e. the entire network. In the last case we can use the entire lattice as a quantum register of entangled qubits. In this case all the gates also must act jointly to maintain the coherence property whereas in case of the second model each node can be updated separately with its neighboring gates acting in unison. 3.4 Time scales of quantum gates and system evolutions Even without the decoherence problems mentioned in section 3.2 which are mostly stochastic and related to interaction with environment and noise of various kinds and which can be improved for better performing devices in future, a quantum gate has intrinsic time scales set by quantum laws. Toffoli and others [46] have obtained expressions for such time limits. It had earlier been shown [45] that the minimal time needed for the autonomous quantum evolution of a system from one state to another one orthogonal to the initial state given by τ= h 4E 46 (3.1) k v (a) (b) c-NOT updated qubit control qubits (c) ((c)c)(c)(c)(c) all qubits entangled coherent cNOT gates (d) Figure 3.1: (a) In classical H-H network each node receives a current k from each nearest neighbor which has fired; (b) In our first model of quasiclassical network, every node is a qubit-like object, but sees its neighbors as decohered classical sources of potential; (c) In our second model, a quantum gated one, every qubit interacts with its nearest neighbors seen as qubits, but the coherence length is the interqubit distance. (d) In our third model of completely entangled network all qubits are in coherence and the gates also maintain the coherence. 47 where E is the average energy of the system hEi. For single gate operation with a flip between states with energies E1 and E2 , the minimal time is given as τ= h 2|E2 − E1 | (3.2) and the operations of a flip followed by a phase change of θ is given by τ= h (1 + 2(θmodπ)/π) 4E (3.3) Numerically, for an ion-trap involving Ca+ ions , using laser light with wavelength 397 nm, the time is of the order of 10−16 s. This appears much smaller than the present-day stochastic limits, but may become a genuine limit which has to be addressed in fast quantum computing. The situation is similar to the finite width of the action potential. Hence, when we simulate each of the models, we shall use techniques similar to that for the finite width classical model. The updating of each node will be done in small finite time steps, so that the small continuous effects on each other may be taken into account as the system develops in time. 48 3.4.1 Quasiclassical Net Here the procedure is fairly straightforward. The primitive qubit is a recyclable two-state entity, and it feels the total potential from nearest neighbor sources, as well as from a global field, similar to the universal current charging all neurons in the H-H classical model (Fig. 3.1 b). Time dependent perturbation theory is used for finding its development in time in terms of a transition amplitude, and from that we get the probabilities of transition to the other state in a nondeterministic quantum manner. In a simulation we examine the development in terms of discretized time spans, with updated values of the transition amplitudes expressed in terms of excited nearest neighbors for each qubit in turn, in an arbitrary sequence over the lattice not involving any time ordering. Then we get back to the first qubit again, and the process of updating is repeated after the set small time interval. 3.4.2 Quantum-Gated Net In this case too we do not let the quantum gate act instantaneously, but divide the action of the gate into thin discretized time bins and consider the sequence of the snapshots. Each node is considered one at a time, because of lack of coherence beyond a nearest neighbor distance. The gates act on the nodes one at a time, but all nearest neighbors of the qubit contribute 49 simultaneously in each time slab updating. In actual operations full gate action may involve a number of laser pulses, which themselves have a finite time structure, but we shall use a simplified uniform continuum in our model that we shall discretize into uniform time slabs for simulation. For any unitary operator U representing a quantum gate we get the master evolution equation, at time ti , the time of updating node i: |i, ti + ∆ti Y |i + n, ti + ∆ti = Un (∆t)|i, ti i n Y |i + n, ti i (3.4) n where i²L ⊂ Zd is the vector position label of the qubit being updated in a d-dimensional cubic lattice, and n indicates the relative vector position labels of the nearest neighbors of any qubit. i is sequenced in time for updating within each cycle. The neighbors actually remain unchanged until it is their time to update: |i + n, ti + ∆ti = |i + n, ti+n i 3.4.3 (3.5) Completely Entangled Net In this case we have a single transformation matrix representing the simultaneous updating of all the qubits at i²L ⊂ Zd at every small time step: 50 Y |i, t + ∆ti = UN (∆t) i²L Y |i, ti (3.6) i The operator UN now involves all the qubits in the system (say numbering N ), and is therefore a N XN -dimensional matrix, but fortunately of a block tridiagonal form when arranged properly in terms of nearest neighbors. 3.5 Difference between our Quantum Models and Existing Models Though quantum computing has been a subject of great interest for some time, there has been virtually no work involving quantum networks. Zak and Williams [47] have presented the only other model we know of, which involves a completely entangled qubit set of nodes, in their case a 3-node system, and not a lattice, which we have retained as a common feature in all our models, classical and quantum. We highlight the similarities and differences between their model and our third network. Both the model presented previously and our last model have an operator acting coherently on an entangled quantum system. However, there are some significant differences: 51 1. Their system has no input dependence. The operator acting on the quantum wave function defines the patterns emerging, depending on the mutual interactions of the nodes only. Different patterns may emerge, as the operators keep repeating the interaction. However, the design of the operator alone defines these patterns, in a simple way. In our system, an arbitrary input signal changes the pattern in the system and stores it, in some form. Hence, the former model is more like a quantum EPROM (Erasable Programmable Read-Only Memory) and ours is like a quantum RAM. 2. The former does not consider continuous time evolution of the system. The operator acts discretely and instantaneously and a new pattern emerges due to that interaction. In our system, the system interacts with the environment and receives an input state. The operator then makes the system evolve with time in a continuous manner, and partial gate operations at each step of updating produce a richer set of patterns, not possible in the predefined complete gate operations. 3. Dynamical inputs as memory can be injected in our network. 52 Chapter 4 Quasiclassical Neural Network Model 4.1 Introduction The introduction of novel nano quantum devices and the idea of quantum computers raise the possibility of new age artificial intelligence systems based on quantum principles. Quantum factorization algorithms [16] suggest that these new intelligent devices may be able to “think” in a smarter way for certain purposes. The basic concepts of quantum systems, on the other hand, can also be used in macroscopic complex systems like social networks [48] or the financial market [49] where stochastic behaviors and the ideas of choice 53 may be observed. On the other hand, recent researchers have been debating about the possibility of quantum interactions in the human brain as well [50, 51, 52, 53]. Although the possibility of quantum interactions in the human brain remains an unresolved debate, some of the key ideas can be used to create artificial brains taking advantage of quantum interactions to show effects unseen in classical systems. The experimental production of quantum bits [16], especially those using semiconductor quantum dots [54, 55], makes it plausible that artificial quantum systems will be producible in the future. In that light, we attempt to produce artificial intelligent systems based on quantum principles, that are able to store patterns or exhibit stochastic memory. In the present chapter we discuss a quasiclassical system [56], and then we move to quantum gated systems [57, 58] 4.1.1 Comparison with the Hopfield Model It was shown that classical neural networks with a finite-width action potential [30, 31, 32], rather than ones with delta function type action potentials acting between neuron [12] may be a more realistic model of the actual human brain. Hopfield and Herz had found that there is a simple relation between the contributions A from the neighbors and an external current I, with the time 54 period of the firing of the network when phase-lock is established: τ = (1 − A)/I (4.1) In the previous chapter, we have shown [59] that phase locking takes place for networks with an arbitrarily shaped action potential with finite width. The first part of the quantum artificial intelligence design is inspired by the original Hopfield-Herz model, but with further modification so that the neurons “fire” stochastically in a semi-classical way and are placed in a potential field rather than being immersed in a background current. The introduction of uncertainty in the firing would introduce fuzziness in the model, which might be an interesting aspect of an artificial machine not yet produced classically 4.2 Quantum Transitions We first consider the simplest case of a quantum neuron placed under the influence of a time dependent interaction. A net potential may arise at a neuron due to neighboring neurons, or due to a mean field interaction averaging the effect all other neurons, or due to an external interaction. For a state under the effect of such a potential, 55 |ti = U (t, t0 )|t0 i (4.2) where U (t, t0 ) = exp[−i Z t 0 dtH] (4.3) Here the Hamiltonian H consists of a static part H0 and an interactive part V (t) from all other sources. One of the few exactly soluble cases is that of the sinusoidal interaction in a two-state system, with energies E1 and E2 . V (t) = beiωt |1ih2| + h.c. (4.4) This gives Rabi’s formula; if the system is initially in state |1i, then at time t the probability that it will be in the state |2i is P2 (t) = b2 /h̄2 sin2 [(b2 /h̄2 + (ω − ω21 )2 /4)1/2 t] b2 /h̄2 + (ω − ω21 )2 /4 (4.5) Here ω21 = (E2 − E1 )/h̄. The probability of finding it in the original state is 56 P1 (t) = 1 − P2 (t) (4.6) Hence, sinusoidal oscillation can be observed. It is important to note that these closed, exact expressions are true only in a probabilistic sense, and the oscillations and time periods mentioned here and later are only averages over a large number of measurements of the state. Each of these measurements is individually a completely random event, as postulated by quantum mechanics. Generally, for a more complicated interaction, time dependent perturbation theory should to be used so that the interaction picture is more convenient, and the equation of motion takes the form ih̄ ∂|tiI = VI (t)|tiI ∂t (4.7) Here the interaction picture states and operators are related to Schrodinger states and operators by |tiI = eiH0 t/h̄ |tiS and 57 (4.8) VI (t) = eiH0 t/h̄ VS (t)e−iH0 t/h̄ (4.9) H0 is the noninteracting part of the Hamiltonian. The transition rate per unit time from state |1i to |2i, to first order in V , after dropping the I subscript for simplicity, is ¯Z t ¯2 ¯ ¯ 0 iω21 t0 0 ¯ Γ = (1/h̄ ) ¯ dt e h2|V (t )|1i¯¯ /t. 2 0 (4.10) This expression produces Fermi’s golden rule when t → ∞. 4.3 Some Analytically Soluble Cases We consider a network of quantum neurons, taking perturbative approximations [60] into account. We can often get analytic results for certain types of simple, but not unrealistic, interactions. They may be relevant to nanoelectronic devices. The potential, as before, may arise due to several possible scenarios mentioned previously. 58 4.3.1 Constant External V, No Interneuron Interaction The external potential V is turned on at t = 0 and remains constant at all subsequent times. Making h̄ = 1 for simplicity, we get 2 t) Γ21 = 4|V21 |2 sin2 (ω21 t/2)/(ω21 (4.11) When we take the average time before transition to be τ = 1/Γ21 = t above, we get a relation for it: sin(ω21 τ /2) = ω21 /(2|V21 |)h (4.12) This gives multiple values of τ for the same value of V21 . The lowest value, however, will stochastically dominate. If the energy difference between the two states of the neuron is small, i.e. ω21 ∼ 0, then the time for transition becomes inversely proportional to the interaction strength. This is seen in the case below (Eqn. 4.15). 59 4.3.2 Harmonic External V, No Interneuron Interaction Here we have V = V0 e−iωt + h.c. and hence transition rate finally becomes, after time integration form t = 0 to t 1 − cos((ω − ω21 )t) 2 ω 2 − ω21 t (4.13) 2 2 sin ((ω − ω21 )τ )/2 4|Vo | 2 ω 2 − ω21 (4.14) Γ = 2|Vo |2 This gives, with t = τ = 1/Γ 1= Again, we see that for the same interaction, multiple average transition times of the neurons are possible. This relation is the first order approximation of the Rabi formula above, as expected. We note that we still get a resonance when ω = ω21 with τ = 1/|V0 | Interestingly, this value is independent of ω and ω21 . 60 (4.15) 4.3.3 Exponentially Damped External V, No Interneuron Interaction We now consider the external potential with an exponentially decaying form with time constant 1/a V (t) = V0 e−at (4.16) 2 1 = |V0 |2 [1 + e−2aτ − 2e−aτ cos(ω21 τ )]/(ω21 + a2 ) (4.17) On integration we get For small energy difference between the two states, we find a simpler expression for τ 1 = |V0 |2 (1 − e−aτ )2 /a2 (4.18) We again check that for a → 0 we get Eqn. 4.15 4.3.4 Damped Harmonic External Potential, No Interneuron Interaction We have, in this more general case, 61 V (t) = V0 e−iωt−at (4.19) simply a shift of ω21 to ω21 − ω in Eqn. 4.17 1 = |V0 |2 [1 + e−2aτ − 2e−aτ cos((ω21 − ω)τ )]/((ω21 − ω)2 + a2 ) (4.20) A resonance situation is observed when ω21 = ω 1 = |V0 |2 [1 − e−aτ ]2 /a2 (4.21) We note that for small a we get as in the case of constant V ( Eqn. 4.15) τ = 1/V0 , as expected. 4.3.5 No External Potential, Constant Interneuron Interaction In this case, though the interneuron potential v may be constant, the turning on time of the neighbors may be randomly different. We shall get only an effective potential, smaller than the full value v, and in the long run, with averaging, this too will be a constant, say f v, with f < 1. We then get the 62 formula Eqn. 4.15, with V0 → f v. 4.3.6 Constant External Potential and Constant Interneuron Interaction We again get Eqn. 4.15 with V0 → V0 + f v, with f < 1 if the interneuron interaction has a random time onset. 4.3.7 No External Potential, Damped Harmonic Interneuron Interaction We assume the potential due to it neighbors (four for a square lattice network) at any neuron to be given by v(t) = X v0 eiωt+iδi −at (4.22) i This is the most general form. With appropriate limits, this can be converted to simple damped interactions or even constant interactions. After integration and considerable algebra the transition rate is given by 2|v0 |2 −2at (1 + e2at − 2eat cos[(ω − ω 0 )t])(2 + e Γ= t a2 + (ω − ω 0 )2 63 P i6=j cos[δi − δj ]) (4.23) Here, we have used the symbol ω 0 ≡ ω21 . Resonance is seen again at ω = ω 0 . If there is phase-locking, the phase factors remain constant, and we get the relation for the time for the average transition from a2 = K|v0 |2 e−2aτ (1 − eaτ )2 (4.24) Here, K is constant depending on the constant phases. For small values of a we again get a relation like τ ∼ 1/|V0 |. 4.4 Quasiclassical Hopfield-Herz type Neural Network We are now prepared to construct the biologically inspired model. We consider a square lattice of quantum neurons that can communicate with neighbors by means of quantum signals. The duration of interaction is varied in analogy with the action potential currents of varying widths in the classical case. We also retain a constant potential background, V0 R background and then add the contributions dtf (ti )v from the i-th neighbor. The integration is spread over the duration of the quantum pulse that is set 64 to zero at each triggering. We keep the energy difference between the two states of the neurons is small, i.e. ω21 ∼ 0. In this case, we get 1= √ h̄ 1 = Rτ P (Γτ ) | 0 dt(Vo + fi v)| (4.25) For small perturbations this would look like 1 ≈ k 0 (1 − | XZ τ 0 dtfi v|)/| Z τ 0 dtV0 | (4.26) This is analogous to the classical formula [59, 12]. A certain difference between the classical and the quasiclassical model can be observed here. In the classical case it is possible to have 1 = A (in the standardized units used there) and get a zero period, i.e. an always saturated net. In this quantum case this is impossible because the more exact expression Eqn. 4.25 cannot give a zero. This is because in the quantum case transition or non-transition at any time is never a certainty. Here, we make the further assumption that the pulse width is a duration greater than the average period, and we have to take a proportionate amount of contribution from the neighbors. This gives the relation 65 τ = k 0 /(V0 + 4v/w) (4.27) This is a simple relation for the average period and can be solved in terms of the system parameters V0 , width w of the pulse and the pulse size v. We have a randomness in the contribution from the neighbors in the quasiclassical case, which was not present in the classical model. We introduce a parameter q with the interaction v between neighbors to account for this randomness. Now we get: τ = k/(V0 + 4qv/w) 4.5 (4.28) Input Dependence The memory and experience stored in the real brain arise from a person’s interactions with the environment by means of “perception”. A memory device as well should be expected to respond to the environment and interact with it, if it is to be of use. Data fed to the device can then be stored and updated and compared with more inputs. However, the internal topology and fixed weights of the connections contribute to “preferring” one set of data over another or relating to a certain pattern. This might be similar to 66 the aspects of a person’s personality that is genetically fixed and appear as addictions or phobias in extreme cases [61] We next observe how our quasiclassical artificial intelligence responds to the outside world when we vary parameters. The fuzziness and “average” periodicity associated with the quasiclassical stochastic behavior is unlikely to produce exact closed equations. However, we also try to see if a simple formula like Eqn. 4.28 can give agreements with the simulation and examine how a memory from an experience is eventually lost. 4.6 Results of Simulation We see from our simulation that: (1) In case of a 40X40 lattice with periodic boundary conditions, so that it looks like an infinite lattice, we get a constant average for the neurons for about 40,000 simulations with a given set of parameters but different initial states. Hence, in an average sense we do have periodicity (Fig. 4.1) and the system is not chaotic. If we follow its history for a single neuron, we see an almost linear relation between the cumulative number of firings and time (Fig. 4.2). If we consider the average sum of all nodes of the lattice, the stochasticity almost disappears and we see a practically straight 67 line (Fig. 4.3). 25 no of triggerings 20 15 10 5 0 19.8 19.85 19.9 time 19.95 20 no. of triggerings Figure 4.1: V0 = 0.2, width = 0.2, k = 0.2: typical pattern of the triggering of the neurons in the quasiclassical neural network. There is apparently no phase locking. 90 80 70 60 50 40 30 20 10 0 0 5 10 15 time (arbitray units) 20 Figure 4.2: Cumulative number of triggering against time. One can see a fairly regular linear behavior despite quantum stochasticity. This is for a single chosen neuron. (2) As the strength of the signal from the neighbor increases, the time 68 160000 no. of triggerings 140000 120000 100000 80000 60000 40000 20000 0 0 5 10 15 time (arbitray units) 20 Figure 4.3: Same as Fig 4.2, but for the whole system. period decreases (Table 4.1) as well. However, it does not appear to go to zero, unlike the classical case, as we had expected, because the quantum mechanical transition rate cannot have a singularity in this case. Hence, we can make the signals arbitrarily strong for a quasiclassical network without worrying about singularities or negative periods. The parameter q was fitted to the first value and then Eqn. 4.28 was used to predict all the other periods accurately. (3) The pulse width is important here as well. If the pulse is spread out, the average period becomes bigger (Table 4.2). Once again, we get excellent agreement of Eqn. 4.28 with the simulation results, with the same value of q used in Table 4.1. (4) The input dependence is observed after averaging over 100 simulation 69 Table 4.1: Strength of Quantum Potential and Average Period of Neurons (width = 0.2, V0 = 1): Best Fit v 0.1 0.2 0.3 0.4 0.5 1.0 Tpred 0.049 0.036 0.028 0.023 0.020 0.011 Tsim 0.050 0.035 0.028 0.023 0.020 0.013 Table 4.2: Variation of Period with Duration of Quantum Potential (v = 0.2, V0 = 1): Best Fit width 0.1 0.2 0.3 0.5 1.0 Tpred 0.022 0.035 0.044 0.054 0.065 70 Tsim 0.023 0.035 0.043 0.054 0.065 runs. We used the following types of inputs: (a) All peripheral nodes in state |1i and all body nodes in state |0i: We see (Fig. 4.4) a smooth transition from a state with an initial firing rate proportional to the number of initially excited nodes that die down quickly and the system forgets the input and lets the system parameters take over with a noisy pattern, despite the averaging over the runs. It is remarkable that the initial few cycles with the memory show virtually no noise. 18 no. of triggerings 16 14 12 10 8 6 4 2 0 0.05 0.1 0.15 time (arbitrary units) 0.2 Figure 4.4: Transition from short term behavior to asymptotic behavior with all peripheral nodes initially in state 1. (b) Peripheral nodes alternate between states |1i and |0i: body nodes are in state |0i (Fig. 4.5). Here, we start with a smaller number of firings because of halving the initially excited nodes, and there are a few kinks in 71 the initial cycles, most possibly due to the conversion of the spatial lack of symmetry to temporal. The system moves to the common noisy asymptotic behavior after forgetting the input as we had expected. 18 no. of triggerings 16 14 12 10 8 6 4 2 0 0 0.05 0.1 0.15 time (arbitrary units) 0.2 Figure 4.5: Same for peripheral nodes initially in states |1i and |0i alternately. (c) Peripheral nodes in random |0i and |1i states: Here too we see fairly prominent kinks (slightly less than the oscillating |1i ⇔ |0i pattern in the previous case) from the interaction of the noncoherent randomness of input in the neighboring peripheral nodes until the short-term memory disappears (Fig. 4.6). (d) We make the whole lattice initially uniform in state |0i. However due to the external driving potential, the system soon develops into the noisy final state. 72 18 no. of triggerings 16 14 12 10 8 6 4 2 0 0.05 0.1 0.15 time (arbitrary units) 0.2 Figure 4.6: Same for peripheral nodes initially in random states of excitation. 4.7 Discussion We have shown that many quasiclassical neural networks can be analytically treated for suitably chosen external and interneuron interactions. This may be relevant in nanoelectronic devices because of their harmonic and/or damped properties. In certain extreme cases, such as resonances and small damping, we obtain a very simple inverse relation between the time period and the effective interaction potential. Our results, both analytic and simulated, indicate that in a quasiclassical microscopic neural network designed to mimic the classical neural network with an integrate-and-fire model with classical action potentials in ways, indicate the presence of an average time period reminiscent of the classical model. However, the following differences are apparent for the quantum 73 version: 1) A system with a zero time period (infinitely fast) found in the Hopfield-Herz model cannot be observed. 2) We can use arbitrarily high potentials to link the neurons We have also seen that the average system period depends on the system parameters as given in Eqn. 4.28 in a simple way. The choice of a single parameter predicts a wide range of potential pulse strengths and durations. Hence, the quantum system is shows a universality. We have not introduced dissipation to consider Hebbian learning. The effects of dissipation in quantum neural networks have been studied by Altaisky [62] and Zak [47]. Such effects in the context of quantum computing are an important field of intense current study [63] because many of the advantages of quantum computing come from the entanglement of qubits. Decoherence is inevitable in our quasiclassical model as the neurons are all allowed to fire independently, with no quantum coherence or entanglement. However, unlike these authors, we have considered large systems. Here the role of adaptivity in the quantum context becomes too complex. Other approaches for constructing and studying quantum neural nets can be found in various papers [64, 65, 66, 67, 68, 80] , and comparison of their work with our work can be found in some of them. A very interesting possibility exists of a semiclassical neural network based 74 on the classical motion of phononic solitons obtained from quantum considerations, e.g. Davydov solitons along a protein chain [81, 82, 83]. A nonlinear Schrödinger equation with an effective potential similar to Eqn. 4.22 with the right kind of space and time dependence of the parameter a, may produce such a soliton, though this cannot be solved perturbatively. These solitons in an α-helix assembly may act as carriers of logical data for distributed nodes in a network. The quantized signal would travel in a manner similar to a biological action potential along an axon. 4.8 Possible Applications Although concepts of quantum computer and related algorithms [69] together with the experimental creation of simple qubits predict that in the future fully quantum complex devices with qubits and gates will be feasible to produce, experimental difficulties related with decoherence and noise prevent the possibility from coming into being in the very near future. However, quasiclassical devices may be created to explore the move towards a quantum world. The analytic results found using harmonic and/or damped interactions may have electromagnetic origin and hence may be quite relevant in a network of nanoelectronic devices. The biologically inspired system considered here can serve 75 as short-term dynamic memory with a built-in mechanism for effacing all input dependence in course of time. This is somewhat similar to the retina, which triggers once a certain threshold of intensity is reached [70] and then it goes back to the initial state to receive a second input. In the human brain too, short-term memory exists together with long term ones, and while fully entangled devices described later may act as a memory holder, quasiclassical devices such as this may act as temporary “experience” devices where a certain experience depending on a threshold triggers an action before that “experience” is effaced. The fuzziness introduced by the stochasticity is similar to a non-quantitative experience that lingers for a short time, which may be compared with emotions evoked due to fuzzy ideas in the back of the mind. While any such mechanism in the human brain is still controversial, it is possible to mimic fuzzy short-term “feelings” triggering certain actions based on the intensity of a perception in a device that acts “by instinct”. However, whether such concepts need quantum indeterminism, and cannot be explained by classical stochasticity is a debatable question which we shall not go into. 76 Chapter 5 Quantum-gated Neural Networks 5.1 Introduction In this chapter we move one step further towards the creation of quantum artificial intelligence by introducing qubits and gates. While we observed the effects of stochasticity when a quasiclassical model was introduced in the last chapter, the introduction of qubits and gates can produce transformation of quantum states instead of firings of neurons so that a memory is represented by the states of qubits in a transformation space where rules are created by the connecting quantum gates. 77 This is similar to a cellular automaton, with the gates dictating the rules of transformation of a qubit. The rules we program into the quantum gated lattice, will however, still be inspired by the classical neural network model so that this gated artificial intelligence is still not a fully entangled device, but a classical neural network recreated with quantum technology. In a classical neural network, the plasticity of the weights [15] with which the neurons are connected to each other contribute to the learning process. Though classical learning is a one way irreversible process with permanent changes made to the hardware of the device, in a quantum device the unitarity of transformations would make the “learned states” reversible. Hence, learning in a quantum level cannot be realized in the same manner as classical learning and a collapse of the quantum system to a well-defined eigenstate might be necessary in order to have the results of the internal operations retrievable by coupled peripherals. However, the parallel processing of information within a quantum device promises the possibility of handling data with speeds unattainable in classical machines. 78 5.2 Hierarchical Structure in Quantum Machines The input data is first processed locally and sent to the central system in a form that best suits the analytic operations of the central nervous system (CNS) in biological nervous systems (see, e.g., [10]). This pre-processing helps partly to avoid overburdening the CNS, so that it can perform its operations with greater efficiency and less probability of chaotic interference. For example, in the eye, the primary signals from the retinal rods and cones are processed by bipolar, horizontal, amacrine and ganglion cells, which themselves form an intricate network within the eye, before the signal reaches the brain. In classical computers, most peripherals have their own “brains” for this preprocessing of data before sending it to other parts of the machine. Hence, it seems most probable that in quantum computing as well, the entire machine may have a hierarchical structure of smaller quantum networks. Each of these structures would perform more efficiently than the corresponding classical analogue because of the quantum nature. It would be extremely difficult to build a gigantic quantum machine as a single unit, since larger systems would undergo decoherence more easily and, hence, would be less manageable. 79 Research is carried out also to design electronic devices mimicking living nervous systems, with sensory organs like the retina and the cochlea [71, 72, 73, 74] as electronic perception devices. Similarly, the motor-control subunits of living systems have also been copied electronically [75, 76]. Hence, we may assume that future quantum intelligent machines, used either in information processing, or in decision-making, may have component subunits that operate quantum mechanically but are connected classically to exchange the processed information. Each of these components may be small enough to manage decoherence, and some of these peripheral units might be quantum networks. In the biological nervous system, input data is converted into a train of pulses (Fig. 5.1). This is achieved by a neural system where the action potential is triggered as long as above threshold input signals persist. Shortterm memory of the CNS is also created in the form of dynamic pulses within a subsystem that is more strongly interconnected. This connectivity is either genetic, or due to formation and strengthening of links coming from similar repetitive experiences. In the next section, we explain the nature and operation of the quantum gates. The quantum gates act on superposed quantum microstates called qubits and are the quantum analogues of logic gates used with electronic 80 +40 mV peaks 0 mV -60 mV resting potential repolarization overshoot time -> 2 ms Figure 5.1: A train of action potential pulses in a biological neural system. devices such as transistors. These gates have been experimentally produced [77] and their operations verified. We construct a network with such gates and qubits. This network can act as a component of future realistic quantum devices. To fully exploit the notions of computer devices and the existing quantum algorithms, the connectivity of the nodes of quantum devices need to be studied extensively. Although quantum gates and qubits have been used to design quantum algorithms to calculate specific mathematical expressions, quantum networks have not yet been designed properly to utilize quantum mechanics for the purpose of learning and pattern recognition. Altaisky [62] has made some preliminary investigations into a single quantum perceptron. As was stated 81 before, the irreversibility of the learning process is worth looking into when dealing with quantum machines that can only undergo reversible unitary transformation. Attempts have been made [78, 79] to explain such a change with decoherence at the output with reversal of the intermediate processes in a quantum computer. The transition to a certain eigenstate is inserted in an ad hoc manner for quantum neural networks, as in Altaisky’s work and also in that of Zak et al [47]. Here, we will into get into the complexities of decoherence, and retrieving information from a quantum neural network. Rather, we concentrate in examining the input dependence of such a quantum gated network to observe how the qubits react and form patterns. Some other authors [65, 66, 67, 68, 80] have tried to formulate the problem from different perspectives and comparison of our approach with theirs may be found in the cited references. Our study will concern mainly the nature of dynamic memory in these networks, and how they evolve. However, we review the basic concepts of qubits and gates first and then go into the mathematical modeling and simulation experiments. 82 5.3 Brief Review of Elements of a Quantum AI Machine: Qubits and Quantum Gates 5.3.1 Qubits A quantum machine is markedly different from a classical one because of quantum superposition, which allows each quantum bit or qubit to exist as a superposition of multiple possible states. Each of these eigenstates in general may represent a different value of the measurable quantity, i.e. if the quantity is measured in any experiment. In that case, the measured value must be one of the eigenvalues of the operators. However, quantum uncertainty makes it impossible to predict in advance which eigenvalue will be obtained for each particular experiment although an ensemble would give stochastic classical values corresponding to the probability of each of the states. The coefficients used to construct the linear superposition of eigenstates, which are in general complex numbers, represent the quantum amplitudes of particular eigenstates, that can be interpreted as the square root of the probability for a superposed quantum state to collapse into one of the many possibilities. The result of each of the measurements is only through interactions with classical devices that act as detectors, and are highly indeterministic, although the probability of each outcome depends on the squares of the prob83 ability. Several models exist to try to bridge the link between the quantum and the classical world [84, 85, 86, 87] However, although the quantum measurement process is an open problem in physics, experimental results predict that the quantum world and the classical rules co-exist in different limits, although the overlap between the two is still poorly understood. We keep the philosophical/speculative or developing models outside the scope of this dissertation, and stress on how quantum bits can be used to get classical results in an efficient way. In a quantum bit, with superposed states, if the unit is a particle with spin 12 , then quantum mechanics allows a measurement process to give only one of the two possibilities for the z-component – spin up, with sz = 1 , 2 or spin down, with sz = − 12 , although the state of the system before the measurement may be a superposition. Symbolically ¯ ¯ À À ¯ 1 ¯ 1 ¯ ¯ |ψi = c0 ¯− + c 1 ¯+ 2 2 (5.1) However, experiments indicate that at present it seems more likely that polarized light, with two spin projections or helicities, may turn out to be a better candidate as qubits than material particles with spin 1/2 such as the electron or the proton, because quantum gates already exist to operate on 84 photons. Until decoherence or measurements take place, collapsing the entire quantum network into one of the possible macrostates, the qubits are allowed to interact with one another and change the coefficients of the superposed states so that the probability of the system collapsing into a specific microstate dynamically changes because of the rules programmed into the gates to update these coefficient. 5.3.2 Quantum Gates Like classical logic gates, quantum gates also give a deterministic output from a number of inputs. Classical gates such as AND, OR, NOT, XOR etc. all have their quantum analogues. These gates effectively act on an output qubit to change its state based on the states of the input qubits. The gate may depend on more than one inputs and must be precisely positioned not to introduce any bias towards any of the inputs. Let us first consider a single input quantum gate, the NOT gate. It can be represented by the equation: UN OT (c1 |1i + c0 |0i) = c1 |0i + c0 |1i 85 (5.2) which corresponds to a spin flip for both the components. This gate performs a unitary transformation, a rotation in this case, in the Hilbert space of quantum vectors. Unitary operators keep the norm of a state vector unchanged, i.e. if we have the operator U operating on the usual state vector |ψi in the Hilbert space, and its Hermitian adjoint U † is the corresponding operator in the adjoint Hilbert space , where the mirror state vector is hψ|, then hψ|U † U |ψi = hψ|ψi (5.3) implying that the operators U and its Hermitian adjoint U † are related by U U † = U †U = 1 (5.4) Hence, the mirror of the operator U operating in the adjoint state vector space is simply its inverse operator and the norm of the vector given by the scalar product of the adjoint vector and the original vector, hψ|ψi, stays invariant. The conservation of the norm ensures the closure of the entire quantum system so that even the probability of a certain state changes, the total probability of finding the system in one of the possible eigenstates remain 86 the same. This means that the system will definitely collapse to one of the possible eigenstates. The novelty of quantum gates is that, unlike the classical case, there are quantum gates [16] representing “square root” of NOT, where the rotation is only halfway through for the two eigenstates. There are also phase transformation gates that selectively change the phase of one component of a superposition. In the case of classical gates, there is an important theoretical result, which has also been applied to construct many circuits. This states that all the different types of multi-bit gates, viz. AND, OR, NAND, NOR and XOR can be constructed from combinations of only one universal gate. This is the NAND gate. A similar theorem may hold for qubit operations or multi-qubit ones. The universal set of all unitary transformations may be represented by combinations of the controlled NOT or cNOT gate as described below, and two single qubit gates – one gate that changes the phase difference between the coefficients c1 and c0 in a particular ratio, and a Hadamard gate that mixes up the up and down components with different signs as represented by the matrix: 87 1 1 1 UH = √ 2 1 −1 . (5.5) The controlled NOT or cNOT gate flips a target qubit depending on whether or not a control qubit is in the on state or the off state. If the c-qubit is |1i, then it is capable of flipping the target (NOT). If the c-qubit is in the |0i state it does nothing. The c-bit itself is never changed by in the operation. Hence, it can be represented by the matrix in the product state space Hc N Ht of the control qubit and the target qubit, where the product eigenstates can be listed in a vector form as |0ic |0it , |0ic |1it , |1ic |0it and |1ic |1it . The matrix form of the cNOT operation, thus, may be given by 1 0 0 1 UcN OT = 0 0 0 0 0 0 0 1 (5.6) 0 0 −1 0 Here the first two product states remain unchanged as the control qubit is 0, and the next two product states are interchanged with the c-qubit remaining unchanged at 1, but the t-qubit flipping. For unitarity purposes, a phase change during flipping is needed, as can 88 be seen easily when real coefficients are used for the qubit superposition. If the control and target bits are |ci = a|0i + b|1i |ti = c|0i + d|1i (5.7) then, normalization of the states demands that |a|2 + |b|2 = 1 |c|2 + |d|2 = 1 (5.8) We get after the cNOT operation |c0 i = |ci |t0 i = (ac + bd)|0i + (ad − bc)|1i (5.9) and it can be checked with real coefficients that normalization of the target qubit is preserved. 89 Like a quantum computer, a quantum information storage machine is most likely to be built with quantum gates, and there may be appropriate combinations of Hadamard, phase change and cNOT gates to produce different types of processing, representing unitary transformations required in the algorithm. It is possible that in the future the arrangement of gates may be automatically controlled by the problem, and, hence, a quantum neural network may be more faithful in mimicking a biological brain than a classical neural net. The only firmware in the latter are the EEPROMs (Electrically Erasable Programmable Read-Only Memory) used in applications such as the BIOS, and though the programming of the EEPROMs is relatively straightforward, it is also very time-consuming. Additionally, for the quantum machine, the signals acting on the gates are at the quantum level and are hence at the speed of light, leading to very fast data processing. The high connectivity of neurons within the brain (where one neuron can reach to 10000 other neurons [88]) also predicts that quantum machines might be able to mimic the biological neural nets better than electronic ones. Quantum gates acting on photon nodes may have multiple connectivity, on account of the possibility of using coherent extended wave functions, while in classical electronic gates, connectivity is usually limited to a few inputs and outputs. 90 However, we refrain from speculating further here and concentrate on the possibility of designing a simple system based on fixed links between the nodes. Our studies in the later sections would indicate the major differences a quantum network would have with a classical network. These results may later be used when more complex network systems are created. Particularly, we would like to examine how static input data is transformed into dynamic memory within a quantum network. This property is observed in biological networks and such results obtained from quantum networks will predict the feasibility of designing quantum networks as more faithful imitations of the living brain since electronic neural networks have so far not been able to reproduce this characteristic. The so-called “dynamic memory” in the form of RAM retains the data in static patterns as well, even though it is refreshed regularly. However, short-term biological memory [89], which is often used in quick decision-making is dynamic and is expressed as an expanded pattern of oscillations in the network. We now examine the behavior of a simple quantum network where the qubits are connected with quantum gates to examine future complex quasibiological quantum networks. 91 5.4 The Quantum Neural Network Model As was seen previously, in the I&F model a neuron receives a current from the fired neighbors, and when its own potential exceeds the threshold, it fires as well, feeding its own neighbors. Since in a quantum process all transitions of the neurons must be designated by unitary operators, in place of the firing of a neuron, we have a less spectacular unitary transformation that simply performs a rotation of the state vector or the qubit. This operation should involve time too in principle, and thus we write: |ti = U (t, t0 )|0i (5.10) to indicate the transformation of a neuron from time to to time t. For small time changes it is possible to write: U (t + dt, t) = U (t, t) + idtH (5.11) d|ti = idtH|ti (5.12) So that in the lowest order, with a Hermitian operator H, the Hamiltonian. 92 As stated earlier, in quantum computing a complete set of unitary operators may make use of Hadamard gates, phase change gates or controlled NOT (cN OT ) gates. Entanglement between different nodes may be manipulated by using these gates, e.g. the cN OT or the Toffoli gate, which can function as an adder. For simple biologically inspired quantum machines, it is not necessary that the entire system is entangled. At the lowest non-trivial level, it is possible to have pairwise entanglement. However, after successive operations, the entanglement may spread to the entire network. This bears similarity with obtaining a dense matrix from the multiplication of a large number of sparse matrices with nonzero elements at different positions. The basic postulates, dictating the rules to be designed into the gates are as follows: 1. each qubit represents a neuron; 2. an excited neuron |1i will turn on a neighbor in a ground state |0i, i.e. flip it to a state of |1i 3. an excited state will make an excited neighbor “fire” and flip back to |0i [induced emission] ; 4. the excited state itself will go down to the ground state |0i in the process 93 5. an unexcited neuron stays inert with no effect to its neighbors or itself the designing of the gates based on these “rules” can be done as follows: Postulates 2, 3 and 5 by adding c-NOT gates to each “operating” neuron and its neighbor so that the state of the operator serves as a condition for flipping the neighbor when needed. There are four neighbors for each neuron in the square lattice we consider. Hence, in place of cN OT gates we shall need cN OT 4 gates, where one controller would be flipping all four neighbors if it is in state |1i and do nothing if it is in state |0i. An AN D gate connecting every neuron with a common |0i state after the cN OT 4 gate can satisfy postulate 4. The cN OT gate can be represented by 1 U = 0 0 iσ2 (5.13) for any particular neighbor where σ2 is the flipping Pauli matrix with a phase change, needed to preserve normalization of the target qubit, as explained in the previously. Eqn. 4.10 represents a unitary operator. However, we connect the cN OT matrices using a weight factor, ², to represent the strength with which a neuron can affect its neighbor. This weight factor differs from the weight factors in classical neural networks, which must sum up to a 94 specific normalized value. Instead, this quantum net weight acts as just a measure of the strength with which the neurons are able to affect the neighbors. Another Hermitian operator in the qubit space can represent the .AN D.|0i operation: 0 0 U0 = (5.14) 0 1 Here the sequence of states in the rows and columns are, as usual |1i|1i, |1i|0i, |0i|1i and |0i|0i, the first being the controlling state. The AN D operator used here is not unitary, although it is Hermitian, as it is a projection operator. It is the nonunitarity of the AN D operator that is responsible for the collapse of the state to the ground state after it has reached a threshold. This is not the ideal quantum computer operation, where all gates should be unitary, but is a hybrid of unitary connections among neurons and a nonunitary collapse. This mix satisfies two much desired characteristics of this network: the introduction of unitary rotations of the neurons in the qubit space, making our device fast and efficient, and also ensuring collapse to the ground state by the AN D operator. This “collapse”, however, simply restarts the rotational activity of a neuron when connected to an excited 95 neighbor, and, hence, is not quite the same as decoherence. Here, states do not simply collapse probabilistically to one of components of the superposed states and delete all memory but transitions take place to a specified state and the timing of the collapse holds information about when that specific threshold was reached to enable a time sequence pattern. Finally, to introduce at each node the controlling qubits remaining unchanged due to its own action, we formulate: c c → s (5.15) s and to formulate the change for the neighbors receiving signal from it, i.e. operated on by the cN OT 4 gates, we have: 0 0 0 0 0 c c −s .c → + ² s 0 s c .c (5.16) As the small ² approximation of the unitary operators are not themselves unitary, it is necessary to renormalize each qubit at each step in simulation. 96 5.5 Results of Simulation For ease of comparison, we construct a 40X40 network of qubits with periodic boundary conditions similar to the simulations we carried out for the classical case [59], so that it behaves in certain ways as an infinite lattice. Although it is very difficult to create an experimentally viable fully entangled large network because of issues like complexity, a theoretical study of such systems is unavoidable. We feed the peripheral neurons to feed data into the system and make the inside neurons either all random or all zero [(0, 1)] . The qubits are updated according to Eqn. 4.26 and 4.27. A large number (40, 000) of time steps are chosen for various ² values. This parameter, representing the strength of coupling, occurs together with dt, and, hence, may also indicate the width of the pulse at each time step if we compare with the classical neural network model. We have assumed that though the quantum gates may eventually flip a state, the process does not take place instantly, but proceeds through usual continuous time development through an appropriate Hamiltonian, which can be broken up into smaller bits of discrete transformations with small time intervals. This assumption is not only legitimate, but obligatory in the context of quantum dynamics, and highly desirable in satisfying our main 97 objective of comparing the quantum case to the classical. The following observations were made for different runs: In the first model each neuron was allowed to go up to its top qubit value of (1, 0) with c = +1 or −1, before it “fired”, i.e. came down. It was observed, interestingly, that in this case no well-defined periodicity existed, either for any single neuron or for the correlation between nodes in the network (Figs. 5.2, 5.4). Though all neurons do indeed go through the (1, 0) to (0, 1) cycles, the oscillations appeared aperiodic. An explanation could be that the exact equation of motion coupling the nearest neighbor neurons becomes insoluble in terms of periodic functions. However, it should be noted that unlike a biological action potential voltage, the time series plotted in Fig. 5.2 and Fig. 5.4 are not any directly measurable physical quantity , but are quantum amplitudes. As stated before, the modulus squared of the c-part of the qubit shown in the plots corresponds to the probability of observing the pure state |1i at the given time step shown along the x-axis. Similarly, in Fig. 5.3 and Fig. 5.5 the quantum correlation amplitude was shown, whose modulus squared gives the varying probability of observing the indicated physical subsystems (qubits) in the same eigenstate as time progresses. A c-amplitude of +1 or −1 for a qubit component indicate unit probability, i.e. certainty, as the state is then 98 a pure |1i eigenstate, and not a superposition of eigenstates. In that case, at the corresponding time steps, measurements on the system would retain the purity of the state. Measurements carried out at other times, with a c-amplitude whose modulus is less than unity, would have a |0i component, and, hence, may give either 0 or 1 when measured. However, after collapse, the subsystem becomes a pure state. Similarly, a correlation amplitude of both 1 and −1 indicate a perfect correlation (both qubits giving 0 or both 1, when measured together). On the other hand, a correlation amplitude of 0 indicates a perfect anti-correlation, the two qubits always giving opposite single neuron c values on simultaneous measurement. 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 19.9 19.95 time 20 Figure 5.2: Oscillations of the c part of a qubit for a non-cutoff model with ² = 0.01. This is the quantum amplitude for the state |1i. The modulus squared of this quantity gives the probability for a particular measurement to find the state to be |1i, though individual experiments may give either 0 or 1. 99 900 890 sum c_i 880 870 860 850 840 830 820 19.75 19.8 19.85 19.9 time 19.95 Figure 5.3: Oscillation of the c parts summed of all qubits in the network for cthresh = 0.7, ² = 0.01. All boundaries excited initially. A slightly different version of the model of the network more akin to the classical one was also experimented with. Here a threshold for the excited part of the neuron was introduced. When crossed, this caused the qubit to jump to the ground state (0, 1), i.e. if c > cthresh , then (c, s) make a transition to the (0, 1) state. If the quantization axes of the nodes and of the AND gate are not the same, the corresponding rotation matrix would make a (cs , ss ) qubit look like a (1, 0) to the gate, making it drop to the ground state prematurely. More interesting results were found with this latter model. periodic oscillations were seen in the system, with all neurons in the same phase. The √ threshold was made 0.7, which is just below 1/ 2, corresponding to equal 100 mixing of (1,0) and (0,1) and turned out to be the critical threshold that gives regular oscillations. Mathematically, this critical behavior may exist because the cut-off effectively serves to truncate the complicated coupled behavior of the system at this value, reducing it to a simpler periodic system, just as the truncation of a transcendental function by a polynomial with a finite number of terms provides it with a simpler behavior. The function: F (t) = cos(θ + sin(²θ)) (5.17) with θ = ωt , for example, assumes the periodic form F (t) = cos[(1 + ²)ωt] (5.18) for ²θ ¿ 1 only, but has a more complicated behavior when this condition is not satisfied. This phenomenon needs to be studied further. However, the other reason for this behavior might be that the rate at which the AN D operator acted was different from the rate of the cN OT operator acting on the system, and the coupling of the two incongruous frequencies yielded a complex pattern. We note that these oscillations are seen in the behavior of a single neuron 101 (Fig. 5.4), or the sum of all neurons of the system, or even in the correlation <i|j> hi|ji between neuron |ii and neuron |ji (Fig. 5.5). 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 19.9 19.95 time 20 Figure 5.4: Correlation between two qubits for no threshold case with ² = 0.01, h10, 10|20, 21i, where the qubits are located by their (x, y) coordinates in the lattice. Another interesting observation is that, for large ² (> 0.7), if we put the initial signal only at two parallel sides of the square, oscillations do not appear, but a static asymptotic state is quickly reached, whereas if we put the signal on all four sides, we get periodic oscillations with changed frequency. The lack of constraints in the orthogonal directions might be responsible for allowing the pattern to settle down to a static state in the first case. This is similar to the one-dimensional Ising model having a trivial phase transition. When signals arrive from both x and y-directions, the attractor for the system becomes dynamic, as it cannot find a stable equilibrium when 102 1 0.95 <i|j> 0.9 0.85 0.8 0.75 0.7 19.75 19.8 19.85 19.9 time 19.95 Figure 5.5: Correlation h10, 10|20, 21i for the above case. it tries to adjust in both directions. 5.6 Discussion In this chapter we have shown that a quantum neural network similar to the biological integrate and fire neuron network can be constructed with qubit nodes connected by cN OT and AN D gates. We have noted that when no threshold is imposed, the system converges to a dynamic state with no fixed period and no phase locking, apparently similar to a chaotic system, but with an average non-chaotic behavior. Dynamic behavior emerges when a threshold is set for the qubits to “collapse” to the ground state. The period is almost inversely proportional to 103 0.06 0.05 period 0.04 0.03 0.02 0.01 0 0.05 0.1 0.15 epsilon 0.2 0.25 Figure 5.6: Variation of periodicity with ² for excitations from all sides. the coupling strength (Fig. 5.6), but is nonlinear for strong coupling. If the coupling is reasonably strong, and the initial excitations are only in one direction, the system seems to converge rapidly to a static attractor. However, with excitations from both directions of the square lattice, dynamic oscillations are observed The correlation between neurons as measured by the overlap of the two qubits, hi|ji also shows periodic time dependence when the oscillations have fixed periods. Interestingly, we have found that although these models are constructed as simple lattices, they can hold dynamic memories of the input indefinitely. The pattern generated may be more interesting when complex phases are introduced in the coupling, and also if a dynamic external agent is affecting 104 the peripheral neurons, and not just an initial input. Delay lines can also be placed between the neurons to introduce a time scale. These results further point to the similarities between biological networks and these quantum nets, as in the neural network of a biological entity too the complexity of the dynamical behavior shows almost chaotic patterns, but in specific stages of alertness, as is known for human beings, the network shows average periodicities such as the alpha, beta, gamma and delta waves. Evolution always seeks the most efficient biological systems that can survive well and has produced such a system where the dynamical signal can damp out. Adjusting the threshold allows us to mimic such behavior in our system as well. The inbuilt erasing mechanism may be advantageous for a quantum AI that may be able to refresh its memory after one set is exchanged with peripheral devices. Hence, future engineering developments in AI that want to follow the biological path (see for example [90, 91] for electronic attempts) may find it useful to study quantum neural networks that are able to mimic the biological brain in many ways. Recent studies show [92] that epileptic seizures in the human CNS is preceded by a loss of normally present chaotic firings, which indicates the possibility of large classical deterministic AI systems going into phase lock freezing. Quantum networks can avoid this automatically on account of their stochastic nature. 105 The ability of bio-systems to handle ill-posed data in a more effective way also evokes the belief that the quantum gated machines with their inherent probabilistic nature may achieve similar robustness in processing real-life situations with incomplete complex data. 106 Chapter 6 Fully-entangled Neural Networks Model 6.1 Introduction In this chapter, we exploit the full advantage of quantum entanglement and try to construct an effective storage system based on the large capacity promised by quantum superposition. In the last chapter, we only presented pairwise entanglement in the first order. We now discuss some results with a more complex model of a completely entangled quantum network on a computable scale. We construct the simplest nontrivial quantum network connected with gates and perform simulation experiments. Networks con107 structed with ordinary c-NOT gates need considerably more computing time in simulation than ones where modified c0 − N OT form of the gates are used . We also show that the periodicity found in the case of the pairwise entangled quantum nets cannot be found in a fully entangled network. Finally, we show how ab initio periodicity may be introduced by hand to make this fully entangled network more similar to a biological memory device. 6.2 An Entangled Quantum Network Model We start with a nXn lattice with usual periodic boundary conditions, so that it can effectively mimic a bigger lattice. We now have N = nXn independent nodes, each of which is a quantum qubit, e.g. a spin-1/2 object. These nodes are individually connected to its neighbors with c-NOT gates. This simple design would allow us to study the general behavior of an entangled quantum network that is not built to handle a specific task. Hence, we will be studying how the degree of robustness of a quantum network connected with gates that allows full entanglement fares when input data is fed to it. The entire system at any time can be represented by the state: |ψi = X aI |ψI i I 108 (6.1) The complete set of unentangled product basis space includes all possible combinations of: |q1 q2 ....qN i (6.2) where N = n2 . Each of the qubits can be either in state |1i or in state |0i here. Initially we may have a pure state with only one aI = 1, and all others zero. However, as the entanglement is allowed to proceed through the cNOT gates between the nodes, all or a subclass of states may be expected to become entangled. By choosing a non-factorizable superposition of the product states, we can also choose an entangled initial state. The effect of each gate on every aI need to be considered carefully at each time step. The c-NOT gate will flip the controlled qubit if the controlling qubit is in state |1i, while doing nothing if the controller is in state |0i. The controller node is unchanged. Each node needs to be taken in turn and to consider the effect on all aI as the neighbors of this node act on this node in a procedure is similar to that described in the earlier unentangled version. We first do the flipping in a continuous manner by choosing for each time step dt the transition submatrix for a small change as we had done in the 109 previous case, and get 1 0 0 1 1 0 A= = 0 0 0 exp(i²σ1 dt) 0 0 0 0 1 0 0 i²dt i²dt (6.3) 1 However, since the previous model did not consider complete entanglement of the net, but only local pairwise entanglement, the state space had only 2n2 components (i.e. polynomial in n). In the case of complete entanglement, the possible state space becomes exponential. It is impossible to handle a large lattice like 40X40 when it is entirely entangled. Hence, in this work we have considered the smallest nontrivial lattice, i.e. a 3X3 lattice. As we have stated earlier, this is in some respects similar to an infinite lattice because of periodic boundary conditions,. Even if we consider a 4X4 lattice, we have 216 product states to upgrade at each step. This is not computationally economic with a classical computer We have first linearized the 2 − d label of each qubit (i, j) to a single 1 − d label by choosing a sequence to optimize computing time, and then we have constructed our label I (stated above) by simply taking the sum: 110 I= X 2i (6.4) i Here the sum is over only those qubits for which the state is |1i and i is the linear sequential position label of the qubit ranging from 0 to N. This allows us to ascertain the state of any qubit in a particular position i with a single bitwise AND (&) operation and, hence, speeds up the simulation process. This also permits us to create on any initial state for the simulation, pure or entangled, by choosing the right combination of I’s. 6.3 Periodic and Aperiodic Regimes Before starting the simulation, we point out a behavior that can be anticipated from purely theoretical considerations for any quantum net on which a particular sequence of unitary operators work repeatedly. Lemma: It is not possible to move any system from an aperiodic regime to a periodic one by a repeated sequence of unitary operators. Proof: The product of any given sequence of unitary operators Ui is equivalent to a single unitary operator, say U . Let |ii be a vector in the orbit of U in the periodic regime. Now if we operate on |ii by 111 U † = U −1 (6.5) U −1 |ii = |ji Here |ji must be on the orbit. However, if the state was reached from the aperiodic regime, then we must also have, with the same inverse operation, a transition to a state in the aperiodic regime. This is impossible, because U and its inverse are both linear operators in usual quantum mechanics, and must give unique results whichever state they operate on. Hence, we cannot get any transition to a periodic system in our simulation. 6.4 Simulation Results and a Modified Gate There is only one interior point for a 3X3 lattice. So, if we initially excite the whole periphery we have in the beginning of the simulation. a495 = 1.0 112 (6.6) As 495 = 1111011112 (6.7) We first deliberately used a nonunitary series of operations with ² imaginary in order to emphasize the importance of unitarity in these operations. In reality this may be possible in a semi-open system with some leakage to a classical thermal environment which acts as a temporal damping factor. Our simulations indicate that even if we begin with a pure state, the nodes get entangled quickly after only a few steps. However, after a sufficiently long √ time the system degenerates to a uniform state with all ai = 1/ N . This can be attributed to the operator A above, which tries to form a continuous c-NOT gate in place of the unitary discrete one: 1 C= 0 0 σ1 (6.8) which, using the imaginary parameter ², becomes a nonunitary one. Hence, unlike in the case of a unitary c-NOT operator, the eigenvalues of this discrete version with an imaginary parameter do not all have modulus 1. Therefore, the eigenstate with the highest eigenvalue emerges, as is usually the case for multiple operations of a nonunitary operator. However, since the full matrix even for the 3X3 net must be 512X512, this cannot be checked easily 113 computationally or analytically. It can be argued from symmetry that the highest eigenstate must be the symmetric one, though we are unaware of any mathematical theorem justifying this hypothesis. To minimize computational expenses, we shall adhere to real matrices. We next construct the infinitesimal form of a unitary matrix representation with a c0 − N OT gate defined as a quantum gate that reversing the phases of the flipped infinitesimal changed coefficients: 1 0 0 1 0 0 U = exp(iHdt) = 1 + i²dt = 0 0 0 σ2 0 0 1 0 0 −²dt 0 0 ²dt (6.9) 1 Here H is a hermitian Hamiltonian, making U unitary. We start with pure initial states and note the biggest components after 1000 time loops. |27i → 0.71(|27i) + 0.21(|11i + |19i + |25i + |26i) where we have omitted the smaller order terms. 114 (6.10) It can be noticed immediately that the initial state has retained its dominance even after 1000 time steps, and also that the next to leading states are separated from it by just a single 1 in a neighboring qubit. If we choose another single state |17i = |1 + 16i i.e. the one with only a corner and the middle qubit of the lattice initially excited, the final state is (highest amplitude terms): |17i → 0.27(|27i) + 0.23(|19i + |25i) (6.11) It can be noted that although this time the original state has disappeared from the list of dominant states finally, the terms now dominating do not have a clear choice, and with the AND operation on the bits of the dominating ones we get back the initial |17i ! (10001) = (11011)&(10011)&(11001) 6.5 (6.12) Creation and Detection of Entangled States It was assumed in the simulation above that the system acquired an entangled state as input. Rabitz et al [93] have presented a method of obtaining superposition of states from a ground state in a molecular system. In a 115 similar spirit we here show how it may be possible to create arbitrary combinations of states in our model in a general quantum network. We consider a matrix, which we call an “extended unitary matrix”, as given below: aR U (R, x ) = b|x0 ihn + 1| 0 0 |n + 1ihn + 1| (6.13) The normalization factor is |a|2 + |b|2 = 1 (6.14) This operator matrix acts on the (n + 1)-dimensional basis where the last vector is |n + 1i an auxiliary vector not related to the n-dimensional entangled vector space. Then, given any state |xi, we get the normalized new state |x00 i = aR|xi + b|x0 i (6.15) with R a nXn unitary operator that gives a vector orthogonal to the new vector |x0 i to be superposed. This generalized unitary operator separately maintains the length of the n-dimensional vector of the active system and the single auxiliary vector in the extended (n + 1)-dimensional vector space 116 , which may be an additional dummy component of the system. As was stated before, a complete set of quantum gates can [94] simulate any unitary operator. the arguments can be trivially extended to produce our extended unitary operator with such gates too. Hence, at least in theory, a physical realization is not an insurmountable problem. Despite being superpositions, the entangled states are pure states. So, in principle the detection of the entangled states is no more complicated than that of single states in the usual basis of the product basis of single spins, given that we rotate the basis to a one that has the chosen vector as a basis vector. Grover’s search procedure [95] can be used to detect the presence of any particular state. Alternatively, filtering matrices that perform the same operation directly in the original basis by adapting the sign-reversing and diffusion matrices of Grover can be determined for the superposed state taking the appropriate linear transformations. As originally proposed by Deutsch [96], the detection process in quantum computation is of course only stochastic. 117 6.6 Discussion We have shown that networks with c0 -NOT gates indeed get fully entangled in general, with a smearing of the excitation from the initially excited nodes to its neighbors, as expected. However, the memory of the put state seems non-trivial, depending on whether it was pure or entangled. However, the final state can easily be back projected to find the initial state. How low noise can be filtered out of a system like this can be an interesting problem. A possible investigation could also include whether a quantum network can be used to filter separable and entangled state in methods similar to or different from those proposed recently by Doherty et al [97]. We have proved that unlike a classical network, a quantum one cannot move to a region of dynamic phase locked oscillations if the input is static. However, periodic dynamic behavior may be injected into the system at will by choosing the right operator, i.e. a suitable unitary operator that rotates some or all states with time. Choosing the appropriate connectivity among the nodes may achieve this, and remains to be studied in detail. Identical periods may be assigned to subclasses to use such a net for pattern identification over a huge database created by the entire set of separable and entangled states. However, the largest advantage of an entangled net is the huge storage capacity that comes with the idea of superposition. 118 6.7 6.7.1 Patterns in Entangled Quantum States Qubit Pattern Generation/Representation In the previous chapters, we have discussed quantum networks performing like artificial intelligence and responding to inputs or holding patterns. Here, we will look into the concept of quantum patterns in more detail. Most of this discussion follows work presented elsewhere [98]. A single quantum vector is capable of representing any pattern expressed by means of a finite sequence of numbers. Therefore, pattern recognition is dependent on handling such state vectors. Quantum nodes are used in these new types of neural networks so that coherence is not lost and the information is retained. Hadamard gates [16]defined by the matrix 1 1 √ H = 1/ 2 1 −1 (6.16) can produce entanglement among pure states, with a resultant non-factorizable state-space. This gate along can flip a |0i state through an angle of π/4 when acting on a 2-D (|0i , |1i) space. This can also rotate a |1i state through 3π/4. For a more general qubit state, 119 α α+β √ = 1/ 2 H β α−β (6.17) a superposition of the tensor product of the |0i and |1i bits of the qubits , in factorizable or nonfactorizable entangled forms, can be produced by a series of Hadamard gates working in parallel. This would imply that, these gates alone can create any state from the |0000...i state. Hence, any pattern recognition in the form of qubits can be done by using these gates. It is the relevant Hadamard gates that would dictate the exact patterns of the states. These states can be converted into data streams by means of photons. We had previously [58] constructed a Hopfield-Herz type [12] type neural network using cNOT type of gates. It was seen that even if a network started off as the |0000...i state vector for the whole net, soon the qubits get entangled. Simple inputs can thus be transformed into complex patterns in these devices. 6.7.2 Qubit Pattern Recognition Grover’s algorithms allows for a certain state to be detected in an entangled N state system in a method much faster than the classical analogous search 120 algorithm needing N 2 steps. This is a probabilistic success that can be made to be close to unit probability by increasing the number of operations, but keeping it much less than the classical threshold. We consider a simple problem involving a two dimensional lattice. Each point (i, j) on the lattice may indicate a pixel. 2 qubits can express a 16(= 24 )-level intensity |p, i, ji = c00 |0i > |0i + . . . + c33 |1i|1i (6.18) Here |cij |2 indicates the probability of the intensity level denoted by (i, j). 1 is the certainty of a level , and 0 is the total absence of it. We use a k as an index to indicate the qubit label with respect to position, defining its range from 0 to log2 N , where N is the total number of pixels. If we have a pattern by four adjoining pixels taken to be the vertices of a square, with intensities A, B, C and D in clockwise orientation, the quantum state can be defined with |k, P i = |k, Ai|l, Bi|m, Ci|n, Di (6.19) The letters k, l, m, n indicate the qubit serial index values for the four pixels. 121 By imposing translational symmetry to the pattern, we form the combination |P i >= N X |k, P > Y [r 6= (k, l, m, n)]|r, Xi (6.20) k Where N is the normalization constant and the product over the other qubits of the lattice are of the indifferent type: |Xi = 1/2[|1i|1i + |1i|0i + |0i|1i + |0i|0i] (6.21) By imposing a further constraint of orientation invariance, the state vector is symmetrized with appropriate interchanges of k, l, m and n . Hence, if the pattern recognition device uses the operator, O(P ) = |1 >D < P | (6.22) then its operation on the test object will give the output state |1iD . A macroscopic detector can be coupled to this to detect the pattern. If the pattern is rigid, with well-defined intensity levels, each corresponding to a unique set of qubits, the operator O will also be well-defined. However, if we want to categorize the objects in larger classes, e.g. into a range of levels, then the detection operator O must consist of linear superpositions 122 of all the allowed states in that given range. This will give a nonzero scalar product for any object in that class. It might be practically impossible to obtain a detector of the type |1iD h1|P , and one might have to use algorithms similar to Grover’s to search for the given pattern. The robustness of data becomes moot in that respect. In the last chapter, when we performed [58] simulation experiments on a small cNOT gated network, it was seen that even after many thousands of interactions among the qubits, which produced new entanglements among states, the dominant states were still reminiscent of the original data. 6.8 Learning Quantum Patterns Classical neural networks show Per Bak type of learning [104], and are fairly effective. We can train the network to recognize combinations of inputs as pre-assigned patterns. This is achieved by using an input, an output and an intermediate layer of neurons, and then by increasing the weights in neuron paths from the set of inputs input to the output, and also decreasing the weights for failed paths. All possible paths are connected with equal weights in a regular network. Only a subset of paths are carefully chosen in a small 123 world network [99, 100], whereas the connectivity is random in a random network. If the number of initial input neurons is ni , the number of output neurons is no and that of intermediate neurons is nm , to obtain a reliable recognition ability after a finite number of trials, one needs nm ≥ ni .no (6.23) If qubits are used instead of classical neurons, and the connecting paths are replaced by quantum gates, a similar structure may be constructed: We show one possible form of a quantum analog of the Per Bak [104]training machine Fig. 6.1. The contributions of the qubits to the next stage are assumed to be proportional to the co-efficient of the |1i part of the qubit, e.g. c1 of Eqn. 5.1. The gates are used to rotate the qubit contributions. The gates on the paths rotate the input from the previous stage incrementally towards the |1i part when a reward is promised, whereas punishment for failures promotes rotation of the signals in the paths towards |0i. This arrangement should be able to train a quantum network in a manner similar to classical training. However, larger capacity of the qubits may make the number of qubits needed less than in the classical case. We may also need to modify the relation among the nodes (Eqn. 8.47). In a different context, quantum learning was explored [98] by utilizing the 124 i input qubits gates intermediate qubits gates output qubits Figure 6.1: Quantum network for Bak type training for pattern recognition. The intermediate and final qubits are shown integrated with OR gates to sum the contributions of the qubits connected behind. The circular gates are rotation gates. The curved feedback information paths that control the gates’ rotations are shown only for two gates for clarity. nonlinear switching action found in classical learning in the quantum domain. There, a Sigmoid-type curve found in biological systems, was reproduced by co-operative quantum devices. 125 Chapter 7 Generalization of Entropy 7.1 Introduction As we have seen above, neural networks are not necessarily deterministic. The input and the output are not always uniquely related. The loss of information may be inherent in the design of the system, or it may result from interactions with an environment that introduces random noise, making the input-output mapping probabilistic. This leads to the relevance of the concept of entropy in a neural network system. However, the cross-talk may be random only in part, and there may be systemic biases. Hence, the Boltzmann-Gibbs type of statistical mechanics based on randomly interacting systems may not be the most relevant for dealing with the states of such 126 ensembles. In a system with a large number of constituents, entropy measures randomness, and is maximum when the system can be in maximum possible states with equal probability randomly, and is minimum when the system is frozen into one state with no uncertainty. Although all forms of entropy share these to end-point definitions, the functional forms may vary in between ( [105]). Keeping energy or some other conserved quantity a constant, the different entropies look into different possibilities of probability distributions within the system. When the total entropy of the system can be expressed as the sum of the entropies of its subsystems, it is extensive, and the most common example of such an entropy is Shannon’s entropy. Renyi entropy ( [17]) is also extensive, but is different from Shannon. Recently, Tsallis entropy ( [18, 106, 107]) has attracted a lot of attention not only for its conceptual and theoretical novelty, but also because it can be shown in specific physical cases ( [108, 109, 110, 111, 112, 113]) to be the relevant form where interactions among the subunits of a system give rise to nonextensivity. Shannon entropy can be retrieved from Tsallis entropy when proper limits are taken, indicating the consistency of the concept. In this chapter, we introduce the concept of entropy from yet another viewpoint. This too will bear resemblance to the Shannon form in the limit. 127 In the first part of this chapter, we derive the rationale for this entropy, and then we compare it with some known forms of entropy. In the end we will find the probability distribution of this entropy, from now on mentioned as the s-entropy. As we shall see, this new definition of entropy is closely related to rescaling the phase space. The normalization of the probability distribution function of a system depends on its free energy and controls the macroscopic properties of the ensemble. Hence, our first step here would be to find a method to obtain the free energy as a function of temperature, and then apply it to a physical simple system. We shall do it for Tsallis entropy and also our newer form of entropy. Then we shall find the specific heat for the system and study how that changes as the parameters are tuned. 7.2 Entropy of a Neural Network A completely deterministic network would have a one-to-one mapping between input states and output-states. So, we have, say, Ni different inputs and an equal number of outputs. The output may be in the form of a periodicity (frequency) or a final static pattern in the network that is measurable. One can also have a continuous version of the one-to-one mapping in a func- 128 tional form: xin → yout = F (xin ) (7.1) with F and F −1 single valued functions in the domain of interest. So we have the probabilities pio = 1 for the corresponding input and output states, and pif 0 = 0, f 0 6= o for non-matching pairs. Hence, conventional Shannon entropy for each given input state i is Si = − X pif log pif = 0 (7.2) f Even if we have a probabilistic distribution of input signals, the weighted average over Si would be zero as it is zero for each i. An interesting point to note for a dynamical system is that the output state o may be a function of time, but may be deterministic if the time of measurement is known. If the time is unknown or the changes are too fast for the measuring system to capture the state at a known time, then we shall have a probabilistic outcome and a nonzero entropy. In general our definition in Eqn. 7.2 may then be used with the appropriate pdf. If the mapping by the network is many-to-one, the network performs pattern recognition, or a generalization, or classification. With dynamical evo- 129 lution of a deterministic kind, we shall have clearly marked attractor basins formed by sets of input states i from where the system converges to static final states o(i), or changes continuously in time in a predictable fashion with o = o(i, t), making the entropy zero again: S{i} = − P {i}o p{i}o log p{i}o = 0. However, if the design of the network is inadequate, e.g. if the number of intermediate processing neurons are insufficient in number, irrespective of the number of training cycles the outcome will always have a degree of uncertainty [114], and we shall have a non-δ pdf, leading to a nonzero entropy: S=− X p(f |i)p(i) log[p(f |i)p(i)] (7.3) if with each (if ) pair forming a single pair-index, and the p(i) normalized to unity. If the final state is indicated by a frequency ω for a dynamical network, we can rewrite this expression for entropy, replacing f by ω, and using an integration over the range of frequencies corresponding to a given input state i: Z Si = − S=− XZ dωpi (ω) log pi (ω) dωp(ω|i)pi log[p(ω|i)pi ] i 130 (7.4) with appropriate weighting and normalization over the initial states. However, both in biological and AI neural networks the pattern of interference or the inadequacy of design may produce effects different from a perfectly random ensemble of modules, opening up the possibility of using nonextensive entropies, where the sum of entropies of subunits may be different from the entropy of the combined system because of the non-random nature of the interactions. 7.3 Defining the New Entropy We formulate the problem by using definitions from information theory, as used in Shannon’s coding theorem. Let us consider a register of only one letter. Let pi be the set of probabilities for each of the N letters Ai that can occupy this position. Though we are here using the language of information theory, it is trivially extensible to states i of a single state of an ensemble where the individual systems can be in any N states with probabilities pi . We now consider a small deformation of the (single cell) register to a new size so that it can accommodate q = 1+∆q letters. The new probability that the whole phase space is occupied by the letter previously associated with pi is now pqi by the corresponding AND operation. Hence, the probability that 131 the new deformed cell is occupied by any of the pure letters Ai is N (q) = X q pi (7.5) i For q > 1 this would be less than the original total probability of unity for q = 1. The shortfall, which we denote by M (q) = 1 − X q pi (7.6) i represents the total probability that the mixed cell has a mixture of Ai and some other Aj fractionally, since the total probability that the cell is occupied by one or more (fractional included) letters must sum to one. Hence, the mixing probability M (q) is introduced by the disorder created by increasing the cell scale from unity to 1 + ∆q. The fractional values of cell numbers can be understood in the same spirit as defining the fractal (Hausdorff) dimensions of dynamical attractors ( [115, 116]), and in complex systems. Studies of diffusion ( [117, 118, 119, 120]) and percolation in complex systems with effectively fractional dimensions for fluids have taken place, where the special geometric constraints translate into a change in the dimension of the corresponding space to an apparently nonintuitive fractional dimension. We come across Huffmann coding in coding 132 theories for optimal transmission of information ( [16]), where the optimum alphabet size may be formally a fraction, though it is changed to the nearest higher integer for practical purposes. We may therefore consider a fractional size of the register, or equivalently, an integral number of cells in the register with fractional sized cells to accommodate a given amount of information in probabilistic optimization. Replacing deterministic parameterization with probabilistic optimization becomes inevitable in classical Shannon information theory ( [16]) in quantum computing contexts, and, hence, our use of the fractional cell sizes may be a classical precursor of the departure from stringent Shannon-type concepts. Another variety of parameter-dependent entropy and probability distribution has also been studied by [121, 122]. The entropy for alphabet of m can be defined from the information content of the register by mS(q)∆q = m(M (q+∆q)−M (q)) (7.7) Hence, the entropy indicates an effective change in the mixing probability due to an infinitesimal change in the cell-size of the register. This gives S(q) = dM (q)/dq 133 (7.8) or equivalently, S(q) = − X q pi log pi (7.9) i This is the mathematical definition of our form of entropy. Another equivalent form is given later in Eqn. 7.26. This expression is analogous to but different from the Tsallis form of entropy, which is defined by the expression: ST (q) = − X (1 − pqi )/(1 − q) (7.10) i which has an apparent singularity at q = 1 in the Shannon limit. If entropy is expressed as the expectation value of the (generalized or ordinary) logarithm, the difference between the Tsallis expression and ours becomes clearer. If we use the generalized q-logarithm defined by (q−1) Logq (pi ) = (1 − pi )/(1 − q) (7.11) Tsallis entropy can be defined more simply by ST (q) = − < Logq (p) > 134 (7.12) Another form of entropy similar in appearance to ours was presented by Aczel and Daroczy (A-D) [123] and [124] (a summary of many different forms including the A-D form can be found in [125]) SAD = − X q pi log pi / i X q pi (7.13) i This has an extra denominator term, so that the weights for log pi are normalized. Wang [126] has also defined yet another form, apparently similar to our entropy, but from a different physical viewpoint, and he uses the condition X q pi = 1 (7.14) i Defined in terms of the simple probability distribution, the expectation value is < O >= X pi Oi (7.15) i We define the expectation value with respect to the deformed probability corresponding to the extended cell in our case, while keeping the usual logarithm 135 Ss (q) = − < log(p) >q (7.16) with < O >q = X q pi Oi (7.17) i Because of the denominator sum, the simplicity of the relation between the weights and log pi are lost in the Aczel-Daroczy form. In the Wang form, the probabilities are simply redefined as p̃i = pqi (7.18) SW = −(1/q)hlog p̃i (7.19) giving which is a re-scaled version of the usual Shannon form. The Wang form, thus, is extensive, unlike our form, where the deformed probabilities do not add up to unity, allowing for information leakage. The function Logq approaches the normal logarithm in the limit q → 1 and, hence, Tsallis entropy coincides with Shannon entropy and also as pqi → 136 pi , we too get the normal Shannon entropy. The Renyi entropy is defined by SR (q) = log( X q pi )/(1 − q) (7.20) i This, like Shannon entropy, is also extensive, i.e. simply additive for two subsystems for any value of q. One needs ( [127]) a slightly different formulation of the extensivity axiom to get Shannon entropy uniquely S1+2 = S1 + X p1i S2 (i), (7.21) i where S2 (i) is the entropy of subsystem 2 given subsystem 1 is in state i. 7.4 Applications of the New Entropy As was stated briefly in the beginning of this chapter, intuitively, entropy is associated with randomness, because it is a measure of the loss of information about a system, or the indeterminacy of its exact state. This in turn depends on the probability distribution for various states. As mentioned before, maximal uncertainty in state space is indicated by a uniform probability distribution function (pdf) among all states, whereas Dirac/Kronecker delta (continuous or discrete states) pdf with no uncertainty has zero entropy. The 137 Boltzmann form can be found from combinatorics: S = k log[N !/ Y ni !] (7.22) i because the ni are simply N pi in equilibrium. Here N is the total number of subsystems, and ni is the number of subsystems in the i-th state. One gets the Shannon form given below in terms of the pi themselves. The exponential probability distribution the constraints P P pi Ei = U is given by maximizing the entropy with pi = 1 and (with Ei the energy of the i-th state, and U the total energy, which is fixed) pi = C exp(−βEi ) (7.23) The Lagrange multiplier constant β here can be identified as the inverse of the temperature. We consider Shannon coding theorem [16]: when the letters Ai of the alphabet used in a code have the probabilities pi . It can be shown fairly easily that a stream of random letters coming out of the source with the given probabilities in the long run will relate the entropy to the probability of the given sequence: 138 P (sequence) = Y Np i pi = exp[−N S] (7.24) i where S is the entropy per unit and N is the large number of letters in the sequence. Now we see how our new entropy can be used in the context of physical situations. As was stated in the last chapter, the effective size of clusters qi defined for the i-th state in our new entropy may in general be a fraction, and if the interaction is weak, the average cluster size qi is just over unity. In liquid clusters, the typical subsystem in state i may bean assembly of ri molecules, but this may change due to environmental factors, such as pH value, to si , to produce a rescaling value of qi = si /ri , greater or less than 1. In general we allow the qi parameter in our entropy to be different for each i. Since pi represents the probability of a single occurrence of the i-th state, i.e. for a cluster of size unity (which may consist of a typical number of subunits), the probability for the formation of a cluster of size qi is p(qi ) = (pi )qi . We now consider the vector vi = p(qi ) = (pi )qi (7.25) This is n-dimensional, where n is the number of single unit states available 139 to the system components. We now study the “phase space” defined by the qi co-ordinates. As we have said above, the deviations of these parameters from unity give the effective (which may be fractional when an average is taken) cluster sizes in each of the states. A value smaller than unity would indicate a degeneration of the micro-system to a smaller one in a hierarchical fashion, partially if it is a fraction. In other words, we consider clusters forming superclusters or being composed of subclusters, with a corresponding change of scale in terms of the most basic unit relevant. Elsewhere, we have discussed the interesting question an oligo-parametric hierarchical structure of complex systems [128]. However, here, we restrict ourselves to cluster hierarchy changes that do no qualitatively change the description of the system. Hence, the divergence of the vector vi in the qi space gives a measure of the escape of systems from a given configuration of correlated clustering. Inversely, the negative of the divergence shows the net influx of systems into an infinitesimal cell with cluster sizes qi . We have unfragmented and also non-clustered, i.e. uncorrelated units at that hierarchy level if all the qi are unity. It can be argued first from the point of view of statistical mechanics that this negative divergence or influx of probability, may be interpreted as 140 entropy. S=− =− X X ∂pqi i i ∂qi (7.26) log(pi )pqi i i The free energy is defined by A = U − TS (7.27) where A is the free energy, U is the internal energy, and S is the entropy. T is the temperature, or a measure of the average random thermal energy per unit where we choose the Boltzmann constant k chosen to be unity. Hence, ST is a measure of the random influx of energy into the system due to the breaking/making of correlated clusters due to random interactions in a large system. The subtracted quantity A, can thus be related with the free energy or the “useful” energy. Usual thermodynamic phase space factors in dealing with a macroscopic system are dropped as common factors in what follows. We arrive at the same expression in terms of the Shannon coding theorem also. A stream of units (clustered) emitted will correspond to the probability 141 exp[−S] = Y pqi i pi (7.28) i which too gives us Eqn. 7.26 since, with qi average clustering the i-th state occurs with probability pqi i , 7.5 Probability Distribution for the New Entropy By maximizing the entropy with constraints X pi − 1 = 0 (7.29) pi Ei − U = 0 (7.30) i and X i we can obtain the pi in terms of the energy of the states, or possibly also other criteria in the usual way. The constrained function 142 X L = S + β( X pi Ei − U ) + α( i − pi − 1) (7.31) i q −(q−1) −(q−1) pi (log pi − 1) + 1 + γpi =0 q−1 (7.32) is optimized with respect to the pi , where we have used for brevity γ = α+βEi The simpler form, −c log y + dy = 1 (7.33) is used to relate to the Lambert function the terms with pi , This converts into d d − ye−(dy/c) = − e1/c c c (7.34) c d y = − W (− e1/c ) d c (7.35) which gives and with our variables, we get for pi " −qW (z) pi = (α + βEi )(q − 1) 143 #1/(1−q) (7.36) where z = −e(q−1)/q (α + βEi )(q − 1)/q (7.37) and W(z) is the Lambert function defined by [129] z = W (z)eW (z) (7.38) The parameters α and β appear from the Lagrange’s multipliers for the two constraints and are related to the overall normalization and to the relative scale of energy, i.e. to temperature ( 1/(kT )) as in the Shannon case where we get the Gibbs expression for pi . In the Tsallis case this gives pi the well-known value pi = [α + β(q − 1)Ei ]1/(1−q) (7.39) which is easily seen to reduce to Shannon form for q → 1. It can be shown (after some algebra) that this form reduces to the Shannon form for q → 1. The nonextensivity property of Tsallis entropy is evident by expanding 144 S T 1+2 = − X pi pj (1 − pi q pj q )/(1 − q)2 (7.40) = S T 1 + S T 2 + (1 − q)S T 1 S T 2 (7.41) ij However, Renyi entropy has the simple additive relation S R 1+2 = S R 1 + S R 2 (7.42) For the new entropy, we have S s 1+2 = S s 1 + S s 2 − M2 (q)S s 1 − M1 (q)S s 2 (7.43) where the Ma are the mixing probability of states for subsystem a as defined in Eqn. 7.6. 7.6 Probability, Lambert Function Properties and Constraints The transcendental equation defining the Lambert function consists of an infinite number of Riemann sheets separated by cuts. These are related to 145 the cut of the log function from −∞ to 0. The different branch values for the same z are marked by a subscript n, with n = 0 corresponding to the principal value and W0 (z) is real along the real z axis from −1/e to ∞ ( Fig. 7.1) . W 1 0.5 0.5 1 1.5 2 2.5 3 z -0.5 -1 Figure 7.1: W0 (z) is real along the real axis from −1/e to ∞; the value of W0 ranges from −1 to ∞; we do not show the W−1 (z) branch, which is real from z = −1/e to z = 0, because it is not suitable for our entropy, as explained in the text. Another branch, conventionally labeled W−1 (z), also gives real values for real z in the domain −1/e < z < 0, going down from −1 at z = −1/e to −∞ at z = 0. Four different regimes can be pointed for the parameters: (a) α + βEi > 0, and q > 1: A real positive pi requires from Eqn. 7.36 and Eqn. 7.37 (with ² ≡ q − 1 146 and γi ≡ α + βEi ) that W (zi ) < 0, and, hence, −1/² < zi < 0 (7.44) . Another constraint is that pi ’s are less than unity. This gives a cut-off value of Ei given by |W (zi )|q/² ≥ α + βEi (7.45) where zi is again dependent on Ei as given in Eqn. 7.37, so that it is a transcendental equation. (b) α + βEi < 0 and q > 1; Assuming that pi is real demands that W (zi ) > 0. Hence, initially zi need only be positive. But since ² < 0 and is in the exponent, the condition pi < 1 gives as a lower cutoff of zi the value z˜i , given by W (z˜i ) = γi |²|/q (7.46) This means that the negative Ei have a highest possible value given by Eqn. 7.46, which is again a transcendental equation in Ei . (c) α + βEi > 0 and q < 1: 147 The same reality of pi constraint implies in this case that W (zi ) > 0. So, we have zi > 0. Similarly, the condition pi < 1 gives the cut off z˜i defined by W (z˜i ) = |²|γi /q (7.47) So, Ei has a maximum value given by this transcendental constraint. (d) α + βEi < 0 and q < 1: Here, −1/e < zi < 0 initially due to the reality of pi and the constraint pi < 1 gives, as in the previous cases, a cutoff z˜i , defined by |W (z˜i )| = |²||γi |/q (7.48) and one can solve for the cut of Ei from the other parameters numerically in specific problems with given sets of parameters. The last branch of the Lambert function W−1 (z) is also real and negative for −1/e < z < 0 with the values −1 to −∞, but is not acceptable as it does not give the limit W−1 (z) → 0 as z → 0, when q → 1, which is required to get the Shannon limit for q → 0. As the probability function is of the exponential Boltzmann form for Shannon entropy, it is defined for any arbitrary value of the energy, because an exponential has no finite root. It can be seen that for finitely nonzero q−1, 148 the spectra of energy states Ei may be constrained when our entropy is used. This is the same with Tsallis entropy, where too the functional dependence has finite roots and a power behavior. 7.7 Numerical Comparison The variation of the probability function for different E at different q values are shown in Fig. 7.2 p 1 0.8 0.6 0.4 0.2 1 2 3 4 b E Figure 7.2: Comparison of the pdf for the new entropy for values of q = 1, 1.1, 1.2 and 1.3. The solid line is for q = 1, i.e. the Gibbs exponential distribution and the lines are in the order of q We note that the new pdf has a smaller curvature than the Shannon form. It drops increasingly rapidly for higher values of q at high energy values it, and is quite different in shape and in magnitude from the Gibbs 149 exponential distribution. A variation of even 10% from the standard value of q = 1 causes a discernible change in the pdf and should be observable in experimental contexts easily . The shape is almost linear at q = 1.3. The comparison of Tsallis pdf and the pdf for the new entropy was shown in Fig. 7.3 and Fig. 7.4 for the same values of q: 1.1 in the former and 1.3 in the latter. It can be noticed that for larger q values the new entropy yields much stiffer probability functions that depart substantially from the Tsallis pdf’s. The pdf for both the Tsallis form and our new form of entropy can be noted to hit the axis at finite values of the energy, making the support of the probability finite, unlike the exponential form of Shannon’s, as we have discussed in the previous section. 7.8 Free Energy We first assume that the entropy S is a sum of the components from all the states: S= X φ(pi ) (7.49) i where φ is a generalized function which will in general be different from 150 p 1 0.8 0.6 0.4 0.2 1 2 4 3 b E Figure 7.3: Comparison of pdf’s for Tsallis nonextensive entropy (solid line) and the new entropy presented here, for q = 1.1 p 1 0.8 0.6 0.4 0.2 0.5 1 1.5 2 2.5 3 b E Figure 7.4: The same as Fig. 7.3, but for a higher q = 1.3 the Shannon form. By optimizing L ( Eqn. 7.31), we get: 151 φ0 (pi ) = α + βEi (7.50) φ(pi ) = (α + βEi )pi (7.51) with the simple solution The constant of integration vanishes because there can be no contribution to the entropy from a state that has zero probability. We assume that the Helmholtz free energy A is defined by β(U − A) = S (7.52) If S is nonextensive, and U is extensive, this makes A to also be nonextensive. Using Eqn. 7.49, Eqn. 7.51 and Eqn. 7.52 get A = −α/β (7.53) Hence, we get the relation for the pdf pi = ψ −1 (β(Ei − A)) 152 (7.54) with the definition ψ(p) = φ(p)/p (7.55) Here, the assumption was made that the function ψ can be inverted. This, however, may not always be the case for an arbitrary expression for the entropy, at least not in a manageable closed form. A can be obtained from the constraint equation X i pi = X ψ −1 (β(Ei − A)) = 1 (7.56) i After A has been determined, it may be placed in Eqn. 7.54 to obtain the pdf pi for each of the states, all properly normalized, and can find U and its derivative C, the specific heat can also be found. For the simple system that we shall discuss later, pressure and volume, or their analogues, will not enter our considerations, and, hence, we have only one specific heat, Cv , with β now defined as the inverse scale of energy, the temperature T , C = −β 2 153 ∂U ∂β (7.57) 7.9 Shannon and Tsallis Thermodynamics Using Eqn. 7.54 and Eqn. 7.55, the Shannon entropy immediately gives, pi = e−α−βEi = eβ(A−Ei ) (7.58) so that Eqn. 7.56 gives the familiar expression for A A = − log(Q)/β (7.59) where Q is the partition function Q= X e−βEi (7.60) i The separation of the A dependent factor is allowed by the exponential form of pi in Eqn. 7.58 and, hence, we get such a simple expression for A in the case of Shannon entropy, giving us normal extensive thermodynamics. In the Tsallis case a common A dependent factor can no longer be separated out, and we cannot find an expression for A in terms of the partition function in the usual way. Instead we need to solve the normalization equation Eqn. 7.56. This will give an infinite number of roots for a general value 154 of ² ≡ q − 1 , but for values of ² corresponding to reciprocals of integers, we shall have polynomial equations with a finite number of roots, which too may be complex in general. Later, we shall see, at least for the simple example we considered in the end of this chapter, that a real and stable root can be found that approaches the Shannon value of A as ² → 0. This is because in that limit the Logq (p) function in the definition of the Tsallis entropy also coincides with the natural logarithm. 7.10 Thermodynamics of the New Entropy We have to obtain A by solving the transcendental Eqn. 7.56 numerically in a manner similar to the Tsallis case. For the specific heat, we get C = β2 X i Ei (Ei − A) i e−W (1+1/²) 1 + Wi (7.61) For brevity, we have written W i = W0 (²β(Ei − A)) and have used the following identities related to the Lambert function ( [129]). W (z) z(1 + W (z)) (7.62) W (z)/z = e−W (z) (7.63) W 0 (z) = 155 As W (z) ∼ z for small z, it can be noticed that for small ² we effectively get classical pdf and thermodynamics, as in the Tsallis case. Hence, the parameter ² is again a measure of the deviation from the standard statistical mechanics due to nonextensivity. However, our nonextensivity differs functionally from the Tsallis form, and the values of ² in the two forms can only be compared in the limit of low β. Power series expansion of W (z) gives W (z) = X (−n)n−1 z n n=1 n! (7.64) A comparison with the power series expansion for log(1 + z) in a form of the Tsallis pi similar to the new entropy form gives pi = e− log(1+²T β(Ei −A))/²T (7.65) We get a cancelation of the parameter ²n of the new entropy and of Tsallis parameter ²T in the first order, so that both distributions approach the Shannon pdf, as we have already mentioned, but if second order equality is demanded, we get 1 ²n = ²T 2 As the third order difference between W (Z) and log(1+z) is only 156 (7.66) 1 3 z , 24 the difference between the Tsallis form and our form of entropy will be detectable at rather low T , i.e. high β. 7.11 Application to a Simple System We now consider the simplest nontrivial system, where only two energy eigenvalues ±E exist, as in a spin-1/2 system. The standard results ( [130]) corresponding to Shannon entropy can be expected for a non-interacting system: A = − log[2 cosh(βE)]/β (7.67) S = log[2 cosh(βE)] − βE tanh(βE) (7.68) U = −E tanh(βE) (7.69) C = (βE)2 / cosh2 (βE) (7.70) We shall take ² = 0.25, 0.10 and then solve numerically for the Tsallis entropy. The values are shown in Figs. 7.5 - 7.8. We notice that Tsallis entropy gives very similar shapes for all the variables and for ² = 0.10 we get a fit much nearer to the Shannon form than for ² = 0.25. The typical Schottky form for a two-level system can be noticed in the 157 A -1 -1.5 -2 -2.5 -3 -3.5 1 2 3 4 5 6 T -4.5 Figure 7.5: A for Shannon (top), Tsallis with ² = 0.1 (middle) and Tsallis with ² = 0.25 (bottom) S 0.7 0.6 0.5 0.4 0.3 0.2 0.1 1 2 3 4 5 6 T Figure 7.6: S for same three entropy forms (from bottom to top Shannon, Tsallis (0.1), Tsallis (0.25)) specific heat. We can replace p− by 1 − p+ for faster execution of the numerics though 158 U 1 2 3 4 5 6 T -0.2 -0.4 -0.6 -0.8 -1 Figure 7.7: U for same three entropy forms (same order as for S) C 0.4 0.3 0.2 0.1 1 2 3 4 5 6 T Figure 7.8: C for same three entropy forms (peaks bottom to top - Shannon, Tsallis (0.1), Tsallis (0.25)) the variables involve both p+ and p− , after finding A. For our new entropy too we first determine the numerical value of A from the normalization condition 159 e−W+ /² + e−W− /² = 1 (7.71) and then use this value to find U , S, and C. In Figs. 7.9- 7.12 the values corresponding to ²n = 0.05 can be seen, which is half the Tsallis parameter ²T = 0.10 used, and also the Shannon values. We observe that only for the S curve we have a perceptible difference between Tsallis entropy and our new entropy at values of β near 1. A 0.2 0.4 0.6 0.8 1 T -1.025 -1.05 -1.075 -1.1 -1.125 -1.15 Figure 7.9: Comparison of A for Shannon (apart), Tsallis with ² = 0.1 and new entropy for ² = 0.05 (superposed) 160 S 0.4 0.3 0.2 0.1 0.2 0.4 0.6 0.8 1 T Figure 7.10: Comparison of S for same three forms of entropy (Shannon, Tsallis (0.1), new entropy (0.05) - Tsallis is just over the new entropy). U -0.75 -0.8 -0.85 -0.9 -0.95 0.2 0.4 0.6 0.8 1 T Figure 7.11: Comparison of U for the same three entropies. Shannon is separated, other two overlap. 7.12 Summary of the Mathematical Properties of the New Entropy From the discussions above, it is apparent 161 as the definition of entropy departs from the simple classical form a class of mathematical functions of increased C 0.4 0.3 0.2 0.1 0.2 0.4 0.6 0.8 1 T Figure 7.12: Specific heat C for same three entropies. Again, only Shannon is separated. complexity arises in the probability distribution functions. The evolution can be seen from the simple exponential of the Boltzmann form, to the generalized q-logarithm of Tsallis, to our Lambert function, and they all reduce to the simple form in a limit, which is the case for most physical laws and is true in various physical situations. Is still an open problem whether a hierarchical set structure can be given to such functions, with corresponding hierarchical limits of physical parameters. Previously, the Lambert function has been used in various physical contexts ( [112, 129]), such as Wien’s displacement law, two-dimensional capacitor fields, enzyme kinetics, models of combustion, range and endurance of airplanes etc. It has also been used in the combinatorial problem of counting 162 of unrooted trees. Since combinatorics is close to the definition of entropy, the presence of the Lambert function in our entropy is not unnatural. The form of the entropy may be taken as an indicator of the effective interaction among the constituent systems of the ensemble, the Shannon form being the limiting case of the zero interaction case, and Tsallis or our form being results of different forms of interaction with ² signifying a coupling constant. It is noteworthy that most thermodynamic functions we have considered here are not crucially dependent on the form of the entropy with adjusted coupling. However, the value of entropy itself may vary significantly in the different formulations. This may act as an indicator to discriminate between the suitability of different definitions of entropy in different contexts. 163 Chapter 8 Generalized Entropy and Entanglement In the entropy proposed in the last chapter, the definition is particularly simple, as it can be expressed simply as the divergence of a vector representing the modified probabilities for the different possible states taking into account a rescaling as a consequence of correlations or clustering due to interactions between the microsystems. Quantum entanglement of states is also a relevant issue in microscopic systems . The problem of quantum entanglement of two states in the picture of Tsallis type nonextensive entropy has been studied before [131, 132, 133]. The generalization of Shannon entropy to the very similar von Neumann en164 tropy using density operators in place of probability distributions [16] reveals common features of the stochastic and the quantum forms of uncertainties and this treatment can be extended to Tsallis’ form too. Our purpose here is to present a combined study of stochasticity and quantum entanglement, so that the former emerges from the quantum picture in a natural way, and then we intend to show that our new approach of defining entropy also allows us to obtain a measure of mutual information that involves stochasticity and entanglement together in a clear comprehensible way. The fact that our new definition of entropy, which is conceptually very simple, also gives the probability distribution function in a closed form in terms of Lambert W functions [59] allows one to carry out many calculations with the same ease as for Tsallis entropy. In this work, however, the probability distribution will not be needed for explicit use. In previous chapters, we had discussed the dynamics of entangled quantum networks. However, we have not considered decoherence effects that would be inevitable when practically designing such quantum devices. Here, we study the effect of entangled states collapsing to form mixed states and study the phenomenon in terms of mutual information using different entropies. First, we clarify a few definitions and mathematical identities that we will 165 be using later. 8.0.1 Entangled and Pure States A state vector is entangled when it cannot be expressed as the factorizable product of vectors in the Hilbert spaces of the two subsystems in the combined Hilbert space H of two particles (or subsystems in HA and HB ). Entanglement, hence, is actually a property related to projection in the subspaces. Thus, it cannot be expected to be measurable by properties in the bigger space alone. If we have the state |ψi = X Cij |iiA |jiB (8.1) ij the density matrix for the product space is defined as ∗ ρAB = Cij Ckl |ijiAB hkl|AB (8.2) The partial density matrix for the HA can be found by taking the trace over the HB part ρA = CC † 166 (8.3) where the C 0 s are now the coefficient matrices. An entangled state (for an explicit example) of two qubits (a “qubit”, or quantum bit, being a quantum superposition of two possible states) may be expressed by the reduced 2X2 matrix from the |0i|0i, |1i, |1i basis sub-set in terms of density matrices: c 2 ρ= γcs γcs s 2 (8.4) with γ = 1 for the pure quantum (entangled) state |ψi = c|0i|0i + s|1i|1i (8.5) Here we have used the compact notation c = cos(θ), and s = sin(θ). This entanglement occurs in the subspace of the product Hilbert space involving only the two basis vectors |00i and |11i. We can obtain other entangled combinations simply by relabeling the basis vectors. So, we shall use this as the prototype. We have an impure state with a classical stochastic component in the probability distribution for |γ| < 1, although we still have probability conservation. This is because T r(ρ) = 1, which remains unchanged under any unitary transformation. Factorizability (“purity” [135]) can be measured by finding ζ of a quantum state, or its quantum non-entanglement, which 167 remains invariant under changes of γ ζ = T rA [T rB (ρAB )2 ] = c4 + s4 . (8.6) Hence, we see that for the maximum entanglement ζ = 1/2 when θ = π/4, and the minimal entanglement corresponds to ζ = 1 (pure factorizable states) when θ = 0, π/2. Classical stochasticity is represented by quantum impurity and attains the maximum value when γ = 0, and is nonexistent when γ = 1, which corresponds to a pure entangled state. It can be noted that ζ does not involve the stochasticity-related parameter γ at all, but remains the quantifier of the quantum entanglement. Another equivalent and interesting way of quantifying entanglement may be the parameter EAB = 2(T r[ρAB ] − T r[ρA ]T r[ρB ]) = sin2 (2θ) (8.7) which is more symmetric in the two subspaces and resembles a correla168 tion function. This has a value of 0 for no entanglement when θ = 0, π/2, and maximal entanglement 1 for θ = π/4, as desired. This definition of entanglement follows the idea of mutual information, though we have not used the entropy at this stage, but only the probabilities directly. This does not involve the stochasticity in terms of the purity parameter γ. We have used, in the relation above ρA = T rB [ρAB ] (8.8) and similarly for ρB . In our specific case, for A or for B, c2 ρA,B = 0 2 0 s (8.9) with T r[ρA,B ] = 1 ensured. 8.1 Stochasticity from Entanglement with Environment If the entangled state |ΨAB > is coupled to the environment state |ΨE > quantum mechanically and then taking the trace over the environment states can give a measure of impurity 169 |ΨABE i = X cijk |iiA |jiB |kiE (8.10) ijk The density matrix for the pure quantum system, thus, is X ρABE = cijk c∗lmn |ijkihlmn| (8.11) ijk,lmn and as we trace over the environment, we get ρAB = X cijk c∗lmk |ijihlm| (8.12) ij,lm For the entangled mixture of |00i and |11i in HAB and the couplings c000 = cc0 c001 = cs0 c110 = ss0 c111 = sc0 with c0 = cos(θ0 ) and s0 = sin(θ0 ), the trace over HE states, yields the density matrix 170 (8.13) ρAB = c2 2csc0 s0 0 0 2c s cs s 2 (8.14) Hence, we introduce classical stochasticity by taking the trace over the environment space with γ = sin(2θ0 ) 8.2 (8.15) Entanglement, Entropy and Mutual Information 8.2.1 Single System Interacting with Environment We consider a single system A interacting with the environment E. The entanglement between the measured system and the environment is contained in the product space HA N HE , and hence the density operator for the combined system-environmental space is as given in Eqn. 8.4 where γ = 1 indicates a pure entangled state, and the environment-traced density is given by Eqn. 8.8. ρA and ρE are equal. Mutual information may be construed as the entanglement, and defined by 171 EAE = 2(T rAE [ρAE ] − (T rA [ρA ])2 ) = sin2 (2θ0 ) (8.16) with θ0 the angle of entanglement as defined before. Hence, the coupling of the system to the environment is reflected by measurements on the system A and the mutual information is contained in the parameters of the system itself. The calculations are done in terms of von Neumann entropy, which is simply the quantum density matrix form of the Shannon entropy (with mixing in an orthogonal quantum basis, it becomes similar to Shannon entropy) [136], S = −T r[ρ log(ρ)] (8.17) . From the Araki-Lieb relation [103], we know that SAE ≥ |SA − SE | (8.18) With SAE = 0 in a pure quantum state, we must have SA = SE , and hence, 172 IAE = SA + SE − SAE = 2SA = −2T rA [ρA log(ρA )] (8.19) which also confirms the view that the system itself contains the mutual information in its parameters in such a case, as we found above. If our new form of entropy [137] is used with the hypothesis that the mutual information is still given by the same form with the parameter q not equal to 1, which is the case for Shannon entropy, then we get IAE = −2T rA [ρqA log(ρA )] = −2c02q log(c02q ) − 2s02q log(s02q ) 8.2.2 (8.20) Entangled Systems interacting with the Environment We have already shown ρAB in Eqn. 8.14 with the 3-system entanglement shown in Eqn. 8.10 and the relatively simple choice of couplings in Eqn. 8.13. We may find the 3-system mutual information with similar construction 173 of ρAE , ρBE and ρABE , and by defining the 3-system mutual information as IABE (q) = −SABE (q) + SAB (q) + SBE (q) + SAE (q) −SA (q) − SB (q) − SE (q) (8.21) and with SABE (q) = 0 for any q, for a single 3-system pure state. We trace over the B space to get ρAE , which, using as basis |00i, |01i, |10i and |11i in the |AEi product space, yields 2 02 c c c2 c0 s0 ρAE = 0 2 0 0 c cs 0 c2 s02 0 0 2 02 0 ss 0 s2 c0 s0 0 0 s2 c0 s0 (8.22) s2 c02 and an identical matrix for ρBE . Using the relevant eigenvalues, we finally we get IABE (q) = c02q log(c02q ) + s02q log(s02q ) −λq+ log(λq+ ) − λq− log(λq− ) 174 (8.23) where the eigenvalues λ1 and λ2 are for the ρAB matrix obtained after tracing over E-space. λ+,− = (1/2)(1 ± √ [1 − 4(1 − γ 2 )c2 s2 ]) (8.24) with γ given by Eqn. 8.15. If we had started with a stochastic picture of entangled impure A-B system, where 1 − γ represents the stochasticity, the mutual information would be IAB (q) = −SAB (q) + SA (q) + SB (q) = λq+ log(λq+ ) + λq− log(λq− ) −2c2q log(c2q ) − 2s2q log(s2q ) (8.25) We first show the mutual information (MI) calculated according to the Shannon form of the entropy in Fig. 8.1. This is equivalent to our form at q = 1, as a function of the A − B entanglement angle θ and the entanglement angle of (A − B)-system with the environment θ0 , which is related to the stochasticity γ as explained above. We note that mutual information is smoothly dependent on the angle of entanglement with the environment θ0 . 175 In this case, it seems that traditional entropy in this case is fairly insensitive to details of coupling with the environment when the mutual information between two systems is measured. äAB 1 1.5 0.5 1 0 Θ¢ 0.5 0.5 Θ 1 1.5 Figure 8.1: IAB as a function of entanglement angle θ in A-B space and the entanglement angle θ0 with the environment, which is related to the stochasticity. We show the deviations ∆IAB of our MI from the Shannon MI, as a function q and entanglement angles θ0 and θ respectively in Fig. 8.2 and in Fig. 8.3, keeping the other angle at π/4 in each case. There is symmetry around π/4. The variation is fairly smooth for fixed θ0 , i.e. fixed entanglement with the environment. However, if we keep the entanglement between A and B fixed at near π/4, then the mutual information using our form of entropy changes sharply with θ0 near the symmetry value θ0 = π/4. It can be seen 176 that this comes from one of the eigenvalues of the density matrix ρAB approaching zero for this mixing value, and with q 6= 1, the there is either a sharp peak or dip compared to the Shannon entropy case, which has fixed q = 1. It can be noted here that recently in a study of the entropy of a chain of spins in a magnetic field [138] it has been found that both the usual von Neumann and Renyi forms of entropy yield nonzero and surprisingly simple closed expressions. Though this work does not mention entanglement explicitly, the correlation functions presented here, which determine the density matrix and, therefore, its diagonalized form needed for entropy calculation, actually are manifestations of the entanglement among the spins and between the spins and the magnetic field. The chain has been split into two parts, similar to our A and B subsystems, and the external magnetic field acts like the environment we have introduced in this work. Though they carry out their extensive calculations at zero temperature, unlike our finite temperature treatment, the fact they obtain nonzero SA for the first L spins is apparently due to the segmentation of the pure state of the fully entangled quantum system and the consideration of part A only for the entropy calculation. This is effectively equivalent to summing the states of part B and the entanglement with the environment, and produces entropy due to the 177 corresponding loss of information about state of the whole system. Hence, their results for this explicit model are consistent with our general result that classical stochasticity and entropy may be a reflection of segmented consideration of bigger complete systems. The values of the entropy of different types, such as the canonical Shannon form or generalized forms such as the Renyi form, which goes to the Shannon form in the usual limit, like that of the related parameter we have mentioned for Tsallis entropy, and for our new form of entropy in this work, reflect the extent of entanglement or interaction or, equivalently, correlation. In their work a length scale comes out of this segmentation, which appears to be similar to the angle of entanglement in our case. We do not get a phase transition as they do, because we have considered a simplified general finite system of only two or three component subsystems, not an infinite chain, and finite systems cannot show any phase transitions. It is shown in fig. 8.2 that little variation takes place with changing θ0 for almost any q. Pronounced changes at small q are shown in fig. 8.3 ( q < 1) for different θ. In Fig. 8.4 and Fig. 8.5 the difference between our MI and the Shannon MI was shown as a function of θ and θ0 simultaneously, keeping q fixed at 0.8 and at 1.2. At q = 1, we get no difference, as our entropy then coincides 178 0.2 0.1 DIAB 0 -0.1 1.5 1 0.8 Θ¢ 0.9 0.5 1 q 1.1 Figure 8.2: Difference of MI from our entropy with that from Shannon entropy at θ = π/4. 0.2 DIAB0.1 0 -0.1 -0.2 0.8 1.5 1 Θ 0.9 0.5 1 q 1.1 Figure 8.3: Same as Fig. 8.2 with θ0 = π/4 with the Shannon form. Again we notice that the mixing angle between A and B shows fairly smooth variation, but θ0 or equivalently the stochasticity, 179 0.2 DIAB 0.1 1.5 1 0 Θ¢ 0.5 0.5 Θ 1 1.5 Figure 8.4: MI difference between our entropy form and Shannon for q = 0.7 0 -0.05 DIAB -0.1 -0.15 1.5 1 Θ¢ 0.5 0.5 Θ 1 1.5 Figure 8.5: Same as Fig. 8.4 but for q = 1.3 causes pronounced peak ( for q < 1), or dip (for q > 1). We can conclude that our method of entropy calculation can indicate a greater role of the 180 entanglement with the environment when this mixing is nearly equal for the A − B entangled states. 0.1 0.05 DIAB 0 -0.05 1.5 1 Θ¢ 0.9 0.5 1 q 1.1 1.2 Figure 8.6: Difference between MI from our entropy and Tsallis’s with θ = π/4 We have previously [137] compared our form with results from Tsallis entropy in formulating a general thermodynamics in view of the prevalent familiarity with the Tsallis form of nonextensive entropy. We showed that despite the conceptual and functional differences between Tsallis entropy and our new form, the results are very similar if we take the Tsallis q to be twice as far from unity (the Shannon equivalent value) as our value of q. In Fig. 8.6 and Fig. 8.7 the difference of our MI from that derived from Tsallis’s entropy was shown. It can be observed again that among θ and θ0 , differences are both relatively more significant for q values different from 181 0.1 DIAB 0 1.5 1 -0.1 Θ 0.9 0.5 1 q 1.1 1.2 Figure 8.7: Same as Fig. 8.6 but with θ0 = π/4. 1, for both angles near pi/4, with peaks and dips similar to the comparison with the mutual information that was calculated with Shannon entropy. 8.3 Quantum Pattern Recognition and Mutual Information In previous chapters, we have discussed how quantum networks can store memories of quantum mechanical nature. In the last sections, we have discussed quantum mutual information with respect to entanglement with the environment. In this section, we apply the idea of density matrix and mutual information developed in the earlier section further to a more practical 182 purpose, one of quantum pattern recognition. We first discuss the concept of quantum mutual information in more detail with respect to patterns and detectors. We first review some of the concepts discussed in [98]. 8.4 Entanglement and Mutual Information When the test object (or its representation) is coupled completely to the pattern recognizing device in the case of perfect recognition, quantum states must match one to one absolutely. Conversely, if there is no recognition, the state of the two will be uncorrelated. The states would then factorize in the product space of the two state spaces: |systemi = |objecti O |detectori (8.26) Each state can be a superposition of eigenstates here. |objecti = X ci |Iiobj |detectori = X dj |jidet (8.27) When there is no factorization, in the case of full entanglement, |system >= X 183 ci |iiobj |iidet (8.28) The two basis states will belong to independent Hilbert spaces in this case. However, the same label i is used to show a one-to-one correspondence. When we have partial matching and, therefore, incomplete entanglement, we get |systemi = X cij |iiobj |jidet (8.29) For this specific composite system, the density matrix can be written as ρsys = X cij c ∗ ij|iiobj |jidet hi|obj hj|det (8.30) The correlation between the object pattern and the detector can be measure by ζ1 = T rsys [ρsys − ρobj ρdet ] (8.31) In this specific case, we calculate the partial traces ρobj = T rdet [ρsys ] (8.32) ρdet = T robj [ρsys ] (8.33) 184 The measure ζ defined before translates to ζ2 = T robj [ρobj ρobj ] (8.34) for this specific case. [103] shows how these two entanglement measures are related by ζ1 = 1 − ζ2 (8.35) The Shannon entropy can be defined in terms of the density function as SS = −T r[ρ log(ρ)] (8.36) Our entangled system, with a state equation |systemi = a|1i|1i + b|0i|0i (8.37) has a density matrix of the general form: |a|2 ρ= ∗ a bc We use the normalization 185 ab∗ c |b| 2 (8.38) T r[ρ] = 1 = |a|2 + |b|2 (8.39) As we see, c does not appear in it. c measures the degree of impureness, with a possible value ranging from 0 for completely impure states, to |c| = 1 for a perfectly pure pair of states. When a pure state is considered, it has a probability of 1 associated with that specific state and a probability of 0 related with any other state. Eqn. 8.34, then, gives Ssys = 0 (8.40) We then get from the Araki-Lieb inequality [103] SAB ≥ |SA SB | (8.41) Sobj = Sdet (8.42) , the relation Mutual information between the object and the detector, thus, is 186 Iobj−det = Ssys Sobj Sdet = −2T r[ρobj log(ρobj )] (8.43) The matrix of Eqn. 8.38, representing a two qubit coupling, gives ζ2 = |a|4 + |b|4 (8.44) The mutual information, in this case, is Iobj−det = 0 (8.45) This value occurs because the trace operator does not depend on diagonalization and when the states are pure (with |c| = 1, ρ it can have only one eigenvalue equal to 1, making the others 0. In most physical contexts, Shannon entropy is the most generally used form since the Boltzmann probability distribution is verified experimentally in these cases. However, in recent years, other forms of entropy have been suggested, giving different probability distributions for different energies with modified functional forms. Some of these new forms of entropy [17] [18] were discussed before, and another entropic function was proposed by us ( [137, 139]). Here we shall simply mention the mathematical form of the new entropy: 187 SN = −T r[dρq /dq] (8.46) This new definition of entropy gives the mutual information between the object and the detector as N = −2T r[ρq log(ρ)] Iobj−det (8.47) One can simply take the sum over the eigenvalues by diagonalizing ρ . This diagonalizing will give only one eigenvalue equal to 1 and the other to equal to zero for a pure quantum state, making mutual information zero. It is impurity, and not entanglement, that is measured by this mutual information as expressed by the variable c in Eqn. 8.38, giving a deviation of |c| from 1, which indicates a pure quantum state. A mixture of pure quantum states at finite temperature produces this impurity, and it is an impure system only that can show distinction between the different forms of entropy. The problem of mutual information because of impurity may become a relevant issue since a pattern recognition system may contain such impure mixtures of quantum states. The calculations shown above may provide one more test for the appropriateness of different types of entropy in various 188 situations. ζ was defined before related to the correlation between the states of the object and those of the detector. However, this coefficient does not give a maximal value for perfect correlation and zero for random association. For example, a value of b = 0, gives ζ2 = 1, and ζI = 0in Eqn. 8.44. This would also be the situation in the case of factorized states. This can be explained by observing that as one qubit is taken away from the detector-pattern pair, the idea of pattern recognition becomes meaningless. ζI = 1/2, is reached √ for a = b = 1/ 2, i.e. when both patterns exist with the same frequency and can also be identified perfectly. 189 Chapter 9 Prior PDFs, Entropy and Broken Symmetry The justification of choosing Shannon or any other more generalized entropy, such as that of Tsallis or Renyi, or the one we have presented here, lies eventually in the relevance or “good fit” such an entropy would produce in the data corresponding to a situation where the presence or lack of interactions among the members or other considerations suggest the need for a proper choice. However, data are always finite, and probability distribution is the limit of relative frequencies with an infinite sample. One, therefore faces the problem of estimating the best probability distribution function (PDF) from a finite sample [140]. This PDF may be subject to the constraint of a known 190 entropy, in whatever way defined, as a functional of the PDF. Mathematically, the problem of determining the best posterior PDF, given a rough prior PDF and data points, is expressed formally by Bayes Theorem. However, the constraint of the constant entropy makes the functional integral impossible to handle even for a fairly simple prior as found by Wolpert and Wolf [141] and by Bialek, Nemenman and Shafee [19]. The integrals involved were first considered in a general context by [141], and the question of priors was addressed in [140, 19]. It was discovered that, though the integral for the posterior was intractable, the moments of the entropy could be calculated with relative ease. In [19] it has also been shown that for Dirichlet type priors [142] P (pi ) = Y β pi (9.1) in particular (which give nice analytic moments with exact integrals, and hence, are hard to ignore) the Shannon entropy is fixed by the exponent β of the probabilities chosen for small data samples, and hence, not much information is obtained for unusual distributions, such as that of Zipf, i.e. a prior has to be wisely guessed for any meaningful outcome. As a discrete set of bins has no metric, or even useful topology that can be made use of in Occam razor type of smoothing, in this paper other tricks were suggested to 191 overcome the insensitiveness of the entropy. We have noted already that the PDF associated with our proposed entropy differs from that of the Shannon entropy by only a power of pi , but this changes the symmetry of the integrations for the moments for the different terms for different bins. We, therefore, shall examine in this chapter if the nature of the moments are sufficiently changed by our entropy to indicate cases where data can pick this entropy in preference to Shannon or other entropies. 9.1 Priors and Moments of Entropy For completeness, we mention here the formalism developed by Wolpert and Wolf [141]. The uniform PDF is given by Punif ({pi }) = 1 Zunif Z Zunif = A à δ 1− à dp1 dp2 · · · dpK δ 1 − K X ! pi i=1 K X ! pi (9.2) i=1 where the δ function is for normalization of probabilities, Zunif is the total volume occupied by all models. The integration domain V is bounded by each 192 pi in the range [0, 1]. Because of the normalization constraint, any specific pi chosen from this distribution is not uniformly distributed and “uniformity” means simply that all distributions that obey the normalization constraint are equally likely a priori. We can find the probability of the model {pi } with Bayes rule as P ({pi }|{ni }) = P ({ni }|{pi })Punif ({pi }) Punif ({ni }) P ({ni }|{pi }) = K Y (pi )ni . (9.3) i=1 Generalizing these ideas, we have considered priors with a power-law dependence on the probabilities calculated as à ! K K X Y 1 Pβ ({pi }) = δ 1− pi pβ−1 , i Z(β) i=1 i=1 (9.4) It has been shown [19] that if pi ’s are generated in sequence [ i = 1 → K] from the Beta-distribution à P (pi ) = B ! qi 1− P j<i pj B (x; a, b) = ; β, (K − i)β xa−1 (1 − x)b−1 B(a, b) gives the probability of the whole sequence {pi } as Pβ ({pi }). 193 (9.5) Random simulation of PDF’s with different shapes ( a few bins occupied, versus more spread out ones) show that the entropy depends largely on the parameter β of the prior and hence, sparse data has virtually no role in getting the output distribution shape. This would seem unsatisfactory, and some adjustments appear to be needed to get any useful information out. We shall not here repeat the methods and results of [19], which considers only Shannon entropy. 9.2 Comparison of Shannon and Our Entropy In our case with the entropy function given by Eqn. 7.26, we note that it involves not only a simple replacement of the individual factors involving nu ²{ni } [141] by nu + q − 1 in the product integral involved in the moment determination, but a complete re-calculation of the moment, using the same techniques given in [141]. Apparently, the maximal value of entropy should correspond to the most flat distribution, i.e. Smax = K (1−q) log(K) (9.6) In the limit of very sparse data, i.e. ni → 0, we get eventually the expression for the first moment, i.e. the expected entropy 194 hS1 i/hS0 i = K Γ(β + q) Γ(βK) ∆Φ0 (βK + q, β + q) Γ(β) Γ(βK + q) (9.7) where we have for conciseness used the notation of ref. [141] ∆Φp (a, b) = Ψ(p−1) (a) − Ψ(p−1) (b) (9.8) Ψn (x) being the polygamma function of order n of the argument x. It can be checked easily that this expression reduces to that in ref. [19] when q = 1, i.e. when we use Shannon entropy. 9.3 Results for Mean Entropy So, we now have, unlike Shannon, a parameter q that may produce the difference from the Shannon case, where q is fixed at unity. In Figs. 9.1 - 9.3 we show the variation of the ratio of hS1 i/Smax with variable bin number K. In ref. [19] we have commented how insensitive the Dirichlet prior [142] is when Shannon entropy is considered in the straightforward manner given in ref. [141]. In our generalized form of the entropy, we note that by changing the parameter, specific to our form of the entropy, for q > 1, we get a peak for small β and large K values. This peak allows us to choose uniform Dirichlet priors with appropriate q 195 0.8 2.0 0.6 S' 0.4 1.5 0.2 0.0 0 1.0 Β 0.5 500 K 1000 0.0 Figure 9.1: Ratio of expected value (first moment) of new entropy plotted against bin number K and prior exponent β for entropy parameter q = 0.5. 2.0 0.5 S' 1.5 0.0 0 1.0 Β 0.5 500 K 1000 0.0 Figure 9.2: Same as Fig. 9.1, but for q = 1.0, i.e. Shannon entropy. value, that would nevertheless lead to asymmetry not possible with Shannon entropy. In other words, instead of the priors, we can feed the information 196 2 2.0 S' 1 1.5 0 0 1.0 Β 500 0.5 1000 K 1500 2000 0.0 Figure 9.3: As previous two figures, but for q = 1.5. about expected asymmetry of the PDF to the entropy with no need to choose particular bins. The nonextensivity of our entropy, coming possibly from interaction among the units, gives rise to situations where the entropy maxima do not increase with the number of bins like log(K), but being K (1−q) log(K), may be extended or squeezed, according to the value of q being less than or greater than unity. The interesting thing to note is that for q > 1 and large K, at small prioric parameter β, the entropy peak exceeds the normally expected expression in Eqn. 9.6, with full K, so, the expected value of entropy is seen to exceed the formal maximum. The clustering or repulsive effects, change the measure of disorder from the Shannon type entropy. So, the highest expected value 197 of entropy may correspond not to a uniformly distributed population, but to that corresponding to one with a smaller subset that is populated. This means that for our entropy the most uniform distribution is not the least informative, the pq weighting distorts it to an uneven distribution for the expected maximal entropy value. This result is in some ways similar to spontaneous symmetry breaking in field theory, where the variation of a parameter leads to broken-symmetry energy minima. A neater view of these results can be seen in Figs. 9.4- 9.6 with K values fixed. S' 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1.0 Β Figure 9.4: Clearer view in 2-dimensional plot, with K = 10. Red, green and blue lines are for q = 0.5, 1.0 and 1.5 respectively We have not obtained the second moment, i.e. the standard deviation, or the spread, of the entropy distribution, because, with our entropy and an arbitrary q, the expressions cannot be obtained in the simple form of 198 S' 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1.0 Β Figure 9.5: As Fig. 9.4, but for bin number K = 100. S' 3.0 2.5 2.0 1.5 1.0 0.5 0.2 0.4 0.6 0.8 1.0 Β Figure 9.6: As previous two figures, but for K = 1000. ref. [19]. We, can however, expect that the variation of the higher moments from the Shannon case will be less than the first moment, because higher derivatives of the Γ functions are smoother. We shall assume the spreads are narrow enough to concentrate on the first moments only. Apart from the PDF estimates above, this picture of broken symmetry 199 for the maximal entropy when the parameter q > 1 is also manifest directly in an explicit calculation of the entropy using our prescription with a simple three-state system. The symmetric expected maximal entropy in this case should be Smax = −3pq log p (9.9) with p = 1/3. With two of the probabilities p1 and p2 running free from 0 to 1 with the constraint p1 + p2 + p3 = 1, the plot for the entropy S = −pq1 log p1 − pq2 log p2 − (1 − p1 − p2 ) log(1 − p1 − p2 ) (9.10) we plot S/Smax in Figs. 9.7, 9.8. For q = 2.44 we obtain the most interesting behavior, with a local maximum at the point of symmetry p1 = p2 = p3 = 1/3, which is not the global maximum. For q ≤ 1 the symmetry point gives the global maximum. 200 1.0 S 1.0 Smax 0.5 0.0 0.5p2 0.5 p1 1.0 0.0 Figure 9.7: Our entropy for a three-state system, with parameter q = 2.44, as two independent probabilities p1 and p2 are varied wit the constraint p1 + p2 + p3 = 1. The expected maximum at the symmetry point p1 = p2 = p3 turns out to be a local maximum. The global maxima are not at the end points with one of the probabilities going up to unity and the others vanishing, which gives zero entropy as expected, but occurs near such end points, as shown clearly in the next figure. S Smax 1.0010 1.0005 1.0000 0.1 0.2 0.3 0.4 0.5 0.6 p1 Figure 9.8: Two-dimensional version of the previous Fig. 9.7, with p1 = 1/3 fixed, so that only p2 varies. This shows a clearer picture of local maximum at the symmetry point and global maxima near the end points. 201 Chapter 10 Illation and Outlook We have presented in this dissertation generalizations and extensions of an important class of artificial neural networks, the integrate-and-fire type, taking them up to a fully quantized model with all nodes completely entangled with one another, through intermediate steps of a quasiclassical model and a quantum-gated but not fully entangled model. In this study our primary concern has been the periodic behavior of these networks, because periodicity is not only of vital concern in biological neural systems, but may also be important in the design of artificial neural networks. We have found that periodic behavior may be present in quantized versions, which may decay out, or may be retained indefinitely. Even where the periodic behavior dies out, as in the completely entangled version, we have found that the input data 202 may be recovered by a method of back-projection. We, therefore, envisage the possibility of the use of quantum devices with both short-term memories (where some decoherence comes in) and long-term (fully entangled case) ones with interesting and useful technological use. At finite temperature noise and stochasticity comes into play, along with the concept of entropy. In a neural network, where units are interacting in a well-defined manner, there may be a competition between order and disorder, i.e. a partial loss of information due to the finite temperature, due to noise, or due to intrinsic defect or incompleteness of design. However, the interference leading to stochasticity may have an element of bias in such systems due to non-random interactions. Hence, in neural networks and all such systems with specific interactions, the concept of entropy may need to be generalized. We have suggested a generalized form of entropy that is based on information-theoretic considerations. We have argued how the usual definition of entropy may be generalized to take into account possible deformations of the volumes of information registers, which may come about from interactions, leading to mutual information, both at the classical and quantum levels. In a stochastic situation actual probability distribution functions are constructed from a finite set of data, which cannot contain the full information 203 about the PDF. One has to use Bayesian techniques to estimate such PDF’s using prior estimates and the data. It was found before that the functional behaviors of Shannon entropy and popular Dirichlet type priors conspire to make the direct application of the formalism very insensitive to the variation of parameters, requiring adoption of more complicated procedures. In this work we have shown how our form of entropy may give the necessary levity for finding good fits. This feature too makes our entropy more useful in the right context, where interactions effectively add more information, removing some of the uncertainty of flat priors. One interesting outcome of this investigation was the result that the highest entropy, as defined by us, may not correspond to the most uniform distribution, as the added information may break the symmetry in a natural way, leading to clusters, or, equivalently, the deformation of phase cells for the storage of information. As the ultimate objective of information processing systems, including neural networks, or information retaining systems, usually involves pattern matching, we have looked into the possibility of applying biological type (allosteric) co-operative recognition units to enhance the probability of recognition in quantum systems. This type of approach may indeed be very relevant for small coherent quantum devices, because biological allosterism, which forms the more efficient sigmoid switching curves in enzymes seems to be in 204 some ways analogous to quantum entangled systems and nonlocality. In any case, as in our study of H-H type neural networks, we believe that the transition from classical to quantum devices for greater efficiency, in time and storage space, will probably require an interplay of classical and quantum concepts. 205 Bibliography [1] B.E. Swartz, Timeline of the history of EEG and associated fields. Electroencephalography and clinical Neurophysiology 106, 173-176 (1998). [2] W.L. Miller and K.A. Sigvardt, Spectral analysis of oscillatory neural circuits. Journal of Neuroscience Methods 80, 113–128 (1998). [3] I. Sugihara, E. J Lang, R. Llins, Uniform olivocerebellar conduction time underlies Purkinje cell complex spike synchronicity in the rat cerebellum. J. Physiol. Lond. 470, 243–271 (1990). [4] C.A. Del Negro, C.G. Wilson, R.J. Butera, H. Rigatto, S. Henrique and C. Jeffrey, Periodicity, mixed-mode oscillations, and quasiperiodicity in a rhythm-generating neural network. Biophys J. 82, 206-214 (2002). [5] R. Refinetti, Circadian Physiology (CRC Press, 2005). 206 [6] M.R. Mehta, Role of rhythms in facilitating short-term memory. Neuron 6, 147–56 (2005). [7] R.W. Kensler, Mammalian cardiac muscle thick filaments: Their periodicity and interactions with actin. Biophys. J. 82 , 1497–1508 (2002). [8] C. Kaernbach and H. Schulze, Auditory sensory memory for random waveforms in the Mongolian gerbil. Neuroscience Letters 329, 37–40 (2002). [9] H. Onimaru, A. Arata and I. Homma, Neuronal mechanisms of respiratory rhythm generation: an approach using in vitro preparation. Jpn J Physiol. 47, 385–403 (1997). [10] W. Hoppe, W. Lohmann, H. Markl and H. Ziegler (ed.), Biophysics (Springer, NY, 1983). [11] J.J. Hopfield, Proc. Natl. Acad. Sci. USA 79, 2554 (1982). [12] J.J. Hopfield and A.V.M. Herz, Proc. Natl. Acad. Sci. USA 92, 6655 (1995). [13] O. Watanabe (Ed.), Kolmogorov Complexity and Computational Complexity (Springer, New York, 1992). 207 [14] A. L. Hodgkin and A. F. Huxley, Quantitative Description of Membrane Current and its Application to Conduction and Excitation in Nerve. J. Physiol. 117, 500 (1952). [15] C. Koch, Computation and the single neuron. Nature 385, 207 (1997). [16] M. Nielsen and I. Chuang, Quantum computation and quantum information (Cambridge Univ. Press, NY, USA, 2000). [17] A. Renyi, Probability Theory (Amsterdam: North-Holland, 1970). [18] C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics, J. Stat.Phys. 52, 479–487 (1988). [19] I. Nemenman, F. Shafee and W. Bialek, Entropy and Inference, Revisited, in Adv. Neur. Info. Processing 14, eds. T.G. Dietterich, S. Becker and Z. Ghahramani (MIT Press, Cambridge, 2002) pp. 471–478. [20] P.W. Anderson, Mat. Res. Bull. 5, 549 (1970). [21] D.L. Stein(ed.), Spin Glasses and Biology (World Scientific, Singapore, 1992). [22] D.J. Amit, H. Gutfreund and H. Sompolinsky, Phys. Rev. Lett. 55, 1530 (1985). 208 [23] B. Derida, E. Gardner and A. Zippelius, Europhys. Lett. 4, 167 (1987). [24] H. Gutfreund and M. Mezard, Phys. Rev. Lett. 61, 235 (1988). [25] H. Sompolinsky and I. Kantner, Phy. Rev. Lett. 57, 2861 (1986). [26] D. Horn, Physica A 200, 594 (1993). [27] F. Zertuche, R. Lopes-Pena and H. Waelbroech, J. Phys. A 27, 5879 (1994). [28] T.L.H. Watkin and D. Sherrington, J. Phys. A 24, 5427 (1991). [29] T. Aonishi, Phase Transitions of an Oscillator Neural Network with a Standard Hebb Learning Rule. Preprint cond-mat/9808121 (1998). [30] D.A. McCormick, B.W. Connors, J.W. Lighthall and D.A. Prince, Comparative electrophysiology of pyramidal and sparsely spiny stellate neurons of the neocortex. J Neurophysiol. 54, 782-806 (1985). [31] Z. Xiang, J.R. Huguenard, D.A. Prince, GABAA receptor-mediated currents in interneurons and pyramidal cells of rat visual cortex. J. Physiol. 1:506, 715–30 (1998). 209 [32] A.M. Thomson, D.C. West, J. Hahn, J. Deuchars, Single axon IPSPs elicited in pyramidal cells by three classes of interneurones in slices of rat neocortex. J. Physiol. 1:496 ( Pt 1), 81–102 (1996). [33] J. Karbowski, N. Kopell, Multispikes and synchronization in a large neural network with temporal delays. Neural Comput. 12(7), 1573–606 (2000). [34] W. Gerstner, Rapid Phase Locking in Systems of Pulse-Coupled Oscillators with Delays. Phys. Rev. Lett. 76, 1755–1758 (1996). [35] D. Golomb, G.B. Ermentrout, Continuous and lurching traveling pulses in neuronal networks with delay and spatially decaying connectivity. Proc. Natl. Acad. Sci. USA. 9:96, 13480–5 (1999). [36] S.M. Crook, G.B. Ermentrout, M.C. Vanier, J.M. Bower, The role of axonal delay in the synchronization of networks of coupled cortical oscillators. J. Comput. Neurosci. 4(2), 161–72 (1997). [37] C. Monroe, Quantum Information Processing with Atoms and Photons. Nature 416, 238–246 (2002). [38] M. Riebe et al., Deterministic quantum teleportation with atoms. Nature 429, 734–737 (2004). 210 [39] M.D. Barrett et al., Nature, 429, 737 (2004). [40] J.I. Cirac and P. Zoeller, Quantum computations with cold trapped ions. Phys. Rev. Lett. 74, 4091–4094 (1995). [41] D.Vion et al., Science 296, 886 (2002). [42] T. Yamamoto, Yu. A. Pashkin, O. Astafiev, Y. Nakamura, J. S. Tsai, Demonstration of conditional gate operation using superconducting charge qubits. Nature, 425, 941–944 (2003). [43] N. Gisin, G. Ribordy, W. Tittel, and H. Zbinden, Quantum cryptography. Rev.Mod. Phys. 74, 145–195 (2002). [44] J.H. Plantenberg, P.C. de Groot, C.J.P.M. Harmans, J.E. Mooij, Demonstration of controlled-NOT quantum gates on a pair of superconducting quantum bits. Nature 447, 836–839 (2007). [45] N. Margolus and L. Levitin, The maximum speed of dynamical evolution. Physica D 120, 188–195 (1998). [46] L.B. Levitin, T. Toffoli and Z. Walton, Operation time of quantum gates, in Quantum Communication, Measurement, and Computing, J. H. Schapiro and O. Hirota, ed., (Rinton, 2003) pp.457–459. quantph/0210076 (2002). 211 [47] M. Zak and C.P. Williams, Quantum neural nets. Int. J. Theo. Phys. 37, 651–684 (1998). [48] F. Shafee, A spin glass model of human logic systems, to appear in Proc. Euor. Conf. Compl. Syst. 2005, arxiv.org:physics/0509065 (2005). [49] B.E. Baaquie, Quantum Finance, (Cambridge Univ. Press, Cambridge, UK, 2004). [50] R. Penrose, The Emperor’s New Mind (Oxford Univ. Press, Oxford, 1989). [51] R. Penrose, Shadows of the Mind (Oxford Univ. Press, Oxford, 1994). [52] M. Tegmark, The Importance of Quantum Decoherence in Brain Processes. Phys. Rev. E 61, 4194–4206 (2000). [53] S. Hagan, S. Hameroff, J. Tuszynski, Quantum computation in brain microtubules? Decoherence and biological feasibility. Phys. Rev. E 65, 061901 (2002). [54] J. R. Petta, A. C. Johnson, J. M. Taylor, E. A. Laird, A. Yacoby, M. D. Lukin, C. M. Marcus, M. P. Hanson, and A. C. Gossard, Coherent 212 manipulation of coupled electron spins in semiconductor quantum dots. Science 309, 2180-2184 (2005). [55] J. M. Taylor, J. R. Petta, A. C. Johnson, A. Yacoby, C. M. Marcus, M. D. Lukin, Relaxation, dephasing, and quantum control of electron spins in double quantum dots. Phys. Rev. B 76, 035315 (2007). [56] F. Shafee, Stochastic dynamics of networks with quasiclassical excitations. Stochastics and Dynamics 7, 403–416 (2007). [57] F. Shafee, Neural networks with quantum gated nodes. Engineering Applications of Artificial Intelligence 20, 429-437 (2007). [58] F. Shafee, Information in entangled dynamic quantum networks. Microelectronics Journal, 37, 1321-1324 (2006). [59] F. Shafee, Neural networks with finite width action potentials. Preprint: arxiv.org: cond-mat/0111151 (2001). [60] J.J. Sakurai, Modern Quantum Mechanics (Addson-Wellesley, MA, USA, 1994) p.316 [61] S.L. Rauch, M.R. Milad, S.P. Orr, B.T. Quinn, B. Fischl, R. Pitman, Orbitofrontal thickness, retention of fear extinction, and extraversion. Neuroreport. 28:16 1909-12 (2005). 213 [62] M.V. Altaisky, Preprint arxiv.org: quant-ph/0107012 (2001). [63] D.A. Lidar, I.L. Chuang and K.B. Whaley, Decoherence-free subspaces for quantum computation. Phys. Rev. Lett.81, 2594–2597 (1998). [64] A.P. Kirilyuk, Dynamically Multivalued Self-Organisation and Probabilistic Structure Formation, Solid State Phenomena 97-98, 21–26 (2004). [65] B. Ricks and D. Ventura, in Advances in Neural Information Processing Systems 16: Neural Information Processing Systems, NIPS 2003, ed. by Sebastian Thrun, Lawrence K. Saul and Bernhard Schlkopf, (MIT Press, Cambridge, MA, USA, 2004). [66] K.W. Cheng, Breaking RSA Code on the Quantum Computer, Thesis, Kaohsiung University, Taiwan (2002). [67] A. Fiorentino, Confronto fra reti neurali classiche e quantistiche, Thesis, Universita degli studi di Milano (2002). [68] J. Faber, and G.A. Giraldi, Quantum models for artificial neural networks, Technical rep. of LNCC National laboratory for Scientific Computing, Brazil (2002). 214 [69] P.W. Shor, Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer. SIAM J. Comp 26, 1484–1509 (1997). [70] M. Meister, M.J. Berry The neural code of the retina. Neuron 22, 43550 (1999). [71] M. Mahowald and C. Mead, The silicon retina. Scientific American 264, 76 (1991). [72] K. Boahen, Aretinomorphic vision system. IEEE Micro 16, 30 (1996). [73] R. F. Lyon and C. A. Mead, The cochlea. Analog VLSI and Neural Systems (Addison Wesley Publishing Co., Reading MA, 1989) p.279 [74] R. Sarpeshkar, R. F. Lyon, and C.A. Mead, An analog VLSI cochlea with new transconductance amplifiers and nonlinear gain control. Proceedings of the 1996 IEEE International Symposium on Circuits and Systems, Atlanta, GA 3, 292 (1996). [75] S. DeWeerth, L. Nielsen, C. Mead, and K. Astrom, A simple neuron servo. IEEE Tran. Neural Networks 2, 248 (1991). 215 [76] T. Horiuchi, T. Morris, C. Koch, and S. DeWeerth, Analog VLSI circuits for attention-based visual tracking. Advances in Neural Information Processing Systems 9, (MIT Press, 1997) p.706 [77] F. De Martini, V. Buek, F. Sciarrino and C. Sias, Experimental realization of the quantum universal NOT gate. Nature 419, 815 (2002). [78] A. Ekert, P. Hayden and H. Inamori, Basic concepts in quantum computation. quant-ph/0011013 (2000). [79] J. Preskill, Quantum Computation. Lecture Notes for Physics 219, (California Institute of Technology, Pasadena, CA, 1998). [80] C. Altman, J. Pykacz, R. R. Zapatrin, Superpositional Quantum network Topologies. International Journal of Theoretical Physics,43, 2029–2040 (2004). [81] A.S. Davydov, Solitons in molecular systems. (Kluwer, Dordrecht, 1991). [82] A. Xie, L. van der Meer, W. Hoff and R. H. Austin, Long-Lived Amide I Vibrational Modes in Myoglobin. Phys. Rev. Lett. 84, 5435–5438 (2000). 216 [83] W. Fann, L. Rothberg, S. Benson, J. Madey, S. Etemad and R.H. Austin, Dynamical Test of Davydov-Type Solitons in Acetanilide Using a Picosecond Free-Electron Laser. Phys. Rev. Lett. 64, 607–610 (1990). [84] F. Shafee, Quantum Images and the Measurement Process. Elec. J. Theor. Phys. 4:14, 121–128 (2007). [85] R. Omnes, A model of quantum reduction with decoherence. Phys. Rev. D71, 065011 (2005). [86] W.H. Zurek Decoherence, einselection, and the quantum origins of the classical. Rev. Mod. Phys. 75 , 715 (2003). [87] G. Sewell On the mathematical Structure of Quantum Measurement Theory. Rep. Math. Phys. 56, 271 (2005). [88] R. Douglas, Rules of thumb for neuronal circuits in the neocortex. Notes for the Neuromorphic aVLSI Workshop, Telluride, CO (1994). [89] E. Guigon and Y. Burnod, Short-term memory. The Handbook of Brain Theory and Neural Networks M.A. Arbib ed. ( MIT Press, Cambridge, MA, 1995) p.867 217 [90] C. Diorio, P. Hasler, B. A. Minch, and C. Mead, A complementary pair of four-terminal silicon synapses. Analog Integrated Circuits and Signal Processing 13, 153 (1997). [91] C. Diorio, P. Hasler, B. A. Minch, and C. Mead, A single-transistor silicon synapse. IEEE Trans. Electron Devices 43, 1972 (1996). [92] L.D. Iasemidis and J.C. Sackellares, Chaos Theory Epilepsy. The Neuroscientist 2, 118 (1996). [93] S.G. Schirmer, H. Rabitz et al, Quantum control using sequence of simple control pulses. quant-ph/0105155 (2001). [94] A. Barenco et al, Phys. Rev. A 52, 3457 (1995). [95] L.K. Grover, A fast quantum-mechanical algorithm for database search. Proc. 28th Annul ACM Symposium on the Theory of Computing (STOC’96) (ACM, Philadelphia, 1996) p.212 [96] D. Deutsch, Proc. Royal Soc. Lond. A 400, 97 (1985). [97] A.C. Doherty et al, Distinguishing entangled and separable states. quant-ph/0112007 (2001). 218 [98] F. Shafee, Aspects of quantum pattern recognition in Pattern Recognition Theory and Applications: ISBN: 1-60021-717-6 ed. E.A. Zoeller (Nova Publishers, 2008). [99] M.E. Newman, S.H. Strogatz, and D.J. Watts, Random Graphs with Arbitrary Degree Distributions and Their Applications. Phys. Rev. E 64, 17 (2001). [100] A. Barabasi and A. Reka, Emergence of scaling in random networks. Science 286, 509-512 (1999). [101] Z. Zhou, X. Zhou et al, Conditions for nondistortion interrogation of quantum systems. Europhys. Lett. 58, 328 (2002). [102] J. Monod, J. Wyman, & J.P. Changeux, J. Mol. Biol. 12, 88-118 (1965). [103] H. Araki, and E.H. Lieb, Entropy inequalities. Comm. Math. Phys. 18, 160-170 (1970). [104] D.R Chialvo and P. Bak, Learning from mistakes. Neuroscience 90, 1137-1148 (1990). [105] P.T. Landsberg, Entropies galore! Brazilian Journal of Physics 29, 46–49 (1999). 219 [106] P. Grigolini, C. Tsallis and B.J. West, Classical and Quantum Complexity and Non-extensive Thermodynamics. Chaos, Fractals and Solitons 13, 367–370 (2001). [107] A.R. Plastino, A. Plastino and C. Tsallis, The classical N-body problem within a generalized statistical mechanics. J. Phys. A 27, 5707–5757 (1994). [108] B.M. Boghosian, Phys. Rev. E 53, 4754 (1995). [109] C. Anteneodo and C. Tsallis, Two-dimensional turbulence in pureelectron plasma: a nonextensive thermostatistical description. J. Mol. Liq. 71, 255-267 (1997). [110] V.H. Hamity and D.E. Barraco, Phys.Rev. Lett. 76, 4664 (1996). [111] C. Beck, Nonextensive statistical mechanics and particle spectra, hepph/0004225 (2000). [112] R.M. Corless, G.H. Gonnet,D.E.G. Hare, D.J. Jeffrey and D.E. Knuth, On the LambertW function. Adv. Comput. Math. 5, 329-359 (1996). [113] C. Wolf, Equation of state for photons admitting Tsallis statistics. Fizika B 11, 1 (2002). 220 [114] E.J.W. Boers, H. Kuiper, B.L.M. Happel and I.G. SprinkhuizenKuyper, Designing modular artificial neural networks. In H.A. Wijshoff(Ed.),Proceedings of Computing Science in The Netherlands (CSN’93), p. 87-96 (1993). [115] C. Tsallis, A.R. Plastino, and W.-M. Zheng, Chaos, Solitons and Fractals 8, 885 (1997). [116] M.L. Lyra, and C. Tsallis, Phys. Rev. Lett., 80, 53 (1998). [117] A.R. Plastino and A. Plastino, Non-extensive statistical mechanics and generalized Fokker-Planck equation. Physica A 222, 347-354 (1995). [118] C. Tsallis and D.J. Barkman, Phys. Rev. E 54, R2197 (1996). [119] L. Borland, Phys. Rev. E 57, 6634 (1998). [120] M. Buiatti, P. Grigolini and A. Montagnini, Phys. Rev. Lett. 82, 3383 (1998). [121] G. Kaniadakis, Nonlinear kinetics underlying generalized statistics. Physica A 296, 405–425 (2001). [122] G. Kaniadakis, Statistical mechanics in the context of special relativity. Phys. Rev. E 66, 056125 (2002). 221 [123] J. Aczel and Z. Daroczy, Charakterisierung der Entropien positiver Ordnung und der Shannonschen Entropie. Acta Math. Acad. Sci. Hungary 14, 95–121 (1963). [124] J. Aczel, and Z. Daroczy, Sur la caractrisation axiomatique des entropies d’ordre positif, y comprise l’entropie de Shannon. Comp. Rend. Acad. Sci. (Paris) 257, 1581–1584 (1963). [125] M.D. Esteban and D. Morales, A summary on entropy statistics. Kybernetika 31, 337–346. (1995). [126] Q.A. Wang, Entropy 5, 3 (2003). [127] A.I. Kinchin, Mathematical Foundations of Information Theory, (Dover Publications, New York, 1957). [128] F. Shafee, Oligo-parametric Hierarchical Structure of Complex Systems. NeuroQuantology Journal 5, 85–99 (2007). [129] S.R. Valluri, R.M. Corless and D.J. Jeffrey, Some applications of the Lambert W function to physics. Can. J. Physics 78, 823–831 (2000). [130] R.K. Pathria, Statistical Mechanics, (Butterworth-Heinemann, Oxford, UK) (1996) p.77 222 [131] A. Vidiella-Barranco, Entanglement and nonextensive statistics. Phys. Lett A 260, 335–339 (1999). [132] S. Abe, and A.K. Rajagopal, Quantum entanglement inferred by the principle of maximum Tsallis entropy. Phys. Rev. A 60, 3461–3466 (1999). [133] S. Abe, Nonadditive entropies and quantum entanglement. Physica A 306, 316 (2002). [134] A. R. Calderbank and P. W. Shor, Good quantum error-correcting codes exist. Phys. Rev. A 54, 1098–1105 (1996). [135] F. Verstraete and M.M. Wolf, Entanglement versus Bell violations and their behaviour under local filtering operations. Phys. Rev. Lett. 89, 170401 (2002). [136] E. Merzbacher, Quantum Mechanics 3rd ed. (John Wiley, NY, 1998) p. 368. [137] F. Shafee, Lambert function and a new non-extensive form of entropy, IMA Journal of Applied Mathematics 72, 785–800 (2007). 223 [138] B.-Q. Jin and V.E. Korepin, Quantum spin chains, Toeplitz determinants and the Fisher-Harwig conjecture. J. Stat. Phys. 116, 79–95 (2004). [139] F. Shafee, Generalized Entropy with Clustering and Quantum Entangled States. cond-mat/0410554 (accepted by Chaos, Solitons and Fractals) (2004). [140] W. Bialek, C.G. Callan and S.P. Strong, Field theories for learning probability distributions. Phys. Rev. Lett. 77, 4693–4697 (1996). [141] D. Wolpert and D. Wolf, Estimating functions of probability distributions from a finite set of samples, Phys. Rev. E, 52, 6841–6854 (1995). [142] E.T. Jaynes, Monkeys, Kangaroos, and N, University of Cambridge Physics Dept. Report 1189 (1984). 224