Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Measurement of Free Energy by Monte-Carlo Computer Simulation Graham R. Smith A thesis submitted in fullment of the requirements for the degree of Doctor of Philosophy to the University of Edinburgh 1996 Abstract One of the most important problems in statistical mechanics is the measurement of free energies, these being the quantities that determine the direction of chemical reactions and|the concern of this thesis|the location of phase transitions. While Monte Carlo (MC) computer simulation is a well-established and invaluable aid in statistical mechanical calculations, it is well known that, in its most commonly-practised form (where samples are generated from the Boltzmann distribution), it fails if applied directly to the free energy problem. This failure occurs because the measurement of free energies requires a much more extensive exploration of the system's conguration space than do most statistical mechanical calculations: congurations which have a very low Boltzmann probability make a substantial contribution to the free energy, and the important regions of conguration space may be separated by potential barriers. We begin the thesis with an introduction, and then give a review of the very substantial literature that the problem of the MC measurement of free energy has produced, explaining and classifying the various dierent approaches that have been adopted. We then proceed to present the results of our own investigations. First, we investigate methods in which the congurations of the system are sampled from a distribution other than the Boltzmann distribution, concentrating in particular on a recentlydeveloped technique known as the multicanonical ensemble. The principal diculty in using the multicanonical ensemble is the diculty of constructing it: implicit in it is at least partial knowledge of the very free energy that we are trying to measure, and so to produce it requires an iterative process. Therefore we study this iterative process, using Bayesian inference to extend the usual method of MC data analysis, and introducing a new MC method in which inferences are made based not on the macrostates visited by the simulation but on the transitions made between them. We present a detailed comparison between the multicanonical ensemble and the traditional method of free energy measurement, thermodynamic integration, and use the i former to make a high-accuracy investigation of the critical magnetisation distribution of the 2d Ising model from the scaling region all the way to saturation. We also make some comments on the possibility of going beyond the multicanonical ensemble to `optimal' MC sampling. Second, we investigate an isostructural solid-solid phase transition in a system consisting of hard spheres with a square-well attractive potential. Recent work, which we have conrmed, suggests that this transition exists when the range of the attraction is very small (width of attractive potential/ hard core diameter 0:01). First we study this system using a method of free energy measurement in which the square-well potential is smoothly transformed into that of the Einstein solid. This enables a direct comparison of a multicanonical-like method with thermodynamic integration. Then we perform extensive simulations using a dierent, purely multicanonical approach, which enables the direct connection of the two coexisting phases. It is found that the measurement of transition probabilities is again advantageous for the generation of the multicanonical ensemble, and can even be used to produce the nal estimators. Some of the work presented in this thesis has been published or accepted for publication: the references are G. R. Smith & A. D. Bruce, A Study of the Multicanonical Monte Carlo Method, J. Phys. A. 28, 6623 (1995). G. R. Smith & A. D. Bruce, Multicanonical Monte Carlo Study of a Structural Phase Transition, to be published in Europhys. Lett. G. R. Smith & A. D. Bruce, Multicanonical Monte Carlo Study of Solid-Solid Phase Coexistence in a Model Colloid, to be published in Phys. Rev. E ii Declaration This thesis has been composed by myself and it has not been submitted in any previous application for a degree. The work reported within was executed by me, unless otherwise stated. March 1996 iii for Christina and Ken iv Acknowledgements I would like to thank the following people: Alastair Bruce for all his guidance, help and encouragement, and for never shouting at me, even when I richly deserved it; Stuart Pawley and Nigel Wilding for many useful and pleasant discussions; David Greig, Stuart Johnson, Stephen Bond and Stephen Ilett for carefully reading and commenting on the nal draft of this thesis; Peter Bolhuis for making available the results of [171]; my atmates; and all my other friends in Edinburgh and elsewhere. I also gratefully acknowledge the support of a SERC/EPSRC research studentship. v Contents 1 Introduction 1 1.1 Thermodynamics, Statistical Mechanics, Free Energy and Phase Transitions . 1.1.1 Phase Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 The Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Statistical Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 O-Lattice Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Calculation in Statistical Mechanical Problems . . . . . . . . . . . . . . . . . 1.2.1 Analytic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Monte-Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Monte-Carlo Simulation at Phase Transitions . . . . . . . . . . . . . . 1.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Review 1 3 4 9 22 27 27 30 37 43 44 2.1 Integration-Perturbation Methods . . . . . 2.1.1 Thermodynamic Integration . . . . . 2.1.2 Multistage Sampling . . . . . . . . . 2.1.3 The Acceptance Ratio Method . . . 2.1.4 Mon's Finite-Size method . . . . . . 2.1.5 Widom's Particle-Insertion Method . 2.1.6 Histogram Methods . . . . . . . . . 2.2 Non-Canonical Methods . . . . . . . . . . . 2.2.1 Umbrella Sampling . . . . . . . . . . 2.2.2 Multicanonical Ensemble . . . . . . 2.2.3 The Expanded Ensemble . . . . . . vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 46 51 54 57 59 61 63 63 65 71 2.2.4 Valleau's Density-Scaling Monte Carlo . . 2.2.5 The Dynamical Ensemble . . . . . . . . . 2.2.6 Grand Canonical Monte-Carlo . . . . . . 2.2.7 The Gibbs Ensemble . . . . . . . . . . . . 2.3 Other Methods . . . . . . . . . . . . . . . . . . . 2.3.1 Coincidence Counting . . . . . . . . . . . 2.3.2 Local States Methods . . . . . . . . . . . 2.3.3 Rickman and Philpot's Methods . . . . . 2.3.4 The Partitioning Method of Bhanot et al. 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Multicanonical and Related Methods 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 The Multicanonical Distribution over Energy Macrostates . . . . . . . . 3.1.2 An Alternative|The Ground State Method . . . . . . . . . . . . . . . . 3.1.3 The Multicanonical Distribution over Magnetisation Macrostates . . . . 3.2 Techniques for Obtaining and Using the Multicanonical Ensemble . . . . . . . . 3.2.1 Methods Using Visited States . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Incorporating Prior Information . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Methods Using Transitions . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Finite-Size Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Using Transitions for Final Estimators: Parallelism and Equilibration . 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Free Energy and Canonical Averages of the 2d Ising Model . . . . . . . 3.3.2 A Comparison Between the Multicanonical Ensemble and Thermodynamic Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 P (M ) at = c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Beyond Multicanonical Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 The Multicanonical and Expanded Ensembles . . . . . . . . . . . . . . . 3.4.2 The Random Walk Problem . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 `Optimal' Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.4 Use of the Transition Matrix: Prediction of the `Optimal' Distribution . vii . . . . . . . . . . 77 79 81 82 84 84 85 86 87 88 95 . 95 . 96 . 102 . 104 . 106 . 112 . 114 . 127 . 140 . 144 . 152 . 152 . 154 . 158 . 168 . 168 . 171 . 176 . 181 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 4 A Study of an Isostructural Phase Transition 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Comparison of Thermodynamic Integration and the Expanded Ensemble|Use of an Einstein Solid Reference System . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Thermodynamic Integration . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Expanded Ensemble with Einstein Solid Reference System . . . . . . . . 4.2.3 Other Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Direct Method|Multicanonical Ensemble with Variable V . . . . . . . . . . . 4.3.1 The Multicanonical NpT -Ensemble and its Implementation . . . . . . . 4.3.2 The Pathological Nature of the Square-Well System . . . . . . . . . . . 4.3.3 Finding the Preweighting Function . . . . . . . . . . . . . . . . . . . . . 4.3.4 The Production Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.5 Canonical Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.6 Finite-Size Scaling and the Interfacial Region . . . . . . . . . . . . . . . 4.3.7 Mapping the Coexistence Curve . . . . . . . . . . . . . . . . . . . . . . 4.3.8 The Physical Basis of the Phase Transition . . . . . . . . . . . . . . . . 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 . 191 . 198 . 200 . 209 . 212 . 214 . 216 . 219 . 220 . 232 . 236 . 242 . 246 . 250 . 254 5 Conclusion 260 A Exact Finite-Size Scaling Results for the Ising Model 264 B The Double-Tangent Construction 268 C Statistical Errors and Correlation Times 271 D Jackknife Estimators 275 E Details of the Square-Well Solid Simulation 277 viii Chapter 1 Introduction We begin by giving necessary background to the work carried out in this thesis. We shall deal with the thermodynamical and statistical mechanical notions that underpin our understanding of phase transitions, in particular, the ideas of entropy and free energy. We shall describe the role of computer simulation, especially Monte-Carlo simulation, and explain why the measurement of free energy presents particular challenges. 1.1 Thermodynamics, Statistical Mechanics, Free Energy and Phase Transitions Who could ever calculate the path of a molecule? How do we know that the creation of worlds are not determined by falling grains of sand? FROM Les Miserables VICTOR HUGO Thermodynamics and statistical mechanics are theories which describe the behaviour of a bulk material comprising large numbers of interacting particles distributed in space, for example the molecules of a solid, liquid or gas. Thermodynamics, as its name implies, is concerned with the bulk energy of the material, work done on it and heat ows in and out of it. It does not acknowledge explicitly the microscopic interactions of the particles that compose the bulk|indeed the theory evolved from 1 CHAPTER 1. INTRODUCTION 2 empirical observations and experiment at a time when the microscopic nature of materials was not understood at all. For this reason the development of the theory was itself a painful process [1] and it was not connected with the physical principles of mechanics until the work of Boltzmann and Gibbs at the end of the last century, and of Einstein at the start of this, led to the development of statistical mechanics. For a traditional exposition of thermodynamics see [2] or the rst chapters of [3]. How might we attempt to use a model of the microscopic structure of a macroscopic sample of a material to calculate its properties? In a Classical Mechanical framework, it is clear that, given the initial positions and velocities of all the particles that compose the system (i.e. the initial microstate of the system), and knowledge of the interactions, it is possible in principle to calculate the state at any later time from the known laws of dynamics; but it is equally clear that such a calculation will never be possible in practice, for two reasons: rstly, because the number of particles in a macroscopic system is O(1023 ), which is simply too large for any existing or foreseeable computer, and secondly, because it is likely that the dynamics are in detail chaotic, that is to say, the later evolution depends with enormous sensitivity on the initial state. Nevertheless, it is apparent from our observations of real systems that this microscopic complexity does not result in macroscopic complexity. Materials have well-dened bulk properties that depend only on the values of a few thermodynamic control parameters, such as pressure, temperature, magnetic eld, etc, not on the continually varying positions and velocities of all their innumerable constituent particles. Moreover, these bulk properties seem to all but the most precise measurements not to vary with time (provided the control parameters remain xed), even though there is continuous microscopic activity, and when suitably normalised they are the same for all macroscopic samples, large or small1. This gives a hint that it might be possible to construct a theory which predicts the bulk properties without any details of the kinetics and which depends in only a simple way on the number of particles|an enormous simplication. This is indeed what statistical mechanics achieves: on the basis of a knowledge only of the interactions between the constituent particles, it provides expressions for the bulk properties. We shall not give a detailed derivation of the formalism of statistical mechanics here, or connect it fully with thermodynamics, and results will often be stated without proof. 1 As long as they are large enough that the number of particles near the boundary of the sample is small compared with the number in the bulk. The bulk properties of a material are normally proportional to the quantity of material present|they are extensive. When we refer to a quantity as a density, it means that we have divided an extensive quantity by the number of particles. The result is intensive, or eectively independent of the quantity of material. The control parameters are naturally intensive. CHAPTER 1. INTRODUCTION 3 A full exposition can be found in [4] and other useful references are [3, 5, 6, 7, 8]. 1.1.1 Phase Transitions We have just noted that the bulk properties of a material, such as the internal energy density, or the magnetisation density (if the material is magnetic) depend on a few control parameters, like the temperature. Normally they vary smoothly with the control parameters, but there exist a few points where they may `jump' suddenly from one value to another, i.e. points where the material changes its properties dramatically. Where this happens we say that the material undergoes a phase transition. The property that `jumps' can be used as a so-called order parameter of the transition: by a suitable choice of origin we can make it zero in one phase and nite in the other, or of the same magnitude and opposite sign in the two phases, (which is how it is naturally dened for the model we shall introduce in section 1.1.2). Examples of phase transitions that are found in almost all simple atomic or molecular materials are the melting and boiling transitions. Here there are two obvious order parameters: one is the internal energy density, which changes by LH pV , (where LH is the latent heat of the transition, p the pressure and V the change in volume) and the other is the specic volume v = V=N (where N is the number of particles in the system) which changes more than a thousandfold at the liquid-vapour transition of water at atmospheric pressure. Transitions of this kind, where the order parameter is discontinuous, are called rst order phase transitions. Exactly at the transition values of the control parameters, volumes of the material in states characteristic of the two `sides' of the phase transition can exist in contact with one another|coexisting phases. There is another class of phase transitions, known as second order or continuous transitions, in which there is no latent heat and the order parameter itself does not change discontinuously, but instead experiences large uctuations, and its derivative with respect to the one of the control parameters diverges. An example of this is the ferromagnetic transition of iron, in which the spontaneous magnetisation (which is the order parameter for the transition) disappears continuously at a temperature of 1043 K, while the susceptibility diverges. The point in the space of control parameters where a continuous phase transition occurs is called a critical point. Critical points occur as the limit of a series of rst order transitions in which the change in the order parameter has been getting progressively smaller (though not all lines of rst order phase transitions terminate in critical points; solid-liquid melting curves do not appear to do so). The unusual phenomena associated with critical points have been the subject of a huge amount of CHAPTER 1. INTRODUCTION 4 study in the past thirty years or so [9]. Clearly we would like to know which phase will be found at particular values of the control parameters, and, particularly, at what values of these control parameters phase transitions will occur. We would also like to be able to calculate the bulk mechanical and thermodynamic properties of the material in its various phases. Statistical mechanics provides answers to these questions in principle; however we shall later see that the expressions that it enables us to write down, particularly those relating to the location of phase transitions, are dicult to evaluate accurately either analytically or computationally. The main concern of this thesis has been the investigation of various computational techniques designed to overcome these diculties. We shall explain our intentions in more detail when we have provided more background to bring the particular problems associated with phase transitions more clearly into focus. Computer simulations are introduced in section 1.2 and reviewed in more detail in chapter 2. In the remainder of section 1.1 we shall introduce relevant aspects of statistical mechanics itself. We shall illustrate our explanation of statistical mechanics using one of the models that will be widely used in the investigations of this thesis, the Ising model. 1.1.2 The Ising Model First [the gypsies] brought the magnet. A corpulent gypsy...who introduced himself as Melquades, made a showy public demonstration of what he himself described as the eighth marvel of the wise alchemists of Macedonia. FROM One Hundred Years of Solitude GABRIEL GARCIA MA RQUEZ The Ising model is a simple model of a magnetic system, where the `particles' are classical `spins' on a lattice. It is dened by the characteristic that each spin s may exist in one of two state, called `up' and `down' or `+1' and ` 1'. There is a coupling J which denes an interaction energy between the spins at sites i and j : Eij = Jij si sj . (The choice of sign is a convention.) Aside from this, the system may have any dimensionality, any type of lattice and there may also be an external magnetic eld. However we shall be particularly concerned with the Ising model on an L L 2-dimensional square lattice with the choice CHAPTER 1. INTRODUCTION 8 > < Jij = > : 5 1 if i and j are nearest neighbours 0 otherwise There is no loss of generality in choosing J = 1 since, as will become apparent in section 1.1.3, any other choice just results in a scaling of temperature. We shall impose periodic boundary conditions, (P.B.C.) meaning that spins (i; L) and (i; 1) are neighbours for all i, as are spins (L; j ) and (1; j ) for all j . Thus all spins are equivalent and each interacts with its four nearest neighbours only. The coupling is positive, so that the spin-spin energy is lower when spins are aligned parallel and higher when they are antiparallel. The ground state is therefore a state where all spins are parallel, for which reason the Jij > 0 model is known as the ferromagnetic Ising model. The total energy due to spin-spin interactions, which we shall call congurational energy and represent2 by E , is therefore given by E () = where X <ij> X <ij> si sj denotes a sum over nearest-neighbour pairs and represents a particular arrange- ment of all the spins (also called a conguration or microstate). The number of spins in the system is N = L2. The magnetisation M is simply M () = X i si which interacts with an external magnetic eld H to give an extra term HM () in the energy3. The total energy (often called the Hamiltonian) is therefore ETOT ETOT () = E () HM () (1.1) Despite the simplicity of the Ising model, it exhibits a phase transition driven by the external In the literature the symbols H, U and V are also frequently used for congurational energy. The choice of sign is, once again, conventional: one may characterise the energy of a magnetic material either in terms of the energy stored in the eld or the work done on the solid, and both conventions are in use. We have chosen to consider the second case. 2 3 CHAPTER 1. INTRODUCTION 6 eld H : at temperatures below the critical temperature Tc this is a rst-order transition, at Tc it becomes continuous and for T > Tc it disappears. The order parameter is the magnetisation M or magnetisation density m = M=N . This phase behaviour is qualitatively (and in some respects quantitatively) very similar to that of a real ferromagnet like iron. We have therefore to look at the phase behaviour as a function of the two variables H and T ; however it makes for slightly tidier notation if instead of T we use the inverse temperature = 1=kB T , so we shall do this from the start. First consider the dependence of m on H for three dierent values of (see gure 1.1). 1 2 m 3 m m mo H H H -mo β>β c β=β c β<β c Figure 1.1. Schematic diagrams of magnetisation density vs. external magnetic eld for the Ising model for three dierent values of inverse temperature. The qualitative behaviour at high H is the same in all cases; m ! sign(H ). However the behaviour as H ! 0 depends on . At high temperatures (or low ), as in graph 3 of gure 1.1, m ! 0 as H ! 0, while at high , (graph 1), there is a residual magnetisation left after the eld has gone to zero, m ! m0 ( ) as H ! 0+ and m ! m0( ) as H ! 0 . Therefore, if at inverse temperature > c , the eld is taken from just below zero to just above it, the magnetisation `jumps' discontinuously by 2m0 ( ). We have, therefore, a rst order phase transition. The point of crossover between these two regimes occurs at c (graph 2). Here there is no residual magnetisation at H = 0 but @m=@H jH =0 diverges, so there is a continuous phase transition and c is the inverse critical temperature. If the system is cooled from < c to > c at H = 0 then the system `chooses' either the +m0 or m0 state with equal probability|a phenomenon known as spontaneous symmetry breaking. The usual way of summarising phase behaviour of this kind is by way of phase diagrams. CHAPTER 1. INTRODUCTION 7 1 2 H mo βc β βc β Figure 1.2. Schematic (1) m and (2) H phase diagrams for the Ising model. The two most usual kinds are shown in gure 1.2. We may plot m0 , the order parameter for the transition, against temperature or inverse temperature; this is done in the diagram on the left. Note that if m is free to vary then those points lying between +m0 and m0 for > c do not represent equilibrium points of the system; however if we constrain the total magnetisation to be constant, then this region of the phase diagram contains states (which we call mixed states) which describe two-phase coexistence; the system under these conditions separates into large `domains' each with a magnetisation equal to +m0 or m0 , the domains having that volume which keeps the total magnetisation at its constrained value. The other kind of phase diagram is shown in the diagram on the right of gure 1.2. Here the axes are the two elds, H and ; the phase transitions then appear as a single line which in the Ising case lies along the axis (H = 0) and ends at the critical point. This line is called the coexistence curve. All the mixed states also lie on this line, because they dier only in their value of the total magnetisation. In mathematical terms, the coexistence curve is the line on which m is not a single-valued function of H and . In fact, the two phase diagrams are really projections of a surface in m; H; space, on which each point represents the value of m produced by the elds H and . This surface is drawn in gure 1.3. The (m; H ) graphs in gure 1.1 are constant- sections through it. We should remark here that both a computer-simulated Ising model and a real ferromagnet, particularly one made of a pure metal, are liable to show the phenomenon of hysteresis instead of the sudden rst order phase transition at H = 0. If the system has a positive m and H CHAPTER 1. INTRODUCTION 8 m β H β c Figure 1.3. The surface of state points in m; H; space for the Ising model. is decreased and then made negative, m remains positive at rst, and requires the applied eld to be appreciably negative to drive it to its (negative) equilibrium value. The same thing happens if H is made positive and m is initially negative. This phenomenon, generally called metastability, occurs widely in nature. Metastability interferes with the measurement of the true location of phase transitions, both in experiments and in computer simulation; it will be a frequent concern of ours to try to develop simulation techniques that are as little aected by it as possible. The two-dimensional Ising model with H = 0 is particularly interesting because many of its properties, including the phase behaviour, can be exactly calculated analytically, so it is an ideal testbed for computer simulation methods which can then be applied to other systems. There exist exact solutions for the internal energy, the heat capacity and the free energy (which we dene in the next section), both for nite systems [10] and in the N ! 1 limit [11]. In the latter case m0 can be evaluated too. A detailed exposition of many of the the exactly-known properties of the 2d Ising model can be found in [12]. A typical conguration (typical in the sense that its magnetisation lies near the peak of the probability density function (p.d.f.) of M ) p of the 64 64 2d Ising model at the critical point (H = 0, c = (1=2) ln( 2+1) = 0:440686 : : :) is shown in gure 1.4. This system is just large enough for the self-similar structure of the critical clusters of dierent magnetisations to be becoming apparent; this behaviour is one of the most interesting features of the critical point, and is central to the renormalisation group CHAPTER 1. INTRODUCTION 9 theory of continuous phase transitions. Further detail and references can be found in section 1.2.1. Figure 1.4. A typical conguration of the critical 642 2d Ising model. White corresponds to si = +1, black to si = 1. Now in order to explain this phase behaviour of the Ising model we must introduce some statistical mechanics. The initial discussion is general, but we shall return to the Ising model as a specic example to clarify the discussion of phase transitions. 1.1.3 Statistical Mechanics We shall begin by stating the equation which is the basis of statistical mechanics: P can () = exp( EZTOT ()) (1.2) The probability distribution given by this equation is called the Boltzmann Distribution. For a standard derivation of it, see [4] or [5]; for an alternative (and simpler) derivation using information theory, see [13]. We shall motivate it here by showing that it is consistent with, and illuminates connections between, the simple physical properties of bulk material that we described at the start of this chapter. We shall start by considering materials away from phase transitions, moving on to phase transitions later. CHAPTER 1. INTRODUCTION 10 Equation 1.2 relates the total energy ETOT () to P can (), the probability that the system in equilibrium will be observed in conguration . Note that only the energy of the conguration appears in this expression; there are no momentum terms and no dynamics. In the sense that a probability can be interpreted as a time-average, the dynamics can be regarded as having been averaged out, though this interpretation is not necessary to the truth of equation 1.2, as is discussed in [14]. The normalising factor Z , the partition function, is given by Z= where X f g X f g exp( ETOT ()) (1.3) is a sum over all the microstates of the system, which, for the 2d Ising model, means over all congurations of the spins. The logarithm of the partition function, the Gibbs free energy, is dened by G(; H ) = 1 ln Z (1.4) It is also a very important quantity; we shall nd (section 1.1.3) that it is g = G=N that determines the location of phase transitions. The set of microstates available to the system, over which the sum runs, is called the ensemble; here we are considering an ensemble where the system's volume, temperature and number of particles are constant. The choice of ensemble will, in general, aect how many terms there are in ETOT ; in the Ising model canonical ensemble it is in general necessary to include the HM term as well as the congurational energy E (though in most of the cases we shall consider H will be zero). However in an ensemble where M was constant, i.e. we considered only those congurations with a particular magnetisation, the HM term would be the same for all congurations and so irrelevant (cancelling out above and below in equation 1.2). Now let us consider what predictions the theory makes for the values of the bulk properties (or observables) of the system. These observables, such as internal energy, are `canonical averages,' averages over the Boltzmann distribution: we dene an operator which acts on a conguration to give a number which is the value of the property for that conguration. Then the observed bulk property will be the average of the operator over the Boltzmann distribution of the congurations of the system. For a general operator O we have CHAPTER 1. INTRODUCTION < O >= X f g 11 O()P can () = (1=Z ) X f g O() exp( E ()) (1.5) so, for congurational energy (internal energy) we have < E >= (1=Z ) X f g E () exp( E ()) (1.6) If the interactions are fairly short-ranged, as for most molecular materials, (and the Ising model) then we expect < E > N ; this is in fact a general property of what are called `normal systems' in statistical mechanics [15], [16, chapter 2]. Important systems that are not normal include electrically charged and gravitationally bound ones. Now let us consider the the heat capacity CH = @ < E > =@T . This is also known to be extensive for short ranged-interactions; if it were not we could violate conservation of energy by cutting up a system, heating it, reassembling it and cooling it. However, by dierentiating equation 1.6 it is easy to show [6, p chapter 3] that E , the r.m.s. size of uctuations in < E >, is given by E CH . Thus p E N , and so the uctuations in the internal energy density e < E > =N die away p as 1= N . For macroscopically-sized samples of material, the uctuations will be too small to observe and < E > is eectively identical to E , the most probable value of the energy. P can (E ) is thus very sharply peaked about its maximum (or mode) at E ; in the single-phase region there is only one mode. The large-N limit is also called the thermodynamic limit, since here the predictions of statistical mechanics become fully consistent with those of thermodynamics. This is a convenient time to introduce the notion of density of states, (O), where O is an operator on the microstates as before. We dene (O) = X f g (O O()) so that it is just the number of states for which the operator has the value O. The possible values of O are called the `O-macrostates' of the system. The total number of microstates is TOT = X f g 1 which may or may not be nite (though it is in the Ising case). Generally TOT increases exponentially with system size: TOT exp(cN ) for some constant c. Let us rst look at the CHAPTER 1. INTRODUCTION 12 particularly simple case of O = E , so we are considering the congurational energy macrostates. The Boltzmann distribution, where P can() is a function of E () only, implies that here all microstates within a macrostate are equally probable, so the probability of an E -macrostate is P can(E ) = (1=Z ) (E ) exp( E ) (1.7) which may also be written as P can (E ) = (1=Z ) exp[ (TS (E ) E )] or (1.8) P can (E ) = (1=Z ) exp[ F (E )] where S (E ) = kB ln[ (E )] is the entropy and F (E ) = E TS (E ) is the Helmholtz free energy, also sometimes called the free energy functional. These quantities are also extensive, and densities f = F=N and s = S=N are dened as usual. F and S dened here are in fact identical in the L ! 1 limit to the quantities for which we use the same symbols in thermodynamics. The identity can be shown rigorously [3, chapters 15-17]. Equation 1.8 makes it clear that the probability of a macrostate is controlled by an interplay of its energy and entropy. At low temperatures (large ) the energy term dominates and the most probable macrostates are those of low energy. At high temperatures (small ) the energy has less eect and those macrostates which comprise the largest number of microstates, that is, have the largest entropy, are the most probable. It is interesting to consider what the `typical' microstates of low-temperature and high-temperature macrostates are like. At low temperature, low energies are favoured, and these are best achieved, for attractive forces, by surrounding each particle with as many neighbours as possible, or, for the Ising model, by surrounding each spin with neighbours having the same orientation. These low-energy congurations are typically highly ordered and few in number: there are only a few close-packed crystal lattices, and only two ways of arranging the Ising model's spins so that they are all parallel. Conversely, at high temperatures, all microstates are equally probable, whatever their energy, so those macrostates having large numbers of microstates are favoured. However, consideration of producing congurations by placing particles or orienting spins randomly (which would generate all microstates with equal likelihood) shows that these congurations will almost always CHAPTER 1. INTRODUCTION 13 have rather high energy. Thus the notion of entropy as a measure of order begins to emerge. It is interesting to see how such an apparently unquantiable idea becomes associated with the multiplicity of congurations of a particular energy. This happens because the nature of the force is such that low energy demands very ordered congurations, and these are geometrically restricted to be few in number, while the overwhelming majority of the much more numerous disordered congurations have high energy. We can now rewrite the canonical averages as averages over the macrostates; for example < E > = (1=Z ) X = (1=Z ) X E E E (E ) exp( E ) E exp[ F (E )] F (E ) and P can (E ) are shown in gure 1.5. We showed above that the fractional uctuations p in E about E die away like 1= N , so P can (E ) is very sharply peaked about its maximum at E (the study of the scaling of probability distribution functions and the related canonical averages with system size is known as nite-size scaling theory [17]). Since P can (E ) exp[ F (E )], the behaviour of P can (E ) leads to the thermodynamic principle that a system seeks out the value of E that minimises F (E ). p In fact, the extensivity of F (E ) is another conrmation of the fact that P can (e) 1= N ; if we write P can (e) in terms of the bulk free energy density f (e) = limL!1L dFL (eLd) (writing eLd for E ), it is found that PLcan(e) exp( Ldf (e)) exp( f (e)=2e2 ) Thus e2 = L d=2, and so E2 = Ld=2 , i.e. P can (E ) is a Gaussian with half-width Ld=2 We can describe the Ising model's magnetisation in a similar way to that in which we described the energy: < M >= (1=Z ) X f g M () exp[ E () + HM ()] As was the case with < E >, it can be shown that, away from the critical point, consideration CHAPTER 1. INTRODUCTION 14 exp(F(E)) F(E) can p (E)~exp(-F(E)) δE <E> E Figure 1.5. A Schematic diagram of the free energy functional F (Ep) and the macrostate probability P can (E ). Note that < E > N while the half-width E N . of the extensivity of < M > and of the magnetic susceptibility = @M=@H , leads to the p conclusion that the uctuations in M grow only as N , so the uctuations in m = ML d about its maximum m disappear as L ! 1. Thus < M >! M , where M is the most probable4 value of M . It will be noted that there is an obvious similarity in the behaviour of the uctuations in E and M : in each case the uctuation is related to the response function Y = @Y=@y where Y is E or M and y is the eld that couples to it, either T or H . Such behaviour is in fact extremely general; it is described by the uctuation-dissipation theorem [16, chapter 8] which relates the uctuation Y in a quantity Y to the response function Y . The density of magnetisation states is now (M ) = X f g (M M ()) This time all microstates within an M -macrostate are not equally probable, because they may have dierent congurational energies. Therefore the appropriate free energy functional F (; M ) is now dened by At the critical point, however, we have already said that diverges. This implies that the uctuations in become large, and this is indeed the case: they are large both in absolute size (M is almost extensive) and in spatial extent, extending over the entire system. In this case < M >6! M . See appendix A for a futher discussion. 4 M CHAPTER 1. INTRODUCTION 15 exp[ F (; M )] = X f g (M M ()) exp[ E ()] (1.9) so that the probability of an M -macrostate becomes P can(; M ) = (1=Z ) X f g (M M ()) exp[ (E () HM ())] = (1=Z ) exp[ F (; M ) + HM ] and we can write < M >= (1=Z ) Z= and X M X M (1.10) M exp[ F (; M ) + HM )] exp[ F (; M ) + HM ] X G(; H ) = 1 ln exp[ F (; M ) + HM ] M Now let us extend the discussion from the single phase to phase transitions. Taking the 2d Ising model as a paradigm, we shall now use equation 1.10, along with the results for the behaviour of P can (; M ) as a function of system size, and physical arguments about the nature of the favoured congurations at various temperatures, to explain the appearance of the dramatic jumps in canonical averages that we know are characteristic of rst-order phase transitions. We now write FL (; M ) instead of F (; M ) to make clear its dependence on the nite-size of the system. Consider, therefore, the shapes of FL (; M ) for some nite L above and below c. For the 2d Ising model these can be shown [6, chapters 5-6], [17, chapter 11] to be as illustrated schematically in gure 1.6. Diagram (1) describes the situation at high temperatures. In this regime the probability of a macrostate is dominated by its multiplicity, and the eect of the (average) energy of the congurations is small. Thus the favoured macrostates are those around M = 0, which correspond to spins being chosen with random orientation. Thus PLcan (; M ) has a single maximum, and FL (; M ) has a single minimum at M (0) = 0, describing the single phase that exists there. The limiting bulk free energy density fb (; m) = limL!1L dFL (; mLd) is approached quite quickly as L increases and thus (when viewed on the right length scale) looks CHAPTER 1. INTRODUCTION 1 16 F 2 L F L M M Figure 1.6. Schematic diagram of the free energy functional FL(; M ) (1) for < c, (2) for > c . very similar to FL (; M ), as shown in diagram (1) of gures 1.6 and 1.7. 1 f 2 m f m Figure 1.7. Free energy density f (; m) in the limit N ! 1, (1) for < c, (2) for > c. The behaviour at low temperature is dierent, as shown in the second diagram of gure 1.6. FL (; M ) now has two minima at M which will describe the two phases of the system, and correspondingly fL(; m) has two minima at m. These come from the dominance of energy over entropy at low temperatures; low energy congurations are favoured even though their multiplicity is low, and these congurations tend to have almost all their spins aligned either up or down, that is to say, they have nite magnetisation. The symmetry of FL (; M ) CHAPTER 1. INTRODUCTION 17 follows from the symmetry between positive and negative magnetisation here. Correspondingly, PLcan (; M ) must be symmetric with two maxima. However, the scaling this time is 8 > < FL (; M ) > : Ld for M > M and M < M Ld 1 for M < M < M The rst line here is normal scaling behaviour; the microstates that dominate FL (; M ) have a fairly uniform magnetisation. We shall digress for a while to explain the scaling in the region between the modes of P can(; M ). The Interfacial Region. The Ld 1 scaling in the region between the two minima occurs because, although there are of course very many microstates where the magnetisation is roughly uniform throughout the conguration (these are the `typical' microstates at high temperature), these have relatively high energy and are strongly suppressed at low temperature. The best way the system can achieve low energy when M 6= M is to go into mixed states, state of phase coexistence, where congurations consist of regions of magnetisation density m and +m separated by interfaces (these are thus also known as interfacial states). The overwhelming low-temperature contribution to FL (; M ) comes from such microstates. These regions have a free energy density fb (; m ) which is typical of the bulk states while the interfaces introduce an interfacial free energy fs . fs is a free energy because there is an internal energy necessary to create a particular interfacial area and an entropic part related to the number of ways that such an interface can be positioned in the system in order that the phases of dierent magnetisations are present in the right proportions to produce the overall magnetisation M . For a nite-sized system in the interface region one can therefore write F (; M ) Ld fb + c(M )Ld 1fs (1.11) The second term is the eect of the interface. Its `area' in d space dimensions is Ld 1 while c(M ) is a geometrical factor determined by the shape of the interface. As well as M , c(M ) also depends on the boundary conditions of the system. It is given by the Wul construction [18]. The existence of the interface has interesting eects on the behaviour of the system. Its CHAPTER 1. INTRODUCTION 18 contribution to the total free energy goes to innity with L, but more slowly than that of the bulk free energy. Therefore its fractional contribution to the total free energy is cLd 1fs =Ldfb = cfs =Lfb ! 0 as L ! 1, and so in the thermodynamic limit it has no eect on bulk properties, nor on the location of phase transitions. This is reected in diagram (2) of gure 1.7, which looks qualitatively dierent from the corresponding diagram of gure 1.6. We see that the limiting fb (; m) as L ! 1 is constant between +m and m, lacking the central maximum of FL (; M ), which disappears as 1=L. Thus the limiting fb (; m) is convex, as required by thermodynamical arguments (see for example [3, chapter 8]). However we would not be right to conclude that the interface has no eect at all on the system in the thermodynamic limit. At coexistence P can(Minterf )=P can(M ) / exp( cLd 1 fs ) ! 0 as L ! 1 (1.12) so that the interfacial macrostates are exponentially suppressed compared to the pure states. Therefore in order to see the coexistence of phases in the Ising system it is necessary to constrain the total system at constant M , so that the system is forced into one of the mixed states. If we had looked only at f (m) to try to predict this behaviour, that is, if we had taken only the leading term (Ld f (m)) in the expansion of FL (; M ) in L, we would have concluded, wrongly, that P can(Minterf )=P can(M ) ! 1, implying that even in the thermodynamic limit a system with an unconstrained magnetisation should be able to pass freely between +m0 and m0. The presence of this large interfacial region of macrostates whose probability goes to zero in the large system limit, but whose free energy density approaches that of the pure phases, provides a large measure of explanation for certain non-equilibrium properties of statistical mechanical systems; in particular, metastability. For example, we now see that to change the sign of M at the phase transition, given that the spins must be ipped piecemeal, requires the creation of an interface between regions of +ve and ve magnetisations; that is, it implies the necessity of passing through the unlikely interfacial region. Similar considerations aect all rst-order phase transitions, and are a bugbear of computer simulations of them (see sections 1.2.3 and 4.3). Of course, this explanation is not the whole story because it takes no account of the dynamical mechanism by which the metastable state eventually decays, which is known to be via the nucleation of a droplet of some critical size [17, chapter 11], which then grows. We would not CHAPTER 1. INTRODUCTION 19 necessarily expect the order parameter M of the whole system to be a suitable quantity for studying this (although some very recent work [19] has in fact suggested that a surprisingly good description of the relaxation of metastable states may be obtained from consideration of M alone). We now return to the main thread of the argument. We shall now write an expression for the probability of a phase, which is the sum of P can (; M ) over those values of M characteristic of each phase, which we shall take as being M > 0 and M < 0 here. In fact, of course, because of the shape of P can(; M ), only those M -values within O(Ld=2 ) of M are really `characteristic' of the phase; but equally very little error (and an error which is increasingly small as L increases) is made by including all the states with +ve magnetisation in one phase and all with -ve magnetisation in the other. Therefore, labelling the phases with A and B , PAcan (; H ) = X M 2A P can (; M; H ) = ZA (; H )=Z (; H ) (1.13) where ZA is dened as ZA (; H ) = X M 2A exp[ F (; M ) + HM )] and similarly for B (we retain the HM term for generality though for the Ising model transition occurs at Hcoex = 0 because of the symmetry of F (; M )). The relative probabilities of the two phases are PAcan =PBcan = ZA =ZB and we can dene restricted Gibbs free energies on a particular phase alone: GA (; H ) = 1 ln ZA (; H ) so that PAcan =PBcan = exp( [GA (; H ) GB (; H )]) (1.14) CHAPTER 1. INTRODUCTION 20 At phase coexistence, PAcan (; Hcoex ) = PBcan (; Hcoex ) = 1=2 so ZA(; Hcoex ) = ZB (; Hcoex ) and GA (; Hcoex ) = GB (; Hcoex ) demonstrating the fundamental result that the condition for a phase transition is that the specic Gibbs functions of the two phases should be equal. However, the shape of P can (; M ) means that the quantities GA and GB can be related to the basic free energies G(; H ) and F (; M ): ZA(; Hcoex ) = ZB (; Hcoex ) = Z (; Hcoex )=2 so GA (; Hcoex ) = GB (; Hcoex ) = G(; Hcoex ) 1 ln 2 that is, gA (; Hcoex ) = gB (; Hcoex ) = g(; Hcoex ) + O(1=N ) so to within a correction which vanishes in the large N limit, gA and gB are also equal to the specic Gibbs function of the system considered as a whole. The narrowness of the peaks of P can (; M ) is also responsible for the increasingly abrupt change in < M > at the transition. Consider the case when the specic Gibbs functions of the two phases are not quite equal, because of a slight change H in the applied eld from its equilibrium value. Since only states very near to M contribute, we have PAcan=PBcan exp(2 HM ) (1.15) If > c , M > 0 and M < M > N . Therefore any macroscopic H will cause PAcan =PBcan ! 0 or 1 depending on its sign, or, put dierently, we can make PAcan =PBcan < by applying ln H = 2 < m >N CHAPTER 1. INTRODUCTION 21 Therefore we can see the `jump' in the order parameter emerge from the exponential dependence of the probability of a phase on the size of the system, though of course if H is very small there may be metastability problems. If < c , then < m >= 0 and the response to H is smooth. The fact that P can (; M ) is so sharp about its maxima also enables us to write the equilibrium condition in another way. If, under the action of eld H , the macrostate of maximum p probability in phase A is M (H ) with probability P can (M ), then PAcan NP (M ). Thus even if we constrain M = M we can dene g~A (; H ) (1=N )(F (; M ) HM ) = = (1=N ) ln[exp( F (; M ) + HM )] " p (1=N ) ln (c= N ) = gA (; H ) + O( lnNN ) X M 2A # exp( F (; M ) + HM ) where c in the 3rd line is a constant of order unity. Thus in the large N limit L d(F (; M (H )) HM (H )) ! gA(; H ) (1.16) So even a knowledge of F (; M (H )), determines the Gibbs free energy of the entire phase, if we also know M (H ). A practical method that uses this to determine the phase coexistence eld Hcoex (where this is unknown) is the double-tangent construction. This is described (in the context of an o-lattice system) in appendix B. As the temperature increases, the dominance of the energy over the entropy becomes weaker and m becomes smaller. Finally, at c , the rst order phase transition disappears. The scaling of FL (c ; M ) is somewhat unusual and is discussed in appendix A. It has the property that P can (0)=P can(M ) = constant ( 0:04), independent of L, but the two-peak structure remains, leading to large (almost extensive in the system size) uctuations in the order parameter M . CHAPTER 1. INTRODUCTION 22 1.1.4 O-Lattice Systems Let us now extend the discussion of phase behaviour from the Ising model to the more familiar melting and boiling transitions. Once again we model the material under consideration as a system of particles with some position-dependent potential energy acting between them, though the potential will usually be appreciably more complicated than in the Ising case. A clear dierence is that this time the position coordinates may take a continuous range of values; such systems are known as o-lattice systems, as opposed to lattice-based systems like the Ising model. The total number of microstates is therefore not denumerable. However the same analysis carries over in slightly modied form: P can () and P can (E ) etc., become probability densities, and we write the partition function (here at constant volume) as a congurational integral: Z (; V ) = Z V exp( (E ())d (1.17) R Where E () is the congurational particle-particle energy and the shorthand d refers to the Nd-dimensional integral over the coordinates of all N particles in a d-dimensional space. The canonical averages are also dened as integrals in the obvious way. To examine the phase behaviour it proves best to analyse this model in an ensemble with V allowed to vary controlled by an external pressure eld p, since this corresponds to real phase coexistence where the pressure is the same in the two phases. Therefore the total energy of a conguration is ETOT = E () + pV () where pV is the work that the system has to do against the external pressure p to reach volume V . It then follows from 1.2 that the p.d.f. of and V is P can(; V ) = exp( pV )Zexp( E ()) p with the partition function Zp = = 1 Z Z 0 0 1 dV exp( pV ) Z V d exp( E ()) exp( pV )Z (; V )dV where Z (; V ) is the constant ; V -partition function dened in equation 1.17. The logarithm CHAPTER 1. INTRODUCTION 23 of Zp is related to a Gibbs free energy G(; p) = 1 ln Zp (1.18) and as before we dene a related intensive free energy density, g(; p) = G(; p)=N , and a free energy functional F (; V ) = 1 ln Z (; V ) so that exp[ G(; p)] = X V exp[ pV F (; V )] and the free energy of a phase is exp[ GA (; p)] = X V 2A exp[ pV F (; V )] where A denes the set of volumes characteristic of a phase; and F (; V )) P can (; V ) = exp( pV ) exp( Zp (1.19) We remark immediately that while F (; V ) will often have a double-well structure it will not in general possess the symmetry of the Ising model's F (; M ), which will lead to phase transitions occurring at p 6= 0, whereas all the Ising model's occur at H = 0. To expand on this, let us begin by considering a typical solid-liquid-vapour phase diagram of a continuous system, as shown in gure 1.8, and comment on it using the formalism we set up in the previous section, comparing it with the Ising model's phase diagram (gure 1.2). By examination of the energy functions, we identify the external eld energy terms HM (Ising) and pV (continuous) and pair o the corresponding intensive variables H (Ising) with p (continuous), and the extensive variables M (Ising) with V (continuous). The analogy with the Ising model is clearest in the liquid-vapour part of the continuous system's phase diagram, so let us ignore the solid phase for now. The p and H diagrams both show a coexistence line which ends in a critical point; at high there are two phases (liquid and vapour or the +M and M phases), while at < c there is only one. The V and M lines both show a U-shaped coexistence region with its axis roughly along the temperature axis (the convention of drawing the temperature axis vertical for the continuous CHAPTER 1. INTRODUCTION 24 s+v p gas (g) TrL Solid (s) CP β s+l l l+v v liquid (l) CP s TrP s+g g vapour (v) β V Figure 1.8. Schematic p and V phase diagrams. CP is the critical point, TrP the triple point and TrL the triple line. system and horizontal for the magnet obscures the similarity a little). Nevertheless there are dierences in detail: the magnetic system has a clear symmetry about H = 0 in its phase diagrams that the continuous system lacks. As we have said, this is a result of the fact that there is a complete correspondence of microstates between the two phases of the magnet; for each microstate with positive magnetisation there exists another, its `photographic negative', with -ve magnetisation. In the uid there is no such symmetry between the liquid and vapour phases. The statistical mechanical description is also very similar. Once again, the requirement that the response function @V=@p should be extensive for each phase, while still being related to the uctuations of the order parameter, leads to the result that PLcan(p; V ) becomes sharply peaked about its mode (or modes, very near coexistence), with a width such that V=V N 1=2 . The free energy functional F (; V ) for the continuous system is qualitatively similar in shape to F (; M ) for the magnet. At high temperatures entropy dominates and there is only one uid state, a gas, for which the energy is high, because the particles are widely separated, but so is the entropy, because each particle can explore the whole volume of the system. At low temperatures the energy term becomes more important and FL (; V ) develops two area which are locally convex, one centred on states with high volume (and thus high entropy but also high energy) which form the low-density vapour phase and the other centred on states with low volume (and thus low entropy but also low energy) which form the high-density liquid phase. CHAPTER 1. INTRODUCTION 25 Nevertheless, the vapour-like states still have the lower F and thus, at p = 0, enormously higher weight. However, the convexity of FL (; V ) means that by imposing a suitable nite pcoex it is possible to produce a PLcan (; V ) that has two separated maxima such that X V 2A PLcan (pcoex ; V ) = X V 2B PLcan(pcoex ; V ) clearly implying GA (pcoex ; V ) = GB (pcoex ; V ) which is just the same as was derived for the Ising model. The narrowness of the modes means that analogues of equations 1.15 and 1.16 also apply. Equation 1.16 can be used to estimate pcoex by the double-tangent construction even if only a part of F (; V ) is available; see appendix B. We digress briey to mention that there has been some controversy as to where the analogue of the transition should be regarded as occurring in a nite-size system|whether it should be when the two peaks of PLcan (V ) have equal weights, or when they have equal heights. The controversy arises from the asymmetry between liquid and vapour, which causes the two peaks of PLcan (V ) to have dierent shapes|indeed, because of this asymmetry, the phase transition in this type of system is known an an asymmetric rst-order phase transition. Both criteria are compatible with the expression just given for the transition point in the innite volume limit: the values of the eld variables required to produce both equal-heights and equal-weights approach the same limits. However, the dierence is important when trying to identify the transition point of a nite system in a computer simulation, and both methods have been used: the authors of [20] favoured the equal-weight criterion while those of [21] used equal height. The problem is discussed in [17] and [22]. Recently it has been established [23] that, for latticebased systems, using equal weights will give estimators of the control parameters at coexistence that have smaller discrepancies from the innite-volume limit. However, it is not clear that the analysis applies to o-lattice systems. As before, the region of low probability between the peaks is dominated by states exhibiting phase coexistence. As for the Ising model, these states are unstable unless the order parameter is constrained. Equation 1.12, and the appropriate analogue of equation 1.11, also apply. Let us now consider the solid-uid phase transitions, which are a new feature of the continuous system's phase behaviour. In the solid phase the energy is near its minimum, since the CHAPTER 1. INTRODUCTION 26 highly ordered structure maximises the number of close neighbours that a particle can have, but the entropy is also low because each particle is held in the `cage' formed by its neighbours and can move only in a small volume around its lattice site. As can be seen from the p T phase diagram, if we begin from low temperature and pressure then at rst there is a solid-vapour coexistence curve, which has a junction with the liquid-vapour coexistence curve at the triple point (marked TrP in gure 1.8) where solid, liquid and vapour can all coexist (i.e. all have the same specic Gibbs free energy). After this a solid-liquid coexistence curve continues upward. It is generally called the solid-gas coexistence curve once the pressure is higher than pc , since after this heating a dense uid at constant p will not produce another phase transition to a less dense uid. An obvious qualitative dierence between the solid-uid transitions and the liquid-vapour transition is that the solid-uid transitions do not end in critical points (at least, no experimental evidence for such behaviour has ever been found). This seems to be because the solid and uid have qualitatively dierent structures: in the solid there is clear long-range crystalline order and particles are localised on their lattice sites, while in the liquid and vapour there is no such order and particles may wander throughout the volume of the system. It is hard to envisage a mechanism which could allow one type of structure to merge continuously into the other, as would have to happen at a critical point. It is interesting to note that the liquid and vapour do have qualitatively identical structures in this sense, even though the dierences in bulk properties such as density between the phases may be very large|far larger than the dierence in the same property between the solid and the liquid. Though the absence of a critical point means that the order parameter description of the transition does not carry over in all its details from the uid case, the usefulness of the concept means that one is often dened. Volume (or density) of the system will once again serve, since the solid and uid always have dierent densities. Analysed like this, the solid-uid phase transition looks very much like the liquid-vapour one|the probability PLcan(p; V ) at a suitable pressure again has an (asymmetric) double peak structure, the transition occurs when the specic Gibbs function g is the same for both solid and uid, and the region between the maxima is dominated by mixed phases. There are also other possible order parameters that are more closely linked with the obvious dierence in structure of the two phases, related to the average number of nearest neighbours or to the structure factor of the system [24]. CHAPTER 1. INTRODUCTION 27 1.2 Calculation in Statistical Mechanical Problems All exact science is dominated by the idea of approximation. BERTRAND RUSSELL The formalism of statistical mechanics that we have described provides in principle for the solution of the problem we stated in section 1.1.1: that of nding which phase is stable at particular values of the control parameters, and in particular of nding where any phase transitions occur. We need a model for the potential, of course, but having this we have expressions for the partition function, canonical averages and the weights of phases in terms of this potential, and these tell us which phase is favoured at particular values of the control parameters. Knowing this we can in principle nd those values of the control parameters where both phases have equal weights, and so can construct the phase diagram. However evaluation of these expressions, consisting as they do of sums over all the congurations of the system, is in practice extremely dicult, and can only be carried out for a few particularly simple forms of E (), whether we are dealing with a continuous or a discrete space of congurations. The subject of this thesis is of course the use of computational methods in the evaluation of the necessary expressions, and we shall give a basic overview of computer simulation in sections 1.2.2 and 1.2.3, as an introduction to chapter 2, where we review in depth the various ways that the phase coexistence problem has been tackled computationally. But rst let us look at analytic methods. 1.2.1 Analytic Methods There do exist analytic methods for evaluating the partition function and canonical averages of statistical mechanics: there are some `exact' results and some approximate methods of general applicability. However, as we shall see, most of these approximate methods fail in the situation that is of interest to us, viz. the calculation of the location of phase boundaries. Non-interacting systems, i.e. systems in which Z can be expressed as a product of singleparticle partition functions, are generally soluble with ease, but they are not of much physical interest. A number of exactly soluble systems which do contain particle-particle interactions have been found. The 2-dimensional Ising model in zero eld is one example [11]; various `embroidered' extensions of this can also be solved, see [12]. Some properties of the Potts CHAPTER 1. INTRODUCTION 28 model, a generalisation of the Ising model, can also be found analytically [25, 26]. The other exactly soluble models, for example the 6-Vertex model, tend not to provide much further insight into `real' systems; for more details of their behaviour see [26] and [27]. The amount that is known about these systems varies depending on the model; for example g(; 0) is known at all temperatures for the Ising model but is only known for the Potts models at their phase transition temperatures. All the exactly soluble models are lattice models having only one or two spatial dimensions, and they are solved by Transfer Matrix techniques [26, 27], which are not applicable to other systems. The diculty with exact solution is that we must calculate how many congurations there are with a particular energy, density or magnetisation, but because the the number of congurations is so vast, and the potential describing the interaction couples together the degrees of freedom of the particles5 , this calculation quickly becomes a problem in combinatorics which is in soluble analytically in only a few special cases. Mean Field Theory is an attempt to deal with an interacting system by eectively reducing it to a non-interacting one. We do this by considering a single particle and treating its interactions with all the other particles in the system by averaging them out so that they form a continuous `background' eld. The `background' eld depends on bulk properties that we do not know, but we can solve the problem by enforcing self-consistency: we calculate the properties of the single particle that are produced by a particular `background' eld, and then demand that these properties would, if extended to all the other particles, be such that they would produce the desired background eld. In [6, chapter 5] and [3, chapter 20] the 2d Ising model with H = 0 is treated in this way. The successes and deciencies of the technique are immediately apparent: In one dimension it is qualitatively wrong, while in 2d and 3d it predicts qualitatively the right behaviour but its answers are wrong in detail (the critical temperature is too high). The reason for this lies in the fact that the eects of uctuations and cooperative phenomena are ignored; the results of the technique are therefore best at low or high temperatures and worst at phase transitions, especially continuous ones. Moreover, there is no way of extending the theory systematically to obtain successively better results (although there are heuristic criteria applicable in particular circumstances). It is interesting that the performance of MeanField theory gets better as the spatial dimensionality d of the problem increases; in four or more dimensions it is essentially exact for the Ising model. This occurs because the number All physically interesting potentials have at least pairwise interactions, (E = contain three-body, four-body, etc terms. 5 P i<j Eij ) and may also CHAPTER 1. INTRODUCTION 29 of `nearest neighbours' of any one particle in particular increases with d, and so the size of uctuations as a fraction of the mean eld diminishes. Series expansions [16, chapter 4], [28] result from perturbation techniques; the partition function is expanded as a series in some convenient parameter about a state where it is known exactly. For example we can expand about T = 0 in T , (low temperature series), or about = 0 in (high temperature series), or about = 0 in . These can equally be regarded as expansions about the fully ordered state(s), or the entirely random states. Series expansions for simple energy functions like those of the Ising or Potts models have (with great eort) been continued to upwards of 40 terms [29]. With careful analysis (and using techniques which were developed using insight obtained from other methods) they can give useful results even around phase transitions, but convergence tends to be slow, and results are not better than those of computer simulation even for the longest series. This is unsurprising, because the expansion is about the temperatures/densities where a particular phase is at its most stable, which is inevitably distant from the phase transition point, where it is becoming unstable. The expansion parameter thus cannot be `small' and many terms will be required. For complex energy functions the expansion can only generally be carried to a few terms and results are correspondingly poorer still. The renormalisation group (RG) ([9, volume 6], [30, 31], and [6, chapter 5] for an introductory reference) is a method of evaluating the partition function in stages, summing over some of the degrees of freedom at each stage and then dening a `renormalised Hamiltonian' on the remaining ones that in some sense `looks like' the one that existed at the previous stage. This denes a mapping which, with appropriate approximations, leads to results for the free energy and canonical averages (see [30]). The process is in principle exact, but quickly becomes extremely complicated to carry out. It was developed to deal with the a critical region, where the self-similar physical structure of the system when viewed on dierent length scales is echoed by the similar structures of the sequence of renormalised Hamiltonians. We note at this point that analytic RG calculation can be supplemented by a powerful numerical technique called the Monte Carlo Renormalisation Group [32]. Fluid systems can often be usefully be treated by methods which aim at producing an equation of state, the equation relating p, and . One approach is approximate solution of the Ornstein-Zernike equation, which links the potential function to the radial distribution function g12 (r) (dened in section 4.3.5). From g12 (r) an equation of state may be derived. For CHAPTER 1. INTRODUCTION 30 a general account see [33]; for a particular method of approximate solution, the Percus-Yevick closure, see [34]. Results are often good for simple uids like the hard-sphere uid, but the best schemes are not systematically improved but are the result of heuristic combination of results obtained by using dierent approximations. Moreover, g12 (r) is very dierent in the liquid and vapour phases, and so an approximation scheme that describes one phase well is not good for the others. The consequence is that no single model predicts phase transitions; they must be treated by using a dierent model for each phase and estimating the free energy by integrating the equation of state (assuming we have a reference free energy for the liquid state from some other source). Another approximation scheme is the method of cluster expansions [35], which relates the potential function to functions known as f-functions that describe the interactions amongst successively larger groups of particles. The equation of state can then be expanded in terms of integrals involving f-functions. However once again the method becomes very complicated when taken beyond the lowest orders. 1.2.2 Monte-Carlo Simulation Given that the partition function and canonical averages cannot be calculated analytically for the vast majority of systems of interest, can they be calculated, or at least estimated, by computer? It is indeed the case that they can, and there are at least two broad approaches to the problem. In both cases a model of a system of particles interacting with the potential of interest is set up in the computer. Clearly the number of particles N that is found in macroscopic samples of a material, O(1023 ), is out of the reach of simulation, but in fact it is frequently found that the macroscopic properties are observed at least qualitatively, and often quantitatively as well, for N = O(1000) and sometimes even for N = O(100) or less. Moreover techniques for the extrapolation of results from nite samples to the thermodynamic limit are now well developed (see [17] and appendix A), even where the nite-size of the computer simulation has its greatest eect, that is, near continuous phase transitions. There are two broad approaches to computer simulation in condensed matter physics: molecular dynamics (MD) and Monte-Carlo (MC) simulation. In molecular dynamics simulation, the force produced on each particle by the potential is calculated and the particles are moved in accordance with the Newtonian equations of motion. However doing this immediately moves us away from the attractive simplicity of the absence of kinetics in the statistical mechanical picture of many-body systems. This picture is the foundation of the Monte-Carlo method, which CHAPTER 1. INTRODUCTION 31 will be our concern throughout the thesis. An introductory reference to molecular dynamics, which we shall not discuss any more, is [36]; more detail can be found in [33, 37, 38]. There are also various methods that attempt to combine the best features of the two, for example `hybrid Monte-Carlo' [39, 40] and constant-temperature molecular dynamics, the `Hoover-Nose Thermostat' [41]. The MC method is basically a technique for evaluating multidimensional integrals; in this case the integrals (or sums) that dene the partition function and canonical averages. There are a great many general references to the method; see for example [42, 43, 44]. A specic reference to the problem of simulating uid systems is [45], and phase transitions are discussed in [46] (mainly for lattice models) and [47, 48] (mainly for o-lattice systems). For a system with a discrete conguration space, one might at rst imagine that one could evaluate the partition function directly by simply summing exp( E ()) over all the congurations of the system (we neglect eld terms for now; they do not aect the argument). However, because the number of possible congurations increases exponentially with increasing system size, it passes out of the range of even the fastest computer quite soon after it passes out of the range of calculation by hand. The same problem prevents the use of normal numeric integraR tion routines to nd exp( E ()) in a system with a continuous conguration space. The dimensionality of the problem is Nd, where N is the number of particles and d is the dimensionality of space. A numerical integration routine would require the evaluation of the integrand exp( E ()) at a `grid' of points nely spaced enough for it to be smooth [49]. However if the grid is such that there are mp points along each coordinate axis, then the integrand must be evaluated mNd p times, which once again grows exponentially with N . For example, in 3d with 10 particles and a grid with just 10 points along each axis, 1030 evaluations would be required; this is already out of the range of computability even though the choice of 10 points along each axis would be far too small if the interparticle potential were realistic and the system were fairly dense. A ner grid would be required because the integrand would be zero almost all the time, whenever there was at least one `overlap' between the repulsive `hard cores' of the particles (corresponding to the strong repulsive forces produced by inner-shell electron in real atoms and molecules), and for those few points on the grid where there were no overlaps the integrand would vary too sharply for a numeric integration routine to give an accurate picture of its behaviour. Obviously, then, we cannot exhaustively denumerate the state space, so to proceed we must CHAPTER 1. INTRODUCTION 32 take a sample of congurations and use them to estimate the quantities of interest. This sample, while it may well with modern computing power contain millions of congurations, will nevertheless include only a tiny fraction of the congurations that exist. The simplest way to produce the sample would be to generate the congurations at random, so that all microstates are generated with equal probability. This is known as `Simple Sampling'. For example, we might make a conguration of particles by generating the x,y and z coordinates of each particle with a uniform probability on [0 : : : L] where L is the length of the side of the box containing the particles; or we might make a conguration of Ising spins by chosing each spin randomly in the +1 or 1 orientation. Suppose we generate Nc congurations. Then Z e=:b: Z~ = TOT N c X Nc i=1 exp( E (i )) Nc O( ) exp( E ( )) i i < O >e=:b: O~ = i=1 PN c exp( E ( )) i i=1 P (1.20) where `e=:b:' means that the right hand side is an estimator of the left. It is indeed true that < Z~ >= Z and < O~ >=< O >. However they are not good estimators for most cases, because Nc must be extremely large|O( TOT )|before it is at all likely that Z~ Z and O~ < O >. The problem is the same one that was described above when talking about numerical integration|the vast majority of the states in the sample are likely to have a very high energy (because they contain at least one overlap) and so a very small value of exp[ E ()]. As we discussed in section 1.1.3, the expressions for Z and the canonical averages are dominated by these high energy congurations only at very high temperatures; at more normal temperatures the congurations that are important have lower energy but because there are so few of them compared to the high energy ones they are generated by the simple sampling technique only at extremely lengthy intervals (and intervals whose length increases exponentially with Ld). We see that what is required is a biased sampling scheme that generates much more frequently those congurations that dominate Z and the canonical averages. This is known as importance sampling. Suppose we have an algorithm that generates microstates such that state i appears with probability Pi = P (i ). Then we call the vector Pi i = 1 TOT the sampled distribution. (We shall employ a vector notation P for fP ()g here and later (especially in chapter 3), for the set of macrostate probabilities). CHAPTER 1. INTRODUCTION 33 An obvious choice of sampled distribution that satises our requirements for most statistical mechanical problems (with the major exception of the phase coexistence problem) is the Boltzmann distribution itself: we call a simulation that samples this distribution a Boltzmann sampling simulation. However it is not at rst obvious how to do this because we do not know the partition function Z . The problem is the same for any other choice of sampled distribution; we will in general have P () = Y ()=Z where the function Y ( exp( E ) for Boltzmann sampling) is easily specied calculated for P any conguration, but its normalisation Z = fg Y () is unknown and hard to calculate because it involves a sum over all congurations. The known function Y to which the unknown probability distribution is proportional is called the measure of the probability distribution, an expression coming from the statistics literature [50, 51]. A solution, which notwithstanding its age is still the basis of most MC work done in condensed matter physics, is the Metropolis algorithm [52]. The central idea of this method is to generate a sequence of microstates by evolving a new trial microstate j from the existing one by making some (usually small) random change to it. Now we arrange, in a way described below, to move to this new state with a probability P (i ! j ) = ij . If the transition is accepted, i is replaced by j which is then itself used as the source of a new trial microstate. Otherwise we try again, generating another trial microstate from i . P ij is called a Stochastic Matrix, meaning that it has the property that j ij = 1. Its size is TOT TOT . Note that ij depends only on the present state i ( i ) and the trial state j , not on any other state that the chain has passed through in getting to i. This property is called the Markov property and the sequence of congurations produced is called a Markov chain. ij can be thought of as describing the time evolution of the probability distribution of the microstates. Suppose we have Pi (0) at time t = 0 (if we know exactly that we are in state i0 at this time then Pi (0) = (i i0 )). Then at t = 1 we have Pi (1) = X j Pj (0)ji (1.21) Of particular interest is the (left) eigenvector of ij with eigenvalue 1, which we write simply CHAPTER 1. INTRODUCTION 34 as Pi . It is the vector of equilibrium microstate probabilities: Pi = X j Pj ji (1.22) P represents what is called the stationary probability distribution of the chain|the probability distribution of microstates that is invariant under the action of , and so remains unchanged once it is reached. An extremely important property of a Markov chain is that it can be shown [50] that the sampled distribution converges to P as n ! 1 for any choice of initial state (i.e. any Pi (0)) as long as is such that the chain is ergodic, meaning that any state can be reached from any other in nite time. In many situations the convergence is rapid. This means that we can use the Markov chain with transition matrix as a way of producing microstates with probability distribution P . We must perform `equilibration', discarding the early congurations, which are unlikely to be typical of equilibrium, while the sampled distribution converges6 to P ; then after that congurations will indeed be drawn with probability P . Now we must consider how to choose to produce the desired probability distribution P . There are many possible solutions because there are 2TOT components in and only TOT constraints coming from the components of P . We normally choose to observe detailed balance, which means taking Pi ij = Pj ji with 8 > < R Pi < Pj ij = > ij : R P =P ij j i otherwise where Rij is an arbitrary symmetric matrix; its signicance is discussed below. That this choice satises equation 1.22 can be easily veried by substituting for ji . The power of the method comes from the fact that the components of P (the equilibrium probabilities) enter only as ratios, so that we have overcome the problem of the unknown partition function|it simply cancels out, and we need only the measure of the distribution: Pj =Pi = Yj =Yi . For example, to sample from the the Boltzmann distribution we choose: 6 It is not possible except by observation to say when equilibration has occurred; in practice this is often a problem where equilibration is slow, as occurs near phase transitions CHAPTER 1. INTRODUCTION 8 > < R Ej < Ei ij = > ij : R exp[ (E ij j Ej )] otherwise 35 (1.23) The matrix Rij describes which other microstates are accessible from a given microstate, so it is determined by the particular choice of algorithm for generating the trial move. Another way of putting this is to say that R determines the `pseudodynamics' of the simulation (as we have seen, the physical dynamics of the system are not a part of a MC simulation). R is normally extremely sparse, because we choose the new conguration by making only a small modication to the present one: for example the single-spin-ip Metropolis algorithm, where microstates i and j dier only in the movement of a single particle or the ipping of a single spin. Most microstates are therefore inaccessible in one move. We are forced to make this choice because, if we allow a trial transition to any other conguration, we shall once again be sampling the states of highest entropy, so it is almost certain that the energy of the trial conguration will be very much higher than that of the starting one. Therefore, the acceptance ratio of Monte-Carlo moves ra becomes exponentially small. We discuss the eect that this has below and in section 1.2.3. We implement the algorithm in practice in two stages: rst we select a trial move from i to j by a process for which Rij is symmetric (such as a ip of a random spin or a random small displacement of a randomly chosen particle), then we evaluate = Yj =Yi . If is greater than one we accept the move; if it is less than one we accept it with probability . This is normally done by generating a pseudorandom variate X [49, chapter 7] on the interval [0 : : : 1] and accepting the move if X < . Let us rst consider a Boltzmann sampling simulation. Canonical averages are obtained very simply in this case: Nc X < O >e=:b: O = (1=Nc) Oi i=1 Oi is the result of applying the operator O to the ith microstate of the Nc generated (by this we mean that we generate Nc trial updates of the Markov chain; not all of them will be accepted, so sometimes the same microstate will reappear several times in succession). In practice, it is more ecient, particularly for a long chain, to store a histogram fC g of the number of times CHAPTER 1. INTRODUCTION 36 the chain visits each of the j = 1 : : : Nm macrostates of every operator O of interest. Then < O >e=:b: O = (1=Nc ) N m X j =1 Cj Oj Because the MC method uses random numbers to generate a sample of the congurations of the simulated system, the estimates of thermodynamic quantities that it produces are associated with a statistical error. We consider this error in detail in appendix C. Here we conne ourselves to quoting some of the important results: full derivations of them can be found in this appendix and [53]. We quantify the statistical error in our estimate using the Standard Error of the Mean (O)2 = (O < O >)2 If all congurations were uncorrelated, this would be simply related to the variance of O (var(O) =< O2 > < O >2 ) thus < (O)2 >uncorr = var(O)=Nc (1.24) However, because the MC method generates a new conguration by making a small change to the existing one, adjacent congurations in the Markov chain are in practice always highly correlated. The consequence of this is that the standard error of the mean is larger than equation 1.24 suggests; we have < (O)2 >corr (1=Nc)var(O)(1 + 2O ) (1.25) where O is the correlation time of the observable O. This can be measured by expressing it in terms of correlation functions i . We consider O mainly in section 3.4.2, where we shall use it as an analytic prediction of the variance of estimators from various sampled distributions. However to measure the standard error of O it is easier use a direct method. To do this we block the congurations into m = 1 : : : Nb blocks (O(10) is enough) and estimate O on each block [45, chapter 6]. Then we measure the mean and variance of the blocked means fOm g, and use the simple formula < (O)2 >= var(O )=Nb CHAPTER 1. INTRODUCTION 37 since the blocks should be long enough for the block means to be uncorrelated (if they are not then Nc is not large enough for good results anyway). A variant of this is to dene Jackknife estimators OJm on all the blocks of congurations except the mth, and then to nd the mean and variance of these (see appendix D). We should note that, whatever algorithm we are working with, we must expect to have to update all the particles (or at least a constant fraction of them) to get uncorrelated congurations. This implies that the best we can do is have O =t Ld. It has been the norm in MC simulation to choose the sampled distribution to be the Boltzmann distribution, as described above. However other sampled distributions can be used, and may in fact be superior, particularly for problems involving phase transitions (for which, as we have mentioned, the Boltzmann distribution does not perform well; in terms of the ideas of correlation times that we have just introduced we would say that near phase transitions O can be very large indeed). The investigation of alternative (non-Boltzmann) sampled distributions that are better matched to this problem has been the key concern in this thesis. In section 1.2.3 below we describe why Boltzmann sampling gets into trouble; chapters 3 and 4 contain results of investigations of various non-Boltzmann sampled distributions. 1.2.3 Monte-Carlo Simulation at Phase Transitions Let us consider the ways that statistical mechanics suggests we could try to nd the values of the control parameters and H (or p) that produce a phase transition in a particular system. Section 1.1.3 suggests two ways in which we could try to do this. We shall discuss each in turn and explain why conventional MC methods encounter diculties in each case. It was shown in section 1.1.3 that the phase behaviour of a system is reected in the probability distribution of the order parameter, with phase coexistence occurring when the two phases have equal weight. So an obvious approach would be to simulate in an ensemble with a variable order parameter that embraces the two phases, and measure its probability distribution directly, eventually nding the values of the control parameters where the two phases have equal weight. This method can be considered as measuring the free energy dierence between the two phases; we saw in equation 1.14 that PAcan =PBcan = exp[ (GA GB )], which implied that at coexistence gA = gB = g + O(1=Ld). Note that it is not necessary to know the absolute free energies GA and GB themselves. The other method would be simply to measure the absolute free energies GA and GB of CHAPTER 1. INTRODUCTION 38 the two phases separately, and then compare them. This is particularly attractive if there is a diculty in crossing the phase boundary, as happens for example in the case of solid-uid transitions: it requires a very long time for the solid to crystallise out of the uid and any crystal formed is likely to contain defects and grain boundaries. We shall show below how the absolute free energy of a phase can also be expressed as a canonical average. As a variation on this, we remark that equation 1.16 shows that it is not necessary to know F (; V ) (to use the o-lattice example) for all V 2 A to estimate GA (; p), as long as V (p) is known. As we have already commented, this is the basis of the double-tangent construction which is described in appendix B. However, we will more often use the calculation of G as a canonical average in this thesis. Free Energy Dierences Let us examine the rst method rst, in the context of measurement of P can (; M ) by Boltzmann sampling for the 2d Ising model. Suppose we have H = 0 and > c so we are actually at the phase coexistence and should nd that the two sides of the distribution have equal weight, that is, that there is no free energy dierence between the two phases. However, let us imagine that we do not know that H = 0 is the phase coexistence eld, only that it is a reasonably close approximation (so that P can (; M ) does have two maxima), which we are trying to rene. The diagrams in gure 1.9 illustrate the problems faced by Boltzmann sampling Monte-Carlo. All show the underlying distribution PLcan (; M ) (it has roughly this shape for all c ) sampled by the simulation, and diagrams A2 and B2 also show the function MPLcan(; M ) which gives the weight with which each part of the macrostate space contributes to the canonical average < M >. We also show some possible data histograms of visited-states produced after a short run ((A) gures) and after a long run ((B) gures). These histograms would give the estimate for the probability of each phase. The accuracy of our assessment of whether we are at phase coexistence and of any canonical averages obtained is clearly limited by rw , the average time required to travel between the peaks. Thus, we see that after the short simulation we have not visited both sides of the distribution. Only after a long run have enough `tunnelling' events between the two sides of the distribution occurred to give us a good idea of their relative weight and to outline the shape of the whole of P can(; M ). The accuracy obtained is limited not by the total number of congurations generated, which could be millions, but by the number of tunnelling events. The presence of such `bottlenecks' in conguration space can cause the results CHAPTER 1. INTRODUCTION 39 A1 A2 M can P (M) M can MP (M) M can P Estimation of <M> from short run (M) and Histogram, short run B1 B2 M can P (M) can MP (M) M M can P (M) and Histogram, long run Estimation of <M> from long run Figure 1.9. P can(; M ); MP can(; M ) and possible data histograms for an Ising model at c . of even a very long simulation to be very poor. In such a case it is said that the simulation suers from ergodic problems. Note however that the shape of the distribution within each peak is obtained quite rapidly, so an an estimate of < jM j >A within phase A from the short run would be quite accurate, even though the estimate of < M >, which depends on both phases, is very poor. We understand, then, that adequate sampling will only be obtained with a run much longer than rw ; but how long is this likely to be? The part of the sampled distribution that has the greatest eect on rw is of course the region of low probability between the peaks, through which the simulation has to pass to get from one peak to the other. At criticality, rw L4, because the relative heights of the centre and peaks of the p.d.f do not change with L. However at > c we are in the region of rst-order phase transitions, and here, as we described in section 1.1.3, to pass through the region around M = 0 we must create an interface between the two phases, with a free energy cost / Ld 1fs . Thus with Boltzmann sampling we must wait on average for a time CHAPTER 1. INTRODUCTION 40 M / exp(fs Ld 1) (1.26) before a uctuation in the energy that is large enough to do this occurs. Unless L is very small or is very close to c , this exponential slowing down is so severe that in any run of practicable length the simulation will remain eectively trapped in one phase and never tunnel to the other. We took as a premise when starting this discussion that the eld H was close enough to zero that PLcan (; M ) has in reality roughly equal weights in the two phases (equation 1.15 shows how sensitive PLcan(; M ) is to the applied eld; for there to be roughly equal weights implies H = 0 O(L d) for the Ising model). In fact, of course, for most systems, particularly o-lattice models, Hcoex is not determined by symmetry and determining it is our major concern. But the analysis we have just given shows that with Boltzmann sampling the estimate of PLcan (; M ) obtained at the coexistence point is indistinguishable from points in the single-phase region. We can do no more than put a wide bracket on Hcoex: Hcoex is certainly less than a pressure Hh that drives a simulation started in the high-M phase into the low-M phase, where it is then observed to stay, but it is certainly more than Hl that allows a simulation started in the low-M phase to pass into the high-M phase where it then stays. Hl and Hh must be at least far enough from Hcoex that FL (; M ) Hl M and FL (; M ) Hl M are convex everywhere. We should note that the shape of PLcan (; M ) causes problems only because the the structure of the matrix Rij that determines the pseudodynamics is such that the simulation has to pass through the region of low probability between the peaks in order to get from one peak to the other. In this respect, the pseudodynamics of the Metropolis MC simulation resemble the dynamics of a real system, which must also evolve by small steps, and the consequences are similar [53]: the ergodic problems of the simulation correspond to the tendency of real systems to exhibit metastability (section 1.1.2). Rij is, of course, under our control, but nave attempts to improve the algorithm do not succeed: as we explained in section 1.2.2, generating a new conguration by making a large random change to the existing one produces an exponentially small ra |easily small enough to negate the eect of a larger average change in magnetisation. For lattice models only, the tunnelling problems can in fact be overcome: there is a class of algorithms called cluster algorithms [55, 56] that are able to generate new congurations with a large M without necessarily incurring a large positive E . Hybrid MC [39, 40] is able to some extent to do the same thing. Whilst these methods are extremely eective, we have CHAPTER 1. INTRODUCTION 41 taken the simple Metropolis algorithm as a `given' in this thesis and concentrated instead on improving performance by the use of non-Boltzmann sampled distributions. With a cluster algorithm, Boltzmann sampling is an excellent strategy|the region between the peaks does not contribute much to the weight of a phase or to < M >, and time spent there is time that cannot be devoted to sampling the peaks where most of the weight lies. However with the Metropolis algorithm a very substantial improvement is obtainable by chosing a dierent sampled distribution that puts more weight between the peaks and so reduces rw , even at the cost of a reduction in the sharpness of the denition of the peaks themselves (see section 4.3 and much of chapter 3). In fact, even after choosing a better sampled distribution, rw remains large, because of the width of PLcan(; M ); the two important regions of macrostate space, the peaks at M , are separated by a distance that grows like Ld. In the case where the sampled distribution is roughly at between the peaks and the same height as them, we still require on average the `random walk time' 2 1 2 M rw = r M (1.27) a to travel between the peaks, where the average change of magnetisation is M . As we have seen, increasing M fails to have the desired eect because of a dramatic fall in ra ; it is unusual in MC simulation to settle for an acceptance ratio of less than 1=3. Therefore rw L2d L4. How this relates to O is discussed in section 3.4.2. Measuring Absolute Free Energies Faced with the diculties described above, we may try to avoid the interface region entirely and measure the free energy of each phase separately. In this case we require absolute free energies, so that the two phases can be compared. These can in fact be derived from averages over the Boltzmann distribution of operators which are exponentials: for the Ising model with H = 0, we nd < exp(E ) >= (1=Z ) so that X 1d Z = 2L = < exp(E ) > 2 CHAPTER 1. INTRODUCTION and 42 G( ) = 1 (L2 ln 2 ln < exp(E ) >) This is in fact G( ) of the entire system, not just of a single phase (because H = 0 is the coexistence line), but nevertheless serves to illustrate the principle; and if we restricted the algorithm to generate only congurations with M 0 (say), the result would indeed be the free energy of that phase. However, attempting to implement this method with Boltzmann sampling is again very unsatisfactory. In this case, the problem is not that the sampled distribution produces ergodic problems but simply that the average to be measured and the sampled distribution put their weight in dierent macrostates. Consider gure 1.10, which shows schematically the situation in the case we have just described, for the sampling of energy macrostates in the 2d Ising model: Estimation of < O 1 (E) > O1 (E) P can (E) P can (E)O1 (E) E O2 (E) P can (E) and Histogram E P can (E) Estimation of < O 2 (E) > P can (E)O2 (E) E Figure 1.10. A schematic diagram of a Boltzmann sampling distribution over energy and a suitable (O1 ) and an unsuitable (O2 ) operator to estimate from it. The leftmost diagram shows a typical sampled distribution for a Boltzmann sampling simulation and a histogram that might be generated from it. This distribution is well suited to estimating < O1 > where O1 (E ) and O1 (E )P can (E ) are shown in the upper right illustration. However P can (E ) is unsuited to estimating < O2 > (lower right illustration) because O2 (E ) CHAPTER 1. INTRODUCTION 43 increases so fast with E that P can (E )O2 (E ) has a lot of weight in one of the tails of P can (E ). exp(E ) is, of course, an operator of this kind. The error in the estimate of P can (E ) is larger in the tails because there are fewer counts, and the eect of this error on < O2 > is magnied by multiplying by O2 . Finally it normally happens, as shown in the diagram, that P can (E ) becomes so small for states far into the tail that no counts are recorded there, and so no contribution is made to < O2 > even though P can (E )O2 (E ) is still large. This is liable to happen whenever P can(E ) 1=Nc, which soon arises since P can(E ) exp( cLd) in the tail. Thus MC sampling cannot be be used to evaluate averages like < O2 > if the largest values of P can (E )O2 (E ) are produced by the multiplication of a very small value of P can(E ) by a very large value of O2 (E ). This problem can also be ameliorated by using a non-Boltzmann sampled distribution, one that extends over the region where O2 exp( E ) is large. We explain how an estimate of the canonical average < O2 > can be extracted from this in section 3.1. Indeed, the Ising problem that we have just described will be the test-bed for an investigation of this `single phase' method in chapter 3. In the Ising case we shall look at O2 (E ) exp(E ); for an o-lattice model, for example a uid in the NpT -ensemble, we would need to evaluate < exp[(p p^)V ] >, where p^ is small. 1.2.4 Discussion We have introduced the theory of Statistical Mechanics and a method of computer simulation, the Monte-Carlo method, that is naturally related to it. We have described the problem of nding the location of phase transitions, and how it relates to the concept of free energy, and we have described two approaches by which the MC method could be used to tackle this problem. We have also explained why MC simulation in its most easily-applied (Boltzmann Sampling) form fails for both of these approaches. Our task in this thesis will be to investigate ways in which the diculties may be overcome by the use of of MC simulations which sample from distributions other than the Boltzmann distribution. We shall look at methods for generating and applying these distributions in chapter 3, and shall produce some new results for the behaviour of the p.d.f. of its magnetisation. Then in chapter 4 we shall apply the method to investigate a system of topical interest, the square-well solid. But rst we shall review the extremely extensive literature on the problem of free energy measurement. Chapter 2 Review And what there is to conquer By strength and submission, has already been discovered Once or twice, or several times. FROM Four Quartets T S ELIOT As we saw in chapter 1, the problems of Monte-Carlo simulation of phase transitions, particularly rst-order phase transitions, centre around the diculty of measuring the appropriate free energy or free energy dierence. If we keep the order parameter of the transition constant, then we are faced with measurement of a Helmholtz free energy, for which a Boltzmann sampling algorithm is not suitable because it does not sample the high-energy congurations. If we allow the order parameter to vary, then there is a large free energy barrier separating the two phases. Because the simulation's pseudodynamics constrains it to move in short steps through its conguration space, it takes an exponentially long time to cross this barrier. A large amount of work has already been done on solving these problems. Progress up to 1986 is described in the review articles by Binder [46] and Frenkel [47], while [48] covers some developments, especially for dense liquids, up to 1989. A more recent short review can be found in [57]. Two recent methods, that have been the basis of the work done in this thesis, are the multicanonical ensemble [58] (or [59] for a review) and the expanded ensemble [60]. We shall now give a brief description of the important methods, going over again some of the ground covered in the reviews [46, 59, 47, 48], but also bringing in the newer methods. We 44 CHAPTER 2. REVIEW 45 avoid a detailed description of the multicanonical ensemble at this stage; such a description can be found in chapter 3. We shall divide the methods described into three categories as follows: 1. Methods which nd a free energy by expressing it in terms of some other quantity which is more readily evaluated in Boltzmann sampling Monte-Carlo simulation. We shall call these integration-perturbation methods. They are (a) Thermodynamic Integration (b) Multistage Sampling (c) Bennett's acceptance ratio method. (d) Mon's Method (e) Widom's particle insertion method We shall also include in this section a description of a relevant related technique: (g) Histogram methods 2. Methods which sample from a distribution other than the canonical Boltzmann distribution with a constant number of particles. We shall call these non-Boltzmann methods. They are (a) Umbrella Sampling (b) The `Multicanonical Ensemble' and its variants. Originally due to Berg and Neuhaus, we describe work by those authors and others (Lee, Rummukainen, Celik, Hansmann and others). (c) Expanded Ensemble due to Lyubartsev et al., called `Simulated Tempering' by Marinari and Parisi. We also describe related methods by Geyer and Neal. (d) Valleau's Density-Scaling (e) the Dynamical Ensemble (f) Grand Canonical Monte-Carlo (g) the Gibbs Ensemble 3. Others. Mostly these are coincidence counting methods, which try to measure the probability of an individual microstate. They are CHAPTER 2. REVIEW 46 (a) Coincidence Counting (Ma's Method) (b) Local States (Meirovich, Schlijper) (c) Rickman and Philpot's function-tting method. (d) The partitioning method of Bhanot et al. We shall describe each method individually and discuss its advantages and disadvantages, before discussing and comparing them with one another. 2.1 Integration-Perturbation Methods 2.1.1 Thermodynamic Integration Thermodynamic integration (TI) may perhaps be considered the standard method; certainly it is the easiest way to perform free energy calculations because a Boltzmann sampling program that may already exist can often be used with little or no modication. A review of some of the many applications can be found in [47]. It has been the norm to use constant-NV T simulations in these calculations; all the examples given here assume that this is the case. Applied in this fashion (and assuming that the order parameter is constant in this ensemble), the method does not allow the direct determination of the whole probability density function (p.d.f.) of the order parameter (V say), but rather measures F (V ) for a particular V . However there is no reason why it cannot be implemented using constant-N; p; T simulations, in which case equations analogous to those below (equation 2.1 etc.) lead to G(p). In whatever form, the method relies on the fact that the derivatives of free energies may often be related to canonical averages that are easily measured by Boltzmann sampling, for example @ (F ) =< E > @ V which implies F ( ) 0 F (0 ) = Z 0 < E > d (2.1) (2.2) To use this equation, we measure < E > by simulation at constant V for a series of -values connecting and 0 , closely spaced enough that the shape of < E ( ) > is welldetermined. Then the integration is performed numerically, typically by using a 5- to 10-point CHAPTER 2. REVIEW 47 Gauss-Legendre quadrature [61]. It is important that the path of integration should not pass through a rst-order phase transition; if it does, then < E ( ) > will itself be poorly determined because of metastability problems, and it will vary so fast with that the determination of its shape will be dicult. As a result, the accuracy of the numerical integration will be drastically reduced. and 0 are known, so F ( ) is determined if we know F (0 ). This state at 0 must therefore be chosen to be a reference state of known free energy. It usually corresponds to either a very high or very low temperature. In the high temperature limit, at 0 = 0, we have Z = exp( F ) = V N for an N -particle uid with a soft potential. If the system is a uid with a hard core in its potential, then the system reduces to the hard sphere uid, a system for which the free energy is known with good accuracy from analytic calculations [62]. At very low temperatures (high 0 ), a solid with a smooth potential (to take a dierent example) approaches a harmonic solid. Examples of the use of equation 2.2 in calculation can be found in [63, 64, 65], which use a low- reference state, and [66, 67, 68], which use a high- one. Another equation that is often used is @F = p @V T which leads to F (V ) F (V0 ) = Z V V0 pdV (2.3) and we measure p using (see [45]) p = NkB T + < P > where P = (1=3) XX i j<i (2.4) rij @E@r(rij ) ij This time the usual reference state is one of high V , for which the virial expansion may be used to obtain p(V0 ) and thus F (V0 ). Near the reference state equation 2.3 may be badly behaved; better numerical stability is obtained by using F () F (0 ) = where is the density. Z pd 0 2 (2.5) CHAPTER 2. REVIEW 48 TI based on equation 2.5 has been used so often in the literature that we shall not attempt to give an exhaustive list; two recent examples are [69, 70]. Both the above are examples of what might be called `natural' TI; the integration path is of the type that might be followed in an experiment on a real system. However, a Monte Carlo (MC) simulation is more exible than a laboratory calorimetry experiment, since the reference system may dier from the system of interest in the form of its congurational energy as well as in its control parameters, and indeed need not correspond to any `real' system at all. It is in fact common to use `articial' methods where we take advantage of this greater freedom to change the details of interaction between the particles. In this case we usually introduce a parameter which controls the change of the interaction, so that by varying it we can smoothly transform the system under investigation into the reference system for which the free energy is known exactly. If we write E = E (; ) then from the denition of F R d(@ER (; )=@) exp( E (; )) d exp( E (; )) = @E@() @F ()=@ = (2.6) (2.7) where indicates that the canonical average is evaluated by a Boltzmann sampling simulation with congurational energy E (). We now have an equation of the form of equation 2.1; the desired free energy is obtained by casting it into the form of equation 2.2 or 2.3 and integrating, as in the `natural' examples. A typical application is Frenkel and Ladd's method [71], designed for solid systems. We shall use a variant of this method in section 4.2. Here E () = E0 + U where U is the energy of an Einstein solid with its lattice sites at req i (The Einstein solid is a crystal composed of non-interacting point particles each attached to its lattice site by a harmonic spring), so N X 2 E () = E0 + (ri req i ) i=1 CHAPTER 2. REVIEW 49 In almost every application the extra interaction U is added linearly, so that < @E (; )=@ > is just < U >. The desired free energy F (0) is found from F () = F (0) + Z 0 max < U >0 d0 It is apparent that F () is equal to Fein , which is exactly known, only in the unmeasurable limit ! 1, but Frenkel and Ladd use a 1= series expansion of F () Fein to correct their large- results. They carry out simulations on hard spheres to to investigate the relative stabilities of fcc and hcp hard spheres, nding that fcc is marginally the more stable structure. Their results agree with those of [72]. Another useful technique for measuring free energies of solids is the single-cell occupancy method of Hoover and Ree [72, 73]. Equation 2.5 is used for the integration, and|the essential component of the method| each particle is constrained to stay throughout within the WignerSeitz cell formed by its neighbours. This does not aect the solid phase, where diusion of the particles is in any case prevented by their neighbours, but stops the rst order melting transition that would otherwise occur when the density falls suciently. The reference system is thus a solid which is vastly expanded, so that the interaction of the particles is extremely small and the partition function can be calculated exactly (it is similar to the ideal gas, except that each particle has available to it only a fraction of the total volume). However it should be mentioned that there is some evidence [66, 67] that a second-order or weak rst-order phase transition does takes place in spite of all eorts, and because of this extra computer time is needed in this region for equilibration and to capture the shape of < E (V ) >. Advantages and Disadvantages As we have said, a major advantage of the method is that it may require very little modication of an existing Boltzmann sampling routine to use it, though an `articial' TI method will usually need some alteration to insert the extra potential (U above). The major disadvantage comes from the inability to handle phase transitions: when using TI in phase transition problems, we do not usually have even the option of measuring the free energy dierence between the phases directly. They have to be treated separately, and each linked with its separate reference state. Whether or not this causes any diculty depends on the length of the integration path required, and whether or not it is easy to nd an integration path that does not cross a phase CHAPTER 2. REVIEW 50 transition. It may be dicult to avoid such a path: we have seen that for solids this requires the use of a trick like the single-cell occupancy method, and the same problem may well arise in uid problems, where integration from the dense uid (liquid) phase to a very dilute state would cross the liquid-vapour line. One way to solve this problem would be to integrate around the critical point, though this seems not to have been tried for a simple uid. In [74] the phase transition was prevented articially by suppressing density uctuations, but this clearly has the disadvantage that congurations with signicant canonical weight may be suppressed. The total time required by the method depends, then, on the number of simulation points required on the integration path, and so depends on the particular problem. If the path is long, or if F (V ) has some regions where its higher derivatives are large, then many points will be needed, and if some of the simulation points have ergodic problems (i.e. if equilibration at constant V is a problem), then they will take a long time to obtain. The extreme example of this is a phase transition. Conversely, however, if F (V ) is well-behaved then TI will be very ecient and is likely to outperform most other methods listed here, not least because of the way it scales with system size. Many methods require the number of `simulation points' or the equivalent to increase at least as Ld=2, because they require that adjacent simulation points have some likely congurations in common, and the size of the `uctuations' in a canonical ensemble increase as Ld=2 while the size of the system ( Ld) determines the separation of the reference state and the state of interest. With TI this is not so: the separation of simulation points depends only on the smoothness of the integrand and need not increase much with L at all. (see section 2.1.3 for further comments). One example where a very smooth integrand has enabled a very large system to be investigated is [75], discussed further in section 2.1.4 below. Estimating the error in measurements of free energy made by TI is not easy to do, and must be counted a disadvantage of the method. An estimate can be obtained by the blocking procedure of section 1.2.2, or by looking at the nite-size scaling behaviour of the estimate (which is done in [71], leading to a claim of an accuracy of 0.1%). However the total error thus obtained may be an underestimate of the true error, because it includes only the eect of random uctuations, while the eect of, for example, rapid variation in < E ( ) > which is not well captured for the chosen spacing of simulation points, is to put in a systematic error which is not detectable simply by repeating the simulation with dierent random numbers. Errors of this kind are found in our investigations in section 3.3.2. In some systems ergodic problems may also aect the estimates of < E >. It is also important, though time consuming, to equilibrate CHAPTER 2. REVIEW 51 each simulation separately; it has been found [76] that failure to do this may cause hysteresis. A common way to reduce the amount of equilibration that must be done is to use the nal conguration of one simulation as the starting conguration of the next. In the case where TI is done in an ensemble in which the system has a constant order parameter, so the result is F rather than G, it is easiest to nd the coexistence curve by using the double-tangent construction, described in the context of a uid simulation in appendix B (see also [47]). This avoids the necessity of mapping out the whole of F . The process is easier once one coexistence point has been found, because the Clausius-Clapyron equation dp H d coex = V may be used to predict how p must change for a given change in to keep on the coexistence curve. Integrating the Clausius-Clapyron equation using a more complex predictor-corrector method is known as Gibbs-Duhem integration [77]. 2.1.2 Multistage Sampling In this method, (also known as the `overlapping distributions' method) the idea is to measure the free energy dierence between two canonical ensembles, dened on the same conguration space but with dierent values of the eld variables, by using the overlap of the p.d.f.'s of some macrostate. The method seems to have been used rst by McDonald and Singer [78, 79]. Later implementations include Valleau and Card [80], and the method has in fact been fairly widely used [54, 64, 65]. To see how it works, consider distribution functions of the internal energy in two canonical ensembles at temperatures 0 and 1 . We have P0can(E ) = (E ) exp( 0 E )=Z0 and P1can(E ) = (E ) exp( 1 E )=Z1 Clearly, we can measure P1can and P0can from Boltzmann sampling MC simulation; let the estimators derived from histograms of visited states be P~1can and P~0can . As we have said before, only the unknown value of (E ) prevents us from estimating Z . But we can eliminate it between CHAPTER 2. REVIEW the two equations: 52 Z1 =Z0 = [P~0can (E )=P~1can (E )] exp( (1 0 )E ) So we can now estimate Z1 =Z0 = exp(0 F (0 ) 1 F (1 )), as long as the state E is such that we obtain both P~0can (E ) and P~0can (E )|that is to say, as long as the measured probability distributions P~1can and P~0can overlap. If they do overlap, it obviously makes sense to use all the energy states in the overlap region, which gives us P~ can (E ) exp( ( )E )dE 0 F (0 ) 1 F (1 ) = ln ov 0 R ~ can 1 0 ov P1 (E )dE R (2.8) If one of the states (at 1 say) is a reference state of known free energy, equation 2.8 now gives us F (0 ). A similar equation results from considering the p.d.f.'s of the magnetisation (or whatever the order parameter is) at dierent values of the eld H , and using the overlap region to eliminate exp[ F (; M )] (compare the equations for P0can(E ) and P1can (E ) with equation 1.10). However, this simple implementation runs into trouble for more than small free energy dierences between small systems, because the two measured p.d.f's will fail to overlap. This problem becomes more acute as the system size increases: as we discussed in section 1.1.3, the fraction of the possible energy states of the system which have a canonical probability signicantly different from zero goes like N 1=2 , making overlap with the reference state harder to achieve. In [80] this problem is solved in a fairly obvious way by generating `bridging distributions': a series of simulations is performed using modied Boltzmann weighting Pj (E ) / exp( 1 E=j ) for a set of coecients fj g in the range 1 =0 < j < 1 (i.e. eectively for a range of temperatures between 1 and 0 ). The set fj g is chosen so that adjacent distributions overlap and the `coldest' bridging distribution overlaps with the Boltzmann simulation at 0 (say) while the `hottest' overlaps with the simulation at 1 . Equation 2.8 is applied repeatedly to eliminate all but F (1 ) and F (0 ) (though if the process is applied between 1 and any intermediate temperature it can be used to nd F there too). This implementation is what gives the method the name `multistage sampling.' If the temperature is varied, as above, we commonly choose the reference state to be at innite temperature, 1 = 0, when the free energy is often known exactly, or to a good approximation, as described in section 2.1.1. In [80], Valleau and Card test the method on hard spheres with Coulomb forces and report results with a quoted 1% error in good agreement with results obtained by other methods. They CHAPTER 2. REVIEW 53 also point out that the exponential bridging distributions used are not the most ecient, being sharply peaked, though of course they require hardly any modication of a normal Metropolis MC program to produce. It is interesting to note that, if the two distributions overlap almost entirely, then R ~ can ov P1 (E )dE 1 and we need sample only the ensemble at 0 , evaluating < exp[ (1 0 )E ] >0 . This case corresponds to the evaluation of < O > for the `suitable operator' O1 in gure 1.10. However, unless (1 0 ) is quite small, exp[ (1 0 )E ] will be more like O2 (from the same gure) and the estimator of < O2 >< exp[ (1 0 )E ] >0 obtained will accordingly be bad for the reasons described while discussing this gure. The normal multistage method thus oers a way of overcoming the problem of incompatibility of sampled distribution and operator (provided that the p.d.f.'s at 0 and 1 overlap to some extent), by sampling both ensembles. We can see intuitively that the estimator of < O2 >, which tends to be an R underestimate, will be increased by dividing by ov P~1can (E )dE < 1. A variation on this temperature-changing version of multistage sampling, called Monte Carlo recursion, has been developed by Li and Scheraga. They express the free energy dierence as a sum of terms of the form ln < exp[(i i+1 )E ]) >i and use acceleration-of-convergence techniques to extrapolate it to = 0. The method has been applied to the Lennard-Jones uid with 32 particles [81], and to two models of liquid water [82]. In [82] a comparison of the method is made with TI and multistage sampling, and it is found that the eciency is about the same. In [82] fairly good agreement with experiment is obtained. A simpler version of the same technique, with temperature doubling at each stage, has recently been rediscovered [83]. The multistage method, as we have described it, is, like TI, only suitable for sampling along a path on which there are no rst order phase transitions, because at the phase transition (which, let us say, occurs at ) there is a discontinuous `jump' in E , which, if it is larger than the size of typical uctuations, is likely to prevent overlap of the p.d.f.'s P can ( ; E ) and P can (+ ; E ). Thus, like TI, it can only be used for nding the free energy of a single phase separately, not for measuring a free energy dierence by sampling through the coexistence region. It is possible to overcome this problem by performing a series of simulations all at , each with an articial constraint on E to keep it in some range of values narrow enough that all macrostates have appreciable Boltzmann probability, and overlapping with its neighbours. Then by matching the probabilities in the overlap regions, it is possible to reconstruct the whole p.d.f. of E across the transition region, though with the caveat that equilibration of CHAPTER 2. REVIEW 54 congurations containing interfaces may be extremely slow. Some authors, however, would describe this kind of implementation as umbrella sampling (see below). The same problem arises with the p.d.f. of the order parameter, with replaced by the appropriate value of the external eld, and it can be overcome, to a certain extent, in the same way. Advantages and Disadvantages The multistage sampling method has very similar advantages and disadvantages to TI, though it may be considered that in its concentration on probabilities it leads to free energies in a rather simpler and more transparent way. On the other hand, the method does demand overlap of the p.d.f's of adjacent simulations, which, as we have seen, is not necessarily the case for thermodynamic integration. Like TI, multistage sampling has the advantage that it requires little modication of existing Boltzmann sampling routines, but also like TI it cannot, in its simplest form, deal with rst order phase transitions on its path of integration. It is similarly vulnerable to poor estimation of the canonical p.d.f. caused by problems of slow equilibration or free energy barriers in any of the substages of the simulation, particularly those at low temperature. Whether the method gives us F (; V ) or G(; p) depends on whether the order parameter is constant in the ensemble simulated or may vary. If it is constant then a double-tangent construction (appendix B) will be required to nd the coexistence eld, just as for TI. 2.1.3 The Acceptance Ratio Method Introduced by Bennett in [84], this method has also been used by Hong et al. [85] and, in modied form, in [86] (see section 2.1.4 below). It is a similar method to multistage sampling, but extends it somewhat and addresses problems of optimising the process of measuring F . We should also remark that [84] is also a good general reference to the problem of measuring free energy; the author discusses what were the major methods of doing so when it was published as well as contributing several new ideas. He connes himself to methods where the system of interest is connected with a reference system of known free energy, and, indeed, asserts that the free energy can in general only be found by using a reference state. In fact this is not the case (the methods of section 2.3 do not do so), but the vast majority of methods do use this technique. Bennett then goes on to discuss in detail the problem of nding the free energy dierence CHAPTER 2. REVIEW 55 between two canonical ensembles (where one is a reference system, this will clearly give the absolute free energy of the other). He also treats in detail the statistical problem of analysing the available MC data to extract the best estimate of the free energy, although he has to treat this problem using the assumption that the data points are uncorrelated. Let the two systems be denoted by suxes 0 and 1. Then for any function W () of the coordinate variables we nd R Z W exp( (E0 + E1 ))d exp[ (F0 F1 )] = ZZ0 = 0 R 1 Z1 W exp( (E1 + E0 ))Rd R W exp( (E0 + E1 ))d = W exp( (E1 + E0 ))d = Z1 Z0 W exp( E0 ) >1 = < (2.9) < W exp( E1 ) >0 This is clearly reminiscent of equation 2.8. If we choose W = exp(E1 ) with E1 = 1 E0 = , then it reduces to the equation for a single stage of multistage sampling with almost complete overlap. The acceptance ratio method itself is produced by the choice W = min fexp(E0 ); exp(E1 )g, in which case equation 2.9 becomes Z0 < Me ( (E0 E1 )) >1 Z1 = < Me ( (E1 E0 )) >0 (2.10) where Me is the Metropolis function Me (x) = min f1; exp( x)g. So we see that we can estimate Z0 =Z1 = exp[ (F0 F1 )] by performing two simulations, and in each recording the average of the Metropolis function of the dierence of the two energy functions. < Me ( (E0 E1 )) >1 is the average of the probability for making a transition from ensemble 1 to ensemble 2|hence the name `acceptance ratio method'. However no transitions are made in fact: we sample each ensemble separately. This should be compared with the expanded ensemble described in section 2.2.3. Note also that the method is dened above for two systems with dierent potential energy functions but the same canonical variables fg. However, one system could actually have more than the other; the smaller system could be given dummy variables whose contribution could be factored out of the congurational integral. Bennett continues by making variational calculations (in which, however, the eect of correlations has to be ignored) to predict the conditions under which the method gives the results with the smallest variance. It proves necessary, if both < W exp( E0 ) >1 and < W exp( E1 ) >0 CHAPTER 2. REVIEW 56 are to be measured accurately, that the two distributions of E should overlap, and Bennett advocates shifting the origin of one of the potential functions to achieve this. The optimal amount of shift is just the free energy dierence between the ensembles, which must be found iteratively. Note that it is not optimal to shift by < E1 >1 < E0 >0 as might rst seem to be the case, because a correction for the relative likelihood of other congurations in each ensemble is necessary (cf. section 2.2.3). Since transitions are not made between the 0 and 1 systems, it is also necessary to consider what fraction of the available time to spend in each. The results is that it is, in fact, almost always near-optimal to devote the same amount of time to each system. Finally, it can be shown that the choice for W that minimises the variance of the estimate of the free energy dierence is not Me (x), but the Fermi function, ffer (x) = (1 + exp(x)) 1 , though the dierence is small. If the distributions cannot be made to overlap, we must either generate bridging distributions as in Multistage sampling, or try extrapolating them if they are smooth enough. Though it is not stated explicitly, the criterion for eective extrapolation seems to be the same one that determines the separation of the points in thermodynamic integration. Bennett's extrapolation will work well if the shape of P can (E ) is well-described by its rst few moments, which are related to < E > and d < E > =d , while TI over widely separated points also works if F ( )'s higher derivatives are small, and these are related to the derivatives of < E > by equation 2.1. This seems to put overlapping-distribution methods back on an even footing with TI, though extrapolation is more complex to implement. The paper also contains a discussion of other methods: numerical integration, the `perturbation method' (which is what has since become known as the single-histogram method [87]), which is viewed as a limit of the acceptance ratio method where one ensemble is not sampled at all, and overlap methods like Multistage sampling. In this discussion, the multicanonical methods of Berg and Neuhaus (see section 2.2.2 and chapter 3) are also anticipated, when the use of at bridging distributions is suggested. Advantages and Disadvantages The acceptance ratio method and the other extensions and optimisations of the overlappingdistribution method that are considered here seem to improve it to an extent where it is again competitive with TI. However the analysis applies directly only to a two stage process, and in a real application many stages will usually be necessary. To choose the spacing of the stages CHAPTER 2. REVIEW 57 and optimise the method using Bennett's criteria is a complicated problem which would involve several `trial' runs, but fortunately it seems that the `eciency maximum' is in practice quite wide and at, and rough improvements (inserting an extra stage wherever the acceptance ratio cannot be measured properly, etc.) quickly give an answer very close to the optimum. Another problem which is not addressed in the paper is that around a rst order phase transition, ergodic problems may occur which cause the averages measured in each simulation separately to converge slowly. This problem is obscured in [84] because it is assumed throughout the analytic derivations that the congurations that are sampled in each simulation are uncorrelated. No reference is made to the possibility of reducing the correlation time itself. It seems that in some cases the ergodic problems might be overcome by actually performing the transitions whose probabilities are measured; this is the basis of the expanded ensemble method. This is a point we shall return to in the discussion section, and indeed often in later parts of the thesis. 2.1.4 Mon's Finite-Size method Mon's method [86] relies on MC sampling to nd the dierence in free energy density between a large and a small system. The free energy is then calculated for the small system by evaluating the partition function explicitly. The method (described as implemented for the 2d Ising model with H = 0, with an obvious generalisation to 3d) is this: consider the Ising model on a 2L 2L square lattice. Two dierent kinds of boundary conditions are considered: rstly, the normal periodic boundary conditions, for which the total energy function is E2L , and secondly, boundary conditions which divide the 2L 2L lattice into 4 (L L) lattices each individually having periodic boundary conditions, for which the energy function (of the 4 lattices considered as one composite system) is E L . It follows that fg exp( E L ) fg exp( E2L ) P fg exp(P (E L E2L )) exp( E2L ) = fg exp( E2L ) ZE L ZE L = 2 P P = < exp( (E L E2L )) >2L (2.11) (2.12) (2.13) CHAPTER 2. REVIEW 58 so that the dierence in free energy density is 1 g2L gL = ln < exp( 4L(2Ed L E2L )) >2L The Gibbs free energy is found for the Ising model, since M varies in the NV T -ensemble. The procedure can be iterated until L is about 2 (in 3d) or 4 (in 2d) when gL can be found exactly. Then so can g for the larger lattices. Rather than attempt to measure the average of the exponential directly, Mon's method employs techniques derived from the acceptance ratio method. For small systems both ensembles are simulated, with the transition probability between the two being measured(using the Fermi, not the Metropolis function); then it is found that ZEL < ffer ( (E L E2L )) >2L ZE L = < ffer ( (E2L E L )) >L 2 For large systems, especially in 3d, there is insucient overlap between the ensembles even for this and Mon uses multistage sampling, simulating in addition interpolating Hamiltonians E () = E2L (1 )E L for in the range [0 : : : 1] and nding the free energy dierence by ZE L ZEL ZE( ) ZE(n ) = ZE L ZE( ) ZE( ) ZE L 1 2 1 2 2 where each ratio on the RHS is found by measuring Fermi transition probabilities. Up to six stages were used in [86]. In this paper, g2L gL was measured for Ising models at Tc for L up to 12 (2d), 6 (3d simple cubic) and 5 (3d body-centred cubic). g was obtained to about 0.2% accuracy with 50 000 MC passes per site for each Hamiltonian simulated. An investigation is also made of the predictions of nite-size scaling theory, that in d dimensions g(Tc; H; L) = gb + U0(d)L d + O(L d 1) (see appendix A). Since the method measures g2L gL directly, the contribution gb from the background analytic part of the free energy density is removed and he was able to conrm the theory directly and measure the scaling amplitudes U0 to a few percent. More recently, in [75], the same method has been applied to large Ising models on lattices up to 323 , and U0 measured to extremely high accuracy. In this implementation, the interpolating ensembles E (a) are used, but with thermodynamic integration (in the version described by equation 2.6), rather than multistage sampling, used to nd the free energy dierences. This is the only case we have seen where the system size has been suciently large, and < @E ()=@ > CHAPTER 2. REVIEW 59 suciently smooth, that TI can be used with high accuracy in a situation when the energy histograms do not overlap. 16 TI stages were used for the 323 system; many more would have been necessary with multistage sampling without Bennett's extrapolation. Advantages and Disadvantages This is another method that is most easily implemented for lattice models, and though it could also be applied to o-lattice systems this has not to our knowledge been attempted. Presumably, we would either consider subdividing the system until we reached one that contained only a single particle, or we would stop when the system became small enough for another method to be easily used to measure the free energy. However, especially in a dense uid, the energy function of the large system considered divided up would most frequently be innite, corresponding to the case where at least some particles overlap the `walls' partitioning the large system into the subsystems. This would drastically reduce the eciency of the method; some kind of intermediate ensembles, with the walls put in gradually, would be required. The numerical eort involved depends on the dimensionality and nature of the problem. The idea of the method is attractive and the accuracy is high because it concentrates on measuring a correction term (the dierence in free energy densities between the two systems) rather than the free energy directly. While it is similar to other multistage sampling methods, we would expect that fewer intermediate stages would be necessary in this case because of the intelligent choice of systems E L and E2L : the dierence between the two Hamiltonians will be small over most congurations, and will be caused mainly by the presence of the extra interfacelike terms produced by evaluating the E L energy in the E2L ensemble. The size of these interfaces increases only as Ld 1, so this is also the scaling of the quantity in the exponent to be averaged. In a similar multistage technique with ensembles diering in temperature, for example, the analogous quantity would scale like Ld. The result is that fewer stages are needed for a particular system size than in a `normal' Multistage method. It is also probably the best method for measuring the correction-to-scaling amplitude U0 . 2.1.5 Widom's Particle-Insertion Method This method was introduced in [88]. Its goal is to measure the chemical potential ( G=N for a system with only one kind of particle) by making trial insertions or removals of a particle. CHAPTER 2. REVIEW 60 Consider the denition of : = @F @N V;T or for a nite system = F (N + 1) F (N ) + 1) = 1 ln 3(ZN(N + 1)Z (N ) Because the number of particles changes, it is necessary to include the eects of the the p kinetic energy, which produces the 3 ( is called the thermal wavelength: = h= 2M= ), and the indistinguishability of the particles, which produces the N +1 = (N +1)!=N !. However we can remove the need to consider them explicitly by using id = 1 ln[(N + 1)3=V ] for the ideal gas, giving ex = id = 1 ln ZV(NZ (+N 1) ) R N 0 0 N N 0 = 1 ln exp[ E (R )] exp[ EN( ;N )]d d V exp[ E ( )]d = 0 0 N 0 1 ln < exp[ E ( ; )] >N d R V (2.14) (2.15) (2.16) where we have cast the ratio of partition functions into the form of a canonical average (over an N -particle simulation) of the excess energy E 0 (0 ; N ) of the interaction between an N + 1th particle, 0 , and the other N . The procedure is thus to perform coordinate updates in a constant-NV T simulation of N particles as normal, but to alternate them with trial insertions of a `ghost' particle at randomly chosen locations, where we then evaluate E 0 to estimate < exp( E 0 ) >. The `ghost' particle is not in fact inserted. A virtual trial removal of a particle can also give us in a similar way (see [89]). The method can be thought of in the framework of the acceptance ratio method. We are sampling the transition probability exp( E 0 ) two canonical ensembles diering in the number of particles they contain. The expectation of the transition probability gives the free energy dierence. If the transitions are actually made then we have grand canonical Monte-Carlo. CHAPTER 2. REVIEW 61 Advantages and Disadvantages The method works well for uids at low density, but at high density the Boltzmann factor becomes very small because almost any insertion move would result in the test particle overlapping some other, resulting in a very high energy. Once again the necessity of nding the average of an exponential reduces the eciency. Similarly removing the particle would leave a high energy `cavity.' For solids the method does not work at all, because we cannot insert or remove a particle without disrupting the lattice. Various methods have been tried to improve the performance of the method at high densities. For example in [90] the inserted particle is moved around so that high-energy congurations are sampled better, while in [89] we actively search for the cavities where insertions remain possible. This method is similar to the generalised form of umbrella sampling (see below), where certain congurations are generated preferentially. 2.1.6 Histogram Methods Renewed interest in histogram methods was the result of papers by Ferrenberg and Swendsen [87, 91, 92]. They present their methods as a way of optimising the analysis of data obtained by conventional MC simulations, though it is also relevant to the free energy problem. Because the Boltzmann distribution has the same form at any temperature, we can use the MC estimate P~can (E ) from a simulation done at temperature to estimate (E ) and so by substituting in equation 1.7 and normalising get P~ can (E ) exp( E ) P~can(E ) = P ~ can E P (E ) exp( Ei ) (2.17) 0 where = 0 . This is called histogram reweighting. Expectation values < O > follow as before, so we can calculate TD quantities away from the temperature of the simulation| but only slightly away from it because the canonical distribution is very sharply peaked. As a result, once terms coming from the wings of the distribution become important the accuracy falls, just as in multistage sampling or the evaluation of an exponential average in section 1.2.3. In [87] the method is used to exactly locate turning points in quantities which are functions of temperature, like specic heat. Simulations are performed at a temperature near that of the specic heat maximum (the exact temperature at which this maximum lies is unknown), then reweighted to obtain a much better estimate of the location of the maximum and the value of the 0 CHAPTER 2. REVIEW 62 specic heat there. If the whole p.d.f. at 0 can be accurately constructed, then the free energy dierence G( ) 0 G(0 ) is also calculable (because to be able to nd P can(0 ; E ) accurately is also what is required to nd < exp( 0 )E >). This use of the histogram method is thus the same as the `single-ensemble' versions of multistage sampling (section 2.1.2) that has already been mentioned. In a more recent paper [93] Rickman and Philpot have suggested that an analysis of the distribution function in terms of its cumulants provides an approximation which can be extrapolated with more condence into the wings of the distribution. They show, using data from a simulation at one temperature, that various thermodynamic quantities (including free energy dierences between the two temperatures) can be calculated by this method more accurately over a wider range of temperatures than by simple reweighting (this clearly connects with Bennett's extrapolation techniques). The method has been extended to be more useful in the context of free energy estimation in [91], where Ferrenberg and Swendsen extended it to `overlap' data from several simulations at dierent temperatures, obtaining iteratively soluble equations giving the partition function and its error at any temperature. The peaks in the error show where further simulations need to be done. This procedure is very similar to the overlapping done in multistage sampling, but the analysis showing where simulations should be done is new. A detailed discussion of the errors of estimators of free energies and other thermodynamic expectation values obtained by both single- and multiple-histogram methods can be found in [94]. Although histogram methods have become a standard technique as a result of the work of Ferrenberg and Swendsen, the idea of reweighting the histogram to give estimators of thermodynamic quantities at other temperatures was used by many previous authors; see this chapter ibid. One early example is the work of Macdonald and Singer in the late 1960's on the LennardJones and 18-6 uids ([78] and especially [79]). Histogram methods are also essential in many of the non-canonical methods of the next section. Indeed, the full power of the technique is perhaps best released by use of the multicanonical distribution. When histogram methods are applied to Boltzmann sampling simulations, the fundamental problem of the unsuitable shape of P can (E ) is still not solved, only alleviated. CHAPTER 2. REVIEW 63 2.2 Non-Canonical Methods 2.2.1 Umbrella Sampling This name is used in the literature in several ways to describe any or all of a number of dierent methods. In its most general sense it is the name given to what we have called `non-Boltzmann sampling'. As we saw in chapter 1, the Metropolis algorithm can be used to sample from any distribution. It can be shown (see section 3.1.1) that if we do sample from a non-Boltzmann distribution then canonical averages can be recovered by a suitable reweighting, provided that the chosen sampled distribution (some general P (), not P can()) put appreciable weight in those states that dominate the canonical p.d.f. P can () at the temperature/eld of interest. Now, as we have seen (section 1.2.3 and elsewhere), the canonical sampled distribution is incompatible with the estimation of the average of certain operators, in particular those that lead to the measurement of absolute free energies or free energies dierences, because the congurations that dominate the maximum of O(E )P can (E ) (to take the example of energy macrostates) are generated with almost zero probability. If the sampled distribution is carefully chosen so that, as well as the states in P can(E ), it also puts weight in the states that dominate the maximum of O(E )P can (E ), then it can be used to overcome this incompatibility problem. This normally requires a sampled distribution that is wider over the E -macrostates than the canonical distribution|hence the whimsical name `umbrella sampling,' coined on the grounds that the wider sampled distribution is like an umbrella extending over P can (E ) and OP can (E ). Though it was not the rst use of this technique, the seminal paper on this method seems to have been [95] (where the term `non-Boltzmann sampling' is also introduced), Some similar methods were employed earlier, for example [78] and especially [79], which used a kind of reweighting based on equation 3.5. These latter references also made the fundamental point that estimators produced by multiplying small histogram entries by large reweighting factors are no use in practice because of their high variance. Possibly because of the inuence of [95], the name `umbrella sampling' is often used used in a restricted sense where free energy measurement is the goal. However, as we have said, the name is also applied by some authors, for example Frenkel in [47], to any kind of non-Boltzmann sampling (in [6, chapter 6] the name `umbrella sampling' is applied to overlapping canonical p.d.f's with constraints, which we would call multistage sampling, but this usage seems to be rare). In this wider sense, the range of possible applications is almost endless: the method CHAPTER 2. REVIEW 64 can be used to generate any event that is rare in the canonical ensemble with a high enough frequency that its probability becomes measurable; for example, it has been used to investigate large uctuations in an order parameter [45], and in [24] rough measurements of the free energy barriers to nucleation are made. If we think of these uctuations as taking us across the free energy barrier between two phases, then we see that the problem of free energy measurement by direct connection of the two phases can also be approached this way: as we speculated in section 1.2.3, we could use a sampled distribution which has a higher-than-canonical probability of being in the two-phase region. Another possibility, used in uid simulations, is to use a nonBoltzmann distribution parameterised by the distance of closest approach of the molecules. However, like umbrella sampling itself, these techniques are usually given their own names: the multicanonical ensemble and density scaling. They are both described separately below. A nal point is that, although we have constrained the shape of an eective umbrella sampling distribution, we have not xed it entirely. In fact, it seems to be an unresolved question which sampled distribution is the `best' for measuring a particular operator, (in the sense that estimators produced from it have the lowest variance). The issue seems to have been addressed rst by Fosdick [96] and more recently, in the context of the study of spin-glasses, by Hesselbo R and Stinchcombe [97], who recommend a distribution where P (E ) / ( E (E 0 )dE 0 ) 1 . We shall return to this matter in section 3.4. Advantages and Disadvantages Umbrella sampling is a powerful method; once a good sampled distribution has been obtained we can reweight it to obtain not only free energies but a variety of canonical averages for all values of the control parameters such that both P can and OP can put almost all their weight in the sampled region. Another very signicant advantage is that any ergodicity problems in the low temperature states may be largely overcome by the increased volume of phase space available to the system. It may, for example, move from states typical of a low temperature, where movement through conguration space is typically slow, up to states typical of a high temperature, where congurations change rapidly. Then it may `cool down' again and return to a region of conguration space far away from the one where it started. The whole process may take much less time than would be required to pass between the two regions by way of the low-temperature states alone. The most serious disadvantage of the method seems to be the diculty in nding a suitable CHAPTER 2. REVIEW 65 sampled distribution in the rst place [47], the problem being, as we shall see in chapter 3, that some knowledge of the free energy that we are trying to measure is required. It has apparently been rare in the literature to achieve a sampled distribution more than a few times wider than the canonical, and so the method has frequently been combined with multistage/TI techniques and has mainly been used with small systems where the fractional uctuations are larger. Indeed, in [46] it is stated that the umbrella sampling method cannot be applied to large systems because a suitable sampled distribution cannot be produced. However various authors have given methods for evolving a suitable sampled distribution of any shape (see [50] and references in section 2.2.2; see also [98], an early reference that seemed to go largely unnoticed), and we have also addressed the problem at length in chapter 3. 2.2.2 Multicanonical Ensemble This method is originally due to Berg and Neuhaus [58], whose work has prompted much recent interest and further study, some of which can be found reviewed in [99], [59] and [100]. The method is described in detail in chapter 3. It is really a rediscovery and reapplication of the ideas of non-Boltzmann sampling that were already present in [95]; a sampling distribution is generated so that the probability distribution of some chosen observable X (typically E or M ) is roughly at over some range of its macrostates (other workers have used sampled distributions with similar properties without calling them multicanonical; eg [24]). We shall characterise this distribution by its dierence from the Boltzmann distribution, by using weights () ( (E ()) or (M ()), etc., as appropriate) so that the sampled distribution has measure Y () / exp( E ()) exp(()). The results can then be reweighted (as outlined in section 2.2.1 to recover the desired results for the canonical distribution. In [58] the method was originally presented as a way of measuring interfacial free energy with high accuracy. However it has much wider applicability than this: it enables us to tackle the free energy problem both by the approach of simulating direct coexistence of the two phases or by measuring the absolute free energy of each phase separately. In their rst paper [58] Berg and Neuhaus investigated the rst-order phase transition of the 2d 10-state Potts model and measured the interfacial free energy fs with high accuracy. At the transition temperature (which is known exactly for the Potts model at H = 0, so there was no need to search for it), the canonical probability distribution of the internal energy is doublepeaked, with one peak corresponding to a disordered phase and the other to the 10 equivalent CHAPTER 2. REVIEW 66 ordered phases. The states in between, which correspond to mixed states containing interfaces, are exponentially suppressed (see section 1.1.3). Because the tunnelling times between the symmetry-broken phases in canonical Metropolis simulations is so long, a preweighting (E ) is used to produce a sampled distribution which is roughly at between the two peaks of P can (E ), then reweighted the results to recover the canonical distribution. The accurate measurement of the probability of the mixed states leads to an estimate of the interfacial tension. With the probability of each energy (action) macrostate roughly constant, it was found that the ergodic time rw , the time to tunnel between the symmetry-broken phases, scaled with system size as rw L2:35d, comparable with the ideal rw L2d expected for a simple random walk, and an enormous improvement on rw exp(2Ld 1fs ) expected for a Boltzmann sampling simulation. Later applications and extensions of the method have included: Berg and Neuhaus, together with various co-workers, have measured interfacial tensions for several other systems by preweighting of the order parameter (magnetisation); they call this the multimagnetical ensemble, though we use `multicanonical' for all applications. In a paper written with Hansmann [101] they simulate the 2d Ising model as a strict test of the method; these results also appear in [102] along with similar measurements for the 3d Ising model. Like the energy distribution of the 10-state Potts model, the magnetisation distribution of the critical Ising model has a double-peaked shape and preweighting is used in a similar way to facilitate tunnelling and to measure the probability of the M = 0 states. A recent paper written with Billoire [103] performs similar measurements on the 10-state and 20-state 2d Potts models, and [104] presents data on the 2d 7-state Potts model and the 4d SU(3) lattice gauge theory. In all cases the interfacial tension is measured with much greater accuracy than had previously been achieved. The later papers contain further development of error analysis and comparison with nite-size scaling theories. Further work on lattice gauge theories has included the application of the method to to the SU(2) [105] and SU(3) theories [106] and to QCD itself [107]. The multicanonical method has also been used in simulations of spin-glasses (with energy preweighting), where particular advantage is taken of the algorithm's ability to move rapidly across the free-energy barriers that severely slow canonical simulations. Three papers investigate the Ising-like Edwards-Anderson spin glass. The rst, by Berg and Celik [108], looks at the 2d model. The innite-volume zero-temperature ground state CHAPTER 2. REVIEW 67 entropy and energy are estimated from nite-size data. They nd that the problems of slowing-down with increasing system size are more severe than before, rw L3:2d, though this is still much better than canonical simulations (exponential slowing down) and is better even than simulated annealing, for which rw L4d. The others, by Berg, Celik and Hansmann [109, 110], look at the spin-glass in 3d and present more complete results for energy density, entropy and heat capacity, evaluated at all temperatures (including = 1) by reweighting. They also show the order parameter distribution. Considering that energy, not the order parameter, was preweighted, it may seem strange that this distribution could be accurately obtained: however the locations and heights of the peaks can be found because the multicanonical algorithm makes high energy congurations accessible, where the order parameter has a single-peaked distribution, and then the system can `cool down' again in any of the modes of the order parameter (as was described in the `advantages and disadvantages' part of section 2.2.1). Thus, the system has the eective ability to tunnel through the free energy barriers separating the phases. The only signicant dierence is that, with energy preweighting, the order parameter distribution is not well determined in the regions of low canonical probability, and so no estimate of interfacial tension is obtained. Again, the simulation slows down with rw L3:4d. In [111] the suitability of the algorithm is checked by applying to the well-understood Van Hemmen spin glass. The use of the multicanonical algorithm to investigate spin-glasses is also referred to in [112], a general paper on the multicanonical method and its application to multiple-minimum problems. Hansmann and Okamoto have applied the multicanonical method to the problem of protein-folding. Like the spin-glass, this is a multiple-minimum problem where ergodicity problems can prevent the location of the ground-state. In [113] they preweight the congurational energy of met-enkephalin, a simple protein consisting of 5 amino acids and containing (in the simplied model used) 19 continuously variable dihedral angles as parameters. They obtain the lowest-energy conguration, in agreement with results of simulated annealing, and by reweighting the measured probabilities of energies evaluate < E > and the heat capacity for a range of temperatures. This work is also referred to in [112]. In [114] they have found the lowest energy states of three polypeptides, of length 10-20 residues, each containing only one type of amino acid. CHAPTER 2. REVIEW 68 Rummukainen [115] has combined the multicanonical algorithm with microcanonical `demon' methods in a hybrid algorithm. By separating the demons, which form a heat bath, from the multicanonical part, he is able to apply fast cluster methods (and, potentially, parallelism) to them. This is advantageous because the changes of the preweighted variable in multicanonical algorithm are inherently serial in nature (because , being a function of a variable (E or M ) that depends on all the spins, couples together all the spins/particles of the system). To some extent this limitation is overcome by the use of the demons. The method is tested on the 2d 7-state Potts model, preweighting energy, and it is found that the ergodic time is much reduced below that of the simple multicanonical algorithm, e L1:8d. It is interesting to note that this is better even than `ideal' random-walk behaviour, and demonstrates the eect that algorithmic improvement can have. Interfacial tension is also measured, the results seemingly revealing inadequacies of the normal nite-size scaling ansatz. In work on multicanonical multigrid Monte Carlo [116, 117] Janke and Sauer combine the multicanonical ensemble with the multigrid method to reduce critical slowing down. They investigate the 4 theory in both one and two dimensions, with particular reference to the autocorrelation time O of observables in the multicanonical simulation, and its relation to the error bars of the estimators of canonical averages. As they point out, O is not necessarily the same as rw . They nd that the behaviour depends on the nature of the potential barrier they are trying to tunnel through. If the barrier height does not depend on the system size, as is the case with the 1 d theory, then the scaling of O with L is much the same for normal metropolis and multicanonical metropolis (though the multicanonical algorithm always has the lower O ), and in both cases it is much reduced by using the multigrid method too. However, if, as will usually be the case in physically interesting dimensions, the barrier height increases with system size, the increase of O with system size is much smaller for the multicanonical algorithms, and employing multigrid techniques too further reduces O but keeps its scaling roughly constant|the opposite of what was found in 1d. They conclude that where the multigrid algorithm can be used, it should be used in combination with the multicanonical algorithm, since the two together can produce an order-of-magnitude improvement in performance over that obtainable with either alone. In a more recent paper [118], Janke and Kappler have combined the method with the Swendsen-Wang cluster algorithm (to produce what they call a `multibondic CHAPTER 2. REVIEW 69 cluster algorithm') and applied it to rst-order phase transitions of the the q-state Potts model, nding that the multicanonical autocorrelation time grows as the ideal Ld . We remark that Janke and Sauer's extension of error-propagation analysis to the multicanonical ensemble, (and a similar approach adopted in [119]), so that the error bars of the estimators of canonical averages can be obtained from the correlation times of variables in the multicanonical simulations, produces expressions that are very complicated. In practice we favour the use of blocking. We present our own investigations of O in section 3.4. Lee introduces what he calls `Entropic Sampling' in [120]; this is really no more than a multicanonical preweighting carried out on internal energy rather than magnetisation. However he does give an algorithm for evolving the preweighting, and discretises the preweighting function at the level of the fundamental granularity of the problem rather than making it piecewise linear within a series of bins, the choice of Berg and Neuhaus. He also avoids using their idea of an eective temperature, which we too feel is unnecessary. He presents some measurements on the 10-state Potts model and a small 3d Ising model, comparing numerical values for the coecients in the high-temperature expansion of the partition function with exact results. Very recently Lee, Novotny and Rikvold [19] have applied the multicanonical method and Markov chain theory to study the relaxation of metastable phases in the 2d Ising model. The multicanonical ensemble is applied as usual to achieve a at sampled distribution over M , then the observed matrix of transitions between macrostates is used to predict rst passage times and thus to study relaxation properties, in particular the binodal and spinodal decomposition of metastable phases. A modied algorithm is used to speed up dynamics at high-jM j. It turns out that the description of the relaxation in terms only of the global order parameter M is surprisingly accurate. In [19] the macrostate transition matrix is not used in the generation of the multicanonical distribution, a technique that we have found to be useful (see section 3.2.3). In what is to our knowledge the only application to date to the problem of nding the phase diagram of an o-lattice system, Wilding in [121] has used the Multicanonical technique to map out the coexistence curve for the 3d Lennard-Jones uid. The liquid and vapour phases are connected directly by tunnelling across the interfacial region, in one CHAPTER 2. REVIEW 70 of the clearest examples of the method's use in a free energy/phase coexistence problem. In [58, 102, 103, 108, 122] the problems of the generation of the parameters that produce the multicanonical distribution is addressed. In [58, 102, 103] nite-size scaling from a small simulation is used to produce the parameters of a larger one. In [108] an overlappingdistribution method rather like multistage sampling is used to give an initial estimate for the spin-glass problem where nite-size scaling does not work. In [122] all the methods are reviewed and a slightly ad hoc method is given for combining the results of several previous rening stages to give the best estimate for the reweighting function. Advantages and Disadvantages This method has the same key advantages and disadvantages as umbrella sampling, of which it is a special case; producing a good sampled distribution requires an iterative procedure, but once it is established a single simulation suces to measure the quantity of interest, which in our case will be a free energy or free energy dierence. The larger set of accessible states reduces error due to ergodic problems, which enables ecient simulation of systems like spin glasses, but means that the random walk time across the multicanonical region becomes long, since the multicanonical region is much wider than the peak of the sampled distribution of a Boltzmann sampling simulation. eective autocorrelation time of observable O, O , also becomes long, but it does not feed directly into the variance of estimators of canonical averages in the same way that was described in appendix C, because of the eect of the reweighting process (see[116, 118]). We shall investigate the nature of multicanonical errors more in chapter 3. The ability to treat the two phases simultaneously in a single simulation (for lattice models and, in some circumstances, o-lattice models too) is also clearly an advantage, though we should point out that the direct linking the two phases is not easily applicable to o-lattice systems where one phase is solid and the other uid because this would still produce severe ergodic problems due to the diculty in `growing' crystals out of a uid. Another potential disadvantage, (which would also aect most umbrella sampling simulations) is the impossibility of performing parallel updates on dierent degrees of freedom if those updates would change the preweighted variable. In canonical simulations with short-range forces, particles or spins that do not interact either before or after the move may be updated in parallel. With the same forces but multicanonical sampling, only one particle or spin may be updated at a time, because the change in preweighting, which we must know to calculate the CHAPTER 2. REVIEW 71 acceptance probability, depends on the global value of some macrostate variable and so would depend on whether or not parallel updates going on elsewhere in the system were accepted. As we have seen, some methods like [115] have already been developed to partially overcome this problem. We shall discuss it further in sections 3.2.5 and 3.4.1. 2.2.3 The Expanded Ensemble This method, or methods similar to it, has also been discovered independently several times. The way it is presented here follows the approach of Lyubartsev et. al. [60]. In some ways, it is similar method to the multicanonical ensemble (section 2.2.2): the system is again encouraged to explore more of its possible energy states and preweighting is again involved. We shall explain the connections more formally in section 3.4.1, though they will be obvious while reading this. First let us consider the temperature expanded ensemble, which is the rst version introduced in [60]. A system of charged hard spheres (the RPM electrolyte) is simulated, which can also make transitions between a number ( 10) of dierent temperatures fj g, so the total expanded ensemble partition function Z is given by Z= X j Zj exp(j ) (2.18) where the Zj are the normal canonical partition functions at each temperature Zj = X f g exp[ j E ()] = exp( j Gj ) and the G's are Gibbs free energies and the coecients fg are chosen so that about the same time is spent in each j -state (we call the j -states `subensembles'). Transitions between the subensembles are implemented using the normal Metropolis algorithm: the same conguration is retained, and the move from ensemble i to j is accepted with probability Me (Eeff ) where Eeff = (j i )E () j + i . The nding of a suitable set fg is a non-trivial problem which must be approached iteratively, just like the evaluation of the preweighting coecients of the multicanonical distribution. We expect to be at temperature j with probability Pj given by: Pj = Zj exp(j )=Z CHAPTER 2. REVIEW so j ) = exp( ( G G ) + ) Pj =P0 = ZZj exp( j j 0 0 j 0 0 exp(0 ) 72 (2.19) If we can arrange for the state labelled by zero to be a state of known free energy (a perfect gas, say, for 0 = 0 , or an Einstein solid), then we can nd the absolute free energies. In P the given references we estimate the Pj from the histogram of visited states: P~j = Cj = j Cj . The coecients fg are required to stop the system spending almost all its time in the state of lowest G. The `ideal' form for the fg is j = j Gj , under which the probability of all subensembles is constant. We require j = j + O(1) if their probabilities are to be accurately measured. In [60] known approximate values for G are used to bootstrap the estimates of fg; the G's are in general unknown (and are indeed the very quantities we seek). This should be compared to section 2.1.3. Although we have described the method as it was implemented for subensembles diering in temperature, it can easily be generalised to to dierent values of other eld variables (such as H ) or to dierent forms of the interaction: in [60] itself, use of the temperature-expanded ensemble is only sucient to transform the RPM electrolyte into the hard-sphere uid (at = 0), because the hard core in the potential remains. Therefore, another expanded ensemble is used where the Hamiltonian is changed to move from hard spheres to a perfect gas via increasingly penetrable spheres. All the kinds of transformation of the energy function that have been applied in thermodynamic integration are also applicable here. In the investigation of the RPM electrolyte, Lyubartsev et.al. quote an error of about 1% in the free energy and achieve good agreement with results of multistage sampling and theoretical calculations. This method is one of the few that can as easily be applied to o-lattice as to lattice-based systems. The same authors have extended the method to Molecular Dynamics simulation and applied it to the Lennard-Jones uid and to a model of liquid water [123]. Marinari and Parisi [124] independently discovered the expanded ensemble, which they call `simulated tempering', and applied it to the random-eld Ising model, which at low temperature has a p.d.f. of the order parameter with many modes separated by free energy barriers. They present the method as a development of the simulated annealing algorithm, in which the temperature is steadily reduced. Their concern is not with the absolute free energy dierence of the system, or the free energy dierence between `hot' and `cold' ensembles, but with the overcoming of ergodic problems in the lowest-temperature ensembles; the ability of the system CHAPTER 2. REVIEW 73 to reach high temperatures where it can travel easily across free energy barriers means that, when it cools down again, it is likely to enter a dierent mode of the p.d.f. than the one it started from. This results in an eective rw for travelling between the modes of the lowtemperature ensembles which is much less than in a single low-temperature simulation, even taking into account the extra eort in simulating the other subensembles. They measure < E > and < M > in the lowest-temperature ensemble and show how the rapid tunnelling is indeed facilitated. They compare simulated tempering with Metropolis and cluster-ipping algorithms: it far outperforms Metropolis and seems better even than the cluster algorithm, though data on rw comparable to that presented in the results of multicanonical simulations is not given. In [125], however, the tunnelling time between between phases of opposite magnetisation is investigated for the 3d nearest-neighbour Ising model, and it is claimed that rw L1:76(6) with ve temperature-states used for all system sizes, but dierently spaced. Thus, while Lyubartsev's implementation corresponds to the measurement of an absolute free energy of a phase, Marinari and Parisi's shows how the method could also be used to connect two phases for the measurement of a free energy dierence: we would use a variable order parameter M within each subensemble and nd coexistence by nding control parameters such that each mode of P can (M ) had equal weight in the particular subensemble that had the control parameters of interest. However, because the sampled distribution within each subensemble is unaltered, P can (M ) would only be measurably dierent from zero for both modes if H were very close to Hcoex (see equations 1.14 and 1.15). Other applications of the multicanonical ensemble/simulated tempering have included the following: The folding of simple models of proteins has been investigated in [126] and [127]. In the second paper, the global minima of polymer chains of 8{10 residues are investigated by two methods: one is a conventional temperature-expanded ensemble, while in the other the polymer sequence changes between ensembles. This latter approach gives some idea of the exibility of the expanded ensemble. In this context we should also mention [128], where MC parameters are optimised during a run to achieve the fastest decorrelation in a protein-folding problem. The convergence process resembles the nding of fg in an expanded ensemble simulation, though non-Boltzmann sampling is not used. The 3d Ising spin glass has been studied [129]. The method allows the ecient generation CHAPTER 2. REVIEW 74 of a good sample of equilibrated congurations at temperatures below the glass transition. It is found that the predictions of replica symmetry breaking theory rather than droplet theory are supported. The method has been used for the measurement of the chemical potential (free energy) of a polymeric system. Used by Wilding et. al. [130], an expanded ensemble approach is here combined with the particle-insertion method. The simple particle insertion method fails because of the large size of the molecules, which results in a very small acceptance probability. Therefore a series of intermediate ensembles are introduced in which the interaction of the `test' polymer molecule with the others is gradually switched on. The expanded ensemble technique is used to move between these ensembles. For long polymer chains, the method is more eective than the commonly-used congurational-bias MonteCarlo [131]. The method has been applied to the 2D Ising spin glass [132] and, very recently, to the U (1) lattice gauge theory [133], by Kerler and co-workers. In the Ising case a temperatureexpanded ensemble is used; for the gauge theory, an set of ensembles containing progressively more of a monopole term that transforms the transition from rst to second order. We remark that, in [130] and [133] issues relating to the nding the coecients fg are also investigated. The method of [130] uses a linear extrapolation technique, while in [133] a method using information in the observed transitions between subensembles is used. This resembles a method we developed independently, which is described in chapter 3, where we investigate the problem of nding suitable fg's. (Our investigations are performed using the multicanonical ensemble but would also apply to the expanded ensemble). One issue that is important in expanded ensemble calculations is the spacing of the subensembles. Let us consider the temperature expansion case as an example. Unlike the multicanonical ensemble, there is no `natural' granularity of the temperatures so they must be chosen with a separation wide enough that the random walk time between the ends of the chain is not too long, but not so wide that the acceptance ratio ra falls excessively low. It is possible to make an approximate calculation of what the spacing should be (see [124]): we know P (i ! j j)= Me (Eeff ) and Eeff = (j i )E () j + i . Consider only transitions between adjacent states, so j = i + 1, and suppose we have the ideal set fg. Then we may expand ( ) as a Taylor series, which gives, writing = i+1 i , CHAPTER 2. REVIEW 75 2 2 i + @ i + O( 3 ) i+1 = i + @ @ i 2 @ 2 i 2 = i + < Ei > 2k 2 (CH )i + O(3 ) B i (2.20) where in the second line we have used i = i Gi and expressed the derivatives of Gi in terms of canonical averages. Thus < Eeff > 2 (CH )i =2kB i2 . We demand < Eeff >= O(1) for a reasonable ra , so we can use this expression to estimate a suitable (and thus i+1 ) given i and a measurement of (CH )i , the heat capacity in the ith ensemble. Then equation 2.20 enables us to estimate i+1 . It is instructive to observe that, since CH Ld, we require = i+1 i L d=2. This is the expected size of the fractional uctuations in the energy, and so we see that once again we require an `overlap' between adjacent subensembles. It is the same sort of scaling as is required for multistage sampling. However, use of the expanded ensemble, where congurations pass from one ensemble to the other, makes it clearer that what is really required is an overlap of the p.d.f.'s of the congurations of the adjacent subensembles. With a large dierence in temperatures, the dominant congurations in one ensemble are not the dominant congurations in the other|they will scarcely have any weight at all. Attempting a transition in temperature without changing the conguration is liable to produce a `non-equilibrium' conguration in the new ensemble, which is then very unlikely to be accepted, just as, when making coordinate updates in the normal metropolis algorithm, the conguration can be altered only slightly at a single step while maintaining a reasonable acceptance probability. Methods Related to the Expanded Ensemble Here we shall discuss two interesting methods which appear in the Statistics literature. They are similar enough to the Expanded Ensemble to be treated here but also have important dierences. They have not yet been applied to problems in physics, to our knowledge. The rst, due to Geyer and Thompson, is called Metropolis-Coupled Markov Chain Monte Carlo, (MC )3 [50]. Transitions are made between subensembles dened just as for the expanded ensemble, but every one of the subensembles is active at a particular time, and the ensemblechanging moves consist of `swaps,' where the conguration i from ensemble i (i.e. the ensemble with temperature i or energy function Ei ) is moved into ensemble j and vice-versa. The swap CHAPTER 2. REVIEW 76 is accepted with probability ra = min 1; PPj((i))PPi ((j )) i i j j With this choice, the stationary distribution Pi of each ensemble is preserved. Note that we do not need coecients , because the method ensures that every ensemble is always active, so there is no problem with the simulation seeking out the ensemble where G is the smallest and staying there. However, we are as usual constrained by the acceptance ratio ra , which will become very small if the ensembles have a large free energy dierence; indeed, ra has the form of the product of two expanded ensemble transition probabilities, and so will become unusably small rather faster. We should also note that we do not get absolute free energies directly, because the expanded partition function is now the product of the Z 's for each subensemble, not the sum. However, (MC )3 does oer a way of moving rapidly between the modes of a multimodal probability distribution (Geyer presents it primarily as a way of doing this), and so the application allowing tunneling between coexisting phases would still be possible. In this respect the second method, `Tempered Transitions,' [57] is similar|it too would give g not directly, but by speeding up tunnelling between the modes of P can (M ), where M varies within a subensemble. However, here we seek to reduce the time taken for a random walk through the subensembles by forcing the system to follow a trajectory from the temperature of interest, up to high temperatures and then back down, then performing a global accept/reject of the whole trajectory. However it is found (at least in the trial problem that is investigated in [57]) that to achieve a good acceptance ratio we need N 2 intermediate states, where the expanded ensemble requires only N , so that there is no reduction in time. Nevertheless, for some systems Tempered Transitions may be a better way of moving between the modes of P can (M ), the details depending in a rather complex way on the shape of P can (M ) in the intermediate ensembles. Advantages and Disadvantages Like the multicanonical ensemble, the expanded ensemble enables a wider exploration of conguration space, and gives access to free energies and free energy dierences through a direct measurement of probability. It can be used either to measure the free energy of each phase separately, or to improve the exploration of the conguration space at low temperatures by CHAPTER 2. REVIEW 77 giving the simulation the ability to connect to high-temperature states where the p.d.f. of the order parameter no longer has a double peak (or, in the case of spin-glasses, a multiple peak). This thus enables the simulation to bypass the potential barriers due to interfaces between phases, though searching in the space of the other control parameters will be required to ensure that the high-temperature phase does not always connect to only one of the low-temperature phases. We would remark on the clear similarities between this method and the acceptance ratio method [84, 85] (see section 2.1.3). By recording < Me (E ) > rather than a histogram of visited states, it is likely that the variance of the estimator of subensemble probability is reduced; the acceptance ratio method records not only if a transition would be accepted or rejected but also, in eect, how much it would be rejected by. However, because no transitions are made, we lose the ability to connect with high-temperature subensembles to speed up the decorrelation of the system. Just as for multicanonical sampling, the expansion of the accessible volume of conguration space leads to a long random walk time for its exploration. However the analysis is somewhat simpler in this case, since (as we saw in equation 2.19) the quantities we are interested in here are expressed directly as the ratio of the measured probabilities of being at the two ends of the chain of subensembles. In section 3.4, we shall investigate how the length of the random walk aects the error of the method, addressing in particular an argument that accuracy can be improved by subdividing the chain of subensembles [130, 24]. The question of parallel updating once again arises, but here is generally less of a restriction. Just as in section 2.2.2 it was found that, for the multicanonical ensemble, those updates that aect the preweighted variable must be carried out serially, so here the updates that change the subensemble must be carried out serially. However, in the expanded ensemble, they could not be carried out any other way, since the energy function or temperature is naturally a global property. Within a particular subensemble, we may perform whatever parallel updates of the particles' coordinates would be possible in a Boltzmann sampling simulation. 2.2.4 Valleau's Density-Scaling Monte Carlo This is another application of a non-Boltzmann sampling technique which enables the estimation of canonical averages of uid systems accurately over a wide ranges of a variable, in this case density, by sampling in a single simulation all relevant parts of conguration space. Free energy CHAPTER 2. REVIEW 78 dierences between the average canonical densities within the sampled range are also obtained. The method is described in [134]. It is argued that a good way of parameterising the sampled distribution (at least for a hard sphere uid) is to choose the measure Y () = Y (snn ()), where snn is the distance between the pair of particles closest to each other in the simulation, i.e. snn = minij (sij ). sN are the reduced position vectors, so si = ri =L where rN are the `real' positions and L is the length of the side of the box. In the canonical ensemble we do not expect rnn to depend much on the density , because of the short-range repulsive forces, which are very strong and increase rapidly as interparticle distance decreases, and therefore snn 1=3 . Suppose we are interested in the dierence in free energy between ensembles at 1 and 2 . By selecting a suitable form for Y , we can make sure that that snn covers the range from snn;1 to snn;2 where snn;1 is a typical value in the canonical ensemble at 1 and similarly for snn;2 at 2 . We therefore look for P DS (snn ) constant for the range of interest. If 1 and 2 are representative of dierent phases, then the method is oering a way to connect the phases directly, like multicanonical sampling with a variable order parameter. Also like multicanonical sampling, canonical averages can be recovered by reweighting. As well as the hard sphere uid, the method is applied to the primitive model Coulombic uid, which has a spherical hard core plus Coulomb forces; there are equal numbers of +1 and 1 charged ions. A slightly dierent sampled distribution is used here; in order to sample those congurations which are still important when weighted by exp( E (sN )) a distribution Y (sN ) = w(snn )(sN ) is used, where is a function chosen to ensure that an appropriate range of energies is sampled. For the hard sphere uid results are presented, gathered from only 4 DS simulations, that cover the range 0:4 < 3 < 0:9. Excess free energy, excess pressure and also the pair correlation function g12 (r) are measured and found to be in very close agreement with analytic results. For the reduced primitive uid extensive results are given for both 1 : 1 and 2 : 2 electrolytes covering excess free energy, internal energy, excess osmotic coecient, mean ionic activity coecient and like and unlike pair correlation functions. The ranges of density are 0:05 < 3 < 0:31 for 1 : 1 and 0:02 < 3 < 0:30 for 2 : 2, both ranges covered by 3 DS simulations. The number of particles N = 108 throughout. Results match those of other canonical/grand canonical simulations, and seem better for high densities when those can suer ergodic problems. In a second paper [135], the `Coulombic Phase Transition,' the low-temperature low-density phase separation of a uid of charged spheres is studied. CHAPTER 2. REVIEW 79 Advantages and Disadvantages Though it is not presented as such, this is a similar method to multicanonical sampling; Y (sN ) = w(snn )(sN ) is used. It seems that a sampled distribution of the form Y DS (sN ) = exp( E (sN )) exp((snn )) could well be appropriate, i.e. a multicanonical distribution weighted not in but in snn . This shows the main innovation of the method to be the use of snn rather than (or V ) itself as the parameter that controls the non-Boltzmann weight of a conguration and thus extends the range of sampled densities. It is not clear what the relative merits of the two approaches are, though one feels that there is useful physical insight in Valleau's identication of hard-core overlaps as a crucial factor controlling the variation of the density. Probably the most serious disadvantage of the method as presented is that Valleau gives no systematic procedure for evolving Y DS but uses physically motivated tting functions containing snn , the hard core radius, L and some t parameters. From a series of short initial runs, these can be set to give a suitable sampled distribution, at least for the fairly small systems studied here. Aside from this, advantages and disadvantages seem to be as for the general umbrella sampling/multicanonical methods. 2.2.5 The Dynamical Ensemble This method was introduced by Gerling and Huller in [136]. The system of interest (an L2 Potts model in [136]) is regarded as being coupled not to an innite heat bath, as is the case with the canonical ensemble, but to a nite bath (they consider a 2d ideal gas of N particles, where N = L2 ) whose density of states function is known. The combined system has constant energy ETOT , so the probability that the Potts model has energy Ep is P dyn(Ep ) / p (Ep ) bath (ETOT Ep ) / p (Ep )(ETOT Ep )(N 2)=2 (2.21) (2.22) Because bath is known, the bath need not be simulated directly, but its eect can be included by sampling from a non-Boltzmann distribution of energy: the Metropolis algorithm can be CHAPTER 2. REVIEW used with 80 P dyn(1 ! 2 ) = ETOT Ep (2 ) (N 2)=2 P dyn(2 ! 1 ) ETOT Ep (1 ) Thus it falls into the non-Boltzmann sampling framework, though its physical motivation is clearer when it is described as above. As for the canonical ensemble, only a small volume of phase space is sampled at a time, and so it is necessary to do several simulations at dierent values of ETOT , measuring < Ep >dyn. Then, using all of these, the entropy s(Ep ) is tted by least squares to dyn )dEp < Ep >dyn= REPp Pdyn(E(E)pdE p p R with P dyn(Ep ) / exp[Ns(Ep )](ETOT Ep )(N 2)=2 After this, canonical averages and Z may be found easily. Gerling and Huller make measurements on the 4, 5 and 8-state Potts model near the transition point, nding that they can discern the transition from a continuous to a rst-order transition much more easily than by conventional methods. In [137] the method is applied to the 103 3d Ising model and the critical exponents and are measured with fair accuracy even from this small system. Advantages and Disadvantages It is not immediately apparent from the above why the dynamical ensemble is better than the canonical; however its sampled distribution does confer important advantages. Firstly, and most signicantly, while Pcan (E ) is doubly-peaked at a rst order phase transition, P dyn (E ) remains singly peaked, so removing the diculty of tunnelling. Secondly, nite-size eects are much smaller because the heat bath and the system of interest have similar sizes. Thirdly, we nd canonical quantities (< E >can and Z ) in this method by a kind of laplace transform of the tted s(E ). This is in fact extremely numerically stable, so that good accuracy is obtained even when there are appreciable errors in s(E ). The reverse Laplace transform, which would give s(E ) (or (E )) from < E >can is of course extremely un stable, which is another manifestation of the fact that free energy measurements require information about (E ) at for a wider range of E 's than can be obtained from a Boltzmann sampling simulation at one. The only disadvantage CHAPTER 2. REVIEW 81 to the method seems to be the necessity of doing a series of simulations; it is not limited to lattice-based systems. The method does, then, seem to be very attractive, though it has only very recently appeared and so has not been widely applied. It is not clear that its major advantage|the removal of the free energy barrier between the phases|would necessarily remain if it were applied to other systems (such as o-lattice systems); if it did the method would certainly be extremely useful. 2.2.6 Grand Canonical Monte-Carlo In a sense, this method is really a kind of Boltzmann sampling, but it is most convenient to treat it in this section, as the varying number of particles gives it special properties. It was introduced in [138], and is applicable to o-lattice uid-like systems. Like the constant-NpT ensemble, it allows the density of the system to vary; but this is achieved by varying the number of particles in the simulation box, N , rather than the box's volume. This is an attractive method because the Gibbs free energy becomes one of the input parameters ( = G=N for particles of a single type) and so the need to measure it is removed. It is necessary to measure the pressure p, which may be done using equation 2.4. The partition function simulated is ZV T = 1 X N =1 (N !) 1 3N exp(N ) Z V exp( E (N ))dN which we do by making the usual particle moves and also trying particle insertions/deletions; see [47, 48] or [45, Chapter 4] for a derivation of the required expressions for acceptance of these moves. As for Widom's method, we may replace by using the ideal gas equation of state, an approach that was rst adopted in [139]. For the correct value of for the temperature of the simulation, the method should allow direct simulation of coexistence (the system should move back and forth between densities characteristic of the two coexisting phases). However, because the sampling is still from the Boltzmann distribution, albeit in the Grand Canonical Ensemble this time, the problem of the low probability of the interfacial states remains, and eectively prevents tunnelling between the two phases, just as we described for Boltzmann sampling MC with a variable order parameter in section 1.2.3. In practice, the method would be used more easily to treat each phase separately. The problem this time would be the equalisation of pressures between the two phases, rather CHAPTER 2. REVIEW 82 than the equalisation of free energies. We would simulate one phase for a variety of densities, by varying and measuring p and , then repeat the whole thing for the other phase, for the same set of values of , looking for the pair of -points where the pressures equalise. Once a single coexistence point has been found it would probably be easier to use Gibbs-Duhem integration, as we explained in section 2.1.1 Advantages and Disadvantages The most serious disadvantage is clearly that the problem of the interfacial region is not overcome; tunnelling between the two phases is still suppressed, so the two phases cannot be simulated together. However, given this, the method has the attraction that the need to measure free energy is replaced by the simpler problem (for potentials without singularities) of measuring the pressure. The use of the method is limited only by the density of the system; for dense uids acceptance of particle insertions/removals becomes very low because of the likelihood of hard-core overlaps, and in the solid phase the method fails altogether, because the insertion or removal of a particle disrupts the crystal lattice. The NpT -ensemble is better for the simulation of solids. We should also note that nite-size eects are large (though well-understood; see [140] where the method was applied to a critical 2d Lennard-Jones uid) and the method seems to be unusually sensitive to the quality of the random numbers used. There is a clear similarity between this method and Widom's, which measures the acceptance ratio for a single particle insertion without actually performing it. Both methods work well for the same kinds of system (uids of low to medium density). It should be noted that care must be taken to avoid a situation where the acceptance ratio of particle insertions/deletions appears to be high, but in fact a particle is simply being removed, leaving a vacancy, then replaced in the same density. This is best avoided by equilibration moves between the particle insertions/deletions. 2.2.7 The Gibbs Ensemble This method was introduced only recently [141, 142, 143, 144] but has already gained a great deal of popularity (see [145] and references therein). Its use is restricted to phase equilibria of uid systems, but given this restriction it has the great advantage that both p and are guaranteed to be the same in the two phases, so that a coexistence point on the phase diagram CHAPTER 2. REVIEW 83 is found immediately, without the need to search laboriously for the values of the control parameters that produce equilibrium, as is the case with TI and most other methods. In the Gibbs ensemble two phases are simulated simultaneously. They do not coexist in the same simulation volume, but are still in a sense kept in thermodynamic contact. This thus avoids the necessity of creating an interface between them, with the concomitant ergodic problems of crossing it. The way this is achieved is as follows: two simulation `boxes' are used, of volumes V1 and V2 , and volume-changing MC moves of both boxes are made as for constant-NpT simulation, but the total volume V = V1 + V2 is constrained to be constant. This ensures equality of pressure in the two phases. Similarly, the number of particles in each volume is not constant, but N = N1 + N2 is constant, so we move particles from one simulation box to the other. This ensures equality of chemical potential. As well as this we move particles around in each simulation box in normal MC fashion. The statistical mechanics of the system and the accept/reject procedure for the various kinds of moves are described in [144] and [48]. Equilibration normally causes one of the simulation boxes to come to contain the dense phase, and the other the rare phase, and the pressures and chemical potentials of the two boxes equalise. Note that although pressure and chemical potential are the same in the two phases, we do not immediately know what they are because the net pV and N terms for a volume-change or particle-swap move are zero for the two boxes together. They must be measured separately, the pressure by using equation 2.4 and by a version of Widom's method. See [144] for details. Advantages and Disadvantages The method is attractive because it is very easy to investigate the coexistence curve once the initial rather complex coding has been done. The method becomes dicult to apply in two regions: One is at the critical point, where the `dense phase' and `rare phase' change identities regularly and (more seriously) one cannot apply the nite-size scaling theory necessary to correct the results of Monte-Carlo studies of critical phenomena [140]. The other is at low temperatures where one phase is very dense and the other is very rare. As well as requiring a large total volume, this makes the particle-swapping moves into and out of the dense phase dicult. We should mention that, as with Grand Canonical Ensemble simulations, it is easy to be misled into thinking that an good acceptance ratio of particle-swapping moves being achieved when in fact the same particle is being moved back and forth between the same vacant sites. Nevertheless, CHAPTER 2. REVIEW 84 the Gibbs ensemble is extremely attractive for most non-critical uid-uid phase equilibrium problems, for which it seems to have made previous methods obsolete. 2.3 Other Methods 2.3.1 Coincidence Counting This method was developed by Ma [146], and provides a way of estimating the density of states (E ) for all states with appreciable P can (E ). To do this, we generate a set of Nc congurations, of which N (E ) have energy E , and measure nx(E ), the number of coincidences|the number of times we get the same microstate more than once. Now suppose we generate one more conguration. If the probability of another coincidence is PE (hit) then clearly < nx(E ) >N (E)+1= (nx (E ) + 1)PE (hit) + (nx (E ))PE (miss) If we may assume that the set of points N (E ) is randomly distributed in the set (E ) (which implies that we must not sample the trajectory at intervals shorter than the relaxation time), then PE (hit) = N (E )= (E ) and PE (miss) = 1 N (E )= (E ). Substituting these we nd < nx (E ) >N (E)+1= nx(E ) + N (E )= (E ) and, averaging over values of nx(E ), (the angle brackets now denote an average over all N (E ) steps, not just the previous one) < nx(E ) >N (E)+1 =< nx(E ) >N (E) +N (E )= (E ) Taking the ansatz < nx (E ) >N (E)= N (E )(N (E ) 1)=2 (E ) (2.23) we proceed by induction: < nx (E ) >N (E)+1 = N (E )(N (E ) 1)=2 (E ) + N (E )= (E ) = (N (E ) + 1)N (E )=2 (E ) which has the correct form. It remains only to check the case N (E ) = 2, for which we indeed nd CHAPTER 2. REVIEW 85 < nx (E ) >= 1 1= (E ) + 0 ( (E ) 1)= (E ) = 1= (E ) = N (E )(N (E ) 1)=2 (E )jN (E)=2 completing the proof. To apply the method, we use our measured nx (E ) to approximate < nx(E ) > and then equation 2.23 gives (E ). In [146], the estimates of (E ) are used to estimate the total entropy S ( ); it is straightforward to show from G = E TS (with no external eld), and the denitions of G and < E > in equations 1.4 and 1.6 et seq., that this is given by S ( ) = kB X E P can (E ) ln(P can (E )= (E )) (2.24) In [146] the method was demonstrated for a small 1D Ising model, and it has since been applied to a lattice model of an entangled polymer [147]. Advantages and Disadvantages p To get a good estimate of (E ) we need nx (E ) > 1 , implying N (E ) > (E ); however (E ) grows so fast with increasing system size that this requirement soon becomes impossible to satisfy: (E ) exp(aLds(E )=kB ) so the required N (E ) exp(aLds(E )=2kB ). (Nevertheless, note that this is a much less stringent criterion than N (E ) > (E ), which would be necessary if we attempted to nd (E ) by measuring directly the probability of a particular microstate. In this case N (E )direct exp(Ld s(E )=kB ), exponential growth at twice the rate). Although it is ultimately limited by the exponential growth of N (E ), Ma's method may be good enough to enable us to measure the entropy of subsystems of the simulation that are large enough to be independent (ie their size exceeds the correlation length ) so that we can get the total P P entropy by combining the subentropies in a simple way: S A SA + A;B SAB where SAB = SA+B SA SB . It is clearly most suitable for systems where the increase in entropy with system size is comparatively slow (i.e. the prefactor a in the exponential is quite small). The entangled polymer studied in [147] falls into this category because the entanglements reduce the accessible volume of phase space. 2.3.2 Local States Methods First used by Meirovich [148], these have been developed further by Schlijper et. al. [149]. We measure a local entropy Slo (L) dened as in eqn. 2.24 but with the sum taken over small sublattices (clusters) of linear size L embedded in a larger simulation, typically 64x64 in 2d. L is normally chosen to be 2,3 or 4; increasing it improves accuracy but increases the computer CHAPTER 2. REVIEW 86 time and memory required exponentially. Since the clusters are small we can use eqn. 2.24 directly. The bulk entropy density s is found by extrapolations of L dSlo (L) using the cluster variation method (CVM) [150]. The accuracy of the method depends on the rate of convergence of L dSlo (L) as L increases; it is thus worst near c. Meirovich calculated S for the fcc Ising antiferromagnet, and compared with integration [151], local states seems appreciably better in high magnetic eld but shows no improvement in low eld. Schlijper et. al. have improved the method by supplementing the CVM calculations with a Markovian calculation of s based on the dierence of the cluster entropies of two stepped (d 1)dimensional clusters with one having one more site at the step than the other. This second calculation provides an upper bound on s whereas the CVM gives a lower bound; Schlijper et. al. therefore take the average of the two as their best estimate of s and half the dierence as a rigorous bound for the intrinsic error. They present results for the 2d and 3d Ising models and the 3d 3-state Potts model and claim an impressive accuracy. Advantages and Disadvantages Because of its reliance on the CVM for extrapolation, the method is applicable only to a restricted class of models, lattice models with translational invariance. This rules out large classes of interesting systems (e.g. uids, spin glasses). The CVM is also not a transparent method and requires considerable mathematical eort to apply. Having said this, high accuracy is available where the method can be applied, and the existence of both upper and lower bounds on the entropy is attractive. 2.3.3 Rickman and Philpot's Methods In this method [152] the problem of the measurement of the free energy of a solid is tackled using the idea of measuring the probability of a single microstate, in this case the ground state, in which all the particles are xed on their lattice sites. This probability is far too small to measure, so instead MC simulation is used to measure the probability < () > that all the particles are inside spheres of radius about their lattice sites, then extrapolate to nd < (0) > p by tting to the < () > data a function of the form fit = [(2= ) (3=2; 2 )]N where N is the number of particles and (x; y) the incomplete gamma function. They give results for the Lennard-Jones solid and claim an accuracy of 0.3%, although the agreement with results of integration is only 1%. Their method is attractive but is rendered dubious the very complex CHAPTER 2. REVIEW 87 form of fit . (Although they do have physical grounds for chosing such a tting function.) More recently Rickman working with Srolovitz has devised another method [153], which also falls into this category. It works only for lattice models with discrete conguration spaces, and is described and tested for the 2d Ising model. The idea is to estimate (E ) by dening some large subset (E ) which can be easily identied and enumerated by fairly simple combinatoric methods: for example, those congurations which consist of n isolated spins in a `sea' of spins pointing in the opposite direction (so these have an energy E = 8n with coupling J = 1). We then run a normal Boltzmann sampling algorithm and test whether or not each conguration falls into set (E ). The fraction that do, 0 (E ), gives us an estimate of (E ) from 0 (E ) = (E )= (E ). From a single (E ), the partition function for the temperature of simulation can be estimated; from a set of them Z ( ) for a range of temperatures can be found. The method gives very high accuracy (0:02% from 108 congurations) for the admittedly unarduous task of nding G(; H = 0) for the 6 6 Ising model. Lattices up to 10 10 are studied; the method is limited by the tendency of 0 to become immeasurably small for any suitable set that is both enumerable and can be easily identied. Extension to other lattice models or to F (M ) is fairly obvious. 2.3.4 The Partitioning Method of Bhanot et al. This method [154], like multistage sampling, concentrates on nding the density of states function (E ) and uses overlapping distributions to do so. We begin from a `simple sampling' MC algorithm, (see section 1.2.2), which generates congurations with a p.d.f. proportional to (E ). This time, however, we partition the allowed energy spectrum into narrow, overlapping blocks. In one MC run we start the simulation in one of the blocks and constrain the energy to remain within it by rejecting all MC moves that would take us outside. The block is `narrow' in the sense that all energies within it appear with appreciable probability in the MC run, but it must also be wide enough that we can in principle reach any conguration in the block from any other. We then repeat the process for all the blocks. Within each block we have obtained the relative probabilities of the dierent energies and because the blocks overlap we can combine results of adjacent blocks and normalise, eventually obtaining the absolute probability of any energy in an unconstrained simulation. Knowledge of (E0 ) or TOT then gives us the density of states function and F follows. (As do all other thermodynamic quantities). There is a cumulative error in overlapping large numbers of probability distributions but nevertheless CHAPTER 2. REVIEW 88 Carter and Bhanot obtain high accuracy, comparable with any other method, in their simulations. The 3d Ising model is treated in reference [154] and the method has had several other applications, for example to the phase transition in the nite-temperature SU (2) lattice gauge theory [155]. We note that this method is somewhat dicult to classify; we have put with the `direct' methods because (E ) is measured directly, but we have already said that it resembles multistage sampling, while the use of constraints and the absence of Boltzmann weighting would justify classifying it in the non-Boltzmann methods. Advantages and Disadvantages The method is applicable to o-lattice systems and is simple conceptually and algorithmically| because we do not need to calculate Boltzmann factors, there is a substantial increase in the speed for a single update compared with multistage sampling. However, the necessity of choosing narrow blocks means that we deliberately make worse the ergodic problems with metastable states that we know occur around phase transitions. This is not a problem with the Ising model, for which it can be shown that as long as the blocks contain at least four energy macrostates ergodicity is guaranteed, but might be expected to be so with more physically realistic models. 2.4 Discussion We can see that a great many MC methods of measuring free energy have been tried. Figure 2.1 below is an attempt to group conceptually related methods, to simplify the task of seeing the connections between them. This gure requires some explanation, because even those methods that we have classied dierently frequently have points of resemblance to one another. In fact a single classication is impossible, because the measurement of free energies and the nding of phase boundaries is a process which involves a several dierent issues, not a single one, and one method may resemble a second method in the way it tackles one issue and a third in the way it tackles another. Realising this enables us to analyse and classify the methods more easily and also to envisage possible improved methods which might combine the strong points of dierent approaches. First it is convenient to deal with those methods, discussed mostly in section 2.3, that stand CHAPTER 2. REVIEW 89 Free Energy & Entropy Non-Boltzmann Methods Direct Measurement of Probability of State Coincidence Counting Local states Rickman & Philpot’s Methods Umbrella Sampling Integration-Perturbation Methods Overlapping Canonical Ensembles Multicanonical Ensemble Thermodynamic Integration Density-Scaling Histogram Methods Partitioning Multistage Sampling Acceptance Ratio Method Dynamical Ensemble } Expanded Ensemble Mon’s Method Particle Insertion Grand Canonical Monte Carlo Gibbs Ensemble Figure 2.1. A possible grouping of methods of measuring free energy. The three main groupings reect the three groups into which methods have been classied in this chapter, while the dotted boxes show connections between methods in dierent groups; withing the dotted box, the most closely related methods are arranged side by side. out from the mainstream in that they attempt to tackle the problem of measuring a free energy in a direct way, using the fact that if we could measure the probability of the appearance of a single microstate, or a group of degenerate microstates whose degeneracy we knew, we would know the partition function by P can () = Z 1 exp[ E ()]. These include Ma's coincidence counting method [146], the local states methods [148, 149], and Rickman and Philpot's method for estimating the probability of the ground state [152]. Because of the exponentially large number of microstates, these methods usually rely on ancillary extrapolation procedures and in any case do not work well for large systems. However, we should mention that, at least for lattice models, the multicanonical method can be applied in a way that puts it in this group: we can measure the probability of a single macrostate in the multicanonical ensemble directly without the need to extrapolate, and then calculate from this the corresponding probability in the canonical ensemble. If, therefore, we choose a macrostate that consists of only one or two microstates (for example, the ground state of the Ising model), then Z can be calculated. We describe this variant of the method in more detail in section 3.1.2. Now let us turn to the remainder of the methods, which are attempts to deal with one or both of the problems identied in section 1.2.3. The most obvious way to categorise them is CHAPTER 2. REVIEW 90 by way of the algorithm employed (by which we really mean the sampled distribution here). First there are those methods, that employ simply the NV T (or sometimes NpT ) canonical ensemble in largely unmodied form|thermodynamic integration, multistage sampling and the acceptance ratio method. These are put in the central `box' of gure 2.1. Because of the unsuitability of the Boltzmann distribution to direct free energy calculations, these methods rely on using several independent Boltzmann sampling simulations, which between them cover all the necessary states in conguration space. These methods are most naturally applied to each phase separately, connecting the state of interest to a state of known free energy. They do not handle the direct connection of the two phases well, because the Boltzmann distribution for states in the interfacial region is slow to equilibrate and extremely sensitive to minute variations in the control parameters that push the system from one phase to another. Under some circumstances ergodic problems may also aect the estimates of < E ( ) > in those simulations that are run at low temperatures, or < p(V ) > (see equation 2.4) in those simulations that are run at high density. The dierences between these methods themselves lie less in the nature of the Monte Carlo process|one could imagine them producing and using the same Markov chain of congurations| than in what they measure from the congurations generated and the estimators of the free energies that they extract from the measurements. In TI a canonical average is measured for each stage which is integrated to give a free energy. In Multistage sampling a p.d.f. is measured and overlapped between the stages (or extrapolated and overlapped if Bennett's extension is being used), while in the acceptance ratio method the transition probability between the ensembles of each stage is measured. This classication by estimators is the second way that methods of free energy measurement can be classied, and is to some extent separate from the classication by algorithm, though the choice of algorithm usually has implications for a sensible choice of estimators: we would not necessarily choose the spacing (in temperature or whatever) of the series of canonical simulations in the same way for TI and multistage sampling. All the methods are fairly easy to implement|TI perhaps the easiest|and for all of them the behaviour of Monte-Carlo errors is similar: the total error is obtained by combining estimated errors from within each ensemble, as described in section 2.1.1. We can also include Widom's particle insertion method and Mon's method in this group. Widom's method is like the acceptance ratio method between two canonical ensembles diering by one in the number of particles present, while Mon's method is like multistage sampling, with an unusual but intelligent choice CHAPTER 2. REVIEW 91 of `dierence' between the canonical ensembles that means rstly, that the exponential whose average must be taken grows only slowly with system size, and secondly, that the nite-size corrections to the limiting behaviour of the free energy are obtained directly. To continue with the classication by algorithm, the second group, put in the right-hand `box' of gure 2.1, can broadly be called non-canonical methods. Unlike the methods of the previous group, they allow direct connection of the two phases. In Grand Canonical Monte Carlo (V T ensemble) it is unnecessary to measure since it is input, but the does suer from the presence of the interface, necessitating a search in to obtain equal values of p, which must be measured, in both phases. The Gibbs ensemble elegantly avoids the problem of sampling across through the interface region and, as we have said, is probably the method of choice where it is applicable. At least for lattice models, the shape of the p.d.f. of the Dynamical ensemble may also oer a way round the problem of low sampling probability in the interface region. And the two methods that will most concern us in this thesis also both fall into this group: the expanded ensemble and the multicanonical ensemble. By expanding the volume of conguration space accessible to the simulation, they both enable the determination of free energies in a single simulation. They may be applied either to connect two phases directly, or to nd the absolute free energy of each. The free energies are obtained from MC estimates of the probabilities of the macrostates/subensembles, reweighted to correct for the non-Boltzmann sampling. There is an obvious similarity between these two methods, and we can in fact put them both into the same framework, though we shall wait until section 3.4.1 to do this. These methods are relatively good at overcoming ergodic problems because they do not consist of a series of Boltzmann simulations, each conned to a narrow range of macrostates; they have the ability to reach macrostates of high energy or low density, where decorrelation of congurations is rapid. Of course, ergodic problems are to some extent an inevitable feature of Monte-Carlo simulation; one can never be sure that equilibration is complete and that there is not some `hidden' region of phase space that has not yet been found. Nevertheless, the situation here is obviously better than in the case of TI and the other methods of the rst group. The total error of the process is also obtained more simply than for the multistage methods; the standard blocking methods that work for a single canonical ensemble can be used. Though we have classied them above as using both dierent algorithms and dierent estimators, there is an obvious kinship between some of the `multistage' methods and some of CHAPTER 2. REVIEW 92 the `non-Boltzmann' methods, as we have shown in the gure by connecting them with a dotted rectangle; in particular, the acceptance ratio method bears an obvious similarity to the expanded ensemble in the case where each of the `subensembles' of the latter corresponds to a separate stage of the acceptance ratio method: the dierence is that whereas we merely measure the transition probability of a trial move in the acceptance ratio method, we actually perform the transition in the expanded ensemble. The issue is slightly obscured because the estimator of the probability of the `subensembles' is dierent in the two cases as well|< Me (E ) > in one case and the histogram of visits to the subensemble in the other. In the same way, grand canonical Monte-Carlo and the particle insertion method are related: the rst method actually performs the transitions whose probability is recorded in the second. The partial separation we have eected between the algorithm used and the estimators of free energy dened on the congurations produced enables one not only to understand more clearly the plethora of methods in the literature but also to think of combinations of the two that have not previously been tried. For example, we could employ the expanded ensemble method, making transitions between the subensembles, but record < Me (E ) > and use that as the estimator of the probability of a state. Indeed it is to be expected from Bennett's analysis [84] that the variance of this estimator would be slightly lower, because it contains information on how much each transition is accepted or rejected by. This information is discarded by using only the histogram. Thus, by this method we would keep the advantages of using the expanded ensemble (reduction of ergodic problems) but combine them with some of the virtues of the multistage methods. Or we could envisage measuring < E >can in a multicanonical simulation, then evaluating free energies by integrating as in TI. This combination will be tried in section 3.3.1. We should point out that almost all the methods we have discussed are united in the need to face the problem of exploring in some way a large volume of conguration space, whether they tackle it by doing one or a series of simulations: whereas all congurations within a single phase are in a sense `similar,' this is not the case with the congurations in two dierent phases, or the congurations in one of the phases of interest and those characteristic of the high temperature/low density limit which can serve as a reference system. Now, the metropolis algorithm permits only a small change in conguration at each update step, if the acceptance ratio is to remain reasonably high (in the expanded ensemble context this corresponds to a small change in temperature/energy function). Therefore, to sample both phases or to connect to the CHAPTER 2. REVIEW 93 reference state, the conguration must be changed completely in kind by the accumulation of small perturbations. It is this that makes the free energy problem especially, and to some extent unescapably, demanding. Even though this transformation of the conguration is not explicitly carried out in methods like multistage sampling, the requirement of overlap of distributions means that an equivalent pathway must be opened up. This way of looking at the problem makes it clear why Mon's nite-size scaling method gives good results, at least for lattice models; the congurational dierence between the system with energy functions E2L and E L is comparatively small. It also explains why TI may in the right circumstances give very good results, since it can avoid the need to pass in small steps between very dierent congurations (although it has its own disadvantages too). Issues to be Investigated In the remainder of the thesis we shall concentrate on investigations of the multicanonical ensemble (and, to a lesser extent, the related expanded ensemble). We do not see a clear dierence between umbrella sampling and the multicanonical ensemble, which itself is similar to the expanded ensemble. The dierence seems largely to have been in the class of problems to which they have been applied|condensed matter for umbrella sampling, lattice models and lattice gauge theory for the multicanonical ensemble. Thus, the multicanonical ensemble is in many ways more a rediscovery of the principles of umbrella sampling than a new development in its own right. Nevertheless, this rediscovery has clearly provoked a new wave of interest, prompted perhaps by a realisation that umbrella sampling ideas are not as limited as had been thought by the diculty of nding a suitable sampled distribution [46]; this was, we believe, the most important reason that the original umbrella sampling was not more widely adopted. The advantages of the multicanonical approach are particularly clear in cases where there are very severe ergodic problems, like spin glasses, but even for more general problems of free energy measurement we believe that the method is made attractive by the fact that only a single simulation needs to be performed and the free energy is obtained more transparently by direct measurements of probability rather than through an integration process. The most signicant disadvantage remains the diculty of nding a suitable sampled distribution. Since to generate the sampled distribution requires knowledge of the very free energies that we are trying to measure, it must be done by an iterative process. Though ad hoc methods that work reasonably well in practice have been introduced (see, e.g. [122]) further progress in CHAPTER 2. REVIEW 94 this regard is required before the method can be used `o-the-shelf.' We shall make extensive investigations of iterative methods to do this in chapter 3, applying Bayesian methods to the problem and introducing a new method based on the use of a histogram of accepted transitions to construct estimators of the probabilities of the states. We shall look at the application of the method both to the single-phase and to the coexistence problems. The quantity that controls the multicanonical error might be expected to be the random walk time rw over the wide range of accessible states. While this is to some extent true, we shall show that the relation of rw to the error of the nal free energy estimators is not what might be expected. We shall also compare the eciency of the multicanonical and expanded ensembles with that of thermodynamic integration. While developing simulation methods we shall concentrate on applications to the 2d Ising model, but in chapter 4 we shall apply the techniques to the simulation of a solid-solid phase coexistence in a simple model of a colloid. Chapter 3 Multicanonical and Related Methods 3.1 Introduction In section 2.2.2 we gave a qualitative description of the multicanonical ensemble and reviewed its uses in the literature. To recap, the dening characteristic of multicanonical simulations is a sampled distribution which is more or less at over at least a part of the space of macrostates of a chosen variable (called the `preweighted variable') of the system, which will be either internal energy E or magnetisation M here. Since its introduction in [58], the multicanonical ensemble's uses have included: the measurement of interfacial tensions in Ising models [58, 101, 102], Potts models [58, 103, 104, 115, 120] and lattice gauge theories [105, 106, 107], application to the Edwards-Anderson spin glass to measure internal energy, entropy and heat capacity [108, 109, 110, 111], the study of the 2d 4 theory (combined with multigrid methods [116, 117]), and the study of protein tertiary structure [113, 114]. The method is reviewed by Kennedy [59] and Berg [99] and some recent algorithmic developments are reviewed by Janke [100]. In this chapter, we shall mainly, though not exclusively, be concerned with the application of the multicanonical ensemble to the measurement of absolute free energies. We shall rst describe how the multicanonical ensemble may be used to do this, and then (in section 3.2) we shall investigate ways of producing the required importance-sampling distribution, which is unknown 95 CHAPTER 3. MULTICANONICAL AND RELATED METHODS 96 a priori. This will lead us to a new method where inferences are made from the observed transitions made by the simulation, a method which will turn out to have other important uses. In section 3.3 we shall then present some new results for the behaviour of the critical p.d.f. of the magnetisation of the Ising model at high M , along with results for densities of states and canonical averages for the 2d Ising model and a comparison with thermodynamic integration. Finally we shall widen the discussion to include other non-Boltzmann sampling methods, such as those of Fosdick [96], Hesselbo & Stinchcombe [97], and the expanded ensemble (section 2.2.3, [60]). We shall expand on the comments we have already made in chapter 2 about the similarity of this latter method to the multicanonical ensemble. We shall also present some new theory on the variance of estimators obtained from the multicanonical/expanded ensemble. As well as laying to rest some `folk theorems,' this will enable us to investigate the question of which non-Boltzmann importance-sampling scheme is optimal for the measurement of a particular quantity. All the investigations made in this chapter will be made using the 2d square nearest neighbour Ising model with coupling constant J = 1, as described in section 1.1.2. First, then, let us briey describe how the multicanonical ensemble will be used in this chapter. We shall describe two ways of measuring absolute free energy by preweighting in energy, and we shall also describe magnetisation preweighting. 3.1.1 The Multicanonical Distribution over Energy Macrostates As we have said before (section 1.2.3), absolute free energy can in principle be calculated from < exp[E ()] >can= TOT Z (3.1) ln Z equals F or G depending on whether the ensemble we are simulating has a constant or variable order parameter. We shall be concerned here with the case where the order parameter (the magnetisation) is variable; ln Z = G for Ising. Now, Boltzmann sampling cannot be used to evaluate the free energy with equation 3.1 because it gives exponentially small weight to the high-energy congurations that dominate the expectation value in equation 3.1 (compare O2 in gure 1.10 in section 1.2.2). We require, then to give more weight to the high energy states. To describe how to do this, we shall rst give a general formulation of the use of non-Boltzmann sampling distributions, then specialise to the energy case. As we saw in chapter 1, the Metropolis algorithm can be CHAPTER 3. MULTICANONICAL AND RELATED METHODS 97 used to sample from any distribution over the congurations space. Suppose we sample from a distribution with measure Y (), which we can do by accepting trial transitions from 1 to 2 with probability min(1; Y (2 )=Y (1 )). Then the expectation value of an operator O() is O()Y () < O >Y = fPg Y () P f g Now, even though we have sampled from the distribution with measure Y , we can also write expressions for averages with respect to another distribution with measure W . To see this, consider the ratio )W ()Y 1 ()Y () fg O(P fg Y () P fg Y () fg W ()Y 1 ()Y () P fP g O( )W ( ) = fg W () < OWY 1 >Y = < WY 1 >Y P P = < O >W (3.2) To nd canonical averages (< O >can) when the congurations are sampled from a distribution with measure Y / P (), we substitute W = exp( E ), giving exp( E )Y 1 >Y < O >can= <<Oexp( E )Y 1 > Y (3.3) The notation is slightly simplied if we characterise the sampled distribution by its dierence from the Boltzmann distribution, introducing a function (), which gives an extra weight proportional to exp[()] to each microstate. The sampled distribution thus has measure which gives Y () = exp[ E ()] exp[()] (3.4) exp( ) >Y < O >can = <<Oexp( ) > (3.5) Y An estimator of this from a nite sample of Nc congurations is Nc O( ) exp( ( )) i i =1 < O >can e=:b: O~ = iP Nc exp( ( )) i i=1 P (3.6) where the congurations are assumed drawn from the distribution dened by equation 3.4. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 98 Note that the right hand side of equation 3.6 provides an estimator of < O > for any (), though as only a few choices are usable in practice; for most, the sampling probability is such that the congurations that dominate the sums of either O() exp( ()) or exp( ()) (or both) will be generated with innitesimal probability. The possibility of applying equation 3.3 has been appreciated since the early days of computer simulation; see Fosdick's paper of 1963 [96], an early attempt at nding an `optimal' sampled distribution. It includes as special cases Boltzmann sampling (put Y = exp( E ), when the denominator becomes a `minimum variance estimator', a constant for all congurations) and `simple sampling' (Y = 1). However it has been little used for a long time, mainly because Boltzmann sampling is very successful for most choices of O (internal energy, magnetisation) and is very simple conceptually. Though we shall come back to more general sampled distributions in section 3.4, we shall until then concern ourselves only with multicanonical distributions. All the equations for the estimators O~ that we produce (equation 3.7 and so forth) are true for any values of fg in the limit of very long sampling time, but, because of the failure to sample the important congurations frequently, would give very poor estimators in the run-times accessible in practice. We rst consider the case of a multicanonical distribution with energy the preweighted variable, so only the value of the energy macrostate is relevant to determine : () = (E ()). As we said when introducing this method in section 2.2.2, multicanonical sampling means that the sampled distribution P xc(E ) (for energy preweighting; P xc(M ) for magnetisation etc.) of energies extends right up to very high energies and is roughly at, as shown schematically in gure 3.1. Such a distribution is produced by a set of coecients (E ) = xc (E ) (we shall use `xc' to signify `multicanonical' in mathematical expressions). Equation 3.6, rewritten as a sum of energy macrostates, and written specically for the multicanonical ensemble, is xc xc EPP~ (E )O(E ) exp[ (E )] (3.7) E P~ xc (E ) exp[ xc (E )] where P~ xc(E ) are estimators of P xc (E ); the most obvious way to produce them is simply P to use P~ xc(E ) = C xc (E )= E C xc (E ), where C xc (E ) is the histogram of energy macrostates visited in the multicanonical run. However, there are other ways of estimating P xc(E ). Now, O~ = P P xc (E ) exp[ xc(E )] P can (E ), so the sum in the denominator is dominated by energies CHAPTER 3. MULTICANONICAL AND RELATED METHODS 99 can P (E)exp( β E) can P (E) E Figure 3.1. A schematic diagram of a typical Histogram sampled from a multicanonical distribution, and the estimates of P can(E ) and P can (E ) exp(E ) that may be recovered from it. around < E > , as indicated in the diagram. Conversely, for O exp(E ), P xc (E )O(E ) exp[ xc(E )] P can (E ) exp(E ) (E ) so the sum in the numerator is dominated by the maximum of (E ), which occurs at high energy (at E = 2L2 for the Ising model). However, since the multicanonical distribution extends over both regions of energy space, both sums are estimated to good accuracy. Indeed, for the multicanonical distribution shown in gure 3.1, the estimators of < O > will be accurate for all operators O which depend only on E () (and its conjugate eld ). We get not only free energies but also internal energies, heat capacities etc., and not only at temperature but also at all other temperatures ^: to evaluate these we return to equation 3.3 and replace exp( E ) ^ ) and (if appropriate) O by O^. This leads eventually to the following equation, by exp( E analogous to equation 3.7: O~^ = xc xc E P~ (E )O^ (E ) exp[( ^)E ] exp[ (E )] P E P~ xc (E ) exp[( ^)E ] exp[ xc (E )] P (3.8) CHAPTER 3. MULTICANONICAL AND RELATED METHODS 100 ^ ) The denominator is now dominated by the maximum of P^can (E ), while, for O exp(E (to estimate G(^)), the sum in the numerator is still dominated by the maximum of (E ). Thus, depending on O and ^, terms from various parts of E -space will dominate the sums in equation 3.8, but since the multicanonical distribution is at, we are sure to have sampled the relevant part of E -space. Even if the sampled distribution is only multicanonical over a part of E -space, then these assertions will still be true as long as O and ^ are such that all the appreciably large terms in the sums in equation 3.8 come from the multicanonical part. We should also note that the estimators O~ in equations 3.7 and 3.8 are ratio estimators, that is to say they are the ratios of sums, and as such are slightly biased. A way of removing this bias is to use double-jackknife bias-corrected estimators, which are described in appendix D. One other operator that we shall consider explicitly is the one that allows the measurement of free energy dierences between and ^. This is OG = exp[( ^)E ]: it follows straightforwardly from the denition of the Boltzmann distribution that < OG >can = Z (^)=Z ( ) = exp[G( ) ^G(^)] Substitution of this operator into equation 3.7 and consideration of the numerator and denominator reveals that an accurate estimator will be obtained provided that the peaks of P can ( ) and P can (^) are both in the multicanonical region. This operator is of interest because there are many systems, such as uids, for which we cannot not use equation 3.1 to calculate absolute free energies because (E ) increases without limit. However, we can still use < OG > to ^ (^) is known exactly, or is calculable estimate free energy dierences, and if ^ is such that G to high accuracy by some approximation scheme (e.g. the virial expansion), then the absolute free energy at can be obtained this way. Indeed, the formula 3.1 can be seen as the ^ = 0 (innite temperature) limit of < OG >, using our knowledge that lim!0G = ln TOT , provided TOT is nite. As will be observed, for operators that lead to free energies or free energy dierences, there are two widely separated regions of E -macrostate space that make important contributions, and the rest of the multicanonical distribution does not contribute directly, but is important only in so far as it permits tunnelling between them (necessary to nd the relative weights of numerator and denominator in equations 3.7 and 3.8). Since in the multicanonical distribution CHAPTER 3. MULTICANONICAL AND RELATED METHODS 101 every macrostate is equally probable, one would expect that at each Monte-Carlo step the probability of moving to a higher macrostate would be about the same as the probability of moving to a lower one. This suggests that the simulation should perform a random walk through macrostate space and therefore that e (E ) V 2 (the total number of macrostates Nm V , the volume of the system, and the separation of the peaks of P can(E ) and (E ) scales in this way too). This is very similar to the way that the multicanonical method reduces the tunnelling time e between the two phases in a simulation of a rst order phase transition (see section 1.2.3); e depends only on a power of the system size instead of increasing exponentially with it (see equation 1.26). Indeed, when the multicanonical ensemble was introduced [58] this aspect of the algorithm's performance was particularly emphasised: e V 2:35 in [58]. However, it is not obvious that the at multicanonical distribution is necessarily optimal, and we shall return to the question of exactly how much weight should be put in the region between the peaks in section 3.4.3. We should note that, while many of the applications of the multicanonical ensemble have been to rst-order phase transitions, [58, 101, 103, 104, 115, 116, 120], the measurement of absolute free energies by using knowledge of TOT is referred to only in [120], where it is used in a calculation of S (E ) = (E F (E ))=T . The overall normalisation is also used in calculations of the degeneracy of the ground state of spin-glasses [108, 109]. The dierence in approach in the phase transition problem arises because most authors use the multicanonical method in the form where the free energy dierence between two phases is measured by tunnelling through the interfacial region. Absolute free energies are not required to do this, provided that a way can be found to directly connect the two phases. Indeed, at coexistence the free energy dierence is zero and so all that is required is to reconstruct P can and show that the sums over the two phases are equal. This method is not appropriate for energy preweighting of the 2d Ising model because P can (E ) never develops a double-peak structure; however, it is appropriate for magnetisation preweighting. Finally let us return to the question of what 's we may regard as multicanonical. First, we repeat that the required set xc is unknown a priori; to produce a perfectly at sampled distribution we would need xc(E ) = F (E ) = ln P can (E ), where direct measurement of P can (E ) gives us an estimate that is at rst indistinguishable from zero for most E -states. Thus, to produce the multicanonical distribution implies that we need at least a partial knowledge of the very quantities we wish to measure. In practice, then, we shall never be able to CHAPTER 3. MULTICANONICAL AND RELATED METHODS 102 use a perfectly multicanonical distribution, but only an approximately at distribution. In fact, all the advantages of the `ideal' multicanonical distribution remain as long as the distribution is `approximately at,' so that we obtain good sampling in all states: while it is the case that sampling from a distribution that is only roughly at will lead to larger expected errors than sampling from a completely at one, the dierence in the expected error bar is only a few percent between a completely at distribution and one where we impose only that P xc (E ) = O(P xc (E 0 )) 8 E; E 0 in the range of interest. We shall therefore regard such sampled distributions to be multicanonical too. Where necessary, we shall use the notation (E ) for the `ideal' multicanonical distribution in which every macrostate has exactly equal probability, to distinguish it from xc (E ), which implies only one of many possible sampled distributions that are close enough to multicanonical to be used as such in practice. The condition on P xc demands that xc (E ) should dier from (E ) by terms of order unity, i.e. a constant absolute error. But we know that (at least away from criticality) = F fLd , so therefore the fractional accuracy with which must be known to produce a multicanonical distribution increases with increasing system size. Moreover, the set is not xed absolutely even by the requirement that it produce a particular sampled distribution; if 1 gives a multicanonical distribution (or indeed any other sampled distribution) for a particular , then so does 2 where 1 (E ) = 2 (E ) + k 8 E , k constant. We shall adopt the convention that k is to be chosen such that minE ((E )) = 0. Indeed, there is even less restriction than this; it will be noted that the ^ ) in equation 3.8 are independent of the parts of E -space that dominate the estimator of exp(E temperature of the multicanonical simulation, . This shows that can be chosen more or less at will; if we have a multicanonical distribution produced by coecients (; E ) for one temperature then the same sampled distribution would be produced by (; E ) = ( )E + (; E ) at temperature . It would perhaps be simplest in practice to choose = 0, though this is not what has generally been done in this thesis. We shall consider various iterative procedures for generating a suitable xc (E ) in section 3.2. 3.1.2 An Alternative|The Ground State Method Aside from the use of equation 3.1, there is another way that a multicanonical simulation can give access to absolute free energy: by enabling us to measure the canonical probability of a macrostate that contains a known number of microstates, such as the ground state E0 (which is two-fold degenerate in the case of the 2d Ising model). We rst calculate CHAPTER 3. MULTICANONICAL AND RELATED METHODS xc xc(E )] P can (E ) = PP P (xcE()Eexp[ ) exp[ xc (E )] E 103 (3.9) And then use P can (E0 ) alone to determine the free energy: G = ln Z = E0 + ln P can (E0 ) ln 0 Thus we need to know P can (E ) both at E0 and also for those macrostates near its maximum, which will dominate the normalisation (the denominator of equation 3.9). This implies that we require the multicanonical distribution to overlap completely with the region around < E > and also to extend down to the ground state. This is in contrast to the previous method where the multicanonical distribution had to extend upwards from the region around < E > to overlap with the maximum of exp(E )P can (E ) (E ). As before free energies at other temperatures ^ may be estimated provided that the multicanonical distribution extends to cover the peak of P^can (E ). Whether this technique or the one of measuring < exp(E ) > is better depends on the algorithm and the temperature(s) of interest; to a rst approximation, if < E > is near to E0 (as will be the case for large , i.e. low temperature) then the ground state method will be better; if it is near to the maximum of P can (E ) exp(E ) (as for high temperature) then the < exp(E ) > method will be better. However, the situation is complicated by the variation of acceptance ratio over the wide range of E that we are covering; for instance the Metropolis algorithm slows down very dramatically near the ground state of the Ising model (the acceptance ratio decreases like 1=Ld), owing to the diculty it has in `nding' the spins that must be ipped to `steer' the system into the single microstate of the ground state out of the higher energy states with their exponentially large number of microstates. However, other algorithms, like the n-fold way [156] and generalisations of it [157], can alleviate this problem. Once again, we shall comment further on this matter in section 3.4. In the literature, measurement of the ground state probability seems only to have been used to nd the unknown ground state degeneracy of spin glasses [108, 109], not to give the overall normalisation in a case where the ground state degeneracy is known, and thus the absolute free energy. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 104 3.1.3 The Multicanonical Distribution over Magnetisation Macrostates In the canonical ensemble (we show an external eld for generality, though in the Ising case we shall be concerned only with H = 0) 3 2 P can(M ) = exp(HM ) X 4 f g (M M ()) exp( E ())5 =Z ( ) = exp(HM ) exp( F (M ))=Z ( ) (3.10) (3.11) dening the free energy functional F (M ). We discussed the form of P can(M ) in section 1.1.3. For H = 0, it has two peaks for c and one (at M = 0) for < c. Except exactly at c , these are Gaussian and change shape as L ! 1 in such a way that their width, when expressed in terms of m = M=Ld, becomes vanishingly small. For > c the states around M = 0 correspond to mixed-phase congurations. By introducing xc (M ), so that P xc() / exp[ E () + xc (M ())] and P xc(M ) / exp(HM ) exp[ F (M ) + xc (M )] we may produce a multicanonical distribution, at over some range of M values. From measured multicanonical probabilities we can then recover the canonical distribution at a value of the applied external eld H^ dierent from the value H prevailing during the simulation, by using xc ~ xc ^ P can (M; H^ ) = PP ~(xcM ) exp[ (H ^ H )] exp[ (xcM )] M P (M ) exp[ (H H )] exp[ (M )] (3.12) where P~ xc(M ) may be estimated from C xc (M ), the histogram of visited magnetisation states in the multicanonical distribution, or by some other means. This equation may be used to tackle the free energy problem in the same ways as it was in the energy case. If the range of M that is multicanonical embraces those values typical of the two coexisting phases, then we may simulate coexistence directly; a good nite-size estimator of the innite-volume rst-order CHAPTER 3. MULTICANONICAL AND RELATED METHODS 105 phase transition occurs where the two peaks of P can (M; H^ ) have equal weight, i.e. X M 2A P can (M; H^ ) = X M 2B P can(M; H^ ) where A and B are the two phases. It is essential to determine P can (M ) indirectly, via the multicanonical ensemble for two reasons: rst, to allow tunnelling between the peaks, which is necessary to nd their relative weight whatever the external eld may be; and second to allow reweighting to dierent values of H^ until we nd the one that satises the equal-weights criterion. This method has been used to determine the phase coexistence curve of the LennardJones uid in [121], and we shall use it in section 4.3 of chapter 4. It is not needed for the Ising model where the location of the coexistence line is determined by symmetry. It also allows accurate measurement of P can (M ) in the interface region, which enables us to measure interfacial tension (see the discussion of mixed states in section 1.1.3 and [58, 102, 101]). Equation 3.12 can also be used to measure the absolute free energy of a single phase without crossing the interface, if this is dicult for some reason (for example because, for an o-lattice system, it would necessitate growing a crystal out of a uid). In the Ising case the absolute free energy G is most easily found by producing a multicanonical distribution that extends all the way to the fully saturated states at M = Ld. This has been done, for the rst time to our knowledge, in section 3.3.3. It enables us to obtain another, very accurate, estimate of G by the method described in section 3.1.2. We also use this distribution to study the scaling of P can (M ) at large M . The sampled distribution used in section 3.3.3 in fact extends over all magnetisation states, so that G for the entire system is obtained; however the free energy of just one phase would be obtained if M were restricted to embrace just the values characteristic of the phase. This approach would be less useful for o-lattice systems (for example) where there is not a single saturated state at very large values of the order parameter (volume), while at very small values the there is the close-packed crystal which has innitesimal canonical probability. For them a method analogous to the use of equation 3.8 in section 3.1.1 could be used. We shall keep within the Ising context to describe this method. By substituting into equation 3.11, it is easy to show that < exp[M (H^ H )] >can = exp[ (G(; H^ ) G(; H )] CHAPTER 3. MULTICANONICAL AND RELATED METHODS 106 (If the range of M is restricted, but there is appreciable canonical probability outside it, then G(; H ) should be replaced by GA (; H ), the free energy of the phase). Just as in section 3.1.1, to estimate this accurately we require the sampled distribution to overlap both P can (M; H^ ) and P can (M; H^ ), and so must use the multicanonical estimator 3.12. If the state at H^ is such that its free energy is known exactly or to a good approximation (e.g. using the virial expansion in the case of a dilute gas), then the absolute free energy at H follows. 3.2 Techniques for Obtaining and Using the Multicanonical Ensemble To travel hopefully is a better thing than to arrive. FROM El Dorado ROBERT LOUIS STEVENSON In this section we shall be concerned with techniques relating to the multicanonical ensemble: we shall discuss various iterative processes for the generation of the coecients xc (we use the vector notation for succinctness and because what we are about to say applies to both (E ) and (M )), and we shall also describe a method that may make implementation of the method ecient on parallel computers. We have devoted particular attention to the development of quick, ecient and reliable methods for nding a usable xc, because the absence of such methods seems to have been the principal obstacle [47, 46] to the wider application of previous non-Boltzmann sampling methods methods such as Umbrella sampling ([95], section 2.2.1). We present and discuss the results we have obtained from simulations using the multicanonical ensemble in section 3.3. The usual approach to multicanonical simulation in the literature [58, 103, 108, 111, 120] has been to divide the application of the method into two parts: the nding of an approximately multicanonical distribution, which is done as fast as possible, followed by a lengthy `production run,' in which a much longer Markov chain is generated without changing the sampled distribution. Only the results of the `production run' are then used in equation 3.7 or its equivalents to estimate the quantities of interest, with error bars coming from (jackknife) blocking. We, too, shall divide the tasks up like this (the results of section 3.3 come from simulations implemented CHAPTER 3. MULTICANONICAL AND RELATED METHODS 107 in this way), though at the end of this introductory section we shall discuss further why and if this division is really necessary. First, then, let us discuss the generation of xc. In a real problem the sampled distribution never will be perfectly at on macrostate space because this would imply exact knowledge of the probabilities P can / exp( F ), which, as this expression shows, are dependent on the very free energies that we are trying to measure. To produce the multicanonical distribution and to measure P can (or F ) are therefore the same problem, and to solve it requires an iterative procedure. We begin with some initial guesses (which may well be = 0, corresponding to a Boltzmann sampling algorithm), and generate a short Markov chain. Generally, this will sample only a small fraction of the macrostates we are interested in. In the sampled region we can make inferences about the underlying sampled distribution from the data. We then use these inferences to generate a new sampled distribution, which will be approximately multicanonical in the sampled region, while outside we increase the sampling probability so that the next data histogram will be a little wider. Then we repeat the process, hopefully getting closer and closer to the multicanonical distribution. Let us formalise this a little. We wish to nd xc approximating to for the Nm macrostates of a system. We shall denote the iteration number by a superscript n and the ith macrostate by a subscript i, i = 1 : : : Nm , e.g. Cin for the nth histogram of visited states (very few expressions contain anything `raised-to-a-power', so this is seldom ambiguous). The macrostates could be of energy or magnetisation. We are going to make inferences about P n , the (unknown) `true' macrostate probabilities in the nth sampled distribution, generated by n , on the basis of `data' gathered by a Monte Carlo (MC) procedure constructed to sample from P n . The data do not determine P n exactly, because of the eect of noise and because, at least at rst, many states are not sampled at all. The best way to try to treat this problem, which implicitly handles the problem of distinguishing the `signal,' due to P n , from the `noise,' is to use Bayesian Probability Theory [14, 158, 159, 160, 161], where probabilities describe our state of knowledge about quantities, so that constant but unknown parameters like P n may be assigned probabilities. In this case, what we obtain from the data is P (P n ), a probability density function of P n . In the `frequentist' interpretation of probability theory, where probabilities have meaning only in so far as they express the expected frequency in a long series of trials, an expression like P (P n ) is not admissible. However, it is now fairly well established [14] that the Bayesian and frequentist formulations make almost exactly the same predictions where both are applicable, CHAPTER 3. MULTICANONICAL AND RELATED METHODS 108 while there are many situations|and, as we shall see, this is one of them|where the Bayesian interpretation is the more powerful. To return to the main thrust of the argument: P (P n ) is determined by the data according to Bayes' Theorem [160], which is, after the nth set of data has been gathered, P (P n j H; D1 Dn ) = R P (P n j H; D1 Dn 1 )P (Dn j H; P n ; D1 Dn 1 ) P (P n ) j H; D1 Dn 1 )P (Dn j P n ; H; D1 Dn 1 )dNm P n (3.13) Here H represents the knowledge, as expressed by equation 3.4 and its magnetic analogue, of how n is related to P n . Dn represents the `data', which consists of either the visited states or recorded transitions of the Markov chain and the set n that produced them. P (P n ) j H; D1 Dn 1 ) is the prior probability distribution of P n before the data Dn have been considered. P (D j P n ; H; D1 Dn 1 ) is the likelihood function, the probability that the observed data are produced given that a particular set of values of the parameters P n holds. To calculate the likelihood we must generally assume a model. P (P n j H; D1 Dn ) is the posterior probability distribution of the parameters P n including the eect of the data Dn . From the posterior p.d.f. we generate estimators P~ n of the true P n . The mean (though it may not in practice be calculable) is one obvious possible estimator; others are the mode and median. The width of the p.d.f. gives us a measure of the uncertainty in the estimator. We p expect this width to be of the order of 1= C i because the stochastic nature of MC sampling p produces uctuations in the histogram C n of size O( C i ), which cannot be distinguished from uctuations of the same size due to the true structure of the sampled distribution. We then use the estimator, however dened, to generate the next sampled distribution. The obvious way to do this (though as we shall see there may be better alternatives) is to put in+1 = in ln P~in + k (3.14) where k is an arbitrary constant, which we choose so that the minimum of is zero. This corresponds to sampling from a distribution P n+1 , obtained by setting CHAPTER 3. MULTICANONICAL AND RELATED METHODS 109 Pin+1 / Pin =P~in (3.15) P n+1 would indeed be exactly multicanonical if P~ n = P n . In practice, we never reach this situation because of the random errors in the measurements, but we can expect P n+1 to be closer to the multicanonical distribution than was P n ; it is shown in [51] that this algorithm converges almost surely. The reader may be wondering why we have written Bayes' theorem for the sampled distribution P n , when P can is our real interest. We do this because it is P n that determines the data Dn , so that the likelihood is `naturally' expressed as a function of P n (as we shall see in section 3.2.1). It is of course possible to write an expression relating P (P can ) to P (P n ): just as in one dimension we would write P (x) = P (y(x))dy=dx so here we can write @Pin P (P can j H; D1 Dn ) = P (P n j H; D1 Dn ) mod @P can j (3.16) where can exp( n ) i Pin (P can ) = PNPmi can P exp( kn ) k=1 k However, that fact that all the Pkcan enter the expression for each Pin , most of them in the denominator, makes the transformation 3.16 algebraically very complex, even if we make simplifying approximations like using a uniform prior and a simple model of the likelihood function (for example one neglecting correlations). Indeed, given that Nm, the dimensionality of the problem, usually is at least O(100), the RHS of equation 3.16 cannot be handled either analytically or numerically and so the transformation cannot be performed exactly (though see section 3.2.2 below). Certainly it is impossible to integrate the function to nd expectation values of P can , etc. Instead we are forced to make our inferences about the sampled distribution P n , obtaining estimators P~ n , and then to use ~n in ) P~ican (P~ n ) = PNPmi exp( n n k=1 P~k exp( k ) CHAPTER 3. MULTICANONICAL AND RELATED METHODS 110 which is the Nm -dimensional analogue of using x(< y >) as an estimator of < x >. Moreover, the same diculty in making a transformation from one set of P 's to another aects us even if we conne ourselves to inferences about P n . On the rst iteration (n = 1) we may have no information about the sampled distribution (which is P can in this case). Therefore it is appropriate to choose a uniform prior, P (P can j H) =constant, in which case the posterior will depend only on the likelihood. On every subsequent iteration, however, we have prior information about P n which comes from the posterior of the previous sampled distribution P n 1 . But to get between P (P n 1 ) and P (P n ) involves just the same transformation as equation 3.16; in terms of the variables P n 1 and P n it is n 1 P (P n j H; D1 Dn 1 ) = P (P n 1 j H; D1 Dn 1 ) mod @P@Pi n j where (3.17) ~n 1 n Pin 1 (P n ) = PN(Pmi ~ n)P1i n k=1 (Pk )Pk Once again the algebraic complexity of this expression prevents its being calculated, and we seem to nd ourselves forced to use a uniform prior on P n at each stage. We thereby discard much of the information from the previous iterations, which inuence the current P (P n ) only by the choice of P n itself (through n ). With a unform prior at each stage, Bayes' Theorem as given in equation 3.13 reduces to P (P n j H; Dn ) / P (Dn j H; P n ) (3.18) (Note that no approximation is involved in rewriting the likelihood function just as P (Dn j H; P n ): D1 Dn 1 are irrelevant to Dn given P n ). What disadvantages result from having to dispense with the informative prior? At least initially (n small) the posterior changes rapidly with n and the new likelihood will be much narrower than the prior over most of the macrostate space (as we start to sample regions we previously had to guess about). In this case it makes little dierence to approximate the prior as uniform and base the inference only on the likelihood function. However, as we converge towards the multicanonical distribution, the sampled distribution changes little between iterations. Thus, if we keep Nc a constant, the prior is as narrow as or narrower than the likelihood, CHAPTER 3. MULTICANONICAL AND RELATED METHODS 111 so we throw away a lot of information by disregarding it, and convergence stops once the difference between the sampled distribution and the `true' multicanonical one has come down to the order of the random uctuations in the likelihood, which are inevitably incurred in the simulation process. Indeed, if Nc is too small a large uctuation may throw us a long way away from P xc. Recently [122] an ad hoc renement of this method has been proposed, which uses the histograms of all previous iterations, each contributing with a weight inversely dependent on the size of local uctuations. The easiest way to skirt this problem, though, is by increasing the length Ncn of the Markov chain generated at each iteration of the convergence process, at rst in order to compensate for the increasing number of sampled macrostates, and later to keep smooth convergence, and to minimise the eect of the waste of previous information. The eventual move to the `production run' can be seen as the nal limit of this. On the other hand, it is clearly true that increasing Nc at lot when we are still some way from a multicanonical distribution wastes computer time that we would like to devote to the production run. There is thus a scheduling problem of deciding on a suitable initial Nc , deciding when and by how much to increase it later, and nally deciding on when to move to the `production run'. This is usually found to need some initial experimenting, and even when a scheme is found that does seem to converge smoothly a certain amount of human monitoring is required, though quite a lot of progress has been made [122] on automating the procedure. By using the Bayesian framework we have started to set up above, we have made some signicant progress in incorporating prior information to stabilise the algorithm and in putting the ad hoc `nding' methods that are employed on a rmer footing. This will be described in sections 3.2.1 and 3.2.2 in the context of inferences made using the observed visited macrostates of the chain as data. However, with this visited-states method, it it may be the case that the `nding' stage inescapably gets increasingly lengthy, for example for large system sizes, so that it consumes a large part of the total available computer time. This problem can only really be alleviated by the choice of a more ecient `nding' algorithm, and we have also contributed here through the development of a method, to our knowledge new, that converges very rapidly to something close to the multicanonical distribution by making inferences based on the observed transitions between macrostates made by the system (section 3.2.3). Before embarking on a description of these techniques, we shall return briey to the question of whether the division between `nding' and `using' the 's is really necessary. This somewhat inelegant strategy is, in fact, forced upon us by a failure of P~ n to be a good estimator of P n CHAPTER 3. MULTICANONICAL AND RELATED METHODS 112 outside the sampled region, and by the algebraic diculties we have in handling expressions like equations 3.16 or 3.17. If these expressions were tractable, and could be integrated, then the whole simulation would become a single continuous process of narrowing P (P can ). Then we would nally need some way of transforming the p.d.f. of P can into a p.d.f. of the required P estimators < O >= i Oi Pican, from which we could nd a mean and standard error (the standard error is particularly problematic as it requires treatment of the correlations between the estimates of the components of P can ). The methods we describe in section 3.2.1 go some way towards a solution of the rst problem, but we do not really tackle the second, though other recent work has addressed closely related matters [116, 119]. Signicant further work is required before the division between the `nding' stage and the `production run' can be removed. 3.2.1 Methods Using Visited States In this section we will consider inferences made from what is perhaps the `obvious' choice of `data,' the visited macrostates of the Markov chain (We shall thus call this the visited states, or VS, method). Suppose that Nc congurations are sampled from the Markov chain, with sampling occurring at wide intervals so that successive congurations are eectively uncorrelated. The likelihood function for the data, the histogram of visited states Cin , will then be multinomial (multivariate binomial) with Nm 1 independent variables P n (The Nm th is determined by the normalisation). If we keep the assumption of a uniform prior, then by Bayes' Theorem (in the form of equation 3.18) P (P n j H; D1 Dn ) = Nm 1 (P n )Cin (1 PNm 1 P n )CNn m i i=1 k=1 k R QN m 1 (P n )Cin (1 PNm 1 P n )CNn m dNm 1 P n i k=1 k R i=1 Q (3.19) P P subject to Ci = Nc, and where the domain of integration R is such that Ni=1m 1 Pin 1. The combinatoric factors that would be present in the multinomial likelihood have disappeared in the normalisation. Several possible estimators of P n can be used. The simplest is the maximum likelihood estimator (MLE), the set of values of P n most likely to have produced the given data. It is dened by the equations @P (Dn jP n ) @P n ~ n = 0 i P MLE These are easily solved for the multinomial distribution, using a Lagrange multiplier for the CHAPTER 3. MULTICANONICAL AND RELATED METHODS 113 normalisation; the result is the intuitively obvious one P~ nMLE = C n =Nc, leading to n+1 = n ln C n + k This updating scheme is used in [98], [120]; the frequentist interpretation of probability used in these references leads naturally to maximum likelihood estimators (the unknown parameters can only be considered as having one `true' value, which is most naturally taken to be the one which is most likely given the data). Clearly the scheme does not work where Cin = 0, which happens extremely frequently in the early iterations (there are many macrostates that the Boltzmann sampling algorithm does not visit). For these states it would imply in+1 = 1. We could try to x this by dening a norm kn+1 n k and setting a bound on the magnitude that it may have (suggested in [51]), or by the method, adopted by Lee [120], of leaving i unaltered (except for the arbitrary constant k) if Ci = 0, i.e. behaving as if Ci = 1 in this case. However, with the multinomial approximation for the likelihood and using Bayesian inference, it is possible to do better than this: we can evaluate the expectation values P~ nAV =< P n >: n P~AV;j = R n QNm 1 n C n RR PjQ i=1 (Pi ) i (1 Nm 1 n C n R i=1 (Pi ) i (1 = (Cj + 1)=(Nc + Nm ) Nm 1 P n )CNn m dNm 1 P n k=1 k PNm 1 n n Cn k=1 Pk ) Nm dNm 1 P P (3.20) (3.21) which leads to in+1 = in ln(Cin + 1) + k; we note that for Cin =0 this gives the same updating scheme as was introduced arbitrarily by Lee. For other states it gives slightly dierent estimators, but the dierence is negligible if Cin is large. The eect of this updating, whether using AV or (appropriately xed) MLE estimators, is to decrease the probability in the sampled region (by a factor of Ci +1) and thus, since the probabilities must add to one, to increase it by a uniform factor in the non-sampled region. However, the true P n almost certainly decreases, as least for a while, as we move away from the sampled region, though we do not know purely from C n whether it decrease monotonically or has other local maxima elsewhere. The extent of this decrease is generally many orders of magnitude, so in the non-sampled region the true value of n except at the very edge of C n , since P~ n eectively P n is much lower than the estimate P~AV;j AV;j assigns one count to non-sampled states. The result of this is that on the next iteration the CHAPTER 3. MULTICANONICAL AND RELATED METHODS 114 sampled region widens slightly, becoming roughly multicanonical (at) over that part that was sampled before, and extending a little further into the wings. The n + 1th histogram tends to have large uctuations in the states that were at the edge of the nth, because the poor statistics at the edges of C n tend to produce an which is inaccurate here. However, these get smoothed away on subsequent iterations. The convergence is fairly smooth (as long as we increase Nc to compensate for the increasing number of sampled macrostates). We should contrast this behaviour with that expected if the true value of P n in the wings is larger than the estimate. We have found that this is likely to happen if, despite having very little evidence about P n far away from the sampled region, we attempt to get faster convergence by tting some function to the part of n+1 that comes from the sampled region and extrapolating it. In that case P n+1 may put a great deal, possibly almost all, its weight in the wings and C n+1 may become separated from C n . Convergence then becomes irregular and awkward, with the latest frequently needing to be discarded and a return made to earlier ones (though linear extrapolations seem to be relatively safe in this regard). We shall discuss this further at the start of section 3.2.3. 3.2.2 Incorporating Prior Information Let us return to the idea of incorporating the prior p.d.f. As we discussed above, using a uniform prior for P n at each stage does not accurately reect the condence we have in P n as a result of the earlier iterations; indeed we do not have any real measure of the error in P n or n . We could evaluate the variance of P n+1 using an expression similar to equation 3.21, but since this implicitly neglects correlations, which aect the variance much more than they do the mean, it is unlikely to be accurate. We have tried to build in at least a measure of the `ideal' iterative scheme in which P (P can ) is narrowed continually. To do this we have avoided the diculties of changing variables (see equation 3.16 et. seq.) by using a dierent data-driven (but still Bayesian) strategy to estimate the the p.d.f. of the transformed variable of interest. Rather than looking at P n , or even P can , let us consider P ( ), (we shall justify this choice below) where , introduced in section 3.1.1, is that set of preweighting coecients that would give an exactly multicanonical distribution: P / exp( F + ) = constant CHAPTER 3. MULTICANONICAL AND RELATED METHODS 115 i.e. = F + constant = ln(P can ) + constant We do not use the notation xc since this also embraces all the `nearly multicanonical' distributions. Bayes' Theorem now becomes 1 n 1 n n 1 1 P ( j H; D1 Dn ) = R P ( j H; 1D nD 1 )P (nD jH; ; 1D nD 1 N) m (3.22) P ( j H; D D )P (D j ; H; D D )d We now choose to model the p.d.f of as a multivariate Normal distribution with the covariance terms set to zero1, i.e., after the collection of data Dn , P (i j H; D1 Dn ) exp[ (i in+1 )2 =(2(in+1 )2 )] It is our desire to use a Normal model that has led us, on phenomenological grounds, to consider rather than P can ; the latter will have an asymmetric p.d.f. (because of the constraint Pican > 08i) which will not be well approximated by a Gaussian. The same applies to P n . We expect that P ( ) will be better-behaved. We have (n + 1) superscripts on the parameters n+1 and n+1 because n+1 , being the mean of P ( j H; D1 Dn ), is the set of weight factors that we shall use to generate the (n + 1)'th sampled distribution. This parameterisation takes care of the prior and posterior in equation 3.22; however the likelihood function still remains. We cannot transform a multinomial likelihood (expressed naturally in terms of P n ) without encountering the diculties of equation 3.16. To proceed, then, we estimate the p.d.f. of by jackknife blocking the data histogram at each iteration, nding the expectation value of P n for each block, then transforming these expectation values to give a series of estimators of the variable of interest, , whose distribution outlines the shape of P ( ) 1 this is an ad hoc choice chosen to make the problem computationally more tractable: it will cause us some diculties below CHAPTER 3. MULTICANONICAL AND RELATED METHODS 116 To see in detail how this works, we rst note that P (Dn j H; ; D1 Dn 1 ) = P (Dn j H; ) because, (as we stated when discussing equation 3.18), D1 Dn 1 are irrelevant to Dn given . Thus, we can apply Bayes' theorem once more: P ( j H; Dn ) = P (Dn j H; )P ( j H)=P (Dn j H) evidently we may take P ( j H) to be uniform, which gives P (Dn j H; ) / P ( j H; Dn ) n c . We We model P ( j H; Dn ) by another Normal distribution, parameterised by ^ n and estimate the parameters simply by jackknife blocking the recorded histogram into m = 1 NJ , NJ = O(10) pieces, generating a n;m from each block, and measuring their mean ^ n and n c . Thus we avoid the change-of-variable problem. variance Putting all this into equation 3.22, we arrive at n c )2 )] exp[ (i in+1 )2 =(2(in+1 )2 )] / exp[ (i in )2 =(2(in )2 )] exp[ (i ^in )2 =(2( i which implies and n c ) 2 (in+1 ) 2 = (in ) 2 + ( i (3.23) n c ) 2 in+1 = (in+1 )2 [(in ) 2 in + ( ^in ] i (3.24) Thus, n+1 is an average of its previous value, n , and the simple estimate of its new value obtained from equation 3.14, ^ n , with the two combined according to their estimated variance. The variance itself is always reduced, as we would expect since we are adding new information. This is a more systematic way of trying to build in the results of previous iterations than one based simply on the magnitude of Cin , as is given in [122], and it is likely to be more accurate because the eect of correlations in the sampling process is implicitly included. Note, moreover, CHAPTER 3. MULTICANONICAL AND RELATED METHODS 117 that the basic idea|that of creating a `Monte-Carlo sample' from the unknown p.d.f.|is not bound to any particular algorithm for sampling P n or to any model for the likelihood, so we can apply it to other methods too. Having said this, we have also found that there are several caveats attached to its use in practice, as we shall now describe. First, in order to bootstrap the technique we assume 1 is large so that 2 depends only on D1 . We also have to be careful with our policy of having the normalisation min() = 0. This normalisation should not be applied to each n;m individually, since if min( n;m) falls at the same macrostate each time, we would otherwise estimate a variance of zero for this state. However it is best to set min(^n ) to zero before combining it with n , so that over those regions that both sampled the two 's will be approximately equal. n c (see below). This reduces the eect of uctuations in n c Moreover, the estimate of is itself subject to fairly large random errors, which need careful treatment or they will spoil the estimator. Suppose that over some range of states ^ n and n are separated fairly widely. Then, as we see from equation 3.24, random uctuations in n c (and indeed in n ) will pull the estimator n+1 back and forth between n and ^ n . We can thus end up with a n+1 that is far less smooth than either n or ^ n . This is presumably a consequence of neglecting the covariance terms in the Normal model of the p.d.f. of ; including them would serve to force some smoothness on the function as a whole. However, to do this would make the updating process much more complicated and time-consuming, because equations 3.24 and 3.23 would become matrix equations involving matrix inversion. Therefore n c by locally in practice we have adopted an ad hoc solution to the problem. First, we smooth averaging it. This is found to alleviate the problem in the region where both the n 1'th and n'th iterations produced counts. However, we must also treat those regions where neither iteration produced counts, and where the n 1'th iteration did not but the n'th did. At rst, we tried simply assuming some arbitrary large variance in unsampled states (given that we adopt the technique of averaging the n;m without setting the minimum to zero, we produce an n c of zero in the unsampled states, which clearly needs some xing). This was found estimated i to work well in the `newly sampled' region where the n 1'th iteration does not produce counts but the n'th does, producing a n+1 which depends almost entirely on ^ n , as we would desire. However, in the region which is still unsampled, this scheme leads to n+1 = (n + ^ n )=2, since n c . We thus found that n+1 tended to increase up to the edge of C n , and then fall ni = back to a lower value again. This severely slowed down the desired spreading of the sampled CHAPTER 3. MULTICANONICAL AND RELATED METHODS 118 region. The problem was solved by recording the location of the edge of C n at each stage, and putting n+1 = ^ n in both the newly-sampled and unsampled regions (i.e. anywhere past the edge of C n 1 ). Although it is found that using the average over m of n;m as ^ n gives perfectly adequate convergence, it is found that, with the same total time devoted to each iteration, it spreads into the non-sampled region a little more slowly than does the `nave' scheme of simply using a uniform prior at each stage and updating with equations 3.14 and 3.21 alone. This is because min(Ci + 1) = 1 is a larger fraction of the total counts in one jackknifed histogram than it is of the total counts in the single histogram of the nave method: C n;m contains (NJ 1) NAV 0 = NJ NAV . The maximum change in that may occur is counts while C n contains NAV therefore a little larger in the nave case. To achieve the same spreading rate we could spend NJ =(NJ 1) more time on the method with prior, but in fact we choose to use ^ n n;0 , which is the estimator dened on all the pooled data from the n'th iteration (and thus identical to the nave estimator). The eect of the use of the prior is thus only on the way that ^ n is combined with previous estimators of the weighting function. To summarise, then, the method as we have applied it is: 0. Start with 1 = 0, and estimate 1 to be some large constant. 1. record m = 1 : : : NJ = O(10) histograms Cin;m 2. for each one, set in;m = in ln(Cin;m + 1) n c from the n;m . 3. calculate 4. calculate ^ n n;0 dened on all the data from the n'th iteration. 5. calculate n+1 and n+1 using equations 3.24 and 3.23, with the caveats mentioned above (use only ^ n in the regions where only iteration n + 1 samples, or where no iteration has yet sampled). 6. if the distribution does not extend over all the macrostates of interest, then return to 1; otherwise stop. To illustrate this iterative scheme we shall examine the energy preweighting of the L = 8 and L = 16 Ising model starting from a canonical simulation at = 0:55. We wish to extend CHAPTER 3. MULTICANONICAL AND RELATED METHODS 119 the sampled distribution up to the states characteristic of = 0 to use the method of section 3.1.1 to nd G. This is only for purposes of demonstration, since with inverse temperature = 0:55 the ground state already has signicant weight for these small systems, indeed is the most probable macrostate for L = 8, so G(0:55) could be found directly. To begin with, let us consider the L = 8 system in the case where NJ = 6 histograms are gathered at each iteration, and we choose Nc such that NAV = 50, where NAV = Nc =Ld = the number of counts expected per sub-histogram per energy state in the multicanonical distribution, sampling once per lattice sweep. In gure 3.2 we show the rst three histograms (in fact the average over the six produced at each stage) and the weighting functions 2;3;4 that they produce ( 1 = 0). In the gures, we plot the estimates of produced at the end of every iteration, labelled with their iteration number. We show only the bottom half of the energy range, up to the state E2L = 0 which corresponds to the mean of Pcan =0 (E ). At higher energies, we use (E ) = (E = 0) + E , which ensures that the multicanonical distribution matches the shape of Pcan =0 (E ) but does not extend to very high energy states, which have signicant canonical weight only for antiferromagnetic spin-spin interactions. As we see, the rst histogram extends up to E 64. is quite well determined immediately over most of this range (there is very little dierence between 2 and subsequent 's). It becomes rather less accurate around E = 64 (because very few visits to these states were recorded so the fractional uctuations are larger), and at higher energies is changed from its original value only by the constant we add to keep min = 0. The second histogram is then roughly at, as we expect for a multicanonical sampled distribution, up to about E = 64, then it falls to zero at E = 32. There are some uctuations around E = 64, with some states having appreciably more than their multicanonical probability, due to the uctuations in the tail of the previous histogram having passed into the weighting function. As we might expect 3 therefore extends up to E = 32 before going at; it is now extremely close to the true for 128 < E < 64, where the second histogram obtained good statistics in all states, and fairly close it up to E = 32, though once again with larger uctuations where few counts were recorded. The third histogram C 3 , is then more or less at up to E = 32, the point where C 2 cut o, and then extends up to E = 16, but again the last few states have poor statistics, so 4 is not well-determined for them. The process can clearly be repeated, extending to higher and higher energies, until n has stopped changing apart from small random uctuations (which 2 CHAPTER 3. MULTICANONICAL AND RELATED METHODS 120 600 histogram 1 histogram 2 histogram 3 400 C 200 0 -128 -96 -64 -32 0 -32 0 E 30.0 η2 20.0 η3 η η4 10.0 0.0 -128 -96 -64 E Figure 3.2. Initial Convergence (rst three iterations) of the weighting function n using the visited states method for Ising = 0:55; L = 8. The upper gure shows the histograms C produced at each stage and the lower the resulting 's. is the point where we would move to a longer run with a xed in the `nding|production' scheme). We show the later progress of the iterative scheme (iterations 1{7 , 10 , 15 and 20) in gure 3.3. It is apparent that, at least on the scale of the whole gure, convergence has occurred by the fth iteration (we have produced a usable xc ). Examination of the inset shows that the uctuations in are small after this, though it is not clear that convergence continues. Now let us examine how using the Normal model of with an informative prior compares with simply using a uniform prior at each iteration, updating n using equation 3.21 alone. In gure 3.4 we show the results of such a `nave' run. We used NAV = 300 at each stage so that CHAPTER 3. MULTICANONICAL AND RELATED METHODS 121 30.0 12.0 20.0 η 5 4 11.0 η 3 10.0 -50 -49 -48 -47 -46 -45 E 10.0 0.0 -128 2 -96 -64 -32 0 E Figure 3.3. Main gure: convergence of the weighting function n using the VS method for Ising = 0:55, L = 8, NAV = 50, 6 blocks per iteration, iterations 1{20. Inset: detail of 50 < E < 45. the same total time would be spent on each iteration as was the case before. It is apparent from the main gures that there is nothing to choose between the two methods as regards their speed of convergence|the speed with which they move into the unsampled region; this is as we would expect, because the `tweaks' we have given the informative-prior method have rendered it almost identical to the nave method over for these states. The dierence only begins to become apparent when examining the insets. If a uniform prior is used, uctuations in in the sampled region are larger and persist, rather than dying away as they do in gure 3.3. This is shown more clearly in gure 3.5, where we have plotted the dierence between n and final for the E = 8 macrostate. Thus, our method of incorporating prior information yields improvements in the the smoothness of convergence and goes part of the way to removing the necessity of switching to a `production' run, since the error in becomes continually smaller even though we continue updating it, implying a convergence of our knowledge of F . We should note that the normal model is only accurate when we are already close to the multicanonical limit|it dramatically underestimates in the early iterations, because the uctuations in n;m are reduced in size by the updating using ln(Ci + 1) and do not reect the real uncertainty in in the non-sampled region (though this is unimportant for CHAPTER 3. MULTICANONICAL AND RELATED METHODS 122 30.0 5 12.0 20.0 η 4 11.0 η 3 10.0 -50 -49 -48 -47 -46 -45 E 10.0 0.0 -128 2 -96 -64 -32 0 E Figure 3.4. Main gure: convergence of the weighting function n using the VS method for Ising = 0:55, L = 8, NAV = 300, 1 block per iteration, uniform prior used in updating. Iterations 1{20. Inset: detail of 50 < E < 45. updating, since we do not make use of in this region). A similar pattern emerges in the L = 16 case. The sampled distribution widens gradually and fairly smoothly, extending to higher and higher energies, until the multicanonical distribution is reached. Once again, we present results for iterative schemes using both a normal model for (gure 3.6) and a uniform prior (gure 3.7). In the latter case uctuations in n persist, but if the prior is incorporated they die away once we are close to the multicanonical limit, just as in the L = 8 case. This is shown by the insets in gures 3.6 and 3.7, and (more clearly) by the approach of (E = 112) to its nal value, plotted in gure 3.8. However, the most serious disadvantage of the visited-states method, its slow convergence for all save very small systems, is now becoming apparent. The most that any i can change by in any one iteration is max = maxi (in+1 in ) = ln(maxi (Ci )); thus the greatest possible change is max = ln(Nc ), and more typically max ln(NAV ). This is not a large change considering that generally scales like Ld, and the method becomes tedious even for simple systems like the Ising model when the range of to be covered is > 100. The use of an informative prior does not help because the problem is the extension of n into the unsampled region, where we do not have any prior information to incorporate. We see that with L = 16 we require 15 CHAPTER 3. MULTICANONICAL AND RELATED METHODS 123 20.0 NAV=300, one block/iteration NAV=50, 6 blocks/iteration 0.4 0.3 η final − ηn ηfinal − η n 15.0 10.0 0.2 0.1 0.0 -0.1 -0.2 -0.3 5.0 -0.4 5 10 15 20 Iteration n 0.0 5 10 15 20 Iteration n Figure 3.5. final n for for E = 8 using the VS method for Ising = 0:55, L = 8, NAV = 50, 1 and 6 blocks per iteration, uniform (line) and normal (triangles with error bars) priors used in updating. Insert: detail of later iterations. iterations to get as close to as we were after 4 iterations of the L = 8 system, while we abandoned an application of the method to an L = 32 system, which was not near convergence even after running for several days on a workstation. We shall now make some remarks on the scaling of the method with system size, and on how we should choose NAV . The analysis above shows that in 2 iterations the maximum change in we expect to produce is 2 ln NAV , while if we use the same time to do just one iteration, then we shall produce a max of only ln 2NAV . Thus, if this were the only consideration, the fastest convergence would be produced with NAV = 2. However, this neglects the eect of random errors, which we must be able to distinguish with sucient accuracy from uctuations in the histogram due to the true structure of the underlying sampled distribution. Even for p the uncorrelated case, the random errors will generally be of size NAV . We thus need as a bare minimum an NAV which is large enough that NAV1=2 1, which explains our choice of NAV = 50. In fact, we nd that convergence is just about maintained, for the very small L = 8 system and using prior information, with NAV = 10, but NAV = 2 is impossibly small. For larger systems the lower limit on NAV is enforced by the requirement that the simulation must make several random walks over all the macrostates accessible to it, even though in a single spin CHAPTER 3. MULTICANONICAL AND RELATED METHODS 124 120.0 54.0 100.0 12-21 52.0 80.0 η η 9 15 50.0 48.0 60.0 8 46.0 -160 -150 -140 10 -130 -120 E 40.0 5 20.0 2 0.0 -512 -384 -256 -128 0 E Figure 3.6. Main gure: convergence of the weighting function n using the VS method for Ising = 0:55, L = 16, NAV = 50, 6 blocks per iteration, iterations 1{22. Inset: detail of 160 < E < 120. update it is only able to move from E to E 1 or E 2. The number of accessible macrostates is of the order of Ld, at least when we have moved some way towards the multicanonical distribution, so a simple random walk argument implies Nc L2d and therefore NAV Ld . Moreover, since Ld, we shall require O(Ld =d ln L) iterations. Neglecting logarithmic corrections, the total time for this method to converge thus scales like L3d = L6 for the 2d Ising model. To summarise, then, this simple method of producing the sampled distribution provides `slow but sure' convergence which is suitable for small systems where the weighting function varies over only a few tens of orders of magnitude. Our Bayesian analysis of the problem has claried the procedure of updating the sampled distribution, and enables us to combine information from dierent iterations the sampled region, but does not serve to speed up the slow convergence of n to . It is interesting to note, particularly in relation to the slowness of convergence, that our equation 3.21 was rst derived by Laplace in 1775. It has apparently become `disreputable' (for which reason, perhaps, it is not to be found in [158, 159, 14]), precisely because of the high probability it assigns to events that are known to be possible but not observed (which leads in our application to the slow spreading into the non-sampled region). In certain applications CHAPTER 3. MULTICANONICAL AND RELATED METHODS 125 120.0 54.0 100.0 52.0 80.0 η η 9 10-21 15 50.0 48.0 8 60.0 10 46.0 -160 -150 -140 -130 -120 E 40.0 5 20.0 2 0.0 -512 -384 -256 -128 0 E Figure 3.7. Main gure: convergence of the weighting function n using the VS method for Ising = 0:55, L = 16, NAV = 300, 1 block per iteration, uniform prior used in updating. Iterations 1{22. Inset: detail of 160 < E < 120. this can lead to counter-intuitive predictions. We became aware of this (after completing the work of this chapter) through [162], in which a dierent, though still Bayesian, formulation of the problem (starting with a uniform prior on all `strings' of congurations of length Nc rather than on the unknown state probabilities) is used to arrive at a result which resolves some of the counter-intuitive cases, and performs demonstrably better in a test on data-compression. This result is identical to equation 3.21 in the case where all macrostates are visited but generally gives a much smaller probability to unvisited states; in our case we would assign P~ n (E ) = O(1=Nc2 ) where (C (E ) = 0). This would lead to a maximum change in i of ln(Nc2 ) = 2 ln(Nc ), so convergence (if it remained uniform) would require about half as many iterations. However, it also appears that this choice might well lead to an overestimate of P n+1 in the states just past the the edges of the histogram C n , with consequent slowing of convergence. In any case, the poor scaling with L of the time for convergence would remain the same. Another method is still required for all but small systems. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 126 60.0 NAV=300, one block/iteration NAV=50, 6 blocks/iteration 1.5 40.0 η final - η η final - η n 1.0 20.0 0.5 0.0 10 15 20 Iteration n 0.0 0 5 10 15 20 Iteration n Figure 3.8. final n for E = 112 using the VS method for Ising = 0:55, L = 16, NAV = 50, 1 and 6 blocks per iteration, uniform (dashed line) and normal (triangles with error bars) priors used in updating. Inset: detail of later iterations. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 127 3.2.3 Methods Using Transitions It is apparent that the major problem with the above method of evolving the multicanonical distribution is that large areas away from the central peak of the Boltzmann distribution are initially not sampled at all and we cannot make reliable inferences about them; for example we have seen that assuming a multinomial form for the likelihood and evaluating < P n > leads to an assignment of a constant probability in the non-sampled region, when it is clear physically that P n will decrease for at least some distance away from the sampled peak. (There may be other peaks lying some distance away.) Convergence is thus rather slow. We can try to increase the speed of convergence by tting a function to P~ n or to n+1 from the sampled region and extrapolating it into the wings of the distribution. This way we make some use of our knowledge about the likely shape of P n that is unused by the VS method of section 3.2.1. However, as we discussed there, the distance of extrapolation cannot be made very large because of the danger of heavily underestimating P n in the unsampled region. If this happens, the next sampled distribution may then put almost all its weight in this region, and another extrapolation, if made with a uniform prior and extending into the originallysampled region, will then result in the loss of all the information that we have built up there. Convergence therefore becomes irregular and awkward, with the latest frequently needing to be discarded and a return made to earlier ones. Linear extrapolation is fairly safe, because P can will usually have a negative second derivative and so will be smaller than P~extrap (thus n+1 < ). It still needs to be combined with constraints on the distance of extrapolation, though, to avoid problems with subsidiary maxima in other regions of macrostate space (where P can 's second derivative is not negative). There is still some danger of overestimating because the gradient must be estimated from the points near the edge of the sampled region, where statistics will inevitably be poor. The prescription suggested in [108] is to choose some cut-o state near, but not too near, the bottom of the sampled histogram and extrapolate from there, so that we are fairly sure that P n+1 will not be too large. The choice of the cut-o can either be made by hand [108] or automatically from the size of the histogram, combined with a strategy for `retreating' from a bad extrapolation [122]. Linear extrapolation with various ad hoc constraints has been successfully applied by several authors [108, 98, 130] and found to improve appreciably the speed with which methods based on visited states extend the region that they sample. Despite the good performance of some extrapolation methods, we shall here describe the CHAPTER 3. MULTICANONICAL AND RELATED METHODS 128 results of a dierent approach to the problem. It would be very appealing if, instead of simply trying to make better inferences about states we have not sampled, we could sample all parts of the macrostate space immediately. We have developed a method to do this by using inference based on the recorded transitions of the Markov chain, rather than the visited states. We shall call this the transition probability (TP) method. To our knowledge this method is new (it is not to be confused with Bennett's acceptance ratio method [84]), though in very recent work on an expanded ensemble-like simulation by Kerler et al. [133] it is also recognised that the observed transitions oer useful information about the sampled distribution. Initially the system is prepared in a macrostate with a low canonical probability, such as the ground state. Unless we have already reached a multicanonical distribution, this state has extremely low probability. When we make MC updates, the system therefore moves away from this state until it is moving randomly among its equilibrium macrostates for the present sampled distribution. The process resembles the equilibration of a normal simulation. At each MC step we record in a histogram Cijn the transition performed between energy macrostate i before the step and the macrostate j after it. (Rejected trial moves and accepted moves that do not change the macrostate are recorded alike in Ciin .) We then repeat the process, if necessary, for a release from an unlikely state at the other end of the macrostate space. Then the entire procedure is repeated until the array of recorded transitions is reasonably full. We used an ad hoc criterion based on a parameter NTP to decide this: C n is deemed full when Ci;in +1 + Cin+1;i NTP for all i. Now suppose the transition probabilities between macrostates in the sampled distribution are nij , i.e. nij P (i ! j j i; P n ). We can use Cijn to give an P estimator of this. The maximum likelihood estimator is ~nMLE;ij = Cijn = j Cijn , which as before needs xing if Cijn = 0 for macrostates between which transitions are allowed. We preferred to use ~nAV;ij , for which (by a calculation similar to that in equation 3.21, based on a uniform prior on nij for allowed transitions) it can be shown [163] that ~nAV;ij = (Cijn + 1)= X (Cijn + 1) (3.25) j This expression requires no xing for the case Cijn = 0, though like 3.21 it is obtained by assuming a uniform prior for nij and so does not contain information from earlier iterations. Before we proceed to obtain an estimator of P n itself we shall discuss the circumstances under CHAPTER 3. MULTICANONICAL AND RELATED METHODS 129 which is legitimate to consider the matrix of transitions between macrostates as describing a Markov process, and how it can be related to the matrix of transitions between microstates, which is the true determiner of the microscopic dynamics of the system and thus, ultimately, of the dynamics of the transitions between macrostates. Transitions Between Macrostates as a Markov Process Let us label the microstates with r; s and the macrostates with i; j , so we write Prn and Pin ( P n ) to mean the equilibrium state probabilities under the prevailing sampled distribution2 . Thus Prn = (1=Z ) exp[ E (r) + n (r)] and Pin = (1=Z ) exp[ Fi + in ] The particular set of microstates in macrostate i we shall write as r 2 i. We assume that the macrostates partition the microstates exhaustively and uniquely. Then the transition matrix for the macrostates is, at time t, nij (t) = P (i ! j ji; n )(t) = XX = XX r2i s2j r2i s2j P (r ! sjr; n )P n (rji)(t) P n (rji)(t)nrs where nrs is the transition matrix for the microstates, which is not time-dependent. We would like nij to dene a simple (non time-dependent) Markov process. In general, this P will only happen if s2j nrs =constant 8r 2 i. In that case it does not matter what P n (rji)(t) is, and we have what is called a mergeable process [164, chapter 1.4]|the microstructure of each macrostate is completely irrelevant to the behaviour of the macrostates. In our case this condition will not be satised. However, let us suppose that the underlying process is in a sort of `local equilibrium' in the sense that P n (rji) is constant with time at its equilibrium value (given by the Boltzmann distribution, since does not aect the relative equilibrium 2 Note that this is dierent from the notation used in section 1.2.2, where i and j were used for microstates. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 130 probabilities within each macrostate). Then we can regard nij as dening a Markov process, and we get simply P n (rji)(t) P n (rji) = Prn =Pin and nij = XX r2i s2j (3.26) P n (rji)nrs (3.27) We shall discuss the extent to which equation 3.26 is satised later. Assuming it for now, and applying the detailed balance condition that nrs is known to satisfy to equation 3.26, we then obtain nij = XX r2i s2j n P n (rji) PPsn nsr r Psn nsr = P1n i r2i s2j XX Pn = Pjn nji i (3.28) where in the last line we have used nji = X s2j P n (sjj ) X r 2i nsr = P1n XX j r2i s2j Psn nsr So if nrs obeys detailed balance, then so does nij . But this then necessarily means that P the equation Pin = j Pjn nij is satised, so that the equilibrium distribution P n , is the left eigenvector of the transition matrix n . Thus we have proven the following: Firstly, that if the probability distribution of microstates within each macrostate is constant with time, transitions between macrostates can be regarded themselves as dening a Markov process, determined by the transitions between the microstates (which we shall call the underlying Markov process). Secondly, that if the above holds and the probability distribution of microstates within each macrostate is the same as it would be in the equilibrium distribution Prn , then the Markov process of the transitions between macrostates has as Pin as its stationary distribution. We have checked these results explicitly for small matrices, binning some of the states and conrming that the equilibrium state probabilities CHAPTER 3. MULTICANONICAL AND RELATED METHODS 131 change in the way expected. We may dene the macrostates as any exhaustive and unique partitioning of the microstates. As we shall show, this result has many useful consequences: we may use the theory of Markov processes [164] to relate the matrix n to, for example, the mean and variance of the number of visits to each state in a run of a particular length. Important results relating to the expected error of sampling from the Markov chain then follow (see section 3.4). This analysis cannot be applied directly to the transition matrix for the microstates, because it is too large, but the eective transition matrix for the macrostates will usually be a manageable size: for the energy macrostates of the Ising model, with single-spin-ip Metropolis, its `natural' form is pentadiagonal of size Ld Ld, and it can be reduced further in size by binning the macrostates, which also makes it tridiagonal. In contrast, the microstate matrix has size 2Ld 2Ld . What of the validity of the approximation made in equation 3.26, that P n (rji)(t) may be replaced by its equilibrium (Boltzmann) value? To the extent that the macroscopic variables (the macrostate labels) are the slowest to evolve, one may expect this approximation to be reasonable, and to improve as P xc is approached, where the simulation moves in an increasingly diusive, less directional fashion which gives time for relaxation of P n (rji). We have found good evidence (see section 3.2.5) that the approximation is indeed essentially exact in the multicanonical limit. Moreover, we emphasise that, whereas equation 3.21 was obtained by using a model that is in fact known to be wrong (multinomial distribution of counts, each bin independent of the others), equation 3.25 assumes only that each transition is independent of the preceding ones. This is indeed true; it is just the denition of a Markov process, and we have just shown that the real simulation in equilibrium is described by a Markov process with transition matrix nij . It might be argued that the approximation will be less good in the early iterations, where the system will for at least part of the time be moving rapidly from a release state with a very low P can to more probable states. However we have found in practice, as we shall describe below , that even the rst iteration often gives a surprisingly good estimate of P n . Now, to return to the task of estimating P n , it is clear that having used equation 3.25 to nd ~nAV;ij , we may use P~ nE , the = 1 eigenvector of its transpose, as an estimator of P n . CHAPTER 3. MULTICANONICAL AND RELATED METHODS 132 Although the transition matrix n is Nm Nm , it may be (indeed, should be) chosen sparse, or even tridiagonal, by binning the macrostates and/or choosing the matrix R (introduced in section 1.2.2) to prohibit transitions between widely separated energies (which are in any case very unlikely to be accepted). If this is done, it takes only O(Nm ) operations to nd the eigenvector. Indeed, if n is tridiagonal, it is trivial to nd P~ nE using equation 3.28: we neglect n = 1. Then we use P~ n n normalisation initially and take P~E; 1 E;i+1 = P~E;i ~i;i+1 =~i+1;i to generate P n = 1. In the rst one or two iterations, all the others successively, and nally impose i PE;i when P n still diers substantially from P xc, it may be necessary to work with the logarithms of P n to prevent arithmetic overow, and it is necessary to generate P n in the direction in which it is increasing to prevent the buildup of rounding errors. Thus we should start from the release states, which were chosen because of their low P can, and iterate to the equilibrium states. As in section 3.2.1, it is possible to incorporate prior information from previous iterations, combining the latest estimate with the previous one using the variance as a weight. However we have found that this slows down the very rapid initial convergence that is the main advantage of this method, and is only of advantage near to the multicanonical limit, where, as we shall see, the visited-states method is probably preferable. Therefore we update the preweighting coecients using the simple expression n+1 = n ln P~ nE + k of equation 3.14. The procedure is thus: 0. start with 1 = 0. 1. record histogram Cijn (a) release simulation from `unlikely' macrostate. (b) perform several thousand spin updates of each. (c) go to 1b until simulation has moved to equilibrium, or until it is moving only through macrostates that have all been visited enough times. (d) go to 1a, choosing a dierent release state if necessary, until all macrostates have been visited enough times. 2. estimate transition matrix using equation 3.25. 3. estimate eigenvector P~ nE . CHAPTER 3. MULTICANONICAL AND RELATED METHODS 4. set 133 n+1 = n ln P~ nE + k 5. if procedure has not converged, go to 1; otherwise stop. We have tested this method on the 2d Ising model with preweighting both of energy and magnetisation; we describe the results for energy rst. We used two release states, the ground state at E = 2L2 (spins all up or all down with equal probability) and the innite-temperature states around E = 0 (where we simply generated a starting state with each spin randomly up or down). Simulations launched from these states covered complementary parts of macrostate space, those coming from E = 2L2 approaching `nishing' states (around < E > for small n) from below and those from E ' 0 approaching them from above. The iterative scheme outlined above was therefore modied to run with the step 1b alternated for the two simulations, so that we could use the crossing-over of the simulations as a criterion of their having `moved to equilibrium.' To keep the matrix ~ tridiagonal, we blocked the E -macrostates so that the width of each block was E = 8 (each blocked macrostate except the lowest thus containing two of the underlying macrostates). The parameter NTP was set to 600 for L = 16 and 1200 for L = 32, which meant that each iteration took a rather shorter time than a single iteration of the visitedstates method (since we are now counting spin ips not lattice sweeps). We should note that the VS method performs more than twice as many spin updates per second as the TP method does, because in the TP method the data histograms must be updated at every spin ip, the spins for updating must be chosen at random, and we must check for nishing and re-initialise the lattice more frequently. In the visited-states (VS) method, the fundamental update step is a complete lattice sweep rather than a single spin ip, and so there is less bookkeeping and two calls to the random number generator per ip are saved. For this reason the TP method would not, in normal circumstances (cf. section 3.2.5) be a candidate for use in the `production' stage of an Ising simulation, even if it were not for its other disadvantages (see below). The results for the convergence of n are shown in gures 3.9 and 3.10. It is apparent that the shape of is outlined, albeit with quite a lot of noise, right from the rst iteration using this method. The superiority, at least in the early iterations, that this method has over the VS method is demonstrated more clearly in gures 3.11 and 3.12 where the dierence between n (E = 0) and final (E = 0) is plotted for both methods ( final CHAPTER 3. MULTICANONICAL AND RELATED METHODS 134 120.0 100.0 80.0 η 5 4 60.0 3 40.0 2 20.0 0.0 -512 -384 -256 -128 0 E Figure 3.9. Convergence of the weighting function n, n = 2 20, using the TP method for Ising = 0:55, L = 16. is established using a long visited-states run for L = 16 and nite-size scaling [see section 3.2.4] followed by a long visited-states run for L = 32). Moreover, the advantage of using the transition probabilities clearly increases with increasing system size; for L = 32 it produces a usable weighting function after about fteen iterations (about 1 hour), while extrapolation of the VS results suggests that it would take at least ten times as long (probably appreciably more, since in the few early iterations that were performed far fewer than Ld macrostates were sampled). We can see why the TP method converges so much faster by considering the maximum change in n that we can expect to produce in one iteration. Suppose we make NR releases from one of the unlikely starting states in the course of one iteration. Then the maximum dierence in the estimated probability of two adjacent states, i and i + 1, would arise if every one of the simulations followed a trajectory that took it from i to i + 1 and then on to i + 2, etc, never returning to i. In this case we would have Ci;i+1 = NR , Ci+1;i = 0, and we would estimate ~ni;i+1 P~in+1 ~ni+1;i = P~in = NR + 1 CHAPTER 3. MULTICANONICAL AND RELATED METHODS 135 450.0 400.0 350.0 300.0 η 250.0 4 3 200.0 150.0 2 100.0 50.0 0.0 -2048 -1536 -1024 -512 0 E Figure 3.10. Convergence of the weighting function n, n = 2; 3 12, 14, 16, 18, 20, 25, 30, 35, using the TP method for Ising = 0:55, L = 32. +1 n+1 = ln(NR + 1). But a change in the dierence of of this which would mean that in+1 i magnitude can now be produced between every pair of states in the chain, so that the total available is max = Nm ln(NR + 1) Ld ln(NR + 1). Thus even for a fairly small NR , 10{100 say, the method is able at least in theory to converge on the rst iteration no matter what the system size. The time taken per iteration should also increase like Nm Ld. In practice the method does not converge on the rst iteration, and there is clearly a small residual bias remaining even after many iterations|the weight necessary to reach E = 0 is overestimated. We shall discuss these two problems in turn. The rst problem is largely due to the blocking of the macrostates, which compromises the assumption underlying equation 3.26, that a local equilibrium is maintained within each (blocked) macrostate. For this to be true, all degrees of freedom withing the macrostate must relax on a faster time scale than that characterising the transitions between the macrostates, which is clearly not the case since the blocked macrostates now contain dierent values of energy. In fact, it is not hard to show that fewer transitions occur in the the direction (through macrostate space) in which P n is increasing, and more occur in the opposite direction, than CHAPTER 3. MULTICANONICAL AND RELATED METHODS 136 100.0 80.0 η final −η n Transition Probability Visited States 60.0 40.0 20.0 0.0 0 10 20 30 Iteration n Figure 3.11. L = 16: Convergence of the weight n(0) as a function of iteration number for both the TP method and the VS method. The ordinate shows the dierence between and final , where final is the limiting behaviour of the VS method. would if local equilibrium were established within each blocked macrostate. This result follows from considering the fact that transitions that cross the boundary between blocked macrostates are more likely to come from underlying macrostates near the boundary. The upshot is that P~ nE continually underestimates the change in n required to reach . This explains a large part of the behaviour of the method, though in the early iterations it should be noted that there is a particularly large underestimate of the weight required in states around E=Ld ' 1:4. We attribute this to the complex behaviour of the system near the critical point, where the typical canonical congurations have more or less these energies. In the early iterations, when the system moves ballistically down to these states from E = 0, there is not enough time for relaxation of the large clusters typical of criticality. In later iterations, when the motion is more diusive, there is time for this relaxation, equation 3.26 is better satised and the anomaly in n disappears. We do not have such a full understanding3 of the bias in the limiting behaviour of P~ nE ; it 3 though the analysis of section 4.3.3 may well be applicable here too. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 137 400.0 Transition Probability Visited States η final −η n 300.0 200.0 100.0 0.0 0 10 20 30 40 Iteration n Figure 3.12. L = 32: Convergence of the weight n(0) as a function of iteration number for both the TP method and the VS method. may simply be a result of the fact that < P~ nE (~n ) >6= Pn (3.29) in which case better performance would be obtained by making each iteration longer, which would reduce the bias. For the very long runs in section 3.2.5, for example, the bias is not a problem. As a test of this, we show in gure 3.13 the dierence final (0) n (0) for an L = 16 simulation where the `fullness criterion' NTP is doubled at each step, so starting at NTP = 1200 and increasing to NTP = 153600 by the 8th iteration. We begin not with = 0 but with 8 from the run shown in gure 3.9, for which (0) has not yet exceeded its limiting VS value. We combine the increase in NTP with jackknife bias-correction (see [191] and appendix D), which however assumes that the bias goes as 1=NTP . It is apparent that the bias is indeed much reduced by increasing NTP , but it is clearly much more persistent than 1=NTP . However, using very large NTP , while it may remove the bias, also clearly removes the TP method's main advantage, that of rapid convergence. Unless (as in sections 3.2.5 or 4.3) there is some particular reason for sticking with measurements CHAPTER 3. MULTICANONICAL AND RELATED METHODS 138 4.0 NTP =1200 ... 153600 Limiting behaviour with NTP =600 η final −η n 2.0 0.0 -2.0 -4.0 0 2 4 6 8 10 Iteration n Figure 3.13. L = 16: Convergence of the weight n(0) as a function of iteration number for the TP method with increasing NTP and bias correction. of transitions, we would recommend switching to visited-states when n is changing between iterations by less than could be obtained by visited-states in the same time. If the nal n does not then immediately produce a viable multicanonical distribution (the bias is scarcely more than an order of magnitude), only one or two iterations of the VS method will be required to reach it. If the distribution P can has more than one peak then the method needs slight modication. For example, consider magnetisation preweighting for an Ising model for c . If we are to sample all of the macrostate space by releasing the system from a state of low probability and letting it move to one of high probability, then we need a release point at M = 0 as well as at M = jMMAX j; otherwise, states around M = 0 will rarely if ever be visited. We could initially impose the constraint M = 0 and generate equilibrium congurations with this constraint before allowing M to change, but here we use random M = 0 congurations. In this case we have adopted a two-stage process. The rst stage serves to outline the structure of the macrostate space and bootstraps the second stage which completes the weight assignment. In the rst stage, we perform a sequence of simulations launched, successively, from one of the three initial states, then, once this has converged, we rene the weights using transition probability data gathered using only the two ordered microstates as starting states. We do CHAPTER 3. MULTICANONICAL AND RELATED METHODS 139 this extra renement because it is not generally to be expected that the limiting set of weights produced with three launch states will be multi-canonical, because the conditions presupposed in equation 3.26 are not fullled. A typical M = 0 microstate (at = c ) comprises large clusters of spins of the same orientation, which take a long time to evolve. As a result, the information the algorithm gleans about the M = 0 macrostate is biased by the systematic launch from a microstate which, being `random', has an energy signicantly higher than those typical of M = 0. For this reason the algorithm would be expected to overestimate the weight to be attached to the M = 0 macrostate in the rst stage. But this then ensures that in the second stage simulations released from the M = Ld states will be able to reach the M = 0 states, which they would not with the canonical weighting. In general the method will require (Np + 1) stages, where Np is the number of maxima in P can (the number of phases if each maximum has the same weight). This scheme is shown in operation in gure 3.14 for L = 32 at = c ; we show only half of the macrostate space to reduce cluttering. 50.0 iteration 1 (VS) iteration 1 (TP) iteration 5 (TP) naive fss of L=16 η(M) limiting L=32 η(M) 40.0 η 30.0 20.0 10.0 0.0 0 256 512 768 1024 M Figure 3.14. for Ising with L = 32 at = c = 0:440686::: inferred from one iteration of the VS method, one or ve iterations of the TP method and nave ( Ld) nite-size scaling of xc (L = 16). The solid line shows the limit established from long VS runs (performed to gather data for section 3.3.3). It is apparent that, notwithstanding our concern that the result with three release points CHAPTER 3. MULTICANONICAL AND RELATED METHODS 140 would be biased, the estimate of produced is found to converge on the rst iteration. The faster convergence compared with the application to energy presumably results from P can(M ) being wide and at in the central region, so that the relaxation of M is naturally slow even for the rst iteration. The relaxation time of the energy is much faster, so within only a short time equation 3.26 is approximately satised. The eect of the random M = 0 launch point is therefore small. It is also signicant that the matrix nij is naturally tridiagonal, so that it is unnecessary to block it. The estimate 2 of is in fact good enough to be used immediately in a multi-canonical `production run,' in contrast to the result of a visited-states run of the same length, or to nave ( Ld) nite-size scaling of xc (L = 16), which are also shown in gure 3.14. We also show the eect of proceeding to the second stage of renement, using only the microstates at M = Ld as release points (this is marked as iteration 5 in gure 3.14, which was the 3rd iteration conducted with only two release points). In this case, because the rst stage has performed so well, only a marginal further improvement is obtained, and, as with energy, there seems to be a small residual bias. To summarise, then, we have found that the TP method provides very much faster initial convergence to the multicanonical distribution than the visited-states (VS) method of section 3.2.1; we have demonstrated its ecacy for fairly large systems (L = 16 and L = 32 energy; L = 32 magnetisation), where variations in canonical probability of more than one hundred decades must be covered. If the transition matrix has a suitable structure, convergence can be achieved on the rst iteration; however, if this is not the case, the nal convergence may be poorer than that of the VS method, and there may be a residual bias. In practice, at least for the Ising model, it is probably better to switch to the VS method for nal rening when is changing only a little between iterations. 3.2.4 Finite-Size Scaling It will be noticed that the shape of the nal xc (E ) generated in the previous sections is very similar for dierent system sizes, merely being scaled by the system size Ld. This is a manifestation of the extensivity of the canonical averages and free energy away from criticality (section 1.1.3). As a consequence, xc for a small system, once generated (and the methods we have examined above make it quite easy to do this) can be used to predict xc for a large system: we t a function (such as a spline or Chebyshev polynomial) to the small-system xc, then scale and interpolate it. The predicted can be rened, if necessary, to a multicanonical CHAPTER 3. MULTICANONICAL AND RELATED METHODS 141 form, again by using one of the two previous methods. The renement thus corresponds to measuring correction-to-scaling terms. In the Bayesian framework the use of nite-size scaling (FSS) corresponds simply to beginning with `prior information' about the sampled distribution which comes from smaller systems, reected in a P ( jH) which has its mean at the nite-size scaling estimate and a width chosen to reect (or to underestimate) the expected magnitude of the unknown correction terms. This has been found to work extremely well away from criticality; for the 322 Ising model at = 0:55, scaling the 162 xc (E ) gives an estimate accurate to within 2%, which then requires only a few iterations of the visited-states method to converge to a fully multicanonical form. This convergence is shown in gure 3.15. The subgures that compose this gure are to be read in the order that text would be, from left to right and top to bottom. At the top left is the initial estimate of 1 (L = 32), produced by nite-size scaling of xc(L = 16). Next to this on the right is the histogram C 1 produced in a MC run sampling with 1 . We then used this histogram to produce 2 . The dierence between 1 and 2 is very small on the scale of , so we show the dierence between them, 1 = 2 1 , rather than 2 itself (2nd row left). Sampling with 2 then produced the second histogram Ci2 (2nd row right), and the remaining two lines of the gure show the equivalent data for iterations 3 and 4. It is apparent that spreading of the sampled distribution occurs just as in section 3.2.1; fractionally large uctuations occur around the edge of C 1 where it goes to zero, which translate into a rather `jagged' C 2 , but the irregularities are then smoothed away in subsequent iterations. As in the previous investigation of the VS method, we so not extend sampling also to those states below Pcan =0:55 . The histogram broadens rather faster than in section 3.2.1, because 1 has approximately the right shape even in the region that is not sampled in C 1 . Thus, a small unform increase of the probability of all the macrostates in this region renders many more of them accessible than would be the case if 1 were constant there. It will be noted that in this set of runs we adopted a slightly dierent way of dealing with n above E = 0, simply cutting it o at a constant value rather than letting it increase as (E > 0) = (E = 0) + E . The result is that the states above E = 0 are scarcely sampled, so when using these results to calculate free energies (as we do in sections 3.3.1 and 3.3.2) we use the symmetry of the Ising density of states about E = 0 to reconstruct P can(E ) for E > 0. However, there are situations where the simple FSS described above cannot be applied. One is P can (M ) at c , where a simple FSS scaling does not correctly predict the shape of CHAPTER 3. MULTICANONICAL AND RELATED METHODS 142 (see gure 3.14). As we shall see in section 3.3 below, in some ranges of M -values (M; c ) scales like Ld but in others it obeys dierent laws. With a knowledge of the correct critical exponents, a critical P can(M ) at c could be scaled correctly; however for the important case of the simulation of spin-glasses (see [108]) there exists no FSS theory to predict the scaling of . In such cases the TP method4 or linear extrapolation [108] would be preferable. 4 Though we have not yet tried to apply this method to spin glasses. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 750.0 500.0 η 143 1 C 250.0 1 500.0 250.0 0.0 -2048 -1024 E 0.0 -2048 0 0 -1024 E 0 -1024 E 0 -1024 E 0 750.0 5.0 1 ∆η -1024 E C 0.0 2 500.0 250.0 -5.0 -2048 -1024 E 0.0 -2048 0 750.0 5.0 2 ∆η C 0.0 3 500.0 250.0 -5.0 -2048 -1024 E 0.0 -2048 0 750.0 5.0 3 ∆η C 0.0 4 500.0 250.0 -5.0 -2048 -1024 E 0 0.0 -2048 Figure 3.15. Renement of a FSS estimate of using VS. (top left) estimate for 1 (L = 32), produced by nite-size scaling of xc (L = 16). Below on left: n 1 = n n 1 for iterations 2 (top) to 4(bottom); on right, from top to bottom, histograms for iterations 1{4. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 144 3.2.5 Using Transitions for Final Estimators: Parallelism and Equilibration We now present some new results which show how the multicanonical method could be implemented eciently on a parallel computer, and how in some circumstances we can do away with the necessity of performing full equilibration of the simulation, in the sense in which it is usually meant. We may reasonably speculate that any algorithm which is to be widely used in MC simulation in the future will have to be amenable to running on parallel computers. Unfortunately, the multicanonical method cannot be parallelised in the same way as Boltzmann sampling often can be: by geometrical decomposition, in which each processor of the parallel computer looks after a subvolume of the whole simulations. Geometrical decomposition works for Boltzmann sampling when the forces between particles are short-ranged, so that calculation of E (to be used in equation 1.2.2) for each trial particle move is a local operation. Particles which are suciently widely separated that they cannot interact before or after any possible moves may then be updated in parallel. However with multicanonical sampling this is no longer possible: the transition probability depends on , where is a function of the total energy or magnetisation of the system. Therefore, if we generate several trial moves in dierent regions of the simulation, the transition probability for each will depend on the nal macrostate, which we do not know a priori since it will depend on how many of the other moves are accepted5 . However, some kinds of parallelisation are still possible: First, it is permissible to generate several trial moves with geometric decomposition and then to perform just one at random of those that would be accepted. This would produce a speeding-up in a situation where the acceptance ratio was very low, but not otherwise. It is also permissible to update in parallel with geometric decomposition if the moves that we generate are chosen to keep the value of the preweighted variable, and thus , constant; so this would correspond to Kawasaki dynamics on the Ising model with = (M ). This kind of parallelisation was used (in combination with primitive parallelism|see below) for the simulations of chapter 4. Moves that change must be performed serially, and so the total speed-up with parallelisation is unlikely to be very large, since the preweighted variable, which varies over a wide range in a multicanonical 5 Very recently, and in a dierent context, a `fuzzy' MC method has been introduced [165] which, we speculate, may enable errors introduced by parallel multicanonical updating to be corrected. This is one area of possible future investigation. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 145 simulation, and thus has the slowest uctuations, still has to be explored in serial. This kind of parallelisation is better suited to the expanded ensemble (see section 3.4.1 for a discussion of the kinship between this and the multicanonical ensemble), where the number of -weighted states is generally quite small. Finally, we may simply perform `primitive parallelism,' where Nr copies, or replicas, of the simulation are run in parallel, one on each processor. The results p of all the replicas are then averaged to give estimators with a variance lower by Nr . It is this kind of parallelisation that we shall now discuss in more detail, showing how the multicanonical ensemble is particularly suited to it. Imagine performing a simulation with primitive parallelisation where both Nr and Ld, the system size, are quite large. As we discussed in section 1.2.2, a MC simulation usually needs an equilibration period before unbiased results are produced (i.e. conguration is generated with P n (), where P n () is the eigenvector of the microstate transition matrix nrs that we have chosen with the goal of sampling P n ()). In the case we are discussing, as Nr and Ld increase, the equilibration time becomes a larger and larger fraction of the total simulation time: once equilibrium is reached, a short run suces (because Nr is large), but getting to equilibrium takes a long time (because Ld is large). This problem aicted us quite severely in the simulations of section 4.2. The multicanonical ensemble might seem to oer a way of solving this problem, because macrostates that require no equilibration (the ground state) or equilibrate very fast (innitetemperature) have a high probability under it. Can we therefore simply start every simulation in one of these states and begin collecting data immediately, with little or no equilibration? If conventional estimators are used the answer is no, but we shall show that by using the eigenvector estimators we have just introduced we may indeed do just that. The problem with using conventional visited-states estimators (P~in / Cin ) is that states near to the starting state (and the nishing state too, in fact) receive too much weight. Suppose we have a perfect multicanonical distribution in magnetisation, P (M ) = const. Let us start each of our simulations in the -ve magnetisation ground state and let it evolve until it has done just a few random walks over the whole range of M (this will take quite enough time, since we are assuming that Ld is large). The expected distribution of C (M ) is then as shown (qualitatively) in gure 3.16; even though the underlying P (M ) is at there is a concentration of probability near the starting state, and this only disappears in the C (M ) ! 1 limit. What we have is basically a diusion problem; the probability of being in state M at time t, P (M; tjM0 ; 0) starts CHAPTER 3. MULTICANONICAL AND RELATED METHODS 146 as a delta function at M = M0 ; t = 0 and slowly spreads out over the allowed states, nally becoming uniform only as time goes to innity. The expected C (M ) is proportional to the sum over time of P (M; tjM0 ; 0), and the acceptance ratio in each state plays the role of a diusion coecient. We should note that this problem is always present in MC simulation, but its eect is usually very small; the number of sampled macrostates in a Boltzmann sampling simulation is small and the Markov chain is long, so the the slight bias toward the starting state, which dies away p as 1=Nc, is swamped by the random errors of order 1= Nc . It occurs even in a simulation which is well `equilibrated', by which we mean that the memory of any transient initial state, which may have had a very low probability under the prevailing sampled distribution, has died away and (if the the simulation were a `black box' we could not look into) we would ascribe to every microstate and macrostate its equilibrium probability. But of course in fact the simulation is in exactly one state with probability one when we begin sampling. To ignore this fact has little eect in a Boltzmann sampling simulation, but it would lead to serious inaccuracy here, where the number of sampled macrostates is large and the Markov chain is comparatively short. 0.0070 0.0060 <C(M)>/Nc 0.0050 0.0040 0.0030 0.0020 0.0010 0.0000 -128 -64 0 64 128 M Figure 3.16. Expected histogram of visited states for a diusive system with 257 states (with a separation of 2). It is assumed that we begin in the state at 256 and do 105 steps, moving only to the two adjacent states with equal probability. This gure was generated by solving the diusion equation for a system with a constant diusion coecient and bears only a qualitative relationship to the Ising problem. We can largely overcome this problem if we record the transition histogram Cij and use CHAPTER 3. MULTICANONICAL AND RELATED METHODS 147 equation 3.28 to give eigenvector estimators of the equilibrium state probabilities; as we have already seen when we were using the method to nd the multicanonical distribution, it largely removes the bias of the starting and nishing states. The actual number of visits to each state should not aect our estimators of P n , except in so far as statistics will be poorer for the states that are more rarely visited. Nevertheless, there are two possible sources of bias in the result, one coming from equation 3.29, and the other from the fact that the use of eigenvector estimators does depend on P (ji) having its Boltzmann value (see section 3.2.3). The bias coming from equation 3.29 should get smaller as the number of counts gets bigger. For this reason, we pool the results of all the runs to dene the eigenvector estimator, and use jackknife blocking to give the error. As regards the other source of error, it is not, in fact, the case that P (ji) does necessarily have its Boltzmann value, even given that it did at t = 0, but we argue on physical grounds that in the multicanonical ensemble, where movement through the macrostates is always diusive rather than directional, the macroscopic variables will evolve much more slowly than the microscopic, so giving time for a local equilibrium within each macrostate to reestablish itself, which must be the Boltzmann equilibrium since the microstate transition rules are chosen to produce this. This assertion is conrmed implicitly by the results for P xc(M ), and more directly by measurements we have made of spin-spin correlation functions. We show some results for the L = 16 and L = 32 2d Ising model in gures 3.17 and 3.18. We use magnetisation preweighting at c , with an (M ) that is symmetric about M = 0, so we know that the underlying distribution P xc(M ) should be symmetric even if it is not quite at. We perform 500 runs, starting in the M = Ld ground state. These results were in fact generated using a serial HP-700 computer, but the important point is that they could have been generated in parallel, since each run started from the same state and was independent of all the others. For L = 16 each run was stopped when the simulation returns to that state, after having in the intervening period visited the M = +Ld ground state; for L = 32 we simply performed a constant number of updates (3 107 ip attempts) for each run, so the nishing states were distributed over all the macrostates. Then the transition histograms from all the runs were pooled for the estimation of P xc (M ). It is apparent that using C (M ) (solid line) to give the estimator of P xc (M ) would produce a systematic error, but that this bias is removed to within the random error by using estimators from the transition matrix. The slightly biased jackknife estimator from the pooled transitions is called evecJ (open triangles) and the double-jackknife bias-corrected estimator is evecJJ (lled triangles). The error bars are shown only on evecJJ CHAPTER 3. MULTICANONICAL AND RELATED METHODS 148 but are about the same size on evecJ. On the L = 16 graph we also show an evecJ estimator calculated just from the rst 100 runs (circles); there is an obvious bias present here, although it is in fact still no larger than the random error, which is, of course, also larger this time. For L = 16 there is very little dierence between evecJ and evecJJ, but for L = 32 it does appear that there is a systematic error in evecJ which is eliminated by the bias-correction. In a real implementation, P can(M ) would of course be recovered from P xc(M ) for the determination of free energies and canonical averages, though we have not done this here. 0.0050 0.0045 P(M) P(M) from 1-500, using C P(M) from 1-100, using evecJJ P(M) from 1-500, using evecJ P(M) from 1-500, using evecJJ 1/257 0.0040 0.0035 -256 -192 -128 -64 0 64 128 192 256 M Figure 3.17. Normal and eigenvector estimates for P xc(M ) at c; L=16. We chose to study P (M ) at c to show that the method of doing many short runs can cope with the large, slowly-evolving clusters of criticality without introducing a bias. We do not expect that it should; if we allow two random walks over the whole range of M (repeated enough times in parallel) then we expect to generate all the possible cluster structures. There can be no structure whose generation requires a complex re-entrant trajectory with several visits to each end of the chain of states, because the process is Markovian and so starting once from the non-degenerate end state is equivalent, in terms of the probabilities of what happens CHAPTER 3. MULTICANONICAL AND RELATED METHODS 0.0012 0.0011 149 P(M) from 1-500, using C P(M) from 1-500, using evecJ P(M) from 1-500, using evecJJ 1/1025 P(M) 0.0010 0.0009 0.0008 -1024 -768 -512 -256 0 256 512 768 1024 M Figure 3.18. Normal and eigenvector estimates for P xc(M ) at c; L=32. after, of having been there any number of times before. We have also investigated local spin-spin correlation functions to corroborate the claim that P (ji) has its Boltzmann value to good accuracy. Let the spins be represented by s, and let r be a vector from one lattice site to another. Then (r; M ), the correlation function with magnetisation M , is dened by 2 (r; M ) = < s<(0)ss2(>r) >< s<>s2 > > M M 2 L 2d = < s(0)s1(r) M 2 L 2d given that P r0 s(r0 ) = M where r0 runs over the lattice sites. To estimate < s(0)s(r) >M we calculate X &~ L d s(r0 )s(r0 + r) r0 CHAPTER 3. MULTICANONICAL AND RELATED METHODS 150 and then our estimator of < s(0)s(r) >M is the average of &~ over those congurations which have magnetisation M . We consider r only along the rows and columns of the Ising lattice, and average over equivalent directions, so that r becomes a scalar: r = 1; 2; L=2 in units of the lattice spacing. We have measured (r; M ) for the 322 2d Ising model under two conditions; rst for a series of 50 runs like those that produced the eigenvector estimator 3.18, and second for a single long run containing the same number of congurations in total, starting in a random conguration (therefore probably near M = 0) and allowed to equilibrate for 5 107 spin ips (twice as long as required for a random walk from M = Ld to M = +Ld and back) before gathering data. The rst set of conditions thus gives (r; M ) as produced by the eigenvector estimator, and the second gives it under multicanonical equilibrium conditions. The results are shown in gure 3.19; it is apparent that the two correlation functions are identical to within experimental error. In particular, it should be noted that there is no evidence of M ! M asymmetry in the `eigenvector' runs, even though the visited states histogram C is decidedly asymmetric. As regards the shape of the correlation functions themselves, we observe that, as we would expect, (r; M ) decreases with increasing r and is close to zero for large jM j. If r is small it then increases to a maximum at M = 0, but for r L=2, it is negative around M = 0, a consequence of the tendency of the system to exist in large clusters of opposite magnetisations. To summarise, then, we have shown that we can remove the bias of the starting state in MC simulation by the use of the transition matrix to measure macrostate probabilities combined with the multicanonical ensemble's ability to reach non-degenerate macrostates (so we do not have to worry about the probability distribution of microstates within the starting macrostate). This opens up the possibility of doing simulation without full equilibration of the preweighted variable, and thus of massively parallel implementations of the multicanonical ensemble in which each processor does only a short run on a large system. We do not, however, recommend using this method for simulation on serial computers, because, as we said in section 3.2.3, the TP method performs fewer spin updates/sec than does the normal (VS) method. If there is no problem with bias, as is the case with a serial implementation where a single long Markov chain can be produced, then any extra speed is clearly advantageous. It is also obviously better not to have to do bias-correction if it can be avoided. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 151 0.8 0.6 γ (r,M) 0.4 r=1 r=2 r=3 r=5 r=7 r=10 r=15 0.2 0.0 -0.2 -1024 -512 0 512 1024 512 1024 M 0.8 0.6 γ (r,M) 0.4 r=1 r=2 r=3 r=5 r=7 r=10 r=15 0.2 0.0 -0.2 -1024 -512 0 M Figure 3.19. Estimators of the correlation function (r; M ) for L = 32 Ising. Error bars are not shown, but their size is comparable with the scatter of the symbols. Upper diagram: results from a series of short multicanonical runs starting in the M = Ld ground state; lower diagram: results from a single long multicanonical run after equilibration. If the system is so large that only a few random walks over the preweighted variable can be produced with the serial machine, then no accurate estimate of P n can be made either by VS or TP methods. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 152 3.3 Results In this section we shall present estimates of free energy and canonical averages of the 2d Ising model, which will allow a comparison between the multicanonical ensemble (with = (E )), thermodynamic integration (see section 2.1.1) and exact results. Then we shall present some new results where we use multicanonical sampling to check analytic predictions about the form of P can (M ) for the 2d Ising model at c . 3.3.1 Free Energy and Canonical Averages of the 2d Ising Model Measurement of free energy is of course our central concern in this thesis, so we shall begin with measurements of g( ) = G( )=Ld made using the multicanonical ensemble for the L = 16 and L = 32 2d Ising model. The 162 distribution was obtained by the TP method; the 322 by nite-size scaling followed by rening with the VS method. Once a suitable xc was found, VS estimators were used for all production runs. The production runs comprised a total of 2 106 (L = 16) and 11 106 (L = 32) lattice sweeps, generated in ten blocks, with jackknife blocking used to estimate errors for all results. The early iterations, which contribute only to nding the multicanonical distribution, took an extra 5 105 (L = 16) and 3 106 (L = 32) lattice sweeps. Free energy results are shown in gures 3.20 and 3.21 respectively, along with the exact nite-size results from [10] (solid line). The error bars are much smaller than the data points (triangles). The multicanonical distributions that we used for these measurements were `designed' for the measurement of g at = 0:55 by using equation 3.7 as described in section 3.1.1, so we did not make any particular eort to extend the multicanonical sampling to energies lower than the peak of P can( = 0:55; E ) or higher than the peak of P can ( = 0; E ). We would therefore expect to be able to determine g( ) accurately for 0 < < 0:55, and this is indeed found to be the case. For L = 16, P can ( = 0:55; E ) gives appreciable weight to the ground state E0 , so we can in fact calculate g( ) for any higher value of too; we show g up to = 1:0 in the gure. For L = 32 this was not possible, but we were able to nd g( ) for up to 0:6. The (1sd) error bars were in all cases smaller than 0:0003 for L = 16, and smaller than 0:0002 for L = 32, while g itself varied between 15 and 2 over the range of investigated; the scale of the inset in gure 3.21 gives some idea of the accuracy obtained. Figures 3.26 and 3.27 in section 3.3.2, where the multicanonical results are compared with Thermodynamic Integration, CHAPTER 3. MULTICANONICAL AND RELATED METHODS 153 show the dierence between the MC estimates of g and the exact value. 0.0 -5.0 g Exact Multicanonical -10.0 -15.0 0.0 0.2 0.4 β 0.6 0.8 1.0 Figure 3.20. free energy g of Ising model L = 16, 2 million sweeps. By reweighting with exp( xc) and normalising to recover the canonical distribution, we can also measure < E > and the specic heat capacity E > = 2 (< E 2 > < E >2 ) CH = 2 @ <@ Again, We can also calculate these exactly using results from [10]. The exact results for < e > =< E > =Ld are shown in gure 3.22 and for cH = CH =Ld in gure 3.23. The anomalous behaviour in the critical region, associated with the continuous phase transition in the L ! 1 limit, is clearly visible: the gradient of the internal energy becomes steeper as L increases, this manifesting itself also in the increasing height of the `peak' in the specic heat capacity. In gures 3.24 and 3.25) we show the corresponding results from a single multicanonical ensemble simulation of the L = 32 system. It is apparent that agreement with the exact results (solid lines) is very good. In the main gures, which show the full range of , the errors are much smaller than the symbols; to see deviations from the exact results we must magnify smaller regions. The largest fractional error in < e > is about 0:1% occurring at = 0:44 (very near the critical point, where dE=d is large). Away from criticality the typical error is more like 0:01%. For the specic heat the largest fractional error, again occurring at the critical CHAPTER 3. MULTICANONICAL AND RELATED METHODS 154 0.0 -5.0 -2.0240 -2.0250 g g -10.0 -2.0260 -2.0270 -2.0280 0.555 0.560 0.565 β Exact Multicanonical -15.0 0.0 0.1 0.2 0.3 β 0.570 0.4 0.575 0.5 0.6 Figure 3.21. free energy g of Ising model L = 32, 11 million sweeps. Inset: detail of vicinity of = 0:56; vertical scale expanded more than 3000 times. point, is 0:5%, while elsewhere it is typically about 0:1%. We do not show results for L = 16, which are very similar. 3.3.2 A Comparison Between the Multicanonical Ensemble and Thermodynamic Integration We have also performed simulations using thermodynamic integration to determine g( ). As we described in section 2.1.1, the derivatives of free energies can be related to more accessible canonical averages; in this case we have used @ (g) =< e > @ We therefore made measurements of < e > using Boltzmann sampling simulations for 11 evenly spaced values of between = 0:05 and = 0:55, investigated in that order. An interpolating spline was tted to the data points and integrated numerically with respect to . g( ) was then found by using lim!0 (g) = ln 2. The lengths of the runs were chosen so that 2:5 105 (L = 16) and 1 106 (L = 32) sweeps were performed at each temperature, enabling a more or less direct comparison with the multicanonical ensemble. Each simulation was started using the nishing conguration from the previous one, to reduce the equilibration CHAPTER 3. MULTICANONICAL AND RELATED METHODS 155 0.0 -0.5 <e> L=16 L=32 -1.0 -1.5 -2.0 0.0 0.2 0.4 β 0.6 0.8 1.0 Figure 3.22. Exact internal energy/spin of L = 16 and L = 32 Ising. required, which was conned to 5 104 lattice sweeps. The results are shown in gures 3.26 (L = 16) and 3.27 (L = 32). We show both the results from thermodynamic integration (circles) and those from the multicanonical ensemble (triangles). So that the accuracy obtainable is clearly visible, we have plotted the dierence between g( ) from simulation and the exact g( ) from [10]. For L = 16 at all temperatures and L = 32 at small , the two methods yield comparable accuracy (although the multicanonical results are a little better), and the error bars include the line g = 0 as they should. However, for L = 32 a large deviation from the exact result appears in the thermodynamic integration points at about = 0:35. This is not a random error (the error bars, which represent the measured `spread' of the estimator from blocking, are approximately the same size as those on the multicanonical data) but is instead a systematic error caused by the presence of a phase transition on the path of integration. This is a problem that often severely reduces the accuracy of simulations that use thermodynamic integration. Here, the innite-volume Ising model has a continuous phase transition at c = 0:440686 : : :, and for the L = 32 system < e > changes so rapidly with around this point that the data points are inadequate to determine its shape, producing the systematic error in g. The shape of the deviation{rst positive, then negative{ suggests that the `corners' of a sharp sigmoid curve are being smoothed away. To reduce this error we would have to space the integration points dierently, clustering them around the CHAPTER 3. MULTICANONICAL AND RELATED METHODS 156 2.0 1.5 cH L=16 L=32 1.0 0.5 0.0 0.0 0.2 0.4 β 0.6 0.8 1.0 Figure 3.23. Exact specic heat capacity of L = 16 and L = 32 Ising. phase transition point. By contrast no such special care is required in the application of the multicanonical ensemble. It is untroubled by the Ising phase transition, and indeed can even be used to sample through a rst-order phase transition (see [121] and section 4.3). The error bars get larger because of the large critical uctuations, but they still contain the line g = 0. The multicanonical error bars, unlike those on the the thermodynamic integration points, thus still provide a trustworthy condence limit on the accuracy of the results. We speculated in section 2.4 that consideration of the algorithm used and the estimators of free energy dened on the congurations produced could be partially separated. To investigate this we have tried a `multicanonical-integration' hybrid, where g is estimated by rst nding the internal energy < e > from the results of the multicanonical ensemble, then integrating it with respect to . The variation of < e > through the critical region is tracked well this time, so we would not expect systematic errors. In fact, we nd that exactly the same estimators are produced this way as by direct estimation of < exp(e) >, right down to the seventh signicant gure. There is, therefore, no advantage to the procedure. It has been suggested [47] that thermodynamic integration along a path that avoids a phase transition outperforms the multicanonical ensemble (or related methods like overlapping distributions) for large L because it does not require that all macrostates be sampled. We see no evidence of this, but we have not examined suciently large system sizes. For the L = 32 CHAPTER 3. MULTICANONICAL AND RELATED METHODS 157 0.0 -1.3 -0.5 <e> <e> -1.4 -1.5 -1.0 0.43 0.44 0.45 β exact Multicanonical -1.5 -2.0 0.0 0.2 0.4 β 0.6 0.8 1.0 Figure 3.24. The specic internal energy < e > for the L = 32 Ising model. Solid line: exact results. Points: MC results. thermodynamic integration the system was still suciently small that P can (E ) at each simulation point overlapped signicantly with its neighbours. This suggests that we would need to be dealing with extremely large systems before this became an issue, and it is precisely in large systems that the behaviour around phase transitions is too singular for thermodynamic integration to cope with. To summarise, then, we have found that multicanonical sampling performs at least as well as thermodynamic integration in a single-phase region, and is much better able to deal with phase transitions. The question of the superiority of thermodynamic integration in very large systems has not been resolved, but it is clear that it could only apply to large systems away from phase transitions, where internal energy is very smoothly varying. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 158 2.0 1.95 1.90 1.5 1.85 cH 1.80 cH 1.75 1.0 1.70 0.420 0.425 0.430 0.435 0.440 0.445 0.450 β Exact Multicanonical 0.5 0.0 0.0 0.2 0.4 β 0.6 0.8 1.0 Figure 3.25. The specic heat capacity cH for the L = 32 Ising model. Solid line: exact results. Points: MC results. 3.3.3 P (M ) at = c As we have already discussed (sections 1.1.3), the p.d.f. of the magnetisation of the Ising model has an unusual form at the critical point, related to the critical scaling of the free energy G with L (see appendix A). It is well known [166] that at = c with no external eld and for large L, PLcan(m)dm ' p(x)dx where m = M=Ld=x m=m and m < M 2 >1L=2 L d=(1+) where p (x) (which is in general non-Gaussian) is unique to a particular universality class (a universality class is a collection of possibly highly disparate systems, united by spatial dimensionality and certain gross features of their interactions, which are found to have very similar CHAPTER 3. MULTICANONICAL AND RELATED METHODS 159 0.0010 0.0005 gMC-gexact 0.0000 -0.0005 Thermodynamic Integration Multicanonical -0.0010 -0.0015 -0.0020 0.0 0.2 0.4 β 0.6 0.8 1.0 Figure 3.26. Dierence between exact free energy and various MC estimates, L=16. critical behaviour), and is the equation-of-state exponent: = 15 for the 2d Ising universality class [11]. p (x) for this universality class is shown in gure 3.28. From rigorous results [12] and scaling arguments [167] we may make the following ansatz for p (x) at large x: p (x) ' p~ (x) p1 x( 1)=2 exp( a1 x(+1) ) (3.30) i.e. for the 2d Ising universality class, p~ (x) ' p1 x7 exp( a1 x16 ) The form of this function (with p1 = 0:59 and a1 = 0:027) is shown in gure 3.29; it is clearly in at least qualitative agreement with the measured p (x) (gure 3.28) for x > 1:0. The form of the prefactor (x7 ) is in accord with a recent theory [168] that relates p (x) to the stable distributions of probability theory. However this theory also suggests the existence of further non-universal contributions to p(x) which would fall o as a power at large x CHAPTER 3. MULTICANONICAL AND RELATED METHODS 160 0.0010 gMC-gexact 0.0005 0.0000 -0.0005 Thermodynamic Integration Multicanonical -0.0010 -0.0015 0.0 0.1 0.2 0.3 β 0.4 0.5 0.6 Figure 3.27. Dierence between exact free energy and various MC estimates, L=32. and so be asymptotically dominant. To see whether these non-universal corrections exist we need to compare the ansatz 3.30 quantitatively with an accurate measurement of p (x) (i.e. P can (M; c )) in the large-x regime. This measurement cannot be performed by Boltzmann sampling; whatever the exact form of p (x) may be, it clearly falls o very fast for large x and it will be impossible to measure it accurately above x 1:4. The ground state, by comparison, lies at xmax Ld=(1+), which tends (albeit very slowly) to innity in the thermodynamic limit and for L = 64 is at 1:6132. We therefore used a multicanonical simulation with = (M ), arranged to be at over the entire range M = Ld to M = +Ld. The usual reweighting then enables us to recover P can (M ), and thus p(x), accurately measured over the whole range of M . The inset in gure 3.28 shows the canonical probabilities of the ground state and rst excited state of the 642 system measured this way. To be certain we were in a regime where corrections-to-scaling were small, we investigated quite large system sizes L = 32 and L = 64; the latter is about as large as is feasible using single-spin-ip Metropolis, whose acceptance ratio falls o like L d near CHAPTER 3. MULTICANONICAL AND RELATED METHODS 161 1.4 8.0e-85 1.2 1.0 6.0e-85 p*(x) 4.0e-85 2.0e-85 p*(x) 0.8 0.0 1.61280 1.61300 1.61320 x 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 x Figure 3.28. The critical probability density function p(x) of the scaled order parameter x for the 2d Ising universality class, determined by MC simulation. p (x) is symmetrical about x = 0. The inset shows the canonical probability of the ground state and rst excited states of the 642 2d Ising model, which lie in the extreme tail of p (x) and have been measured by multicanonical simulation as described in the text. the ground state for reasons we explained in section 3.1.2. Because the exact scaling form of P can (M; c ) was unknown, or rather was part of what we wanted to investigate, we used the TP method to generate xc initially, moving to the visited-states (VS) method for nal renement and for the production stage. For L = 32 we did 10 iterations of the TP method, the rst 7 taking 30 minutes each on an HP-700 workstation, the last 3 taking 3 hours. Then we allowed the simulation to proceed using the VS method, with a gradually increasing (automatically increased) NAV . For L = 64 we did 8 iterations of the TP method, generating about 109 spin ips (5 h) for each, then switched to the VS method. The details of the implementation of the VS method are in table 3.1. The TP program used was an early version which converged much more slowly than the one that produced the results of section 3.2.3 for (M ) for 322 Ising; it had not in fact quite converged when we moved to VS estimators. During both the nal renement and production stages, we allowed continued updating of n , using the method of section 3.2.1 to incorporate prior information. Finally we re-analysed the results, combining 's according to the prescription of equations 3.24 and 3.23 but using CHAPTER 3. MULTICANONICAL AND RELATED METHODS 162 1.4 1.2 1.0 ~ p*(x) 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 x Figure 3.29. The ansatz p~ p1x7 exp( a1x16 ) with p1 = 0:59 and a1 = 0:027. L NAV # iterations # lattice sweeps/iter time/iter 32 100 5 6:4 105 1.5h 32 200 5 1:2 106 3h 32 400 2 2:4 106 6h 64 7 5 1:6 105 1.5h 64 14 5 3:4 105 3h 64 28 7 6:8 105 5h 6 64 56 3 1:4 10 12h Table 3.1. Details of the visited-states sampling for L = 32 and L = 64 with magnetisation preweighting. only the last seven (L = 32) and ten (L = 64) iterations, for which all sampled distributions had been approximately multicanonical. Thus we avoided the bias that would have resulted from using the early estimators n , which were not multicanonical, while still avoiding at run-time the division of the process into `nding-' and `production' phases; or at least we were able to decide a posteriori where the `production' phase was to begin. This produces a function best (M ); the corresponding best P~ can (M ) is recovered using P~ can(M ) exp( best (M )). The nal estimate of (M ) ranged over 0 44 for L = 32 and 0 191 for L = 64, corresponding to 83 decades of variation in P~ can (M ). This P~ can (M ) was used to produce the graph of p (x) in gure 3.28. From (M0 = Ld) we can estimate g(c ) as described in section 3.1.3. We nd, using the dierence between (Ld ) and ( Ld) to estimate the error, the remarkably CHAPTER 3. MULTICANONICAL AND RELATED METHODS 163 accurate estimates: g(c ; L = 32) = 2:11115(4) (cf. g(c ; L = 32) = 2:11107 from [10]) g(c ; L = 64) = 2:10999(2) (cf. g(c ; L = 64) = 2:11001 from [10]) These demonstrate the accuracy with which the ground state probabilities (as shown in the inset in gure 3.28) have been measured. Now let us consider the predictions of the scaling ansatz for p (x). Figure 3.30 shows q (x) ln[x 7 p (x)], estimated from the multicanonical results, plotted against x16 for x > 1. According to the ansatz 3.30 we would expect this to be linear with gradient a1 . In fact, there is a linear regime for medium-sized x with a deviation at both ends (though the deviation near the origin cannot be seen on this graph because of the scale). At small x we nd that p (x) is larger than expected from the ansatz 3.30, while at large x it is smaller. The low-x deviation comes from the approach to zero-magnetisation states, to which the ansatz ascribes zero probability, while the high-x one is a non-universal nite-size eect caused by the underlying microscopic structure of the system becoming apparent as we approach the ground state. The relative weights of the ground state and rst excited state for the Ising model, for example, must be 1 : Ld exp( 8c), in contradiction to the scaling ansatz, while some dierent expression will apply to the p.d.f. of, for example, the density of the 2d Lennard-Jones uid. However, in line with xmax tending to innity with Ld=(1+) , the breakdown of the scaling ansatz occurs at larger x for the larger (L = 64) system. Figure 3.30 also shows that we can discount6 the suggestion that the asymptotic behaviour of p (x) is a power-law decay. If it were, then we would expect that at large enough x, q (x) ln x, which would deviate `downwards' (i.e. with a negative second derivative) from the prediction. Instead, the observed deviation from the ansatz has the opposite sense. In dening q (x) we included an x 7 to cancel the expected x7 in the ansatz for p (x). However, we would also like to demonstrate that the polynomial prefactor in p (x) is indeed x7 . This is not easy to do because the eect of the polynomial is of course dominated by the very strong exponential decay. Nevertheless, we have tried tting functions of the form p1 x exp( a1 x16 ) to p (x) (determined by MC) over a series of `windows' of x-values, choosing the values of 6 at least for the Ising model, for which, it must be admitted, non-universal correction terms are often found to have zero amplitude because of the high symmetry [169]. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 164 200 150 L=64 L=32 q*(x) 100 50 0 0 500 1000 x 16 1500 2000 Figure 3.30. The function q(x) ln[x 7p(x)] plotted against x16 (x > 1) for L = 32 and L = 64. We expect this relationship to be linear with gradient a1 in the region where the ansatz 3.30 applies. and a1 to give the best t. The results for the best-t =[( 1)=2] =7, plotted against the central x of the `window,' are shown in gure 3.31. It is apparent that there is reasonable agreement between the measured and ( 1)=2 for a range of x-values around x 1:2 (i.e. before the exponential decay becomes too strong), and that the width of this range is larger for the larger system (L = 64; triangles on gure). It is another demonstration of the accuracy obtainable with multicanonical simulations that we have been able to pick out a polynomial prefactor from data with an overall exponential behaviour. Finally in this section, we shall use p (x) at large x to determine the scaling amplitude U0 , where U0 is the constant term in the expansion in powers of L of G(c ; H = 0; L) (see appendix A). To carry out this calculation we introduce the function F (y) ln Z dxp (x) exp(yx) (3.31) where y Hm Ld is a scaled version of the external eld. The integrand in F is thus a scaled, unnormalised version of P can (c ; M; H ). Now, if we use the scaling ansatz p~ (x) (equation 3.30) for p (x), it can be shown [167] using steepest descents arguments that for CHAPTER 3. MULTICANONICAL AND RELATED METHODS 1.4 165 L=64 L=32 ψ/7 1.0 0.6 1.0 1.1 1.2 1.3 1.4 x Figure 3.31. Results of window-tting the exponent . Squares: L = 32; triangles: L = 64. The lines are guides to the eye only. large enough values of y F (y) = b1 y1+1= + U0 (3.32) where we have made the identication 21 U0 21 ln a 2p 1 ( + 1) in terms of quantities in equation 3.30. We are now in a position to determine U0 ; we determine F (y) through numerical integration of equation 3.31 using multicanonical results for p(x), then plot F (y) against y1+1= , obtaining an estimate for U0 by extrapolating the linear form back to y = 0. In gure 3.32 we show the graph of F , the main gure showing just the region near the origin and the inset showing the whole range of y up to y1+1= = 175, for which the fully saturated M = +Ld state was the most probable. Figure 3.33 shows the eective U0 , the ordinate intercept obtained by tting the data lying within a window of y-values, plotted against the central value of y1+1= in the window. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 6.0 200.0 5.0 150.0 F*(y) 4.0 166 100.0 F*(y) 50.0 3.0 0.0 0.0 50.0 100.0 y1+1/δ 150.0 200.0 2.0 1.0 0.0 -1.0 0.0 1.0 2.0 3.0 1+1/δ 4.0 5.0 y Figure 3.32. The function F (y) determined from multicanonical measurements, plotted against y1+1= . The main gure shows behaviour near origin, while the inset shows the behaviour right up to y1+1= = 175. It is apparent that convergence to the large-y form of equation 3.32 occurs rapidly, occurring in the interval in which HLd=(1+) < 1, the size of eld required to drive the system out of the critical region to a region where the steepest descents arguments are valid. Within the linear region the estimates of U0 are in good agreement with the exact result U0 = ln 2 ln[(21=4 + 2 1=2 )=2] = 0:639912 from [11], although much the best estimates of U0 come from fairly small y-values (see the inset of gure 3.33; at its minimum, we obtain U0 = 0:6398(1)). There are three reasons for this behaviour: one is that the process of extrapolating back to y = 0 for a window of y-values that far from the origin magnies the error in the intercept; the second is that the width of p (x) exp(yx) gets smaller as the eective eld y increases, reecting the behaviour of the susceptibility. Thus at large y, F (y) is dominated by only a few points of p (x) and its random errors are therefore larger. Third, and probably most importantly, at large y the states that dominate F (y) come from the extreme tail of p (x), where we have just shown (see gure 3.30) that the nite-size p (x) deviates appreciably from the ansatz p~ (x). Thus we would expect a small systematic change in the eective U0 . CHAPTER 3. MULTICANONICAL AND RELATED METHODS 167 -0.10 -0.625 -0.20 U0eff -0.30 U0eff -0.630 -0.40 -0.635 -0.640 -0.645 0.0 -0.50 10.0 y1+1/δ 20.0 30.0 -0.60 -0.70 0.0 20.0 40.0 y1+1/δ 60.0 80.0 Figure 3.33. The eective value of U0 given by the ordinate intercept of the linear t to F (y) within a window of y-values, plotted against the central value of y1+1= in the window. We should compare these results with those from [167], where U0 was estimated in the same way, but using results from Boltzmann sampling MC simulations. As a result of the fact that p (x) was not well determined for large x, the tting was limited to y < 10, though as we have seen this is in fact all that is required for a good estimate of U0 . A brief discussion of the physical signicance of U0 can be found in appendix A. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 168 3.4 Beyond Multicanonical Sampling `It must be admitted that she has some beautiful notes in her voice. What a pity it is that they do not mean anything, or do any practical good!' FROM The Nightingale and the Rose OSCAR WILDE We shall now expand the scope of our discussion of the multicanonical ensemble, putting it in a more general framework of other importance sampling distributions, including the expanded ensemble [60, 124], see section 2.2.3. While discussing the expanded ensemble, we shall present new results on the scaling of its expected MC error. Then, in section 3.4.3, we shall use similar analysis to predict for the rst time the expected error of an estimator for various nonBoltzmann distributions, including the multicanonical, and we shall identify a near-optimal sampled distribution for a particular quantity O and algorithm; sampled distributions of any desired shape may be produced by a simple generalisation of the methods of section 3.2. We shall then check our predictions by explicit MC measurement of the variance of O~ . 3.4.1 The Multicanonical and Expanded Ensembles We shall now make more explicit the connection between the multicanonical ensemble and the expanded ensemble that we rst mentioned in the discussion of them in chapter 2. The apparent dierences between the two, in the way they have been formulated up to now, come from the choice of ensemble, and we can put the two into the same framework as the expanded ensemble by considering what the multicanonical ensemble would be like if applied to ensembles other than the NV T ensemble, or what the expanded ensemble would be like if it were not `made from' NV T ensembles. In the same way, it was necessary in chapter 1 to consider the Ising ferromagnet in the NV T ensemble and the the uid system in the NpT ensemble in order for the similarity between them to become apparent. Consider, for example, a multicanonical system with the coecients depending on the magnetisation, = (M ). The multicanonical partition function is: Z= X f g exp( E ()) exp[(M ())] CHAPTER 3. MULTICANONICAL AND RELATED METHODS 169 We can cast this into the form of the expanded ensemble (cf. equation 2.18) by writing: Z= X M Z 0(M ) exp((M )) where Z 0 (M ) = X f g (M () M ) exp( E ()) = exp( F (M )) So from this perspective the multicanonical ensemble appears as an expanded ensemble composed of xed-M ensembles. This example is perhaps a little awkward since we are not accustomed to thinking in terms of the xed-M ensemble. However, in just the same way, an expanded ensemble where each subensemble is microcanonical is exactly equivalent to the multicanonical ensemble with E the preweighted variable. Another example is a non-Boltzmann sampling NpT -simulation with an (V ) designed to increase exploration of the volume macrostates, which can naturally be regarded both as an expanded ensemble generated by putting together canonical ensembles diering in V and as a multicanonical NpT -ensemble with providing a non-canonical weighting to the dierent volumes. We shall use just such a simulation in chapter 4. A grand canonical ensemble (canonical ensembles weighted by N , with an extra (N )) can be considered in the same way. The expanded ensemble is thus the more general concept; a multicanonical ensemble can always be described as in terms of an expanded ensemble. Sometimes, then, it is not even clear what a particular sampling scheme should be called, though `multicanonical' seems to be established for (E ) and (M ), `expanded ensemble' for the system with subensembles having dierent energy functions and `expanded ensemble' or `simulated tempering' for the system with subensembles having dierent temperatures. Because there are no Gibbsian ensembles analogous to the systems with variable temperatures or energy functions, they can only really be thought of as expanded ensembles; the issue is the status of the multicanonical ensemble. We suggest that a classication on the basis of the behaviour of the order parameter, rather than the nature of the subensembles; thus, an expanded ensemble's 's would always be related to a G(x), where x is an intensive (eld) variable, for example, while the multicanonical ensemble's would be related to F (X ), where X is an extensive (mechanical) CHAPTER 3. MULTICANONICAL AND RELATED METHODS 170 variable, such as E or M . With this nomenclature, the ensemble with xed N; p and T and (V ) to make P (V ) at over some range of V would be called multicanonical. This classication has the advantage of generally reecting the way that the quantities of interest are extracted from the simulation. In multicanonical simulations, the required free energies are found by reweighting and summing over all values of the preweighted variable (e.g. equation 3.7) while with the multicanonical ensemble they come from the probabilities of the `end states' of the chain of subensembles only (e.g. equation 2.19). Because of the similarity between the them, all the methods described in this chapter for nding for the multicanonical ensemble are also applicable to the expanded ensemble, provided that the notions of microstates and macrostates are redened slightly; the `microstates' become the joint set ffg;f gg of coordinates and temperatures (taking the temperature-expanded ensemble as an example) and the `macrostates' become the canonical subensembles of which the expanded ensemble is composed. In particular, the transitions between the subensembles still form a Markov process and the TP method of section 3.2.3 can be applied, but with the proviso that some care must be taken in the choice of starting state; since we cannot choose a starting state that contains only one microstate|every state of the expanded ensemble contains all the microstates of its constituent canonical ensembles|we must make sure that the simulation has had time to equilibrate within the starting state before we permit it to move to the other subensembles. We should also note that parallelism of the geometric decomposition kind is more naturally applied to an expanded ensemble than to a multicanonical ensemble. Any parallel coordinate updating that could be applied to the canonical `subensemble' if it were a simulation in its own right can clearly also be applied in the multicanonical simulation, and it would never occur to us to change the value of the `eld' property that varies between the subensembles (temperature or whatever) in one part of the simulation volume but not another. In the multicanonical ensemble, conversely, the need to keep the preweighted variable constant during any parallel updating may well aect our choice of the kind of particle moves or spin ips that are done. Nevertheless, the similarity between the two methods remains in so far as the `preweighted variable,' whether it be the energy/order parameter or the temperature/hamiltionian, must be explored serially. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 171 3.4.2 The Random Walk Problem The description of the dynamics of the expanded ensemble as a Markov process enables us to derive an important new result about the expected error of expanded ensemble simulations. In these simulations, the quantity we wish to measure is simply the ratio of the probabilities of two of the subensembles in the chain, usually the two at the ends (this makes the analysis here easier than for the multicanonical ensemble). For good accuracy, the system must visit both ends of the chain several times, and so the the accuracy is ultimately limited by the time to do a one-dimensional random walk over Nm , the number of subensembles in the chain (assumed not to be very small). This time is rw , rw Nm2 . We now outline an argument suggested by this fact, but which we think is fallacious, that expanded-ensemble-type calculations can have their accuracy improved if the chain of subensembles is divided up into pieces and the sampling performed in each piece separately. We then go on to give what we think is the correct argument. First, then, the fallacious argument: Suppose that the underlying probability distribution is such that each subensemble has equal probability, but that we are `pretending' that we do not know this and are trying to measure Pi . This situation will in fact be realised to a very good approximation in the `production' stage of real applications, and the small deviations from a constant Pi , while being exactly what we want to measure, will have no eect on the random walk arguments that we are about to give. Suppose we generate Nc counts in total in our histogram of the occupancies of the Nm subensembles. If there were no correlations, then the number of congurations going into the ith subensemble would be Ci , which would have a binomial distribution with mean Ci = Nc =Nm and variance Nc (1=Nm)(1 1=Nm) Nc=Nm . Thus we would estimate Pi by Pi e=:b: Ci =Nc (or (Ci + 1)=(Nc + Nm)) with expected error r C Nm Nc C In practice, adjacent congurations are highly correlated. Now, as we have seen, even though the Markov chain we are simulating is really a microscopic one between the microstates of the system, we can treat the process that describes transitions between subensembles as a eective Markov chain in its own right, and given that equation 3.26 is satised, it obeys its own detailed balance condition that we can use to estimate the stationary probability distribution. Because CHAPTER 3. MULTICANONICAL AND RELATED METHODS 172 the underlying Markov process is is highly correlated, so is the eective process|each ensemble will usually only make transitions to its neighbours (indeed we shall usually restrict attempted moves to the neighbours to save wasting eort in attempting transitions that are almost certain to be rejected). It is now tempting to apply equation 1.25, equating the eective correlation time (that multiplies the variance of the estimators obtained) with the random walk time rw Nm2 1. This would give for the correlated case s C Nm3 Nc C (3.33) This would also be the scaling of the fractional error in the ratio r = C1 =CNm , the estimate of P1 =PNm , assuming that the ends are far enough apart for the errors to be independent: s r = var(C1 ) + var(CNm ) r (C 1 )2 (C Nm )2 (3.34) This result implies that we could achieve a better accuracy by dividing up the Nm -state expanded ensemble into b groups with one state (or a few states) overlapping between adjacent groups. Neglecting the overlaps, the number of states in each group would be m Nm =b, and we would devote time Nc=b to each. The estimator of r would be r e=:b: b Y j =1 C1(j) =Cm(j) and then equations 3.33 and 3.34 would give s r pb (Nm =b)3 r (Nc =b) s Nm3 (bN c) (3.35) implying that the error decreases as b ! Nm , i.e. as the expanded ensemble is divided up into smaller and smaller pieces. We expect that in practice errors would eventually begin to increase again, because if b were very small a large fraction of transitions would have to be rejected as taking us outside the group (and possibly because correlation times within some subensembles, may become longer if they cannot connect with others at higher temperatures), but it is not clear what the best value of b to choose would be; we would have to investigate each CHAPTER 3. MULTICANONICAL AND RELATED METHODS 173 b separately and measure the error directly. The core of this argument, that the correlation time is the same as the random walk time, can be found in one form or another in [6, 24, 130]. Now let us do a rather more careful analysis of the eective Markov chain, using results from [164, chapter 5]. Dene state occupancies Ci (Nc jj ), the number of times that state i is recorded in Nc steps of the chain given that it starts in state j . It can be shown that in the large-Nc limit (Nc Nm2 ) the eect of the starting state j disappears: as before, and Ci (Nc jj ) Ci (Nc ) = NcPi = Nc =Nm (3.36) var(Ci (Nc jj )) Nc Pi [Pi + 2tgii (1) 1] (3.37) where the tgii (1) are components of the transient sum matrix Tg (1), which is given by Tg (1) = [I + 1] 1 1 (3.38) It appears that we need to do a full matrix inversion to nd Tg (1), which will cost Nm3 operations, but in fact we can do it in only O(Nm2 ) operations using a trick (see [49, chapter 2]) which depends on the fact that is sparse and 1 has all its rows equal. This makes the calculation of var(C ) economical even for large Nm ; the Nm = 1000 case requires less than one minute on a workstation. Let us test these predictions using a simple Markov model, where there are no `microstates' within each state of the chain. We make the simple choice 8 > > > > > > > < ij = > > > > > > > : 2=3 1=3 1=3 0 for (i = 1; j = 1) and (i = Nm; j = Nm ) for (i = 1; j = 2) and (i = Nm; j = Nm 1) for 2 i Nm 1 and j = i 1; i; i + 1 otherwise This has eigenvector Pi = 1=Nm as required. In gure 3.34 we show the behaviour of Nc 1 var(Ci ) p and Ncvar(Ci )=Ci as a function of Nm for i = 1 and i = Nm =2. It is clear that, except at very small Nm , var(C ) is roughly constant rather than increasing CHAPTER 3. MULTICANONICAL AND RELATED METHODS 1000 (Nc var(C)) /<C> 2.0 1/2 var(C)/Nc 1.0 0.5 0.4 0.3 state Nm/2 state 1 0.2 0.1 174 1 10 100 state Nm/2 state 1 100 10 1 1000 1 Nm 10 100 1000 Nm Figure 3.34. Behaviour of variance (left) and fractional error (right) of the number of visits C to states 1 and Nm =2 in a 1D random-walk over Nm states, with reecting boundaries. p p like Nm, and var(C )=C increases linearly with Nm , rather than going like Nm3=2 . r=r will behave similarly, except that it will show an even smaller deviation from linearity, because Ci and CNm are strongly anti-correlated at small Nm , whereas they are roughly independent for large Nm . If we now consider subdividing the range Nm into b pieces, we nd r=r p p b(Nm =b) (Nc =b) (3.39) p Nm = Nc so this time the variance of the estimator r is not decreased by subdividing Nm , and therefore in an expanded ensemble calculation we may as well always do a single random walk over the whole range to get whatever benets of improved ergodicity are available. We shall present results in chapter 4 that demonstrate that this behaviour is indeed what is found in a real expanded ensemble calculation. Where, then, does the mistake lie in the fallacious argument we gave rst? Equation 1.25 is certainly applicable, since it is based on very general principles of time-series analysis. The error comes, in fact, from assuming that eective correlation time O in equation 1.25 is the CHAPTER 3. MULTICANONICAL AND RELATED METHODS 175 same as, or of the same order as, rw . the operators we need to consider are simply 8 > < Oi () = > : 1 if 2 i 0 otherwise (3.40) so that Nc < Oi >=< Ci >. It is shown in appendix C (equations C.1 and C.2) that Oi = 1 X t=1 i (t) where t labels time in units of the basic update and the correlation functions i (t) are 2 i (t) = < O<i (0)OO2 i>(t) >< O<>O2i > i i so using equation 3.40 it follows that here the i (t) are 2 P (i; tji; 0) (1=Nm ) i (t) = (1=Nm(1)=N ) (1=N )2 m m where P (i; tji; 0) is the probability that the simulation is found in state (subensemble) i at time t given that it was there at time 0. Summing the 's can be done analytically, using results from [170], for the case of a random walk with periodic boundary conditions: the result is Oi = Nm =12. This is clearly also the result for the central state i = Nm =2 for a random walk with reecting boundaries, which is what we have in an expanded ensemble simulation. For the other states of the random walk with reecting boundaries, i (t) can be summed numerically, and this reveals the same dependency on Nm, though Oi is larger. The key result, then, is that Oi Nm; when substituted into equation 1.25, this gives the same form for the scaling of the error in r as comes from our analysis of var(C ) in equation 3.37. There is thus no contradiction between these two expressions, and equating the expressions for the variance we nd g Oi = 1tii (1) P 1 i thus relating T g (1) to the more familiar Oi (though this equation is true only for the simple delta-function form for O that is given in equation 3.40). It is certainly rather surprising that Oi should increase more slowly than rw , but we may CHAPTER 3. MULTICANONICAL AND RELATED METHODS 176 rationalise this by noting that the initial decay of the correlation functions i (t) depends on the average time to diuse away from the starting state, which is essentially a local property and so is scarcely aected at all by Nm. It is only the long-time tail of the decay, where the system is returning to its starting state after wandering far away, that depends strongly on Nm . The interplay of these two eects gives rise to the observed behaviour. We should also emphasise that the result Oi Nm does depend on the condition that the run-time of the simulation should be rw Nm2 . 3.4.3 `Optimal' Sampling We now return to more general questions of MC importance sampling and ask what sampled distribution P () is optimal, given an algorithm, for the measurement of < O >can, the canonical average of O, an operator on the congurations. An estimator of < O >can can be found using equation 3.7 for any sampled distribution P (), though in general a choice that is not tuned in some way to O and the Boltzmann distribution will not give a useful estimator in a time that is not exponentially long. Clearly it is desirable that the standard error of the estimator obtained should be as small as possible, so that computer time can be used as eciently as possible; this is what we mean by `optimal.' In what follows we shall concentrate on operators on E -macrostates, in particular the `free energy operator' O = exp(E ), and we shall parameterise P as usual by a set of weights . We shall use O~ () to mean the ratio estimator of < O >can coming from sampled distribution dened by . Two concepts introduced by Hesselbo and Stinchcombe [97] are useful here for explaining the requirements of optimal sampling. They serve to make more concrete ideas that we have already discussed or alluded to. These concepts are ergodicity, which is measured by O (), and pertinence, measured by Ns ( ; I ), which is the average number of independent samples that are required to obtain the information that is sought (so here I =< O >). The total time required for the problem is thus O ()Ns (; < O >), and this should be minimised as a function of the weights . Of the two, the pertinence is in fact the easier quantity to handle, and the problem of nding the sampled distribution with the best (lowest) pertinence was solved analytically by Fosdick [96] by minimising < O 2 > < O >2 ; the result, which is unique given O and the system, is CHAPTER 3. MULTICANONICAL AND RELATED METHODS () P fs () / < O O> can 1 exp[ E ()] 177 (3.41) or, for O() = O(E ()), O(E ) < O >can 1 exp( E ) P fs (E ) / (E ) which in terms of corresponds to O(E ) 1 < O >can fs (E ) = ln This distribution seems never to have been used in practice; its implementation is complicated by the appearance of < O >can itself in the expression, and its ergodicity turns out to be very poor, for reasons we shall describe below. The ergodicity is rather harder to deal with. It depends not only on O and the system, but also on the algorithm, so general solutions like equation 3.41 cannot be given. Moreover it is usually not analytically tractable, and, while O () can be measured by simulation for a particular , this does not (at least if standard techniques are used) tell us about O (^) for any other sampled distribution. Thus, while we could envisage nding the minimum-variance P () by this method, treating O or var(O) as a function of the 's, to be minimised with respect to them, this would be extremely time-consuming because there are Nm (more or less) independent variables in and each `function evaluation' requires an entire MC simulation. Such a procedure is likely to waste more computer time than we could hope to recoup from more ecient sampling. We shall now go on to discuss issues related to optimal sampling in greater depth, using the notions of pertinence and ergodicity where appropriate. We shall show how an expression for the sampled distribution that is very similar to equation 3.41 follows from consideration of the structure of the ratio estimator O~ (), and discuss the ergodicity of P fs and other sampled distributions. Then in section 3.4.4 we shall give new theory showing how measurement of the macrostate transition matrix for one sampled distribution (the multicanonical is best for this purpose) enables us to estimate it for any other. From this we can make an approximate calculation of the error in the ratio estimator for any other sampled distribution, implicitly including the eect of correlations, without needing to calculate O (^) itself. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 178 Pertinence As we rst said in section 3.1.1, the numerator and denominator of equation 3.7 are dominated by energies around the peaks of P can(E )O(E ) and P can (E ) respectively, as shown (for O exp(E )) in gure 3.1. The states that lie between these peaks contribute hardly anything to either the numerator or denominator of the equation; the weight they are given by the multicanonical distribution serves only to enable the system to tunnel between the two. This is irrelevant to pertinence, which is related only to the information provided by independent samples, so it is clear that the pertinence can be increased by downweighting these states. Indeed, within the peaks themselves the contribution of a particular macrostate to the integral is proportional to its value of P can (E )O(E ) or P can (E ), so the maximally-pertinent P (E ) should have a shape that follows the shapes of these peaks. All that then remains is to determine the relative weights to be assigned to sampling the two peaks; and it follows from simple errorpropagation that the fractional error of the ratio estimator is minimised when the fractional errors of its numerator and denominator are equal. Thus, without any detailed calculation, we arrive at the ansatz P can (E )O(E ) + R P can (E ) can can E P R(E )O(E )dE E P (E )dE can can ) O (E ) P (E ) = R P can(EE )O(E )dE + 1 R PP can ((E E )dE E E ) can = < OO(E >can + 1 P (E ) P # (E ) / i.e. R # (E ) = ln < OO(E>) + 1 can (3.42) These equations dier from the analogous equation 3.41 only in a single sign. For O exp(E ), P fs (E ) and P # (E ) are almost identical; P #(E ) is shown (for the 162 Ising model with = 0:48) in gure 3.35. Only in ln(P (E )) would any dierence be apparent; ln(P fs (E )) ! 1 at O(E ) =< O > 160, while as can be seen from the inset, P #(E ) it is in fact nite there, though very small. Thus the high pertinence of P fs (E ) is justied intuitively. For other operators, P fs (E ) and P # (E ) are not so similar. If O is not a particularly rapidly CHAPTER 3. MULTICANONICAL AND RELATED METHODS 179 0.04 0 10 -5 0.03 10 P(E) -10 10 P(E) 0.02 -15 10 -512 -384 -256 -128 0 128 E 0.01 0.00 -512 -384 -256 -128 0 128 E Figure 3.35. Main gure: the sampled distribution P #(E ) for the 162 Ising model with = 0:48. Inset: the same but with logarithmic vertical scale. increasing function of E , then the peaks of P can(E )O(E ) and P can (E ) will overlap. P # (E ), which is a weighted sum of the two peaks, then becomes a single-peaked function of E , while P fs (E ), which always gives zero weight to the state where O(E ) =< O >, retains its doublepeaked structure. For O = E , this this means that the congurations at < E > are not sampled, an extreme contrast to Boltzmann sampling, where these congurations are among the likeliest to occur. Let us now comment on the multicanonical ensemble's pertinence. There is in fact no canonical average for which it has the best pertinence; rather, it is the best if one is interested in knowing (E ) for all E with constant fractional accuracy. Nevertheless, while its pertinence is not optimal, because of the weight given to states between the peaks (and possibly above and below them), it has, in the language of [97], good worst-case pertinence (and also reasonable ergodicity, a point we shall return to). By this we mean that, whatever operator O we choose, the multicanonical ensemble will have sampled that part of macrostate space and so at least a tolerably accurate estimator of < O > will be obtained. The observables that it estimates worst are those that depend only on a narrow region of macrostate space, for example < E >can ; for these, the eort that the multicanonical simulation puts into sampling all the other macrostates is wasted. But even in the worst case the multicanonical distribution could never need more than Nm times more independent samples than the optimal distribution (this in the case that the spectrum of the observable was so narrow that it depended only on one macrostate). This is CHAPTER 3. MULTICANONICAL AND RELATED METHODS 180 in contrast to Boltzmann sampling, and indeed to the `optimised' sampled distributions P fs (E ) and P #(E ). If we wish to nd the expectation of a dierent operator, they will in general put an exponentially small fraction of their weight in the region of macrostate space which dominates the ensemble average, and so would require O(exp(Ld )) times more samples. Multicanonical sampling can never have this problem because it puts an equal amount of weight in every region of macrostate space. According to [97], the sampled distribution with the best worst-case pertinence is the 1=k R ensemble, for which P 1=k (E ) / ( E (E 0 )dE 0 ) 1 (see also section 2.2.1). This gives rather more weight to low-energy states than does the multicanonical ensemble. However the scaling of the pertinence, (a factor ln( TOT ) Nm worse than an ideal estimator), is the same as that of the multicanonical distribution, so any improvement is presumably only in the size of the prefactor. Ergodicity We shall now discuss in qualitative terms the ergodicities of these various sampled distributions when implemented with an algorithm like single-spin-ip Metropolis that can make transitions only over a very short distances in macrostate space. For the case O = exp(E ), it is apparent that the distributions P fs and P # in fact have very poor ergodicity; the sampling probability is exponentially small between the two peaks for P # , and zero for P fs , which makes tunnelling between them extremely slow. Thus, an MC simulation of normal length is in fact likely to spend all its time in the region of one of the peaks of P fs and not to sample the other at all. O~ from equation 3.7 then has eectively innite error: nothing has been gained over Boltzmann sampling (where the similarly enormous error is due to lack of pertinence). Thus it becomes apparent that the demands of pertinence and ergodicity may well be mutually contradictory; if, to improve the pertinence of the sampled distribution, the states between the peaks are downweighted, the ergodicity will suer and the net eect may be to degrade overall performance. For operators such as O E , P fs would still suer from severe ergodic problems, while P # would become satisfactory. However it can scarcely be claimed that non-Boltzmann sampling is necessary to estimate < E >. The multicanonical ensemble, being at over all accessible macrostates, has no such selfinicted ergodicity problems, though with the Metropolis algorithm the step size is not large and the acceptance ratio becomes low near the ground state (see section 3.1.2). Thus the CHAPTER 3. MULTICANONICAL AND RELATED METHODS 181 tunnelling time between the regions that are important in the ratio estimator is rw Nm2 . The 1=k ensemble has similarly robust ergodicity, with rw scaling in the same way. It is shown in [97] that the acceptance ratio of the 1=k ensemble may be better than that of the multicanonical ensemble, so giving slightly better ergodicity. It is impossible to make this discussion more quantitative while, as we have said, analytical calculation of O () is impossible and measuring it by MC from visited states for more than a few sampled distributions would be prohibitively expensive. Thus, while we might imagine that the true optimal sampled distribution for exp(E ) would be (say) similar to the multicanonical but giving less weight to the states between the peaks (though more than that assigned by P # or P fs ), we cannot say exactly what the trade-o between ergodicity and pertinence should be. 3.4.4 Use of the Transition Matrix: Prediction of the `Optimal' Distribution We shall now show how, at least for the O = O(E ) problem, the diculties caused by our inability to calculate O () may be skirted by using macrostate transition information. Suppose we carry out a simulation with some weighting and estimate the macrostate transition matrix ij () using equation 3.25. Now, for any microstates r, s in i and j respectively, we have rs () = Rrs min(1; exp[ (Es Er ) + s r ]) where the matrix R (dened in section 1.2.2) describes which transitions are allowed. Similarly, for some other set of weights ^ rs (^) = Rrs min(1; exp[ (Es Er ) + ^s ^r ]) so, returning to equation 3.27, we nd that, for j 6= i, ij (^) becomes ij (^) = X r2i P (rji) X s2j rs (^) min(1; exp[ (Es Er ) + ^s ^r ]) rs () min(1 ; exp[ (Es Er ) + s r ]) s2j r2i min(1; exp[ (Ej Ei ) + ^j ^i ]) X P (rji) X () = min(1 rs ; exp[ (Ej Ei ) + j i ]) r2i s2j = X P (rji) X CHAPTER 3. MULTICANONICAL AND RELATED METHODS 182 min(1; exp[ (Ej Ei ) + ^j ^i ]) () min(1; exp[ (Ej Ei ) + j i ]) ij (3.43) = ii (^) follows from ii (^) = 1 X j 6=i ij (^) (3.44) Equations 3.43 and 3.44 show that we can calculate the macrostate transition matrix for any desired weighting ^ if we have measured it for single weighting. So that ij () is accurately determined for all macrostates, it is obviously a good choice to use = xc . Note that, in the derivation of equation 3.43, in order to be able to take the ratio of the Metropolis functions outside the sums over the microstates, it is necessary that Er and r should be the same for all r 2 i, and similarly for s 2 j ; thus the derivation we have just given is valid only for energy macrostates. From equation 3.28, the detailed balance condition on macrostate transitions, it follows that for any system where the TP method is valid we can write ij (^ ) = ij () exp[^j ^i ] ji (^ ) ji () exp[j i ] but it is not clear how the normalisation of the individual terms in (^) follows from this. Now we wish to move from this to an estimate of var(O~(^)), the ratio estimator derived from sampling with ^ . Rather than trying to calculate O (t) and O , we shall use an approximate method, casting the problem into the same form as an expanded ensemble calculation and bringing to bear the machinery of [164] just as in section 3.4.2. This will give a useful qualitative estimate of the error of a particular sampled distribution. We argue that the dominant source of error in O~ (^) is the tunnelling back and forth between the states that dominate the peaks of the numerator and denominator of the ratio estimator, not the sampling within each peak. That is to say that the problem is the estimation of the relative weights of the peaks of numerator and denominator, not the shape of each peak individually. So the fractional error in the estimate of the numerator is P C (E )O(E ) exp[ ^(E )] r in C (E0 ) C (E )O(E ) exp[ ^(E )] Np C (E0 ) P CHAPTER 3. MULTICANONICAL AND RELATED METHODS 183 where E0 is some state in the peak of O(E )P can (E ) (the mode say), in is some sort of correlation time for sampling within the peak, and Np is the width of the peak. We argue that in will be similar for all (sensible) sampled distributions that we may choose, and so will not aect the p comparison of dierent ^ 's. Therefore, we shall need only to calculate var(C (E0 ))=C (E0 ). The same arguments can clearly be applied to the denominator (with the mode at E1 say) and lead to s O~ (^) var(C (E0 )) + var(C (E1 )) (3.45) O~ (^) C 2 (E0 ) C 2 (E1 ) This is now of the same from as equation 3.34 for the estimation of r=r, and we can once again use equations 3.36, 3.37 and 3.38 to calculate the right-hand side from the transition matrix (^). Thus our procedure, (for the specic case of the 2d Ising model, is the following: 0. Measure the macrostate transition matrix (xc ) (pentadiagonal) by MC. 1. Calculate (^) (pentadiagonal) using equation 3.43. 2. Block into a tridiagonal form ~ (^). By blocking at this stage, we can correctly take into account the variation of the probability of the underlying E -macrostates with each blocked macrostate, and so avoid the problems discussed in section 3.2.3. 3. Calculate C and var(C ) using equations 3.36, 3.37 and 3.38, and thus estimate the error of O~ (^). 4. go to 1 and repeat until the optimal ^ is found. Though this process is vastly faster than performing a full MC simulation for each ^ , it is still not so fast as to make a full multidimensional minimisation over ^ feasible; there would in any case be little point in this because of the approximations that have been made. Instead we should choose to examine only certain likely forms of the sampled distribution. To test the predictions of the above theory, we have performed simulations using various sampled distributions for the 162 Ising model with = 0:48. Clearly it would be desirable to investigate larger system sizes and dierent temperatures, but pressure of time has prevented this. For each sampled distribution investigated, we initially predict the error of the estimator of g( = 0:48) using equation 3.45, then perform the simulation and measure it using jackknife blocking of the histograms. We also measure the random walk time rw between the peaks of the distribution, CHAPTER 3. MULTICANONICAL AND RELATED METHODS 184 and compare it with a prediction made using the mean rst passage time ij , the average time taken to reach state j for the rst time starting at state i. Like var(Ci ), ij can be calculated from ~ (^) (see [164]). As candidates for the `optimal' sampled distribution, we have chosen to examine distributions interpolating between the best-pertinence P fs distribution and one that is very similar to the multicanonical. These sampled distributions follow the shapes of P can and OP can , in the peaks, which are arranged to have equal weight, (where the canonical distribution is that at = 0:48, and O exp(E )), but also add in a constant weight between them. This weight is parameterised by w; for w 1, P (w; E ) between the peaks is linear, passing through the points whose y-coordinates are w times the maximum heights of the peaks, and whose x-coordinates are the E -values at the maxima. As w increases, we expect the pertinence to get worse, because less time is spent on the peaks, but the ergodicity to improve as rw . The distribution with w = 1 is very close to multicanonical, diering from it only because the insistence that both peaks have the same weight means they must have slightly dierent heights, producing a sampled distribution which is not at but has a slight slope. We also investigate sampled distributions that put the majority of their weight in the region between the peaks, since we anticipate this will further reduce rw . These are also parameterised by w with w > 1, but here P (w; E ) is chosen such that ln P (w; E ) between the peaks is parabolic, passing through the maxima of ln P can and ln(OP can ) and rising half way between them to w their average height. To make this clear, we show the sampled distributions that have been used in gure 3.36. To generate them requires only a few seconds. First, before comparing the predicted and actual error, we show in gure 3.37 the MC results for g~(w) gexact for the various sampled distributions. We do this to demonstrate that the MC results are consistent with the exact answer, taking errors into account. g~(w) is the measured estimator = 0:48 and gexact is its exact value for the 162 Ising model, taken from [10]: gexact = 2:0713203. A total of Nc = 2 106 lattice sweeps were performed for each sampled distribution, performed in ten blocks. The error bars come from jackknife blocking. Now we proceed to the results that are our real interest, the variance of the estimators of g. We show in gure 3.38 the size of the error bars on g( = 0:48), both predicted and measured, as a function of w. Because equation 3.45 is only a proportionality, depending on unknown parameters, we show not the absolute error, but the error for each w divided7 by the error 7 this also corrects for the fact that the operator O is directly related to Z and thus to exp(g), not g itself. CHAPTER 3. MULTICANONICAL AND RELATED METHODS 185 0.04 w=0.01 w=0.03 w=0.4 w=1 w=4 w=32 0.03 P(w,E) 0.02 0.01 0.00 -512 -384 -256 -128 0 128 E Figure 3.36. The sampled distributions used to test the predictions of the error of g( = 0:48), labelled by the value of the parameter w that determines how much weight is put in the region between the two peaks P can and OP can . (predicted or measured as appropriate) from the rst point, which is at w = 0:01. We call this quantity g=g1. We also show (gure 3.39) the predicted (from ij ) and measured values of rw between the modes of P can and OP can for the various sampled distributions. Agreement between prediction and experiment is generally very good; the estimate of is approximately right throughout the range of w, while the estimate of the error bar g is very good for w < 1, though the w = 1 (near-multicanonical) distribution has a rather smaller error bar than predicted, while the error for large w is also rather lower than expected. We attribute some of this discrepancy (the low value of g(w = 1), the fact that g(w = 4) > g(w = 32)) to error in our estimate of g itself, which we have not quantied; but for large w we may be overestimating g because more weight than is taken into account by equation 3.45 goes into the peaks of P (w). This occurs because we assumed in deriving that equation that the shape of P (w) matches that of P can and OP can in the peaks, but in fact, because the `extra' weight that is put into the region between the peaks is matched on to them at their maxima, for half the macrostates in the peaks P (w) is substantially larger than it would be if it followed P can and OP can . We correctly predicted the distribution that gave the smallest g, which was the near-multicanonical one. The most striking result of the investigation is probably the surprisingly small eect that CHAPTER 3. MULTICANONICAL AND RELATED METHODS 186 0.002 g-gexact 0.001 0.000 -0.001 -0.002 -3 10 -2 10 -1 0 10 10 1 10 2 10 w Figure 3.37. The dierence between the measured estimator g~ and its exact value gexact = 2:0713203 for the 162 Ising model at = 0:48, for six dierent sampled distributions parameterised by w. Nc = 2 106 lattice sweeps were performed for each sampled distribution. The error bars come from jackknife blocking. variation in the sampled distribution has on the estimator of g. It is clear that a wide range of near-multicanonical distributions can be used without any signicant eect on the size of the error bar. It is also apparent that for the single-spin-ip Metropolis algorithm, no signicant improvement on the multicanonical distribution seems possible. One's intuition, which perhaps leads one to favour distributions with high pertinence, can be misled because of the need for ergodicity too: the error of the highly-pertinent w = 0:01 estimator is appreciably larger than the others, because its long random walk time means that a poorer estimate of the relative weights of numerator and denominator is obtained. The w = 32 estimator is better because its ergodicity remains good while at least some appreciable weight is put in the regions that dominate the ratio estimators. It seems, then, that the error of a particular sampled distribution can be predicted with reasonable accuracy. Even though no large reduction in error is possible here, we could use the method to decide between the methods of sections 3.1.1 (connect to innite temperature states) and 3.1.2 (connect to the ground state). Given the transition matrix of a multicanonical distribution extending from the ground state right up to the = 0 states, we should be able to calculate the expected error of g( ) for any by the two methods. For the method using CHAPTER 3. MULTICANONICAL AND RELATED METHODS 187 1.2 MC data prediction 1.0 δg/δg1 0.8 0.6 0.4 0.2 0.0 -3 10 -2 10 -1 0 10 10 1 10 2 10 w Figure 3.38. Predicted (line) and measured (points) error bars on g, as a function of the parameter w, shown as a fraction of their size at w = 0:01. Values for the 162 Ising model at = 0:48, MC data gathered from 2 106 lattice sweeps per value of w, with jackknife blocking. the probability of the ground state, the 1=k distribution of [97], which was designed for a very similar problem, should be considered as well as the multicanonical distribution. The reason that scarcely any improvement on multicanonical sampling is possible using the single-spin-ip Metropolis algorithm, is that the algorithm is limited by the time taken to move between the two regions that dominate the ratio estimator. Any large improvement would require the use of a dierent algorithm (cluster ipping, `demon' etc). It seems likely that, as the correlation time decreases, the `optimal' sampled distribution will become less like the multicanonical and more like Fosdick's (because, in the limit of uncorrelated congurations, when pertinence alone need be considered, Fosdick's prescription does give the sampled distribution of lowest variance). Thus, we anticipate that, for such an algorithm, the the use of the methods described here could well lead to the prediction of a sampled distribution with a substantially lower variance than the multicanonical, even though it did not for the simple Metropolis. 3.5 Discussion We have made a thorough investigation of issues related to the production and use of the multicanonical distribution in the free energy measurement problem (and elsewhere), and we CHAPTER 3. MULTICANONICAL AND RELATED METHODS 188 6 10 MC data prediction 5 τ 10 4 10 3 10 -3 10 -2 10 -1 0 10 10 1 10 2 10 w Figure 3.39. Predicted (line) and measured (points) values for rw , the random walk time between the modes of P can and OP can , as a function of the parameter w. Values for the 162 Ising model at = 0:48, MC data gathered from 2 106 lattice sweeps per value of w, with jackknife blocking. have also investigated its relationship to the expanded ensemble method method and to other non-Boltzmann importance sampling methods. Though the 2d single-spin-ip Ising model has been used throughout, most of our results, particularly in so far as they reect new methods, are not limited to this system and would be expected to be of wider applicability: we shall use them to study a phase transition in an o-lattice system in the next chapter. In our investigation of the generation of the multicanonical distribution we have studied distributions preweighted both in the internal energy and in the magnetisation (this latter case corresponding to the problem of measuring the p.d.f. of the order parameter over a range embracing both the phases; the former corresponding more to the application of the method to nd the absolute free energy of a single phase). We looked at several ways of producing a suitable sampled distribution as rapidly as possible. First we examined perhaps the most obvious method, based on the visited-states of the MC algorithm. While the absolute performance of this method was inferior to that of the other methods, because of its slowness in spreading out from the region sampled by the `initial' (Boltzmann sampling) algorithm, it did serve to show the usefulness of Bayesian methods in this problem: Bayes' theorem is the natural way of distinguishing uctuations in the observed data that are really due to the underlying CHAPTER 3. MULTICANONICAL AND RELATED METHODS 189 structure of the sampled distribution from uctuations that are simply the product of the stochastic nature of MC sampling. However, as we saw, the full prior-posterior formulation of the problem leads to the appearance of integrals of some complexity. We used a simple Normal model to treat these approximately, though this extra eort did not help the rate of convergence to the multicanonical distribution. In general, in fact, we should always bear in mind that it is better, as well as easier, in regions where data is sparse, to use further MC sampling from an updated distribution to conrm and rene the results of a simple, approximate estimation of the underlying sampled distribution, rather than to devote the computer time to a lengthy Bayesian analysis of the sparse data. Secondly, we introduced a new method, the Transition Probability method, to try to overcome the slowness of convergence of the VS method by sampling all the macrostates immediately. We found this method to be of use both for energy and magnetisation preweighting, though it performed better for magnetisation: convergence was faster (in fact almost immediate) and there seemed to be little or no residual bias, in contrast to the energy case. Indeed, it was possible to use the TP method not just to generate the multicanonical distribution, but to obtain accurate nal estimators of the canonical p.d.f. This promises to alleviate to some extent problems caused by the fact that the multicanonical algorithm cannot be fully parallelised, by permitting the length of a multicanonical run to be shorter than it would have to be if VS estimators were used, while still giving an unbiased estimator. This makes it easier to use massive parallelism. To anticipate future results for a moment, we shall use the TP estimator and massive parallelism in this way in the next chapter, and we shall also do further work which will shed more light on the conditions required for the approximation equation 3.26 to be satised. In section 3.3 we have conrmed that the Gibbs free energy g and the canonical averages for the specic internal energy e and specic heat capacity cH are correctly produced for all values of the inverse temperature from a single simulation. We have compared the multicanonical ensemble with thermodynamic integration for the problem of measuring absolute free energy, showing that the size of random errors for the two is about the same, but that the multicanonical method copes far better with a singularity (the continuous phase transition at H = 0, = c = 0:440686:::) on the path of integration. Though it does not relate directly to the problem of free energy estimation, we have also used the ability of the Multicanonical method to make accessible states of very low canonical probability to produce new results on the scaling form CHAPTER 3. MULTICANONICAL AND RELATED METHODS 190 of P (c ; M ). In the last section we have addressed rather more general questions about importance sampling. First, we have put the multicanonical and expanded ensembles into the same framework, and have shown how the theory we developed for use with the TP estimators of the multicanonical and ensemble is also useful in the context of the expanded ensemble, where it enables us to show that simulation within subgroups of the subensembles of which the expanded ensemble consists does not confer any advantage. Second, because we may sometimes be interested only in a single canonical average at a single temperature, we have addressed the question of `optimal sampling.' We have shown that from TP measurements on a multicanonical distribution we may calculate approximately the expected variance of the estimator from any sampled distribution. We have in fact been able to make only preliminary investigations, looking only at O = exp(E ) for the 162 Ising model and investigating various candidate distributions explicitly, to conrm that our predictions of their variance are approximately correct. For this observable and the single-spin-ip Metropolis algorithm, the multicanonical distribution turns out to be very near the best that can be used. There are of course observables for which it is far from optimal, internal energy at any particular temperature being one. But there are none for which it is bad in the way that Boltzmann sampling; it can require at most O(Ld ) more sampling time than an `optimal' distribution, whereas Boltzmann sampling can require a time that is longer by a factor that is exponential in the system size. Chapter 4 A Study of an Isostructural Phase Transition 4.1 Introduction Two partially intertwined threads run through this chapter. The rst is the continuing study of the techniques relating to the generation and use of the multicanonical/expanded ensemble and comparison of its eciency with that of thermodynamic integration (TI). We shall investigate these matters in section 4.2 by applying TI and the expanded ensemble to the square-well solid, just as in section 3.3.2 we applied TI and the multicanonical ensemble to the Ising Model. We shall also, in section 4.3, continue the exploration of the use of the `transition probability' estimators introduced in the last chapter (section 3.2.3). The second thread is the examination of the square-well solid as a system of physical interest in its own right. We shall in particular conrm and elaborate upon recent results that suggest that this system displays an isostructural solid-solid phase transition [69, 171, 172, 173, 174]. Thus, at some times the focus will be on the way that the results obtained by various simulation techniques compare with one another, whereas at others it will simply be on the results themselves and their physical meaning. 191 CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 192 The Square-Well Solid and Related Systems We shall now introduce the system that we shall investigate in this chapter and describe the aspects of its physical behaviour that we shall study. In order to motivate the choice of the the square-well solid, and to place our work in a wider context, we shall also briey describe related theoretical and experimental work, in particular on colloidal systems. Consider a system of N particles moving in a continuous three-dimensional space and interacting with one another via a simple spherically-symmetric pair-potential1 E (rij ), where rij is the separation of the centres of two particles i and j . The potential we shall be using consists of a hard-core repulsion, dening the diameter of the particles, and a short-range attractive force: 8 > > > > < E (rij ) = > > > > : 1 if rij E0 if < rij < (1 + ) 0 if rij (1 + ) (4.1) as shown schematically in gure 4.1. To Infinity E r E σ 0 λσ Figure 4.1. Schematic diagram of the pair potential of the square-well solid; the width of the well is exaggerated compared with its actual width in the simulations of this chapter. The width of the attractive well is a fraction of the hard core repulsion distance ; though 1 V (r ) is more commonly used to represent this, but we reserve V for volume and keep E for internal energy, ij as in previous chapters. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 193 may be varied at will we shall in fact always use the value = 0:01. We choose to measure inverse temperature in units of E0 1 (E0 is the depth of the potential well), so that E (rij ) = for rij in the well. We shall deal exclusively in this chapter with densities where the system is solid, and we shall examine only face-centred cubic (fcc) crystals containing N particles, where N = 4m3 for integer m (in practice, m = 2; 3; 4). The total energy of the conguration is the sum over all pairs of particles of E (rij ) : E (rN ) = X i;j<i E (rij ) We shall quantify the density mainly by the volume per particle v = V=N = 1=, which we shall call the specic volume. We measure lengths in units of , so that for close-packed spheres p v = 1= 2. We shall be concerned with specic volumes in the range 0:72 < v < 0:82. For comparison, the fcc crystal of hard spheres is stable up to v = 0:96, where it melts into a uid of density v = 1:06 [73]. The form of the potential dened in equation 4.1 does not bear more than a very rough resemblance in shape to the pair potentials usually employed in modelling the interactions between atoms, such as the familiar Lennard-Jones potential ELJ (rij ) = 4E0 [(rij =r0 )12 (rij =r0 )6 ], which is softer, everywhere dierentiable and has a much wider attractive well relative to the repulsive part, with an long-range attractive tail. However, while extremely idealised, it does bear a closer resemblance to the eective potential that may be induced between the particles in colloidal systems. Colloids, which occur frequently in nature, especially in biological systems, consist of particles of one material suspended in a medium of another. The diameter of the particles is between 2 nm and 1 m, and both colloid particles and medium may be either solid, liquid or gas, though the overwhelming majority of scientic studies have been on colloids consisting of solid particles in a liquid medium. A detailed review of the properties of colloidal suspensions can be found in [175]; here we are mainly concerned with the equilibrium phase behaviour of monodisperse colloids. By suitable stabilisation we can obtain colloids where the particles behave like hard spheres, then by adding a polymer to the solution we can induce an attractive force between the colloid particles. This force is strictly a many-body entropic eect: when the colloid particles are close together, the number of accessible polymer congurations is greater than when they CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 194 are separated. However, the many-body part of this eective interaction comes from excluded polymer congurations that would intersect three or more colloid particles, so if Rg , the radius of gyration of the polymer coils, is small (as will always be the case here), then the many-body part is also small and the force can be well-described quantitatively by an eective pair-potential called a `depletion potential.' The depletion potential can be thought of as an osmotic eect; if two colloid particles approach closer than Rg , the polymer between them is squeezed out and so they experience a net osmotic pressure from the rest of the polymer that serves to drive them together. The range of this attractive force is thus controlled by Rg , and its overall depth by the concentration of polymer. Its strength as a function of rij can be most easily handled by treating the polymer coils as if they behaved like hard spheres of radius Rg in their interaction with the colloid particles, so that the depth of the potential is simply proportional to the volume of the `depletion region,' the region from which the polymer is excluded. The resulting eective pair-potential is called the Asakura-Oosawa potential [176]. Even though the shape of the depletion potential is still appreciably dierent from a square well, there is now a much greater degree of similarity; there is a hard core and an attractive well of nite width whose width and depth may be varied freely; in particular the well width may be chosen to be much less than the hard core radius. We mention in passing that in colloid science the usual choice of parameter to quantify the density is the volume fraction , the fraction of the volume of the system occupied by the hard cores: = =(6v) in our units. We were motivated to look at the solid phase with 1 by recent theoretical [172, 173, 174] and computational [69, 171] results that suggest that systems with very short-ranged attractive potentials may exhibit an isostructural solid-solid phase transition between a `dense' and an `expanded' phase, with a phase diagram like that in gure 4.2. The qualitative structures2 of the dense and expanded phases are shown in gure 4.3. The inner circles represent the hard cores of the potential, of diameter ; the outer circles represent the attractive wells of width . Examination of gure 4.1 then indicates that the interaction of a pair of particles is E0 if the outer circle of one particle touches or cuts the inner circle of the other. Thus, in the dense phase (top) each particle has a low energy, but the crystal is tightly packed, so the entropy is also low; while in the expanded phase (below), each particle is in the potential well of only two or three of its neighbours on average (but they are still close enough to form a cage to hold it 2 In drawing this gure, we are to some extent anticipating results we will not nd until section 4.3; see in particular section 4.3.8. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 195 on its lattice site). The energy is therefore much nearer to zero, though the free volume in the crystal is also much larger, producing a compensating increase in entropy. β TrL F S 1 CP S 2 S v Figure 4.2. A schematic illustration of the phase diagram of a system with a very shortranged attractive potential, according to [172]. Between the triple line (TrL) and the critical point (CP), two solid phases, labelled S1 and S2 may coexist. The horizontal lines are tie-lines; if the system is prepared with a density in the tie-lined regions, it exists as a mixture of two phases with densities given by the points at the ends of the lines. F labels a uid phase. The dotted box contains the region that is studied in this chapter. We have not developed a method to measure the free energy of a uid or to connect it reversibly to a solid. Therefore we shall investigate only the part of the phase diagram shown inside the dotted region in gure 4.2. We have no way of knowing where the triple Line is, so it should be borne in mind that part (or possibly even all) of any coexistence curve we obtain may be metastable: it may be energetically favourable for the expanded solid to decompose into the dense solid and the uid. As well as the presence of two solid phases, another unfamiliar feature of gure 4.2 is the presence of only one uid phase, where normally we would expect two, a liquid and a gas, becoming indistinguishable at a liquid-gas critical point. It is interesting to digress for a moment to put this into the context of the more general theory developed in [172, 173, 174]. According to this theory, as the range of the attractive part of the potential is altered, we produce in CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 196 Figure 4.3. Schematic (2d) diagram of the dense (top) and expanded solid phases. Each pair of circles represents a particle: the inner circle represents the hard core of the potential, of diameter ; the outer circle represents the attractive well of width . sequence the three types of phase diagram shown in gure 4.4. This gure is based on results obtained analytically3 for the square-well system in [173]. The gure on the left represents the familiar solid-liquid-gas phase diagram that is ubiquitous in nature; for example it describes the phase behaviour of all simple atomic and molecular systems. However, here we see that the condition for its existence is that > 0:25 [173]. If is smaller than this, as in the central gure, then the liquid-gas critical point disappears and the phase diagram contains only one solid and one uid phase. Finally, if is extremely small, < 0:06 [171], as in the gure on the right, we obtain the phase diagram with one uid and two solid phases. There is thus a pleasing symmetry between long range and very short range potentials. We shall describe the physical reasons for the existence of the solid-solid phase transition for small in section 4.3.8. However, we must point out that, aside from analytic calculations, the solid-solid phase 3 The calculations were based on the use of a variational principle to obtain an estimate of|or, strictly, an upper bound on|the free energy dierence between the square-well system and an appropriate reference system. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION β β 197 β G L S S F F S 1 F S2 S v v v Figure 4.4. Schematic phase diagrams of the square-well and related systems for > 0:25 (left), 0:06 < < 0:25 (centre) and < 0:06 (right), based on those calculated in [173]. The limiting value < 0:25 for the disappearance of the liquid phase is taken from [173], while < 0:06 for the appearance of the expanded solid is taken from [171]; in [173] it is suggested that the expanded solid appears only for < 0:015. S=solid, F=uid, L=liquid G=gas. coexistence has been observed by only one group of workers, in computer simulations of the square-well solid [69, 171]; it has not yet been seen experimentally. Experiments looking for it have been carried out on colloids, with inconclusive results caused by practical diculties in working with such dense systems and with polymers with a very small Rg . It is interesting to note that isostructural solid-solid phase transitions, each with a coexistence line ending in a critical point, have been observed experimentally in the heavy metals Cerium and Caesium [177]. However, these phase transitions are not produced by very short-ranged interatomic potentials; the (eective) potentials between atoms of these metals are of the `usual' long-ranged type that would be expected to produce solid-liquid-gas phase diagrams. They are thought to be caused by quantum eects (possibly involving localisation/delocalisation of f electrons, though there is no agreement on the detailed mechanism). Even the existence in real systems of phase diagrams of the second type (solid + one uid phase) has been demonstrated only quite recently, in colloidal systems [178]; it is found experimentally that the liquid disappears when Rg = < 0:3, a value conrmed by analytic calculations on the Asakura-Oosawa model [179]. Computer simulations have been performed on C60 with a model potential produced by smearing and summing Lennard-Jones-type interactions [70], and on hard spheres with an additional Yukawa attraction[180]. The results imply that the phase diagram of C60 is of this type, almost uniquely for a fairly simple molecular material, and that the Yukawa system also has a two-phase phase diagram if the exponential decay is CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 198 strong enough. The phase diagram of C60 has not yet been fully established experimentally. The fact that potentials of dierent detailed shapes all produce phase diagrams of this kind lends credence to the idea that the detailed shape of the potential is not important in determining what kind of phase diagram occurs, only gross features like the relative ranges of the attractive and repulsive parts. This fact, combined with a desire to be able to check our results explicitly with [69, 171, 172, 173, 174], inuenced us in our decision to use the square-well potential instead of a more physically realistic model like the Asakura-Oosawa potential. We also originally intended to study a range of dierent values of , and all parts of the phase diagram, when it would have been advantageous to have a potential whose range is unambiguously dened. However, pressure of time prevented more than a part of this ambitious project from being completed. Details of the Simulation We have used three dierent approaches to the measurement of free energy in this chapter, each of which requires its own computer program, dierent in detail from the others. However, there is a large `core' part of the program that they all have in common, which deals with the basic problem of simulating the motion of the particles of the N -particle square-well solid. This `core' program, and its implementation on the Connection Machine CM-200, is described in appendix E; it is shown there that the most ecient way to map the problem onto the machine is to use a mixture of geometrical decomposition and primitive parallelism, running Nr = O(1000) independent `copies,' or replicas, of each simulation in parallel. Each replica is quite small, containing 32{256 particles. The individual details of the modications required to implement each method of free-energy measurement will be described in the section devoted to that method. 4.2 Comparison of Thermodynamic Integration and the Expanded Ensemble|Use of an Einstein Solid Reference System In this section we shall mainly be concerned with the relative eciency of these two methods; we shall obtain comparatively little information about the physics of the square-well system, CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 199 locating a single pair of coexistence points only. The method of free-energy measurement that is used is the smooth transformation of the square-well solid, by way of a series of interpolating systems, into a system for which the free energy can be calculated exactly: in this case the harmonic Einstein solid at the same temperature and density. This technique was rst used in [71], and is discussed in section 2.1.1. We shall measure the free energy both by using TI (as in [71]) and by using the states along the thermodynamic integration path to make an expanded ensemble (a case such as this, where several Hamiltonians are dened on the same phase space, is more naturally described as an `expanded ensemble' than a `multicanonical ensemble'). We note that this technique of transforming the energy function was used (combined with TI) in the determination of the phase diagrams of the 3d square-well system in [69, 171]; however, the reference system used there was the corresponding hard-sphere system, whose free energy was taken as being already known absolutely from previous simulations and/or theoretical equations of state for each phase. The equation of state used for the solid phase was due to Hall [181]. It is therefore necessary to modify the core simulation routine described in appendix E. The potential function used is no longer simply ESW = N XX i=1 j<i E (rij ) where E (rij ) is as dened in equation 4.1; in making particle moves we now use the energy E () = ESW + (1 )EES where EES is the potential energy of the harmonic Einstein solid: EES = ks N (ri r0i )2 X i=1 where the set r0i are the lattice sites of the Einstein solid (which are of course arranged here to correspond to the fcc lattice sites) and ks is a spring constant. Each particle thus feels an additional harmonic attraction to its lattice site. By varying in the range 0 1 we thus interpolate between the pure Einstein solid ( = 0) and the pure square-well solid ( = 1). In the expanded ensemble implementation we allow transitions between dierent values of ; in thermodynamic integration, which we shall now describe, we simply perform a series of CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 200 independent simulations at dierent values of . 4.2.1 Thermodynamic Integration The principle underlying this method is to use the equation @ ln Z dN r exp( E ()) 1 @ R N = d rR(@EN()=@) exp( E ()) d r exp( E ()) = < ESW > < EES > @F () = @ where < > denotes a canonical average measured in the ensemble dened by E (). Thus it follows that Z 1 FSW = FES + [< ESW > < EES > ]d (4.2) 0 and at intermediate points along the path we dene F () = FES + Z 0 [< ESW >0 < EES >0 ]d0 (4.3) FES can be calculated exactly: FES = 1 (3N=2) ln(ks =). We shall estimate FSW by measuring < ESW > < EES > for a series of values of at a xed temperature and volume, tting a function to the data points, and evaluating the integral numerically. We choose the Einstein spring constant ks to be suciently large that the particles are prevented from moving far o their lattice sites even in the = 0 ensemble, so that there is not an excessive variation in the `typical congurations' of the system as varies. By doing this, we hope to keep the total free energy dierence between = 0 and = 1 fairly small. We took the criterion that the particles should stay close to the lattice sites as implying the following: let P (jr r0 j)dr be the probability that a particle, whose lattice site is at r0 , is found in a shell of radius jr r0 j and thickness dr about the lattice site. Then, with Einstein solid interactions only, it follows that P (jr r0 j)dr = (16=)1=2 (ks )3=2 (r r0 )2 exp[ ks (r r0 )2 ]dr So we demand that [P (jr r0 j < R )]N > 0:5 (4.4) CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 201 where R is the radius of the sphere within which the centre of a particular particle can move without approaching within of neighbouring lattice sites, at the prevailing density. This criterion is thus designed to ensure that in the simulation as a whole all the particles are further than from their neighbours (and so contain no hard-core overlaps) at least half the time. However, diculties arise in the = 0 and = 1 states that prevent us from implementing this strategy exactly as described above. First, in the = 0 (pure Einstein solid) ensemble, however large we make ks , there is a nite probability of generating a conguration where at least one pair of particles are closer together than . Thus < ESW > is innite exactly at = 0, though it is nite and well-behaved everywhere else. This is a consequence of the hard core in the potential, which remains in E () for all nite without diminishing in size at all, only disappearing precisely at = 0. In [71] this problem is handled by expanding F () analytically around = 0, but (as we shall explain in section 4.2.2) it can also be treated easily with the expanded ensemble4 . Since we have an expanded ensemble simulation available, we use it rather than TI to connect = 0 with the next state (we have used = 0:01 and = 0:05). The second diculty, arising as ! 1, must be dealt with rather more carefully. The Centre of Mass Problem This problem is a consequence of the fact that ESW is invariant under translations of the center of mass of the whole system, whereas EES is not. As a result, as ! 1, the centre of mass has an increasingly large volume accessible to it and < EES > increases. At = 1, the probability density of the position of the centre of mass becomes uniform over the simulation box (which has side length L), corresponding to a mean square displacement of the order of L2 and, for the large values of ks that we are using, an extremely large value of < EES > . This would make the evaluation of the integral in equation 4.2 very dicult|not much easier, in fact, than if the integrand had an integrable singularity. Many simulation points would be needed for close to one, where < EES > is large and rapidly-varying, and these simulations would have to be extremely long, since we would have to sample for long enough to allow the centre of mass to wander through the whole simulation volume. If the particle moves are made serially, it is fairly easy to solve this problem by enforcing the constraint that the centre of mass of the particles should remain at all times at the position of 4 The possibility of doing this was also mentioned in [71]. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 202 the centre of mass of the lattice sites, preventing the Einstein energy from becoming excessively large. This is done by accompanying every trial displacement of a particular particle, r say, by a displacement of all the particles by r=N . The square-well energy is invariant under this translation; the Einstein energy is not, but does not have to be recalculated by looping over all the particles: it can easily be shown that this rigid translation of all the particles simply produces a reduction of ks r2 =N in the total Einstein energy. The Metropolis test can therefore still be carried out on the basis of local interactions only. Small (analytically-known) corrections have to be made to the expressions for the free energies of both the Einstein solid and the square-well solid, reecting the extra constraints on the system. Unfortunately this technique (which is the one used in [71]) cannot be applied to simulations where particle moves are made in parallel; it is not clear how much the centre of mass will move by until all the trial moves are accepted or rejected, but the acceptance probability depends on the energy, which itself depends on the motion of the centre of mass. In fact, all the simulations performed in this subsection were carried out using primitive parallelism only, i.e. with serial updating of particle coordinates within each simulation. Thus the centre of mass could have been kept xed, and with the benet of hindsight this would have made both the simulations and the analysis easier to perform. Nevertheless, we used in practice the following method, which is applicable to a system with parallel updating: For simulations carried out at = 1 only, we keep the centre of mass xed, so that < EES > remains quite small. It is permissible to do this here even with parallel updating because E ( = 1) ESW so the Einstein energy does not appear in the Metropolis test controlling the acceptance of particle displacements5 . For other values of , we allow centre of mass motion and accept the growth in < EES > . However, rather than attempting to use equation 4.2 directly, we choose a function q(), described below, which shows the same near-divergence at = 1 as does < EES > , but which is analytically tractable. Then we split up the integral in 4.2 into two pieces: 1 Z 0 [< ESW > < EES > ]d 1 Z 0 [< ESW > < EES > q()] d + Z 0 1 q()d 5 We do this by calculating ~ rCM (by summing ~r over all accepted transitions at the end of every lattice sweep), then adding ~rCM =N to all the ~ri0 ). We move the centre of mass of the Einstein lattice to follow that of the simulation, rather than the other way round, because otherwise it is found that rounding errors in adding ~r=N to the particle positions can produce spurious hard-core overlaps CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 203 The second integral on the RHS can be done analytically or numerically, while the rst is now much better behaved because the two near-divergences cancel. In particular, we may extrapolate the integrand with condence from the largest value of that was tractable to = 1. Our procedure is therefore to t a function (we used an interpolating cubic spline) to < ESW > < EES > q(), extrapolate to = 1 and integrate numerically. The function q() that we use is just (N ) > N; q() < EES EES ( ) where by the notation on the right hand side we mean the average energy of a single Einstein particle with spring constant Nks evaluated in an ensemble with eective coupling (1 )Nks . Thus, q() describes to a good approximation the energy due to the motion of the centre of mass of the Einstein lattice, which moves almost exactly like a single particle with spring constant Nks (the hard cores in the square-well potential keep the lattice rigid and so keep all the springs very nearly parallel). Written out more fully we have q() = = R 3 L=L=22 Nks x2 exp[ Nks (1 )x2 ]dx R L=2 2 L=2 exp[ Nks (1 )x ]dx ! p 2 =4] c ( ) L exp[ c ( ) L 3 p erf(pc()L=2) 2 (1 ) 1 (4.5) (4.6) where in the last line we have written c() Nks (1 ). There is no analytic closed form R for q()d, but it can easily be integrated numerically to whatever precision is required. (N ) > N; , though the MC data shows that there is approximate In general < EES >6=< EES EES equality. However, to be condent that we can extrapolate the integrand to = 1, we require that the approximate equality should continue to hold even for very close to one, where we do not have MC data to conrm it. In fact, we expect the approximation to improve as ! 1, and to become exact in the limit = 1, for the following reasons: First, the probability density of the centre of mass becomes uniform over [ L=2 L=2]3 , (N ) . and so the hard cores have no eect on the expectation value of EES Second, as can be shown by explicit calculation, the expectation of the energy of the single particle with spring constant Nks is equal in this limit to the energy of N independent particles each with spring constant ks . ( ) CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 204 As a consequence, q(1) captures all of the Einstein solid contribution: (N ) > N; = Nk L2 =4 < EES >=1 =< EES >EES =< EES s EES ( =1) ( =1) Having shown how the diculties at = 1 and = 0 may be overcome, we shall now go on to present the results. Results We examine the points = 10=13, v = 0:752 and = 10=13, v = 0:72202, which are chosen on the basis of results given in [69] as being on or very close to the solid-solid coexistence curve. We therefore entirely avoid (for now) the dicult problem of locating the coexistence curve in the rst place. We remind the reader that our units are such that E and F are in units of E0 , is in units of E0 1 and v is in units of 3 . Thus, the pressure p is in units of E0 3 and the spring constant ks is in units of E0 2 . We chose to examine the small system N = 32 (i.e. 23 unit cells) and to use only primitive parallelism with Nr =4096, so the simulation parameters (dened in appendix E) are NPR3=16, NEDGE=1 and LPV=2. The values of at which simulations were conducted were chosen simply `by eye': rst a few simulations were performed at =0.25, 0.50, 0.75 and 0.95, and then other simulation points were chosen on the basis of an assessment of where < ESW > < EES > was changing most rapidly. The largest value of used with the centre of mass free was = 0:995; we also performed a simulation at = 1 but with the centre of mass constrained. For each simulation point, estimates of < ESW > < EES > were output every 500{1000 lattice sweeps and examination of the behaviour of these block averages were used to determine when convergence had occurred: the `best estimate' of < ESW > < EES > was then obtained by averaging over subsequent blocks. When changing , the nal conguration from a run at a nearby value of was used as the starting conguration to reduce the equilibration time, though this still became lengthy as ! 1. Typically 104 lattice sweeps were performed for each , of which about 8000 were used for estimation; for = 0:99 (for v = 0:72202) and = 0:995, 2 104 sweeps were generated and (1{1.5)104 used. The simulations achieved a speed of about 7000 sweeps/hour. Altogether, for v = 0:752, 1:42 105 lattice sweeps, representing about 20 hours' processing CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 205 time, were performed at 13 values of and 1:07 105 of them were used in evaluating < ESW > < EES > . For v = 0:72202, 1:55 105 lattice sweeps were performed at 15 values of and 1:35 105 of them were used. For both densities, these gures include expanded ensemble simulations of length 5000 sweeps that were used to connect to the state = 0. We consider in retrospect that more accurate results would probably have been obtained if more values of had been considered and less time spent on each one, even though this would have increased the fraction of time spent on equilibration. Let us rst consider v = 0:752. The MC results are shown in gure 4.5. The data points show the raw MC results for (1=N )[< ESW > < EES > ] with their error bars; the solid line is the spline t to (1=N )[< ESW > < EES > q()]. We should note that q(1) = 114006 on the same scale as the gure; so (1=N )[< ESW > < EES > ] would reach about -3500 at = 1. This makes it strikingly clear that to perform the thermodynamic integration accurately without the analytic correction would be a near impossibility. 0.0 (1/N)[<ESW>-<EES>] (1/N)[<ESW>-<EES>-q] -20.0 -20.0 -30.0 -40.0 -25.0 (1/N)[<ESW>-<EES>-q] (1/N)[<ESW>-<EES>-q] -10.0 -30.0 -35.0 -40.0 -45.0 0.95 0.96 0.97 0.98 0.99 1.00 α -50.0 0.0 0.2 0.4 0.6 0.8 1.0 α Figure 4.5. v = 0:752. Diamonds: (1=N )[< ESW > < EES >] measured by MC; solid line: spline t to (1=N )[< ESW > < EES > q()]. The main gure shows the whole range of while the inset shows the region around = 1. We emphasise that the point exactly at = 1 is generated with the centre of mass constrained to be stationary, while it is free for all the other values of ; see the discussion of this problem in section 4.2.1. We nd (1=N ) Z 1 0:05 [< ESW > < EES > q()]d = 6:880(3) CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION (1=N ) so (1=N ) Z 1 0:05 Z 1 0:05 206 q()d = 0:916356 : : : [< ESW > < EES > ]d = 7:796(3) And from the single expanded ensemble simulation used to link = 0 and = 0:05, (F=0:05 FES )=N = 0:1049. The Einstein solid's spring constant, generated according to the criterion expressed in equation 4.4, was ks = 54717, which leads to fES = 18:5305. Thus fSW = fES + (1=N ) Z 1 0 [< ESW > < EES > ]d = fES + (F=0:05 FES )=N + (1=N ) = 18:5305 0:1049 7:796(3) Z 1 0:05 [< ESW > < EES > ]d = 10:629(3) The corresponding results for v = 0:72202 are shown in gure 4.6. 0.0 -10.0 -10.0 -15.0 -20.0 (1/N)[<ESW>-<EES>-q] (1/N)[<ESW>-<EES>-q] (1/N)[<ESW>-<EES>] (1/N)[<ESW>-<EES>-q] -20.0 -25.0 -30.0 -30.0 -35.0 0.95 0.96 0.97 0.98 0.99 1.00 α -40.0 0.0 0.2 0.4 0.6 0.8 1.0 α Figure 4.6. v = 0:72202. Diamonds: (1=N )[< ESW > < EES >] measured by MC; solid line: spline t to (1=N )[< ESW > < EES > q()]. The main gure shows the whole range of while the inset shows the region around = 1. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION We nd (1=N ) Z 1 0:01 [< ESW > < EES > q()]d = 9:562(4) (1=N ) so (1=N ) Z 207 1 0:01 Z 1 0:01 q()d = 1:0469854 : : : [< ESW > < EES >]d = 10:609(4) And (F=0:01 FES )=N = 0:07724 from the expanded ensemble simulation. This time we have ks = 460170 (it must be stronger in this denser solid to maintain the criterion 4.4, i.e. to prevent hard core overlaps), leading to fES = 22:6829. Thus fSW = fES + (1=N ) Z 0 1 [< ESW > < EES > ]d = fES + (F=0:01 FES )=N + (1=N ) = 22:6829 0:0772 10:609(3) Z 1 0:01 [< ESW > < EES > ]d = 11:997(3) The errors in both cases were estimated from the size of the error bars on the MC data points. The error is dominated by points near = 1, so only the eect of these points on the estimate of f was considered. We have thus obtained estimates of the Helmholtz free energy f for the two densities under consideration; however this is not sucient to establish that these are indeed the densities of the coexisting phases at = 10=13, since the condition for coexistence is of course that the Gibbs free energies of the two phases should be equal, and g diers from f by a pV term6 that is still unknown, because the pressure p is unknown. For many systems, p can be estimated from a constant-V simulation as a canonical average < p(V ) > [45], but the expression contains the average interparticle force, which for the squarewell system is a pair of delta-functions and so inaccessible. To see what the magnitude of the pV term is, therefore, it would be necessary to do more simulations for dierent specic volumes around those already investigated, to establish the shape of f (v) in these two regions. Then the coexistence pressure and the densities of the coexisting phases could be established with the 6 and 1/N corrections CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 208 double-tangent construction [47]. We do not attempt this here, but to demonstrate that good accuracy has been achieved in the intergration estimates fTI of the absolute Helmholtz free energy, we show in gure 4.7 fTI at v = 0:752 and v = 0:72202 together with f xc(v) calculated from the results of section 4.3. This method does not yield absolute free energies, so the vertical scale has been xed by constraining the two estimates to be equal at v = 0:752. The desired consistency check is thus obtained by the good agreement at v = 0:72202. xc f f TI 12.0 f(v) 11.0 12.05 10.0 f(v) 12.00 11.95 0.7200 9.0 0.71 0.7210 0.7220 0.7230 0.7240 v 0.73 0.75 0.77 v Figure 4.7. The absolute Helmholtz free energy f as a function of specic volume v; the dashed curve is obtained by the multicanonical NpT -ensemble of section 4.3; the absolute additive constant being established using the absolute free energy, calculated in this section, for v = 0:752. These absolute free energies fTI are marked by circles. The solid line is the double-tangent to the dashed multicanonical curve. We also show the double-tangent to the multicanonical f (v); the points of tangency estimate the specic volumes and its gradient estimates the coexistence pressure. The results are pcoex = 44:69, < v >dense = 0:72019 and < v >expanded = 0:759032. These are appreciably dierent from v = 0:72202 and v = 0:752 which were chosen, it will be recalled, because they were the estimates for < v >coex given in [69]. However, we shall nd in section 4.3 that these discrepancies in pcoex and < v > can be largely attributed to nite-size eects in this small N = 32 system; in [69] the system size was N = 108. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 209 4.2.2 Expanded Ensemble with Einstein Solid Reference System It is also possible to use an expanded ensemble approach to measure the free energy dierence between the square-well and Einstein solids. The intermediate systems are dened by a potential energy function that is once again a linear combination of the square-well and Einstein potential energies, just as it was in the TI approach, so that it is possible to make a direct comparison between the two methods. We shall also briey investigate two other important issues in the use of the expanded ensemble: we shall look at the eect of the number of intermediate states used, and we shall explicitly conrm the result of section 3.4.2 that the subdivision of the intermediate states into groups to be simulated separately does not aect the overall accuracy. The simulation itself must be adapted to produce the expanded ensemble. First dene Z (i ) = X f g exp[ E (i )] for some suitable set fi g; i = 1 : : : Nm . Now, as we know from section 2.2.3, by allowing transitions between the `subensembles' (according to rules dened below), we may construct an expanded ensemble with partition function Z= N m X i=1 Z (i ) exp(i ) which, given that 1 = 0 (so that F (1 ) FES ), leads to F (i ) FES = 1 [(i 1 ) ln(Pi =P1 )] (4.7) In particular, using Nm = 1, F (Nm ) FSW , FSW FES = 1 [(Nm 1 ) ln(PNm =P1 )] (4.8) We use the MC simulation to measure the last term, ln(PNm =P1 ), arranging for the set to be such that Pi Pj 8 i; j . The methods of chapter 3 could have been adapted7 to nd a suitable F , but since it is easy to obtain estimates of F () from the TI results of section 7 In the next section we do adapt them to nd xc (V ) for the multicanonical NpT -ensemble CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 210 4.2.1 we chose to use these right from the start rather than to make an independent estimate of starting from = 0. We use the Metropolis algorithm both to make particle moves within each ensemble and to make transitions between the dierent ensembles. To calculate the Metropolis function for ensemble-changing moves is computationally extremely cheap, since we do not move the particles while changing ensemble and so need only multiply ESW and EES by the appropriate values of and (1 ) before and after. However, there is some cost in communication on the Connection Machine, since clearly the prevailing value of must change in all parts of a simulation at once, so all the subvolumes of each simulation must be considered together. There is thus some reshaping and summing of arrays to be done before the move, and broadcasting of the results afterwards, which requires the use of general communication routines. This restriction corresponds to the `rule' discussed in section 3.2.5 that any attempted updates in a single simulation of any coordinates or parameters must be made serially if they change . However, we do of course make the trial ensemble-changing moves in parallel for all the Nr independent simulations, the prevailing value of in each simulation being independent of its value in the others. Before presenting the results, we shall comment again on the end states = 0 and = 1. First, we should note that with the expanded ensemble we have no problem reaching the pure Einstein solid at = 0. Transitions into = 0 are not in any way special, and, while it is the case that some attempted transitions out of = 0 will nd that the trial energy is innite, because they come from an Einstein solid conguration where there would be an overlap of the hard cores of the square-well potential, this simply results in a transition probability of exp( 1) = 0. Thus, this situation is handled transparently by the algorithm, and we need only ensure that the spring constant ks is large enough that a reasonable number of congurations with no overlaps are generated. This is in contrast to TI, where the presence of any states with hard-core overlaps in the = 0 ensemble prevents the evaluation of < ESW > < EES > . As we said before, we used a simple two-state expanded ensemble to handle this in the previous subsection. In that simple case, adequate sampling of both states was obtained without any preweighting, that is to say, with 1 = 2 = 0. Once again, however, the wandering of the centre of mass of the simulation as ! 1 presents diculties. Though we believe it would be possible to nd some way to x this problem within the expanded ensemble formalism, we have simply avoided it in practice by stopping at = 0:99 CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 211 instead of extending the simulations right up to = 1. Thus, the free energies that we obtain are of interest only in so far as they compare with corresponding measurements made with TI, and in so far as they enable us to further investigate some other questions relating to the expanded ensemble/multicanonical ensemble method itself. Comparison with Thermodynamic Integration Using the estimates obtained from TI, we constructed a weighting function xc for the system with specic volume v = 0:752. The size of the system and all the simulation parameters are the same as for TI. We used Nm = 20 subensembles (results for other choices of Nm are given below), and chose the parameters so that (i ) (i+1 ) constant. A total of 5104 lattice sweeps of the expanded ensemble were performed, with all simulations starting in the = 0 subensemble. 3:5 104 sweeps were used for estimation of the free energy dierences, when from examination of the visited states it was clear that equilibration had occurred8. We then estimate f (i ) from equation 4.7, with Pi determined using simple visitedstates estimators. The error bars come from jackknife blocking. To compare the results of TI and the expanded ensemble, we calculate f (), the free energy at various points along the path (f () = F ()=N where F () is dened in equation 4.3). Graphs of the two estimates of f () are indistinguishable, so we instead plot the dierence between the two (see gure 4.8). The sizes of the random errors are comparable, with those on the TI points slightly smaller, as one would expect given that rather more time was devoted to it than to the expanded ensemble. It is apparent that, as in section 3.3.2, the thermodynamic integration and expanded ensemble estimates dier by substantially more than the random error in certain parts of the integration range. In the Ising case, the availability of the exact results meant that we could attribute the discrepancy entirely to systematic errors in the TI points due to a phase transition on the path of integration. That is not the case here; lacking exact results, we can only speculate which result is the more accurate. However, it seems more likely that the fault once again lies with thermodynamic integration, and is again caused by choosing the simulation points too far apart in a region where the integrand is changing rapidly; here that is the region near = 1. We have already commented that the simulation points perhaps should have been chosen closer 8 The lengthy equilibration time, already a problem here, becomes so long in section 4.3 that special techniques must be introduced to overcome it. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 212 0.010 free energy difference Thermodynamic Integration (reference) fEE-fTI 0.005 0.000 -0.005 0.0 0.2 0.4 α 0.6 0.8 1.0 Figure 4.8. The dierence between estimates of f (), the free energy per particle along the path transforming the Einstein solid into the square-well solid. fTI from thermodynamic integration is taken as the reference, so all points (circles) lie on the horizontal axis; only the error bars are of interest. The other points (triangles) show fEE fTI , where fEE is the estimate of the free energy from the expanded ensemble. together. Nevertheless, no strong conclusion on the relative merits of the two methods can be drawn from these data. As in the previous section, though we have determined FSW with high accuracy, we cannot say anything about phase coexistence because we do not know the shape of f (v) or the pressure. To map out f (v), the procedure could be repeated for other specic volumes, or the results of the NpT simulation could be used, as they were before. 4.2.3 Other Issues Let us return to questions related to the most ecient use of the expanded ensemble method. First, how many values of is it best to choose? With the Ising model there is a natural `granularity' to the problem: that of the discrete macrostates. Here that is not the case, so we have investigated the eect that changing Nm has on the accuracy of the results. Second, we have used the simulation to conrm the result, derived analytically for a simple Markov process in section 3.4.2, that the accuracy of the estimate of (PNm =P1 ) is not improved by dividing up the Nm states into overlapping subgroups. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION Nm 5 10 20 213 fEE ra rw -6.84(4) 0.009 22 000 -6.946(16) 0.21 1 900 -6.957(12) 0.54 2 200 Table 4.1. estimates of free energy dierence fEE , average acceptance ratio of -changing moves ra and random walk time rw (in lattice sweeps) for the square-well solid with various Nm . First, then, let us consider the eect on the accuracy of the estimate of fEE f ( = 0:99) f ( = 0:0) of dividing up the range of into 20, 10 and 5 parts. The values of to be used are generated from the TI results, arranged so that Fi+1 Fi is constant for all states i. (though equation 2.20 for the prediction of spacing of states given in section 2.2.3 suggests that this may not have been the best policy). The probabilities of the subensembles are estimated with the visited-states (VS) method. The results are shown in table 4.1. The data for each value of Nm were produced from 6 blocks of 670 `sweeps' each, with two such blocks discarded for equilibration and all the simulations started in the = 0 state. One sweep here comprises one attempted update of all the particles and one attempted change of the prevailing for each replica. The value of rw is a rough estimate only, calculated from the fraction of the Nr =4096 simulations that in fact did make a random walk during the course of one block. The results do demonstrate clearly that reducing Nm to a very small value (5 here) impairs overall accuracy by reducing the acceptance ratio ra . They also suggest that there is quite a wide range of Nm that gives an acceptable accuracy; both Nm = 10 and Nm = 20 would be usable. The random walk time is about the same for these two; the larger acceptance ratio of the Nm = 20 simulation compensates for the greater distance to be covered. We should note that the results obtained here for fEE , when compared with the results of the longer runs that we are about to present, suggest a systematic underestimate of the magnitude of fEE . This is probably attributable to insucient equilibration time, given that all the replicas are launched from a single state, the = 0 state. This explanation is reinforced by the particularly poor agreement of the Nm = 5 results, where equilibration over was the slowest, because of the low acceptance ratio. Now let us concentrate on the case where a total of 20 states are used in the range 0 < < 0:99, and investigate the eect of subdividing the range into b subgroups, with each replica simulation restricted to one subgroup, and with one state in common between adjacent CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION b 214 Nmj fEE 1 20 -6.9715(10) 2 10,11 -6.9680(9) 4 6,6,5,6 -6.9717(12) 9 4,3,3,3,3,3,3,3,3 -6.9694(9) Table 4.2. The behaviour of the expanded ensemble estimator fEE upon subdivision of the range of the expanded ensemble. Nmj represents the number of subensembles in the j th subgroup (j = 1 : : : b) and so shows how the subensembles are allocated to the subgroups. The estimates of the error are scaled by the square root of the total number of sweeps performed. subgroups. Once again the simulations are performed in blocks, here 5{10 blocks of 670{200 sweeps each, and calculations are performed on each block so that the standard error of the nal estimators can be determined. The measured standard errors have been corrected for the variation in the total number of lattice sweeps performed. Probabilities are measured with VS estimators, where the probability P~i of a state is estimated from the number of visits to it Ci , so there is an increasing need to discard early iterations as b decreases, as the time for equilibration over the subensembles lengthens; for the b = 1 case it is necessary to discard the rst four of thirteen blocks. Use of transition probability estimators with a uniform initial distribution of replicas over the subensembles (see section 4.3) might have reduced this problem, though equilibration within the high- ensembles would still have been dicult. The results are shown in in table 4.2. The most signicant results are the measured sizes of the error bars on the estimates of fEE , which seem unaected by the process of dividing the range. Certainly our results allow us to condently exclude the `fallacious argument' of section 3.4.2, which would suggest that fEE should be estimated with three times the accuracy of a single expanded-ensemble run when the range is divided into nine parts. The data thus provides reasonably strong evidence in support of the argument of that section, that (P1 =PNm ) is independent of b. 4.3 Direct Method|Multicanonical Ensemble with Variable V We now wish to estimate the phase transition pressure and the volumes of the coexisting phases for the square-well solid, using a direct method that eliminates the need for a reference system. We shall do this by applying the multicanonical ensemble that was introduced in chapter 3. We shall study nite-sized systems by generating and using a at sampled distribution extending CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 215 over a wide range of macrostates, which in this case are macrostates of V and must embrace the volumes of the coexisting phases. This distribution can then be reweighted to obtain estimates of F (; v) and the canonical probability density PNcan(p; v) for a range of values of p. From this we can nd the coexistence pressure pcoex and canonical averages. Away from coexistence, PNcan (p; v) consists of a single Gaussian peak, but near pcoex it develops a double-peaked structure as shown schematically in gure 4.9. Each peak corresponds to one of the coexisting bulk phases. It is obvious that the natural nite-size estimator of pcoex is that for which the two peaks have equal weight, that is to say PNcan (phase A) = PNcan(phase B) = R 1=2, where PNcan(phase j ) = v2phase j PNcan(pcoex ; v)dv. We shall discuss in section 4.3.6 below the way that this and other nite-size estimators of pcoex approach the innite-volume limit. We note that, except near the critical point or for very small systems, there is a region of very low canonical probability between the two peaks, which will be enhanced in the multicanonical sampling. P(v) v Figure 4.9. Schematic diagram of a typical canonical probability density P can(v) for the square-well solid at phase coexistence. We shall present and comment upon the results that are produced in section 4.3.5; however, since there are some important dierences between the way that the multicanonical ensemble is generated and used here, and how it was used in chapter 3, we shall rst devote some time to explaining and justifying the procedure adopted in this chapter. We shall rst describe how we have implemented the variable-volume multicanonical ensemble, then we shall explain why it is CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 216 that for this system (and this computer) rw , the time required for even a single random walk over all the volume macrostates of the square-well system, becomes prohibitively long. Then we shall describe in detail the procedure for estimating the sampled distribution in spite of this, showing in particular that the transition probability (TP) estimators introduced in section 3.2.3 here outperform visited states (VS) estimators in all stages of the simulation process. Thus, there are two main parts to this section: in the rst we describe the technique we use to produce the results; in the second we concentrate on the physical behaviour of the square-well system. 4.3.1 The Multicanonical NpT -Ensemble and its Implementation The appropriate Gibbsian ensemble for describing this system is dened by the partition function Z Z 1 dV dN r exp[ (pV + E (rN ))] ZN (p) = V 0 with associated probability densities PNcan (rN ; V ) = (1=ZN (p)) exp[ (pV + E (rN ))] and PNcan (V ) = (1=ZN (p)) Z V (4.9) dN r exp[ (pV + E (rN ))] It is quite easy to construct a Monte-Carlo scheme for sampling from this distribution (constant-NpT MC); however, as described in section 1.2.3, the barrier of low probability between the phases in this case means that a very lengthy simulation would be needed to estimate the relative probabilities of the two phases, even if they were of roughly the same order of magnitude. Otherwise, the best that can be done is probably to put a wide bracket on pcoex: pcoex is certainly less than a pressure ph that drives a simulation started in the rare phase into the dense phase, where it is then observed to stay, but it is certainly more than pl that allows a simulation started in the dense phase to pass into the rare phase where it then stays. Instead, we chose a multicanonical approach with the order parameter V preweighted by xc (V ) such that the sampled distribution is approximately at over some range of V , from V to V! say. The multicanonical p.d.f.s are thus PNxc(rN ; V ) = (1=ZNxc) exp[xc(V ) E (rN )] CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION and PNxc(V ) = (1=ZNxc) Z V 217 exp[xc (V ) E (rN )]dN r and we recover the canonical probability using PNcan(p; V ) / exp[ pV xc (V )]PNxc (V ) which we normalise using Z V! V PNcan (p; V )dV 1 Z 0 PNcan (p; V )dV = 1 This gives a good estimate of P can for any p, provided that p; V and V! are such that equation 4.13 is true (i.e., provided that the canonical p.d.f. has eectively all its weight in the multicanonical sampling range). Thus it is not necessary to know pcoex a priori, only to have a rough idea of the volumes of the coexisting phases so that V and V! may be chosen to bracket them9 . Computational Details We start from the `core' CM program described in appendix E. We mention at this point the choice of the maximum displacement x of particle moves in the congurational updates. The same x was used for all the volumes and it was varied only a little between simulations done at dierent temperatures; its values was x= 0:006 (about half the width of the potential well). With this choice, the acceptance probability varied between about 0.25 in the states with lowest V to about 0.9 in those with highest V . However, to implement the Multicanonical NpT ensemble we need to make volume-changing moves as well as coordinate updates. The volume changes are realised by making uniform contractions or dilations of the box that leave the relative positions of the particles unchanged. To avoid the necessity of updating the particles' position coordinates whenever a volume change is accepted, we work with scaled coordinates s = r=L where L3 = V . As a consequence, the potential energy function becomes a function Even if we do not dene this interval correctly, the method is robust; we either nd a single-peaked the location of the phase boundary, or a P can (pcoex; V ) that inIn either case the necessity of widening the interval is clear, and, at least in the second case, it is clear how this should be done. 9 P can (pcoex ; V ) straddling our estimate of creases up to V1 or VNm and then cuts o. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 218 P of V : E 0 (V; sN ) = i;j<i E 0 (V; sij ) where 8 > > > > < E 0 (V; sij ) = > > > > : 1 if sij =L E0 if =L < sij < (1 + )=L 0 if sij (1 + )=L i.e. there is a volume-dependent eective hard-core diameter 0 = =L. Particle moves are made (in parallel within the same simulation, where possible) using the usual Metropolis rule, which is here: P (sN ! s0 N ) = min(1; exp[ (E 0 (V; s0N ) E 0 (V; sN ))]) (4.10) The volume changes are made by discretising the range of V into a discrete10 set fVi g, i = 1 : : : Nm. The Metropolis rule is used in the modied form P (Vi ! Vj ) = min(1; exp[N ln(Vj =Vi )+ xc(Vj ) xc(Vi ) (E 0 (Vj ; sN ) E 0 (Vi ; sN ))]) (4.11) where the (scaled) coordinates are left unchanged and j is restricted to be i + 1 or i 1 chosen with equal probability (trial moves that would take us outside the chosen range of V are immediately rejected). The N ln(Vj =Vi ) term reects the Jacobian of the transformation from r to s coordinates in the partition function ZNxc. Even though the conguration does not change, we must still recalculate all the particle-particle interactions, since the eective hardcore diameter 0 does alter, and then sum E 0 (V; sN ) over all the subvolumes of each simulation. This requires general communication on the CM, and so is quite an expensive process. Our procedure is to perform `sweeps' consisting (usually) of one attempted update of the positions of all the particles and one attempted volume change. One `iteration' then consists of Ns sweeps, after which we update . We shall discuss the eect of the relative frequency of coordinate updates and volume changes in section 4.3.3 below. 10 We do not believe that it is essential to discretise V {it could be left continuous and the transitions grouped into a histogram. However, the TP method would then encounter the same diculties that we found in section 3.2.3, coming from imperfect equilibration within each V -state; in particular, underestimation of the eigenvector of the sampled distribution would result|though as we shall see, this problem may arise anyway. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 219 The ensemble produced by this update procedure has partition function Z^Nxc = N m X i=1 ViN exp(xc (Vi )) Z 0 1 dN s exp[ E 0 (V; sN )] and we measure P^Nxc(Vi ) = (1=Z^Nxc)ViN exp(xc (Vi )) Z 1 0 dN s exp[ E 0 (V; sN )] and then reconstruct the canonical ensemble by Z Vr Vq P PNcan (p; V )dV / 0 i Vi exp[ pVi xc (Vi )]P^Nxc (Vi ) (4.12) where Vi = Vi+1 Vi and 0 signies that the sum is restricted to the range Vq < Vi < Vr . As before normalisation follows from P N m X i=1 Vi exp[ pVi xc (Vi )]P^Nxc (Vi ) Z V! V PNcan(p; V )dV 1 (4.13) 4.3.2 The Pathological Nature of the Square-Well System We have not yet said how the set fVi g is to be chosen; as in an expanded ensemble calculation, there is no `natural granularity' to the macrostate space, so we should choose the number and spacing of the states to give the best performance. As we outlined in section 4.2.3, there is a trade-o between the ra , the acceptance ratio of volume-changing moves, and the number of (accepted) steps required to cross the macrostate space, and near-optimality is obtained for a fairly wide range of possible choices, as long as the acceptance ratio is kept fairly high ( 0:5). A suitable way of choosing the states an of expanded ensemble simulation is described in section 2.2.3. However, the square-well system presents unusual diculties in this regard: because of the hard core in the potential, any trial volume change that produces even a single hard-core overlap will be rejected, which means that only small volume changes have a reasonable chance of being accepted. Thus we nd that Nm must increase rapidly with N , and must be made quite large even for a small system, to avoid a very low acceptance rate. Moreover, the problem is more acute for small specic volume, where the particles are more closely crowded together, so the spacing of the states should vary with V to achieve a roughly constant acceptance ratio. In practice, for N = 32 we experimented to nd spacings that gave an acceptance ratio of about CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 220 0.1{0.5. We used only two values of V , the smaller at low volumes and the larger (twice as big) at high. For N = 108 and N = 256 we used a rough ansatz for the dependence of ra on V and V to generate a suitable set fVi g, with Nm determined by the desired starting and nishing volumes. This method turned out to be quite eective in keeping ra constant; we nd that for the N = 108 system at = 1:0 it varies between 0.3 and 0.7 while for the N = 256 system at the same temperature it varies between 0.3 and 0.6 (it is higher at the low-volume end, so we are slightly overcompensating for the increased diculty of accepting volume transitions here). For comparison, ra varies between 0.4 and 0.05 for the N = 32 system with two dierent r's. More important is the scaling of Nm with N . The procedure just described, which as we have said is found to keep ra roughly constant, produces Nm = 281; 1292; 2949 for = 1:0 and N = 32; 108; 256, so that it seems that Nm N (which can also be predicted by approximate scaling arguments). In a normal expanded ensemble calculation, by contrast, we would expect that Nm would not need to be so large even for the smallest N and that to keep ra constant would require only Nm N 1=2 , since the scaling of the microstate space to be covered ( N ) would be partially cancelled by the N 1=2 scaling of the size of typical uctuations (see section 2.2.3 and [124, 125]). It is this necessity of using large numbers of volume macrostates, combined with the rather slow speed per replica simulation of the Connection Machine, that produces the very long random walk time rw . As we shall show in section 4.3.4, this forces us to modify the way that the multicanonical ensemble is used, extending the use of the TP estimators to all stages of the simulation process. We shall now go on to explain how an approximately multicanonical distribution is generated (section 4.3.3), and then (section 4.3.4) how it is used to produce the desired nal estimators of canonical averages and quantities related to the phase transition. The procedure is, once again, an iterative one, with the iterations numbered with n as in chapter 3. 4.3.3 Finding the Preweighting Function Here we shall deal with the `nding' stage of the multicanonical distribution, where n converges uniformly towards its `ideal' value . In section 3.2.3 we have already established the utility of using TP estimators in this stage of the process, so we shall not justify their use further. However, we shall nd that the way that coordinate updates and volume changes are separated in the square-well system enables us to gain useful further insight into why the convergence process takes place as it does. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 221 Let us show how the nding process works by considering it in action for an N = 32 system. We start with all the particles on their lattice sites, and for each simulation choose the volume macrostate index i with uniform probability in 1 : : : Nm . (Thus we start with the Nr replica simulations distributed fairly uniformly through macrostate space, rather than launching them all from only a few macrostates, as in section 3.2.3). We then perform about 1000 equilibration sweeps through the lattice, updating the particles' positions but not yet attempting the volumechanging moves. The purpose of this is to allow the Markov chains to reach equilibrium at constant volume, so that P (sN jv), analogous to P (ji) from section 3.2.3, has its canonical form. Because the random walk time is so long for this system, it is much quicker and easier to establish equilibrium this way than it would be by allowing the replica simulations to spread out over all these volume states from only a few release macrostates. Then we allow volume changes as well, and gather histograms of volume transitions over short iterations with Ns = 250. This time is much shorter than rw , so each replica will explore only its local macrostates; however we can nd estimators for the whole macrostate space by pooling all the transitions of all Nr replicas into a histogram Cij at the end of each iteration (Cijn at the end of the nth iteration). Then we use Cijn to estimate the TP matrix nij (using equation 3.25), and update n using the simple scheme n+1 = n ln P~ n + constant (4.14) where the estimator P~ n is the eigenvector of the estimated transition matrix ~ij . This is the same scheme that was used in section 3.2.3. In so far as P~ n is a good estimator, we would expect n+1 to be multicanonical. In fact, as we shall see, n tends to converge to a limit over the course of several iterations, just as in section 3.2.3. As always, the accuracy of the TP estimators is reliant on equation 3.26 being satised, which means here that we must be able to achieve equilibration at constant volume, even for those volumes that lie between the two equilibrium phases, in the initial constant-volume equilibration phase, and then we must be able to re-establish it by coordinate updates after each accepted volume change. We shall present strong numerical evidence below to show that this approximation works well while converging to , and shall show that it is essentially exact for the multicanonical distribution. However, it does mean that our method would not be applicable to a system like a spin-glass, where equilibration is very dicult except CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 222 in a particular set of macrostates. To simulate a spin-glass that required the same computational eort per update as the square-well system would probably require a substantially more powerful computer than the CM. We now comment briey on the choice of Ns . It is invariably the case that the initial Pcan (V ), and the early (small-n) iterates P n (V ), have almost no weight in the low-volume states, so that, notwithstanding the slow evolution of V , these tend to become unoccupied as the simulation proceeds. This implies that n should be updated rapidly on the basis of relatively few lattice sweeps per simulation, so that the simulations do not have time to move far, and it is not necessary to re-initialise and re-equilibrate them to access the low-volume states once again11. There is clearly a trade-o to be made between this and the requirement that enough transitions should be recorded for Cij to estimate ij accurately (an eective `fullness criterion', cf. NTP in section 3.2.3). We shall return to this matter, and to the eect of the amount of congurational updating performed between V -transitions, a little later. In gure 4.10 we show the results of the `nding-' process for the N = 32 system with = 10=11, Nm = 237, Nr =4096, Ns = 250 (the results are shown as a function of v = V=N ; this will be our policy from now on, as it allows results from dierent system sizes to be more easily compared). It is apparent that , shown in the large upper gure and its inset, converges to xc (covering 36 decades of probability) within 8 or 9 iterations, which require only about 25 minutes to perform on this small system. The lower gures show the VS histograms; they are not used for updating but are shown to give an indication of the distribution of the positions of the simulations at each stage of the iterative procedure. An initial tendency to `equilibrate' by moving to high volume states, which have high equilibrium probability for 0, is quickly reversed (though the lowest-volume states do briey become unoccupied in the 2nd iteration). We emphasise that the histograms do not reect in any real sense the underlying sampled distribution; the simulations do not move far enough (an average of only 1=30th of the width of the macrostate space) during each iteration for the eect of the starting state to disappear. Thus, here, where the starting states are uniformly spread out, the histograms give the impression of a sampled distribution that is more uniform than is really the case; wherever in the macrostate space they were initially clustered, they would seem to indicate that that region was the most probable, irrespective of its real equilibrium probability. 11 While it is true that simulations that had left the lowest states would eventually occupy them again, once a multicanonical distribution over that part of macrostate space had been established, the time to re-occupy these states (under a random walk) is much longer than the time to leave them (under a directed walk). CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 223 To show the inadequacy of VS estimators (see section 3.2.1) at this stage of the process, we show in gure 4.11 the 2 generated by the TP method above, together with two VS estimators of it. One (`Visited States (i)') shows the eect of trying to use the VS histogram fCi g1 from the same iteration, where the replicas were originally spread uniformly. The other (`Visited States (ii)') derives from a dierent simulation, but of the same length, where the simulations are started more or less with their equilibrium canonical distribution over the volumes (so that they can reach it in the same equilibration time as was given to the TP simulation12 ). This estimator is thus the best that one could expect to do by visited states (without extrapolation). It is apparent that the TP estimator gives by far the best estimator of the true shape of . Most of the larger systems (N = 108 and N = 256) are treated in practice by rst using simple Ld FSS (see section 3.2.4) of xc from a smaller simulation at the same temperature to give an initial estimate of , which is then rened with the transition method as before. The FSS estimator is normally found to be very close to xc , (with the discrepancy getting larger as decreases) so only one or two iterations are required before only random uctuations in are observed, signaling the end of the `nding-' stage, and it is possible to move to the `production' stage. However, to demonstrate that FSS is not essential, we also show the process of nding xc from a zero start for N = 108, (with Nm = 1292, = 10=11 Nr =4096 and Ns = 400) in gure 4.12. 12 The equilibration time from a start where the simulations were all in the state of highest specic volume would be more than ten times as long. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 80.0 80.0 η2 η3 η4 η5 η6 η7 η8 η13 60.0 η 224 40.0 75.0 η 70.0 65.0 0.715 0.720 0.725 0.730 0.735 v 20.0 0.0 0.70 0.75 0.80 0.85 v C 8000.0 8000.0 6000.0 6000.0 C 4000.0 2000.0 0.0 0.70 4000.0 2000.0 0.75 0.80 0.0 0.70 0.85 0.75 v 0.80 0.85 0.80 0.85 v 35000.0 8000.0 30000.0 6000.0 C 25000.0 C 4000.0 20000.0 15000.0 10000.0 2000.0 5000.0 0.0 0.70 0.75 0.80 v 0.85 0.0 0.70 0.75 v Figure 4.10. Top gure: the convergence of the preweighting for = 10=11 with N = 32. We show 2 to 8 (1 = 0), with Ns = 250. We also show the preweighting function 13 , produced after two more iterations of 250 sweeps and three of 2500. In the inset, a detail of the low-volume end is shown. The gures below show the histograms of VS fCin g (pooled for all 4096 simulations) for iterations 1 (top left), 2 (top right), 5 (bottom left) and 13 (bottom right). It is apparent that the initial tendency to move out of the low-volume states, which have low equilibrium probability for 0, is reversed by iteration 5 and the distribution of simulations through the macrostate space is once again approximately uniform for the longer iterations like iteration 13. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 225 η2 (Transition Probability) η2 (Visited States (i)) η2 (Visited States (ii)) 40.0 30.0 η 20.0 10.0 0.0 0.70 0.75 0.80 0.85 v Figure 4.11. Estimators of 2 evaluated using the TP estimator and two VS estimators. Ns = 250 for all. The rst VS estimator is produced by using the histogram C 1 from the same iteration that gave the TP estimator; the second derives from a simulation where the replicas are initialised with their equilibrium canonical distribution. Once again the process of approaching to within a few percent of the multicanonical distribution is quite rapid; to generate the estimators in gure 4.12 requires only 4 hours of processing time, which is about 30% of the time spent on the production stage. We have not made a detailed study of the eect of the length of each iteration on the speed and stability of the algorithm during the `nding' stage. However, the following approximate argument may be used (and was used in the generation of the data in gure 4.12) to predict a value of Ns that is found empirically to be more than adequate for stability, while still maintaining a distribution of the simulations that is at all times fairly uniform over fVi g. Let the average number of visits to each macrostate per iteration (summed over all the replicas) be Nv . Approximate the transition matrix by i;i+1 = a8 i, i;i 1 = c8 i. Let R = PNm =P1 . Then R (a=c)Nm so, assuming the transitions (and the resulting estimates of ij ) are all independent and using CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 300.0 226 300.0 280.0 η 260.0 200.0 η 100.0 0.0 0.70 η2 η3 η4 η5 η xc 240.0 η2 η3 η4 η5 η xc 220.0 0.710 0.720 0.730 0.740 v 0.75 0.80 0.85 v Figure 4.12. The convergence of the preweighting for = 10=11 with N = 108, Ns = 400 and Nr =4096. We show 2 to 5 (1 = 0). We also show the eventual nal preweighting function xc, produced after 13 iterations of 2000 sweeps but starting from a FSS estimator. In the inset, a detail of the low-volume end is shown. Visited states histograms are not shown, but are similar to those in gure 4.10. simple error-propagation, we nd a=c) R=R Nm ((a=c ) p (a=c)=(a=c) is controlled by Nv and (largely) the smaller of a and c; taking it to be c, we have (a=c) p 1 (a=c) cNv Now Nv Nm = Ns Nr , and for stability we may demand R=R = O(1), which leads to Nm2 Ns cN r (4.15) In fact, the results for various test runs imply that the algorithm is robust down to a value of Ns rather smaller even than this; the value of Ns = 400 used for N = 108 above was arrived at by using equation 4.15 but with c = 1, and the algorithm still converged to , though with CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 227 much more noise, with Ns = 150 and Nr = 1000. Certainly the 250 sweeps allowed for N = 32 was far more than required. Equation 4.15 has certain intuitively appealing properties; one would expect that for a single serial computation, the length of time required to estimate R to a minimal accuracy would be approximately equal to rw , and so increase as Nm2 ; the result 4.15 then shows that, on a parallel computer, this time is reduced by Nr , the number of replicas run in parallel13. We should note that there is no contradiction between this result and the results of section 3.4.2, which apply to a slightly dierent system. There, we were considering a single random walker, and the necessary assumption was made initially that the total run-time Ns was much greater than rw Nm2 . The most striking dierence between gures 4.10 and 4.12 is that the N = 108 simulation actually converges faster than the N = 32, despite having the smaller Ns . As we shall conrm below, this is in fact the result of better equilibration between volume-changing moves; the rate of convergence does not depend on Ns to any measurable extent. As in section 3.2.3, the TP method gives an accurate estimator of P n (v) to the extent that equation 3.26 is true, i.e. to the extent that P n (sN jv) has and maintains its canonical value. Since the volume changes preserve the conguration, and P n (sN jv) varies with v, this must be dependent on the amount of congurational updating done per sweep. In the N = 108 simulation, all the particles' coordinates were updated between attempted volume changing moves, but, because of the way that the N = 32 simulation was mapped onto the CM, this was in fact not the case there. In the limit of perfect equilibration, nij =nji should estimate Pjn =Pin exactly whatever the sampled distribution, so immediate convergence would be observed, that is to say, 2 would already be multicanonical (apart from the eect of random uctuations). The underestimate of the dierence between the present sampled distribution and the multicanonical distribution is the result of incomplete equilibration, just as it was for Ising energy in section 3.2.3. There are two issues arising from this that must be checked. First, and more importantly for the nal accuracy of the results, it is necessary to establish whether the same is the xed point of the iterative process (apart from random uctuations), whether or not equilibration is good. To check this, we have run simulations each starting with the eventual limiting xc obtained for N = 32, = 10=11, with imperfect equilibration between volume changes. All the simulations perform the same number of volume updates, but dier in the number Neq of coordinate updates (of 13 it also implies that the accuracy in R obtained is independent of whether or not the range of macrostates is subdivided. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 228 all the particles) that they attempt between them. Several blocks of data are generated for each Neq (and without updating xc), so that error bars can be obtained to see if there is any evidence that xc(Neq ) alters systematically away from . It is found that there is no discernible Neq -dependent change in xc even when several volume changes are made for every coordinate update. Thus we expect that, whatever Neq , the algorithm will still eventually converge to the correct multicanonical limit, if it converges at all. We shall comment again on the special status of the multicanonical distribution a little later. Having reassured ourselves that the multicanonical limit is correctly found, we now show in gure 4.13 the eect of varying Neq on the early stages of convergence of n (for the same N = 32 system, again with the same number of volume updates), to conrm that this is indeed the cause of the dierent rates of convergence in gures 4.10 and 4.12. The iterative process is started from = 0, and 4 iterations performed, with updated after each iteration this time; all the simulations had Ns = 50, but Neq varies between 0 and 8. This time increasing the amount of equilibration performed during each iteration does have a clear eect: it increases the speed of convergence to the multicanonical limit. This occurs because the extent to which equation 3.26 is satised controls the extent to which equation 3.28 provides a good estimate of P n (v). That equation 3.26 is not satised immediately is reected in the fact that convergence to is not immediate. However, this does not explain why it is that the eigenvector of n continually underestimates the change required to reach . To understand this, it is necessary to consider what is occurring physically in the simulations. Initially, they clearly tend to drift to higher volume states. Moves to higher volumes can be made freely, since the conguration is preserved so there can be no hard-core overlaps and there is only the energy cost to consider, while the reverse moves to lower volume are strongly suppressed by the likelihood of a hard-core overlap. However, as Neq is reduced, the moves to higher volume are largely unaected, while the moves to lower volume become more likely. This occurs because they are likely to be simply reversals of a move that came from the lower volume state on the previous sweep; to the extent that equilibration is imperfect, the conguration is preserved from that sweep, and so less likely to contain a hardcore overlap than one that truly reects P n (sN jv). Thus, the ratio ~nij =~nji is nearer to unity than the true nij =nji , the magnitude of the eigenvector is underestimated and so is n+1 . CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION Neq=0 80.0 60.0 η Neq=1/2 80.0 η2 η3 η4 η5 η xc 60.0 η 40.0 20.0 0.0 80.0 60.0 η 0.70 0.75 60.0 η 0.80 0.0 0.85 0.70 0.75 v Neq=1 Neq=2 80.0 η2 η3 η4 η5 η xc 60.0 η 0.80 0.85 0.80 0.85 0.80 0.85 η2 η3 η4 η5 η xc 40.0 20.0 0.70 0.75 0.80 0.0 0.85 0.70 0.75 v v Neq=4 Neq=8 80.0 η2 η3 η4 η5 η xc 60.0 η 40.0 20.0 0.0 40.0 v 20.0 80.0 η2 η3 η4 η5 η xc 20.0 40.0 0.0 229 η2 η3 η4 η5 η xc 40.0 20.0 0.70 0.75 0.80 v 0.85 0.0 0.70 0.75 v Figure 14.13. Convergence of n for n = 2 : : : 5 (1=0) N = 32, Ns = 50, Nr =4096, Neq =0, 2 ,1,2,4,8. Also shown is a suitable xc, which in fact is 13 from gure 4.10. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 230 Though we have not studied in detail the behaviour of the size of the systematic underestimate as a function of the `distance to the multicanonical limit,' n , the results of gure 4.13 (and gures 4.10 and 4.12) seem to indicate empirically that the fractional extent of the underestimate remains about the same at each iteration, (or perhaps decreases slightly as the more diusive movement of the simulations through macrostate space gives greater time for equilibration within a set of a macrostates). However, this constant fractional error in the eigenvector corresponds to a decreasing absolute error and thus a geometric convergence towards , whatever Neq may be (we have already shown that simulations conducted in what we believe to be the multicanonical limit but with various Neq do not drift away from it, so implying the same is the limit of the generation process for all Neq ). Once we have arrived at a situation where further iterations produce only uctuations in n around , we move to the production stage. Although it applies rather more to the next section (section 4.3.4) than to this one, another particularly important result that becomes apparent from this investigation of the eect of Neq is why it is important to generate and use the multicanonical distribution in particular. It might at rst seem that, because TP estimators can generate an estimate of a sampled distribution that varies over many orders of magnitude, multicanonical sampling is not necessary: any sampled distribution would appear to be adequate, even the original canonical distribution. It seems that we require only that adjacent macrostates are similar enough in equilibrium probability for transitions in both directions between them to occur. The investigation of the eect of Neq shows why this is not adequate: the fact that n+1 generated at the end of any stage n is not generally equal to shows that the estimator P~ n is not generally equal to the real underlying P n , the dierence being due to incomplete equilibration at constant volume. Thus any estimates of P can or canonical averages made on the basis of P~ n would be heavily biased, unless Neq were extremely large. It is only in the multicanonical limit, where P~ n P n ( constant), that P~ n becomes more or less independent of Neq , and the arrival in this limit is signaled by the convergence of n . Thus we must reach this limit to be able to accurately reconstruct P can . Why should it be that P alone does not depend on Neq ? To understand this it is necessary to return to the equation (cf. equation 1.21) Ps (t = 1) = X r Pr (t = 0)nrs CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 231 that describes the evolution of the p.d.f. of the microstates r and s (which here are the joint set of coordinates and volumes, fsN ; vg). As we know from section 1.2.2, this converges for large t to the equilibrium distribution Psn = Prn nrs . We have in fact two update matrices n , one (equation 4.10) describing coordinate updates and the other (equation 4.11) v-updates. They both preserve Psn once it is established. Now to be sampling from Psn implies both that local equilibrium P n (sji) P n (sN jv) should be established and that the distribution over the macrostates Pin should have its equilibrium value. In early iterations, the rst condition is satised at t = 0 (because before starting we relax all the replicas with coordinate updates) but the second is not, because the replicas are distributed uniformly over macrostate space but the equilibrium Pin is far from uniform. Thus, when we update with a v-transition, Ps (t = 1) 6= Ps (t = 0) 6= Psn , and P (sji; t = 1) loses its equilibrium value. It must be relaxed again with coordinate updates if equation 3.26 is to be satised. However, if we have Psn = Ps , then Pi (t = 0) = Pi because we chose the distribution of the replicas to be uniform. The result, given that P (sji; t = 0) = P (sji), is that the the underlying Markov process is in equilibrium right from the start (Pr (t = 0) = Prn ) and so stays in equilibrium at all later times, even under the action of v-updates alone; Neq is irrelevant. The special status of multicanonical sampling, then, is due to the fact that the equilibrium Pin accurately reects the distribution of the replicas14. It should be noted that the investigation of Neq that has just been carried out is only possible because the congurational updates and the volume changes are completely separate here and the one can be performed without aecting the other. In the Ising case, though similar eects are clearly present (see section 3.2.3), the congurational updates (spin ips) are also the means by which the macrostate is changed, so an investigation which relies on disentangling the two would be much harder. Finally, we remark that it is not clear from the results of gure 4.13 what the best (computationally cheapest) strategy is in practice in the `nding-' stage. Increasing Neq decreases the number of iterations required, but of course the total computer time/iteration increases linearly with Neq (from 2 12 mins/iteration for Neq = 1 to 10 mins/iteration for Neq = 8). The best strategy is probably intermediate between these two, though we do not expect any improvement to be very great, and we have not investigated it in detail. 14 If we chose some other (non-uniform) distribution of replicas, then of course the `special' sampled distribution would be the one that reected that distribution. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 232 4.3.4 The Production Stage As we have seen, imperfect equilibration at constant volume leads to an initial stage of simulation during which n is not close to , but does converge to it (n+1 n is uniformly positive). The corollary of this is that the arrival of n close to is signaled by the move to a situation where n undergoes only random uctuations between iterations. We have also seen that it is necessary to reach this regime before P~ n is a trustworthy estimator of P n . At this point we move to the production stage. In the applications of the multicanonical method in chapter 3, it was found best to use VS estimators at this stage, whatever method had been used to establish xc : with TP estimators, there was a reduction in speed and, more importantly, a systematic bias (at least for energy preweighting). Therefore, to motivate and justify our use of these estimators for the square-well system, we shall show analytically, and conrm by direct simulation, that the VS method is unusable because of the extremely long random walk time. We shall also show that any bias is very small here on the scale of the random error15. The VS estimator only provides a good estimator of the probability if the run-time of each replica simulation is much greater than the ergodic time (at least equal to the random walk time in this case), so that the replica simulations have `forgotten' all information about their starting states, which would otherwise heavily bias the result (see the discussion in section 3.2.5). Let us see what this would entail for the simulation with N = 108, for which it is found that Nm = 1292 and the average acceptance ratio of volume transitions is about 1=2. Simple random walk arguments imply that about 2 106 volume updates, i.e. sweeps, would be required for a single simulation to traverse the whole range of macrostates. By contrast, the highest speed we can achieve on the CM is 6000 sweeps per simulation per hour (for N = 108, Nr =64), implying the need for over 300 hours even for one random walk per simulation16 . In fact, about 6000 sweeps are needed for a simulation to perform a directed walk from one end of the macrostate space to the other in the =0 case, when these states dier in probability by more than 100 orders of magnitude. Thus, we see that, using visited states estimators, it is impossible to take advantage of the full power of the parallel computer to run many replicas of the simulation, since the initial distribution of the simulations over the macrostate space The extra bookkeeping of the transition probability method is negligible for the square-well system. We should note that this is far from the greatest value of Nc = Ns Nr that can be obtained; with Nr = 64, Nc = 374000 sweeps/hour, while with Nr =4096, 500 sweeps/simulation/hour can be attained, so Nc = 2:5 106 /hour. 15 16 CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 233 will persist throughout the simulation. Of course, given that we keep the uniform distribution of replicas over the macrostates that was used in the `nding' stage, then once we are close to the ideal multicanonical distribution, the VS histograms do appear to reect the sampled distribution, as we remarked in the previous section. However, this is only a result of the fact that the distribution of simulations through the volume macrostates is itself chosen to be nearly uniform. Any further information that we might appear to gain about the the sampled distribution from using such VS estimators would be illusory. The inadequacy of VS estimators means that we have no choice but to stay with the TP estimators used in the `nding' stage. As we have seen, TP estimators implicitly include and correct for the eect of the starting state, so the lack of equilibration over all the volume macrostates is not a problem. In chapter 3 it was found that TP estimators were sometimes biased; however, after the analysis of section 4.3.3 showing that here the underlying Markov process should always be in equilibrium, we would not expect any problem with this. Nevertheless, to be doubly certain, we shall present substantial evidence below to show that there is no bias. As further corroborating evidence, we recorded VS estimators throughout the long simulations that were used to generate the results in sections 4.3.5 and 4.3.7, and none of them ever showed any indication of systematic drift of the replicas through macrostate space, showing that any bias is certainly smaller than can be detected by VS in a run of practical length. We shall now describe how the TP estimators are used in practice to generate the results. Rather than keeping xc constant in the production stage, as was done in generating the results in section 3.3, it is updated after each iteration using equation 4.14, just as in the `nding-' stage. The only dierence is that Ns is normally longer, of the order of a few thousand sweeps per iteration rather than a few hundred. This simple scheme is used since we found in chapter 3 that the more complex updated scheme described by equations 3.24 and 3.23 seems to yield only a marginal improvement. We can now recover an estimator of the canonical probability for each iteration using equation 4.12. The procedure in nding the coexistence point and generating canonical averages is then to nd a set fpng by identifying, for each iteration, pn as that pressure which gives a double-peaked P can (v) with equal weight in the two peaks. Next, the members of the set are averaged to give a mean pcoex = pn and an error bar. Finally, fP n;can(pcoex ; v)g are re-evaluated using the same best estimate pcoex for every iteration. Properties like the densities and compressibilities of the coexisting phases then follow by calculating < v > and < (v < v >)2 > for each phase CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 234 Type of Estimator pcoex simple average TP 30.16(8) single jackknife TP 30.16(8) double jackknife bias-corrected TP 30.15(7) Table 4.3. Various estimators of coexistence pressure for N = 108, = 10=11, Ns = 1000, Nr =4096 with constant xc. All estimators come from transition probabilities. on each member of fP n;can(pcoex ; v)g and averaging, while averaging the members of the set fP n;can(pcoex ; v)g itself gives a best estimate of the p.d.f. of v at coexistence. All the above can be done (if desired) for pressures away from pcoex, provided that equation 4.13 is satised. In all the simulations performed here, all the iterations used were the same length, though it is not essential that this should be the case. The one disadvantage of this continual updating of xc is that we lose the ability to use jackknife bias-corrected estimators. To see whether bias-correction would yield signicantly dierent estimators we have performed a test run with constant xc to allow jackknife biascorrected TP estimators to be calculated17 . The parameters of the simulation were N = 108, = 10=11, Ns = 1000, Neq = 1, Nr =4096, and eight blocks of data were gathered. The results, for the coexistence pressure evaluated using the equal-weights criterion, are shown in table 4.3. To evaluate p~coex , the canonical probability distribution is reconstructed, using equation 4.12, with three dierent kinds of TP estimator for P xc (v). The rst (`simple average') is dened on each transition histogram Cijn separately; to produce the second (`simple jackknife'), all the transition histograms except the nth are pooled to make the nth estimator. The third is a double-jackknife bias-corrected estimator (see appendix D). It is clear that, within the error bars, the three estimators are identical. The procedure of continually updating n is thus justied, and so is the analysis of section 4.3.3 showing that equilibrium is always exactly maintained, (and so that there should be no bias), given that the actual distribution of the replica simulations reects their equilibrium distribution. It thus seems likely that the bias in chapter 3 had this source, because no eort was made to keep the visited states histograms at; simulations were purposely released from only a few starting macrostates. Having justied the use of TP estimators in all stages of the simulation process, including the obtaining of canonical averages, we shall now comment on the accuracy of the estimators obtained. The absolute magnitude of the uncertainty in xc is O(1) for the larger systems (i.e. 17 Once again 13 from the Nr = 1000 TP simulation was used. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 235 xc = O(1) ), and thus, because P~ can exp(xc ), the fractional error in the reconstructed canonical probability is also O(1), that is, (P~ can P can)=P can = O(1), at least for some states. This can be conrmed by inspection of the results of section 4.3.5, in particular gure 4.14. This uncertainty is large compared with what was achieved in chapter 3; we are contenting ourselves with a lower level of accuracy in our knowledge of the sampled distribution, and thus of the canonical distribution and canonical averages. However, the error bars alone do not quite tell the whole story. Very little of the error is attributable to local uctuations; the shapes of the peaks of P can are individually well-outlined, so averages calculated over the peaks separately (i.e. calculated for a single phase), such as the average specic volume < v > and the isothermal compressibility T , for a particular input pressure, do not have such large errors as one might expect from the O(1) error in P~ can . Indeed, comparison of gures 4.14 and 4.15 shows that the error bars are smaller away from coexistence, when only one phase need be considered. By far the larger part of the error in P~ can (pcoex ) comes from uncertainty in the relative weights of the two phases: the interphase distance (in V -space) is < V >expanded < V >dense = N (< v >expanded < v >dense ), while the width of the p.d.f. of a single phase N 1=2 , as we know from statistical mechanical arguments (see section 1.1), and shall conrm in section 4.3.6. Thus, for all but very small systems, or systems very near to criticality, the interphase distance is larger and so has the greater eect on error, because the local uncertainties in P~ can (pcoex ) accumulate over the large distance in volume that separates the phases. In addition, the error in the intensive parameter pcoex , (or, equivalently, in the dierence in free energy density) is smaller than might be expected. This occurs because pcoex aects the relative weights of the phases via a term p(< V >expanded < V >dense ); thus, because < V > is extensive, an O(1) error in the relative weights corresponds only to an O(1=N ) error in pcoex . We anticipate that for still larger systems accurate results would be obtained even if the error in the sampled p.d.f. were larger than O(1), that is to say, even before the establishing of what we have dened as a multicanonical distribution. That the error bars on pcoex are indeed small even for N = 256 is shown in gure 4.22 below. In fact, as we shall see, notwithstanding the O(1) error, our results are at least as good as the results already published for this system [69, 171]. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 236 The Multicanonical Method Compared with Literature Thermodynamic Integration It is apparent that by using the transition method, possibly combined with FSS, we can measure the entire F (v) of the square-well solid from knowledge only of a volume interval that we expect to contain the coexisting phases. We should contrast this with the thermodynamic integration procedure used in [69, 171], where a reference system must be used for each F (v) point. When enough of F (v) is mapped out, pcoex and < v > are located with the double-tangent construction [47]. The reference system used in [69, 171] is the hard sphere solid, for which a good equation of state is known [181]; this is therefore not a computationally expensive procedure, since the potentials of the two systems are similar and so only a short integration path is required|though of course for other systems such a convenient reference system might not exist, and there is some evidence even here that there are problems with it near the critical point (see section 4.3.7). In any case, compared with the multicanonical method, extra complexity is introduced by the use of the reference systems, while we still need an a priori guess of the volumes where we believe the phase coexistence lies. Moreover, the double-tangent construction is equivalent to nding p0coex that produces equal heights of the peaks of P can (p; v); the equal-weights and equal-heights criteria both have the same large-system limit but for small systems the equalweights criterion is the more natural. We shall discuss in section 4.3.6 which gives the better estimate of the innite-volume transition point. 4.3.5 Canonical Averages We now present some results for various canonical averages, evaluated using this version of the multicanonical method. The averages are calculated both at the nite-size coexistence point and for a substantial range of pressure around coexistence. We choose as an example the N = 108 system at = 10=11, though similar calculations can be made for all the other system sizes and temperatures investigated. We used Nr = 1000 and Nm = 1292. First, in gures 4.14 and 4.15 we show how P can(v) can be reconstructed for various pressures from the multicanonical results. Figure 4.14 and its inset show how P can (pcoex ; v) can be accurately measured over a range of (here) more than 10 decades. Once again we emphasise that, even if we had a serial computer as fast as the Connection Machine, and so were not so hampered by the long random walk time, Boltzmann sampling would fail on a problem like this, where two CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 237 modes are separated by a region of very low probability. 800.0 p=70 p=30.2 p=25 600.0 1.5e-07 P(v) 400.0 1e-07 P(v) 5e-08 200.0 0 0.72 0.73 0.74 0.75 0.76 v 0.0 0.70 0.75 0.80 v Figure 4.14. Main gure: P can(v) at high (p = 70), low (p = 25) and intermediate (p = 30:2) pressures, for = 10=11 and N = 108. p = 30:2(1) is the nite-size estimate of the coexistence pressure, where the two phases have equal weights; note that the probability density is much smaller in the rare phase (on the right), because of its higher compressibility. p = 25 is the lowest pressure which can be reliably investigated, because at lower pressures P can has appreciable weight outside the investigated range of volume. The inset shows P can , with typical error bars, in the region between the peaks. P can was smoothed using a moving average over a window of 50 volume states. Then in gure 4.16 we show < v >, the average volume per particle, evaluated from these distributions, as a function of pressure. < v > is calculated by averaging only over the phase that is favoured at the pressure under consideration (this should be compared with gure 4.21 below). The discontinuity in < v > at p 30, showing the presence of a rst-order phase transition, is clearly visible. The estimates of < v > at pcoex are also shown as data points on this gure. Figure 4.17 shows the isothermal compressibility T = ( 1=v)@v=@p = (N= < v >)(< v2 > < v >2 ) as a function of p, with the averages once again calculated only over the favoured phase. Once again the eect of the phase transition is clearly apparent; the very dierent values of T reect the dierent structures of the two phases (see section 4.3.8). We comment upon the nite-size scaling of < v > in section 4.3.6. We should also conrm that the system is indeed solid for all densities studied. To do this, we show in gure 4.18 the pair correlation function g12 (r) P12 (r)=4r2 for N = 108; = 10=11, CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 238 where P12 (r)dr is the probability that a particle has a neighbour in a shell of radius r and thickness dr centred on its present position. The full line shows averages gathered using those simulations that were in the dense phase (in fact the average was gathered for a range of densities rather wider than the peak of P can(pcoex ; v), and was formed without weighting by P can (pcoex ; v), so we would expect it to be slightly dierent in detail from the true g12 (r) at constant NpT , but qualitatively the same); the dashed line shows the corresponding results for the expanded phase. It is clear that the particles in the expanded phase have substantially more freedom of movement; nevertheless, the fact that g12 drops to zero between the peaks shows that the particles remain localised on their lattice sites, that is, that both phases are p solid. The ratio of the location of the nth and mth peaks (for n; m = 1; 2; 3; : : :) is n=m, as expected for a fcc lattice. In gure 4.19 we show a detail of g12 at = 10=11 around r = . This gure clearly shows that g12 has discontinuities at r = and r = (1 + ), matching the discontinuities in the potential. The discontinuity at r = is simply of a consequence of the hard core in the potential, which prevents any particles approaching more closely than . The presence of the other can be rationalised by considering g~12 for two isolated particles. Since g~12 (r) P (r)=r2 , we would expect that, as ! 0, g~12 ((1 + ))=g~12 ((1 + + )) ! exp(E0 ) (4.16) In the solid, g12 is of course modulated by the presence of all the other particles, but there is no reason to expect their contribution to produce a discontinuity at r = (1 + ), and any other behaviour will leave equation 4.16 unaected. The presence of this discontinuity is thus explained, and from the gure we can estimate its magnitude to be g12 ((1 + ))=g12 ((1 + + )) = 71:5(5)=29:0(3) = 2:47(4) (dense phase) = 32:0(3)=13:5(3) = 2:42(6) (expanded phase) in good agreement with exp(10=11) = 2:48. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 239 800.0 p=70 p=50 p=35 p=30.5 p=30.25 p=30 p=29.75 p=29.5 600.0 P(v) 400.0 200.0 0.0 0.716 0.718 0.720 0.722 0.724 v 60.0 p=29 p=28 p=27 p=26 p=25 40.0 P(v) 20.0 0.0 0.70 0.75 0.80 v Figure 4.15. A more detailed view of the peaks of P can(p; v) for the same = 10=11 and N = 108 system featured in gure 4.14 but at a range of dierent pressures. P can was smoothed using a moving average over a window of 50 volume states and some typical error bars are shown. The upper diagram shows the peak corresponding to the dense phase, while the lower diagram shows the peak corresponding to the expanded phase. Note that, in this diagram, it is just discernible for p = 29 that there is some weight at volumes corresponding to the dense phase. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 240 0.84 0.82 0.80 v 0.78 0.76 0.74 0.72 0.70 20.0 25.0 30.0 35.0 40.0 45.0 p Figure 4.16. < v > of the favoured phase as a function of pressure for = 10=11 and N = 108. Typical error bars are also shown. The estimates of < v > for each phase separately at coexistence are also shown (triangles). 1.0e-02 1.0e-03 κΤ 1.0e-04 1.0e-05 20.0 25.0 30.0 35.0 40.0 45.0 p Figure 4.17. Isothermal compressibility T of the favoured phase as a function of pres- sure for = 10=11 and N = 108. Because of the large variation in T between the two phases, a logarithmic vertical scale is used. The compressibility is evaluated using T = (N= < v >)(< v2 > < v >2). Typical error bars are also shown. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 241 300.0 dense phase expanded phase 200.0 g12 100.0 0.0 0.0 1.0 2.0 3.0 4.0 r/ σ Figure 4.18. g12(r) for N = 108; = 10=11. Full line: average for over the dense phase; dashed line: average over the expanded (rare) phase. 120.0 dense phase rare phase 100.0 80.0 g12 60.0 40.0 20.0 0.0 0.98 1.00 1.02 1.04 1.06 1.08 1.10 r/ σ Figure 4.19. g12(r) for N = 108; = 10=11; detail of the region r . Full line: average over the dense phase; dashed line: average over the expanded (rare) phase. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 242 4.3.6 Finite-Size Scaling and the Interfacial Region We shall now look at the nite-size behaviour of P can (pcoex ; v) and the various estimators obtained from it. We shall use both N and L, the side length of the simulation volume, to measure the system size. It is instructive to compare P can (v) for various system sizes, to show explicitly the decreasing size of the fractional uctuations with increasing N expected from elementary statistical mechanics (section 1.1). This is done for = 1:0, N = 32, 108 and 256 in gure 4.20. As well as the expected narrowing of the peak, it is also apparent that the nite-size estimate of < v > at the nite-size coexistence point depends quite strongly on N , and an extrapolation must be used in making an estimate of < v > in the N ! 1 limit. 500.0 N=32 N=108 N=256 400.0 50.0 300.0 P(v) 40.0 30.0 200.0 P(v) 20.0 10.0 0.0 0.76 100.0 0.0 0.70 0.78 0.80 0.82 0.84 v 0.75 0.80 0.85 v Figure 4.20. P can(v) for = 1:0, N = 32, 108 and 256. The peaks narrow as N increases, and there is also some variation of < v > with N . The inset shows the expanded phase in greater detail. The inset shows the expanded phase in more detail. The narrowing of P can(v), and its movement to lower volumes, are clearly visible. In fact, in addition to the expected qualitative p behaviour of P can (v), there is also qualitative agreement with the prediction v= < v > 1= N ; for example, the compressibility T of the expanded phase, evaluated for the three system sizes at a pressure just below the equal-weights transition point, is T = 8:6(2) 10 3 for N = 32, T = 8:5(5) 10 3 for N = 108 and T = 8:2(7) 10 3 for N = 256. The approximate equality p of the estimates of T shows that var(v)= < v > 1=N , i.e. v= < v > 1= N . CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 243 Figures 4.21 and 4.22 show more clearly the nite-size scaling of estimators, in particular those related to the transition point (again for = 1:0). Figure 4.21 shows < v >N as a function of p, where < v > is now averaged over all volume states (cf. gure 4.16). The narrowing of the pressure region over which the transition takes place is clearly visible, as is its movement to higher pressure. The diagram also shows thermodynamic integration results for an N = 108 system, taken from [171]. There is clearly good agreement between the two sets of results. 0.820 0.800 0.780 <v> 0.760 0.740 0.720 N=32 N=108 N=256 Thermodynamic Integration (N=108) 0.700 18.0 20.0 22.0 24.0 26.0 p Figure 4.21. < v >N (evaluated by averaging over all volume states) as a function of p for N = 32; 108; 256, = 1:0. Also shown are thermodynamic integration results for N = 108, from [171]. Figure 4.22 shows the nite-size scaling of the estimate of the transition pressure pcoex , evaluated both by equal-heights and equal-weights criteria. Once again, the results seem consistent with the estimate given in [171]. Both the estimators seem to experience a 1=N nite-size correction with respect to the innite-volume limit, though the correction is rather smaller in the equal-weights case than the equal-heights. For this reason, and because it is the more natural nite-size estimator, we have generally preferred to use the equal-weights criterion. The least-squares ts to both sets of data (dashed lines) are equally good and both have the same intercepts, within error. The ordinate intercepts, which are the best estimates of the innite-volume transition point, are both at pcoex = 22:78(6). Our ndings are interesting in the light of the theoretical prediction, made for lattice models CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 244 with periodic boundary conditions in [23, 182, 183], that the equal-heights estimator of the transition `eld' (temperature or magnetic eld) should indeed have 1=N corrections, but the equal-weights estimator should have only exponentially small corrections. It might be thought that these arguments should apply to o-lattice systems too, but our results suggest that neither estimator has exponentially small corrections. This may be due to the fact that the ensemble under consideration here, with its variable volume, is dierent to the constant-volume ensemble of the lattice model. 24.0 Equal Heights Equal Weights Thermodynamic Integration p coex 23.0 22.0 21.0 20.0 19.0 0.000 0.010 0.020 0.030 0.040 1/N Figure 4.22. pcoex as a function of 1=N , evaluated using equal-weights and equal-heights criteria for = 1:0. Also shown are thermodynamic integration results for N = 108, from [171], and least-squares ts to the data (dashed lines), both of which have their ordinate intercept at p = 22:78(6). Another interesting application of nite-size scaling theory is to the canonical probability P can (pcoex ; v) in the region between its two peaks. Let us dene can rp ln PP can((vv0)) l (4.17) where P can(v0 ) is the probability density at one of the peaks of Pcoex (v) and P can(vl ) is the p.d.f. at its lowest point between them (v0 and vl vary slightly with system size). We show the measured behaviour of ln(rp ) against ln(L) for = 1:0 in gure 4.23; the graph CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 245 5.0 ln(rp) 4.0 3.0 2.0 0.6 0.8 1.0 1.2 1.4 ln(L) Figure 4.23. ln(rp ) against ln(L) for = 1:0 and system size L =2, 3, 4 (N = 32, 108, 256); rp = ln(P can (v0 )=P can(vl )) by simulation. has a gradient of 2.91(1). Now, the statistical mechanics of phase coexistence (see section 1.1) predicts that the dominant congurations around vl should consist of regions typical of each of the stable phases, separated by the phase boundaries. Accordingly, the free energy in this region has the form F (v) Ld fb (v0 ) + a0 (v)Ld 1 fs + where fb (v0 ) is the free energy of the bulk, a0 depends on the geometry and fs is a surface free energy. Thus, since P can(v) exp( F (v)), we would expect rp Ld 1 = L2. The discrepancy between this prediction and the result of the simulation is, we believe, a consequence of the particular simulation method we have chosen: periodic boundary conditions are used and changes in volume are made by a uniform scaling of the particles' positions, which leaves the shape (cubic) of the simulation volume unchanged. If one tries to generate a mixed-phase conguration in such a cubic box, one nds that it cannot be easily made to t; the size of the box is determined by the largest length in the region of expanded phase, and it cannot contract around the regions of dense phase. Some planes of particles in the dense phase are therefore separated, eectively breaking up the uniform structure of the phase. In a simulation of a uid, particles would move between the phases to ll up the `gaps,' but that does not occur here since the particles are held on their lattice sites, and, even if they could move, there would be CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 246 commensurability problems with the simulation box, whose side length is only a few times the lattice spacing. Thus, the simulation fails to open up what is presumably, for a real system, the dominant conguration-space pathway between the two phases, because of the suppression of congurations containing interfaces. Instead, the most probable congurations in the interphase region have a uniform structure that is commensurate with the shape of the simulation box. This implies F^ (v) Ldf^(v) + where f^(v) > f (v0 ) is the bulk free energy at intermediate volume. This implies rp Ld = L3, in much better agreement with what is found. Inspection of the distribution of free volume in simulations around vl conrms that the simulations here seem to consist of a single phase only. This means, then, that the simulation cannot be used in its present form for the evaluation of interfacial free energies. We should note, however, that all our previous results about the coexistence pressure or the volumes of the stable phases remain valid: these phases are of course homogeneous in structure and so present no diculty to the simulation, while to determine their relative weight (to calculate the coexistence pressure) simply demands the presence of some reversible conguration-space path connecting them. Whether this path is the one followed in the `real' system is immaterial, given that the states along the path have negligible weight themselves, both in the canonical ensemble and the ensemble which is accessible to simulation. 4.3.7 Mapping the Coexistence Curve Using a series of multicanonical simulations of dierent nite sizes and having dierent temperatures, we have mapped out the solid-solid coexistence curve of the square-well system with = 0:01 between = 1:0 and the critical point, which appears to lie at 0:6. Simulations were carried out for N = 32, N = 108 and (for = 1:0 only) N = 256, and for each simulation the canonical p.d.f. of v was reconstructed and the equal-weights criterion used to identify pcoex . While in sections 4.3.5 and 4.3.6 it was always easy to distinguish the regions of macrostate space associated with the two phases, because the temperature was low, this becomes progressively more dicult as the critical region is approached. The canonical probability of the region between the two modes increases and nally the modes merge together. This is shown CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 247 happening for N = 108 in gure 4.24; the p.d.f. has become unimodal by the time = 10=17. For N = 32 the same process occurs at lower temperature. 400.0 β=1.0 β=10/11 β=10/12 β=10/13 β=10/14 β=10/15 β=10/16 β=10/17 400.0 300.0 300.0 P(v) 200.0 P(v) 200.0 100.0 0.0 100.0 0.0 0.718 0.720 0.722 0.724 v 0.72 0.74 0.76 0.78 0.80 0.82 v Figure 4.24. P can(v) for N = 108 and a range of values, at coexistence. Ns =540 ( = 10=17) { 1292 ( = 1:0). The inset shows the dense phase on an expanded scale. In implementing the equal-weights criterion, we have used an arbitrary division of the range of v at or near the point where P can is minimal, even though a tting of two overlapping Gaussians might perhaps be better. This is not expected to have a great eect on the assignment of pcoex, which as we have already seen is little aected even by the choice of equal-heights or equalweights criterion. However, it does mean that the estimates of < v > tend to move away from the modes of the canonical P can(v). Therefore, once the temperature exceeds a certain value, chosen by inspection, we move to using the location of the modes as nite-size estimators. The `best estimates' of the innite-volume limits of pcoex and the specic volumes < v > are then calculated by extrapolating the nite-size data against 1=N . We saw in section 4.3.5 that this procedure works well for low temperatures: it is applied here even to near-critical temperatures because to treat the critical region properly, to obtain estimates of c and critical exponents, it is necessary to make highly accurate measurements of the joint p.d.f. of the order parameter and energy [121, 140], which we have not had time to do. The phase diagrams are thus not expected to be particularly accurate in the critical region. CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION N Nm Nr Ns speed (sweeps/iteration) (sweeps/hour) 32 159{281 ( =10/15{1.0) 4096 3500 1800 108 540{1292 ( =10/17{1.0) 4096 2000 500 256 2949( =1.0) 1728 2000 500 248 NEDGE 1 3 4 Neq <1 1 1 Table 4.4. Parameters for the various `production' simulations used in the evaluation of the phase diagram. P can and derived quantities were generated by averaging over about 3{6 iterations. The reader might be concerned that the suppression of congurations containing interfaces, as described in section 4.3.6, might aect the estimates of pcoex and < v > at higher temperatures, in the region where for a nite-size system the interphase congurations do have signicant weight. However, these congurations only acquire signicant weight once the correlation length, or the width of typical interfaces, has exceeded the size of the system [17], so the dominant congurations are once more homogeneous in structure. In particular, we do not foresee incommensurability problems aecting simulations in the critical region, and we believe that an accurate treatment using the methods of [121, 140] would be a fairly straightforward extension of the work performed here. The multicanonical distributions were produced rst for the N = 32 systems, sometimes starting from =0, as described in section 4.3.3, but sometimes using a pre-existing xc from a simulation at a dierent temperature as a rst estimate. xc for the larger systems were produced using nite-size scaling followed by renement. The parameters used in the various simulations are shown in table 4.4. The results are shown below; gure 4.25 is the p phase coexistence curve, while gure 4.26 is the v phase diagram. We also show results from [171] for comparison (dashed lines). It is apparent that there is quite good agreement between the two estimates of the p coexistence curve. Any discrepancies are at most of the order of 1% and most lie within the error bars on the multicanonical points; some do not but since we have no error bars on the thermodynamic integration data this need cause no concern. However, it should be noted that the discrepancies that do exist still correspond to O(1) dierences in the relative probabilities of the two phases for a N = 108 system. The agreement in the location of the phase boundary in the v solid-solid phase diagram is also fairly good, though there is a small but clear systematic disagreement in the specic volume of the expanded solid: at low , the integration method consistently ascribes to it a higher v than the multicanonical. There are several possible explanations for this, none of which CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 249 120.0 Thermodynamic Integration Extrapolated Multicanonical 100.0 80.0 p 60.0 40.0 20.0 0.0 0.4 0.6 0.8 β 1.0 1.2 Figure 4.25. The p solid-solid coexistence curve for the square-well solid with = 0:01. The data points are produced by extrapolating N = 32 and N = 108 (and for = 1, N = 256) data against 1=N ; the equal-weights criterion is used to nd the nite-size estimators of pcoex. The dashed line shows thermodynamic integration results for an N = 108 system, taken from [171]. is completely satisfactory. It may be due in part to the dierence between the equal-heights and equal-weights criteria: equal-heights (implicit in the double-tangent construction of TI) would tend to produce the larger estimate of < v >. However, this cannot be the whole story because equal-heights would also produce a lower estimate of pcoex , while gures 4.25 and 4.22 show that, if anything, the opposite seems to be true. Another possible cause is the nite-size scaling movement of the peak of P can(pcoex ; v) to lower v (visible in gure 4.20), but again this does not seem a large enough eect to account for all the discrepancy; the `raw' multicanonical results for N = 108 still lie much closer to the extrapolated data points than to the thermodynamic integration curve. The third, and possibly most likely, explanation is that the hard-sphere solid used as the reference system in [171] is unsatisfactory near to the solid-solid critical point. One might expect this to happen because the hard-sphere solid does not itself have a critical point, and so `typical' congurations of the two systems are far less similar than they are at higher , making the integration path longer and more awkward. The eect is in fact like that of having a phase transition on the integration path itself. In any case, whatever the cause of the low- CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 250 1.4 Thermodynamic Integration Extrapolated Multicanonical 1.2 1.0 β 0.8 0.6 0.4 0.70 0.75 0.80 0.85 0.90 v Figure 4.26. The v solid-solid phase diagram for the square-well solid with = 0:01. The data points are produced by extrapolating N = 32 and N = 108 (and for = 1, N = 256) data against 1=N . The dashed line shows thermodynamic integration results for an N = 108 system, taken from [171]. dierence in the estimates of v, one of its consequences is that the multicanonical data suggest that the critical temperature is probably lower (i.e. c is higher) than stated in [171]. c for the integration method is at ~cTI 0:576, while gure 4.26 implies instead that ~cxc 0:60(1), though in the absence of a proper critical FSS analysis we would not assert even this result with very much condence. 4.3.8 The Physical Basis of the Phase Transition So far in our investigations we have concentrated only on describing the solid-solid transition that is observed, without attempting to give a physical explanation of why it should occur at all. A full understanding would require the study of dierent potential ranges, and of the uid phase too, so we shall give only a brief semiqualitative description, concentrating mainly on the N = 108 system at = 1 and = 10=16. A rst-order phase transition is always associated with a nite-size p.d.f. of the order parameter (PNcan (v) here) that has a double-peaked structure, the peaks being associated with the coexisting phases in the thermodynamic limit. Since the logarithm is a monotonically increasing CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 251 function, this means that ln PNcan (pcoex; v) = N (f (v) + pcoex v) + constant has the same double-peaked shape. Now this shape means that the derivative @ 2 ln P=@v2 must pass from -ve to +ve and back to -ve. But @ 2 ( Npv)=@v2 0, so it is the other term on the RHS that must produce the curvature: @ 2 f (v)=@v2 must go from +ve to -ve to +ve. We now express f (v) itself as the sum of an energy and an entropy term: f (v) = e(v) s(v)= where e(v) = E (v)=N is an average internal energy density, and so is the means by which the interparticle potential exerts its inuence, and s(v) is an entropy density (we remind the reader that in the present units kB = 1). The multicanonical simulations give us f (v), and e(v) was also measured at the same time, internal energies being easily accessible in any sensible MC sampling scheme. s(v), which we expect to be mainly geometric in origin, is thus easily calculated. We show s(v) and e(v), together with f (v) + pcoex v ln PNcan =(N ), in gure 4.27. The functions s(v) and f (v) + pcoexv are arbitrarily shifted vertically so that they equal zero at the lowest v for which measurements were made. We show f (v) + pcoexv rather than f (v) itself so that the behaviour of the curvature is made more apparent by the removal of the overall `trend.' The +ve to -ve to +ve pattern of @ 2 f (v)=@v2 is visible, though the magnitude of the 2nd derivative is clearly small except at low v. Before proceeding, we shall comment on the behaviour of the functions outside the range of v that is shown in the gure (i.e. the range that was investigated by simulation). We must p have e(v) ! 6 as v ! vCP = 1= 2, since then all particles interact with all twelve of their nearest neighbours, and e(v) and all its v-derivatives ! 0 as v ! 1 and the particles become very widely separated. However, s(v) ! 1 as v ! vCP , because the volume of phase space accessible to the particles ! 0. Therefore f (v) ! +1 in the same limit. As v ! 1 we expect s(v) ! 1 and @s(v)=@v to remain nite. We should note that for some nite v rather larger than shown in the gure, the particles would no longer be held on their lattice sites by their neighbours even at zero temperature|that is, the solid would no longer be mechanically stable. As a rule of thumb, this occurs for fcc crystals when they have expanded to (1:07)3 of their CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 252 8.0 s(v), e(v) & f(v)+pcoexv 6.0 s, β =1.0 e, β =1.0 s, β =10/16 e, β =10/16 f+pcoexv, β =1.0 f+pcoexv, β =10/16 4.0 2.0 0.0 -2.0 -4.0 -6.0 0.72 0.74 0.76 0.78 0.80 0.82 v Figure 4.27. The internal energy density e(v), entropy density s(v) and free energy functional f (v)+ pcoexv for N = 108 at = 1:0 (full lines with symbols) and = 10=16 (dashed lines). An arbitrary constant is added to the entropy and free energy p so that they are zero at the lowest volumes investigated. The v-axis begins at vCP = 1= 2. close-packed volume [184], which corresponds to v 0:866. At this point, we would expect s(v) to increase very rapidly, or even discontinuously, as the particles become delocalised. Let us rst consider the = 1:0 data only. Comparison with gure 4.20 conrms that the dense solid (at v 0:72) has low entropy but is stabilised by low internal energy, while the expanded solid (at v 0:80) has higher internal energy but also higher entropy, in line with gure 4.3. As we move from low v to high, the internal energy increases, rst slowly (e remains p very close to 6 for 1= 2 < v < 0:717), then more quickly as the interparticle separation reduces to a point where the particles move in large numbers out of their neighbours' potential wells (producing the strongly positive curvature around v = 0:72). The rate of loss of energy then slows, giving a strongly -ve curvature around v = 0:73{0.74, and after this it increasingly levels o and attens out, as further increases in v make less dierence to the number of neighbours with which most particles interact. The rapid loss of internal energy even though CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 253 the density remains high, will be crucial in understanding the phase behaviour. The point of inection seems to be quite close to the point where the internal energy would drop to zero p with a uniform dilation of a perfect fcc crystal, which is at v = (1:01)3 = 2 0:7285. Apart from the expected divergence at v = vCP , the shape of s(v) is quite similar, though its high volume gradient is non-zero and its low-volume curvature (both +ve and -ve) is smaller in magnitude. We attribute s(v) largely to free volume in the crystal, so initially as we move away from close packing it increases rapidly, then more slowly and with a -ve curvature. The region of +ve curvature around v = 0:72 cannot be due simply to free volume and must relate to the unknown behaviour of P (sN jv). The dierence of the curvatures of these two functions gives the curvature of f (v), which is the determiner of the phase behaviour. At low v, the stronger curvature of e(v) is dominant, rst +ve, producing the dense phase, then -ve, producing the minimum of the canonical probability. At high v, e(v) attens out and the curvature of s(v) becomes the greater, producing a weak net +ve curvature of f (v) which stabilises the expanded phase. The smallness of the 2nd derivative of f (v) here is responsible for the large compressibility of the expanded solid: the `restoring force' resisting compression and dilation is essentially an entropic eect. At = 10=16, s(v) and e(v) have changed only a little from = 1:0; the largely geometrical factors that produce their shape are only slightly altered by the dierent relative probabilities of the various congurations within each v-macrostate. The main dierence in the phase diagram is produced not by these slight dierences but by the fact that the inverse temperature aects only the entropy term, making its eect greater, and not the energy term. Thus the low-volume dominance of @ 2 e(v)=@v2 is weaker, while the high-v dominance of @ 2 s(v)=@v2 is stronger and occurs for smaller v. Hence we expect the two phases to move together, the dense phase to become more compressible and the expanded phase less so, and the depth of the probability minimum between them to fall|just as is observed in the simulations (see gure 4.24). Indeed, though we have not carried out the calculations, we believe that the whole v phase diagram could be recovered qualitatively from the results of = 1:0 alone. At = 10=16 itself, the two phases have eectively fused together, as shown by the single minimum of f ( = 10=16; v). Conversely, the opposite would be true for coexistence at > 1. The dominance of the curvature of e(v) would be greater and the expanded phase would move to higher volume, eventually reaching those volumes (v 0:866 as we said before) where mechanical instability sets in. However it is very likely that, even before this, the expanded solid would become thermodynamically CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 254 unstable in comparison with the uid. The uid is always favoured entropically compared with the expanded solid, and the dierence in < e > cannot be very great since it is already near zero for the expanded solid, so for small the uid is suppressed, we believe, largely by the pcoexv term: < v >f for the uid is much larger than < v >xs for the expanded solid and pcoex is high. However as increases pcoex falls and < v >xs increases, so pcoex(< v >f < v >xs), which measures the net eect of this term, becomes smaller. Eventually, we presume, the uid will become the favoured phase and we will have moved above the triple line on the phase diagram in gure 4.2. We can also speculate on the eect that increasing has. e(v) will maintain a similar shape while being scaled in the v-direction, so its regions of high +ve and -ve curvature, that produce the dense phase and the interphase region, as well as the region of weak -ve curvature that allows the +ve curvature of s(v) to produce the expanded phase, will all move to higher volumes. Once is large enough, the expanded phase becomes unstable at all temperatures (either because the region of +ve curvature of f (v) moves out of the range of solid mechanical stability, or because pcoex (< v >f < v >xs ) becomes small enough that uid always has the lower g) and a phase diagram containing only two phases results, as in the central diagram of gure 4.4. This is consistent with the results of [171]. Finally, we note that the shape of e(v) should not be strongly aected by variations in the shape of the interparticle potential, as long as the short range is preserved, while s(v), being primarily geometric, should also be similar. The above arguments thus remain valid for more realistic shapes of E (rij ), supporting the assertion in the introduction that the square-well system should have a similar phase diagram to any other real or simulated system with a suciently short-ranged potential. 4.4 Discussion We shall now comment in turn on the two main subdivisions of this chapter, the investigations made using the Einstein solid reference system and those made using multicanonical sampling to directly connect coexisting phases. As regards the comparison of thermodynamic integration and the expanded ensemble using the Einstein solid (section 4.2), it is clear that in eciency of sampling (as quantied by the size of error bars for a particular computational eort), the expanded method is competitive CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 255 with TI. Moreover, there is perhaps some evidence (see gure 4.8) that the expanded ensemble's superior ability to deal with near-singular points gives it an advantage over TI both at = 0, where it handles the problem of the rare hard-core overlaps implicitly, and near = 1, where the integrand is rapidly-changing. However, because we used the TI results to bootstrap the expanded ensemble, and because we did not develop a method to enable the expanded ensemble to reach = 1 (though we believe such a development is possible), the comparison of the two methods is not strictly fair, and we certainly cannot claim to have proven the expanded ensemble's superority. We would add that the awkwardness experienced in transforming the square-well solid into the Einstein solid, (described as part of in section 4.2.1 for TI, though it also aects the expanded ensemble), is an argument in favour of avoiding the use of reference systems, or of using only those that strongly resemble the system of interest. We have also investigated matters pertaining to the use of the expanded ensemble alone. In section 4.2.3 we have shown that the subdivision and overlapping of a chain of interpolating ensembles does not inuence the accuracy with which the relative probabilities of the ends are measured (see table 4.2), conrming the result derived in section 3.4.2. We have also presented some results on choosing the spacing of states in expanded ensemble calculations, though we have done only a little work on this important matter. Our results (see table 4.1) do show the expected trade-o in random walk time between the number of states and the acceptance ratio, leading to a fairly wide eciency maximum, though we are not able to make the quantitative prediction of the acceptance ratio that might lead to an optimisation strategy. We might speculate (though it is not a matter we have investigated) that the notion of the maintaining of equilibrium within macrostates/subensembles (see section 4.3.3) may be relevant here: the larger the separation of subensembles, the less representative of equilibrium within each one will be the congurations that move into them from adjacent subensembles. Thus we would expect that the amount of equilibration needed between attempted subensemble transitions would increase, which would tend to favour the use of a close spacing of subensembles. We suggest that inadequate equilibration may be the reason that spurious results have been reported in some expanded ensemble-like techniques (such as Grand Canonical MC, where the process of inserting or removing a particle is naturally discrete and in a dense system may produce what is eectively a wide spacing of subensembles). Now let us discuss the investigations made using the multicanonical NpT -ensemble (section CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 256 4.3). We have studied the square-well solid with = 0:01 and have obtained detailed information about f (v) for various over a range of volumes including the coexisting phases, thus enabling the construction of the phase diagram (gures 4.25 and 4.26). The results are largely consistent with those obtained by TI in the literature [69, 171], and we have grounds to think that where there are discrepancies our results are the better ones. The full measurement of f (v) and e(v) also provides some physical insight into how the short range of the potential leads to the solid-solid phase transition (see section 4.3.8). The transition is possible because the energy function e(v) has the following features, all within a narrow range of v not much above close-packing: rst it has positive curvature, producing the dense solid, then negative curvature, producing a minimum of canonical probability, and nally it is nearly at (the system having lost most of its internal energy), so that the curvature of the entropy s(v) is able to stabilise the expanded solid. Because all this occurs for small specic volumes, thanks to the short range of the potential, every particle remains trapped in the cage of its nearest neighbours and both phases are solid. However, a full understanding of the physics of the transition would require treatment of the uid too. In order to apply the multicanonical ensemble to this problem, we have also had to extend it, because the very long random walk time rw in this problem prevents a straightforward implementation. We have solved this problem by increasing the use of transition probability estimators to enable ecient parallelisation using many replica simulations. We foresee this improvement being widely applicable, since it largely overcomes the problems caused by the inherent serialism in the multicanonical method itself. In section 4.3.3 we show that ecient convergence to the multicanonical distribution is produced by TP estimators, possibly in conjunction with FSS, and by continuing with the use of TP estimators in the `production' state (a procedure extensively justied in section 4.3.4) we have arrived at an iterative scheme that achieves to a very large extent the ideal of a combination of the `nding-' and `production' stages. Thanks to the separation in this simulation of updates that alter the preweighted variable but preserve the conguration, and those that do the opposite, we have also, as described in section 4.3.3, gained new insight into the importance of the preservation of equilibrium at constant v. Failure to equilibrate completely is the reason why convergence to the multicanonical distribution is not immediate, and, because the estimator of the sampled distribution P n is CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 257 least aected by incomplete equilibration when P n is multicanonical18, it is also the reason why the multicanonical distribution should still be used in the production stage. The simulations of chapter 3 did not provide this insight because the same update procedure both made moves between dierent macrostates of the preweighted variable and equilibrated the microstates within them. For the square-well solid we have been able to tackle systems where rw is much longer than the time devoted to each replica simulation; though because each replica now traverses only a part of the macrostate space this is achieved at the cost of losing some of the improved ergodicity that can usually be claimed for the multicanonical ensemble. To simulate systems where ergodic problems are more severe, like spin glasses, this procedure would not be adequate; connection to those regions of macrostate space where decorrelation is fast would then be essential. Indeed the best procedure might well be to launch all replicas from these states instead of spreading them uniformly. The procedure would then resemble that of section 3.2.5, and, as was the case there, the time devoted to each replica would have to be O(rw ). This is still much less than would be required using VS estimators, because the TP estimators would take account of the starting distribution of the replica simulations. To be sure that there was no bias would require that the VS histogram stayed at notwithstanding the biased launch points (see section 4.3.3). It might in practice be found, as in section 3.2.5, that any trend to the histogram had little eect. Alternatively, we might try modifying the method; we speculate that making the VS histogram at by including only some of the transitions of certain replicas in Cij may have the desired eect. It will be noted that, in the form in which it is applied here, the multicanonical ensemble bears some resemblance to the `multistage sampling' approach (section 2.1.2), where each simulation would be constrained to walk (possibly multicanonically) within a narrow section of the full range of macrostates, overlapping with its neighbours. From a VS histogram, the p.d.f. of V within each section would then be estimated, and by imposing continuity between the sections, it could be reconstructed for the whole range of macrostates. In the multicanonical approach, there are no constraints on the movement of the replica simulations, but they do behave in a similar way in practice because rw is so long. Nevertheless, the multicanonical approach retains several advantages. To use multistage sampling, we must decide a priori how to divide up the range of macrostates|how wide each section should be and how much it should overlap 18 strictly, when P n reects the (imposed) distribution of the replica simulations CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 258 with its neighbours. We also must decide how to match the results from the various histograms, using just the overlapping states or using a function tted to the whole histogram. The use of the single pooled histogram with TP estimators handles all of this transparently. Moreover, to allow full equilibration (in the VS sense) over the range of macrostates in each section of the multistage sampling simulation would require that each section should contain substantially fewer macrostates even than are explored by one of the multicanonical replicas in the course of its run. Thus, even though the multicanonical method does not have very good ergodic properties here, the multistage sampling approach would be even worse in this regard. It is also true that the multistage sampling approach would have the lower acceptance ratio of volume-changing moves, because attempted transitions that would take a replica out of its allowed section of the macrostate space must be rejected. Finally, we do not think that the time spent in generating the multicanonical distribution (which is in any case typically only about a quarter of the total time spent) could be saved by using multistage samplings. It is true that the sampled distribution can be canonical within each section, but this makes the overlapping process more dicult because fewer counts are recorded at one end of the range of states so that uctuations have a greater eect. It is also possible, though we do not know for certain, that the same problems that were found in using TP to estimate P n in the early iterations of the multicanonical method (discussed at the end of section 4.3.3) would recur. It is of course possible to use a sampled distribution in the multistage sampling method that is multicanonical within each section, but this then requires the same sort of generation process as does a distribution that is multicanonical over all the macrostates. Let us nally make some comments on the eciency of this multicanonical procedure as compared to the thermodynamic integration method used in [69, 171] (it is not possible to make an absolutely fair comparison with TI because to integrate along a path of variable V (see equation 2.3) would require the measurement of < p(V ) >, which, as we explained in section 4.2.1, is inaccessible for this system). Though we do not, of course, know how much time was required for the simulations of [69, 171], the fact that several dierent ranges of potential were investigated, and the uid phase was included too, suggests that the thermodynamic integration done there is appreciably faster. This is surely a consequence of the use of a reference system which is `similar' to the square-well solid, whereas the multicanonical method is an ab initio method. The accuracy of these TI results is thus dependent on the accuracy of the equation of state used for hard spheres, but any slight errors induced by this are probably CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION 259 outweighed here by being able to use a very short integration path (except perhaps near the critical point|see section 4.3.7). In a case where there was no suitable reference system, we do not believe TI would be in any way superior. Even here, applied to a system to which it is not particularly well-suited because of the hard-core in its potential (see section 4.3.2), the multicanonical ensemble has some advantages: it seems to be more accurate near the critical point (it could certainly be extended to yield accurate information about the critical region) and it gives estimates of canonical averages away from the phase transition (see section 4.3.5). We also consider it to be a more transparent way of obtaining information about f (v), because it uses the interpretation of f (v) as the logarithm of a probability to relate it directly to the probabilities that are measured in MC simulation. Chapter 5 Conclusion Look at the end of work, contrast The petty done, the undone vast. ROBERT BROWNING The main problem we have addressed in this thesis is perhaps the generic problem of statistical mechanics: the identication of which phase is stable at particular given values of the thermodynamic control parameters, leading to the construction of the phase diagram. As we described in chapter 1, there are two fundamental ways that MC simulation can be used to approach this problem. Firstly, each phase can be tackled separately, which requires a measurement of its absolute free energy. This is most usually done by connecting it to a reference system in some way. Secondly, the free energy dierence between the two phases (closely related to their relative probabilities) can be measured directly by allowing them to interconvert. This removes the necessity of measuring the absolute free energy, but some way must be found to overcome the free energy barriers that separate the two phases. In our own investigations, we have concentrated on the ways that non-Boltzmann sampling (particularly the multicanonical ensemble) can be used to tackle the phase coexistence problem. This type of extended sampling can be used in either of the ways described in the rst paragraph: it can be used to connect to a `reference state' at innite or zero temperature to measure absolute free energy (as in much of chapter 3), or it can be used to overcome the free energy barrier between two phases to measure their relative probability directly (as in section 4.3). An idea that turned out to have particularly wide applicability was to use the information 260 CHAPTER 5. CONCLUSION 261 contained in transition probabilities between macrostates. The transition probability method was was originally developed as a way of rapidly producing a distribution approximating to the multicanonical (section 3.2.3). While very eective for doing this, it also provides a way of removing the bias due to the starting state in a MC run which is not long in comparison with rw (see sections 3.2.5 and, particularly, 4.3). This greatly facilitates the implementation of the method on parallel computers, and may be more generally useful. Consideration of transition probabilities also prompted us to do the analytic work on Markov chains (section 3.4.2) which led to the result (conrmed by simulation in section 4.2.3) that the accuracy of a multicanonical/expanded ensemble calculation is not improved by subdividing the macrostates, and which also enabled us to predict the expected variance of estimators from various sampled distributions. While the resulting prediction of an `optimal' sampled distribution (section 3.4.4) is probably mainly of curiosity value, variance calculations were useful in checking the validity of TP estimators in section 4.3.4. Much of the work done was purely on the development of the method, but physically interesting results were obtained, in section 3.3.3 on the high-M scaling of the 2d Ising order parameter, and, particularly, in section 4.3, where aspects of the solid-solid phase transition in the square-well solid have been investigated with greater accuracy than before. This work thus provides a contribution to the growing body of evidence describing variation in the nature of phase-diagrams as qualitative features of the interparticle potential vary. Is it likely that the multicanonical/expanded ensemble will ever replace thermodynamic integration as the `standard' method of free energy measurement? There seems to be no reason why not in principle: Though there may be some cases where TI's eciency is higher (one is mentioned in section 2.1.4), whenever we have compared the two over the same path, as in sections 3.3.2 or 4.2, we have found the multicanonical approach to be at least comparable in accuracy (something we would expect in the light of the results of section 3.4.2 on subdivision of the range), while it can deal better with phase transitions or ergodic problems. It also has the advantage of relating the desired free energies more directly to the probabilities that are measured in MC simulation. Its most serious drawback is still the diculty of constructing the required sampled distribution, but we can now begin to see that this disadvantage can be largely overcome, using FSS, extrapolation, or the TP method. The TP method also helps in the ecient parallelisation of the method, a matter which is bound to become increasingly CHAPTER 5. CONCLUSION 262 important in the future. Therefore, we expect increasing use of multicanonical/ expanded ensemble methods, particularly for systems like spin-glasses or dense uids, where thermodynamic integration experiences diculties because of ergodicity problems or phase transitions on the integration path, or for the measurement of surface tension or the probability of large uctuations, where thermodynamic integration is not appropriate at all. For many other systems (e.g. uids at fairly low densities), however, it seems unlikely that thermodynamic integration will be entirely displaced, because the multicanonical ensemble does not oer any great advantage to compensate for the greater eort needed to code it|thermodynamic integration can be done with only a little modication of Boltzmann sampling simulations. We should also mention here that, while we have not investigated them ourselves, several other new methods discussed in the review section (chapter 2) were either extremely ecient or seemed to hold promise, particularly the Gibbs ensemble (see section 2.2.7), the dynamical ensemble (section 2.2.5) and the (MC )3 and tempered transitions methods (discussed in section 2.2.3). Whatever sampling scheme is used, it seems likely that the free energy problem will always remain `hard,' simply because of the distance in conguration space that must be sampled over. When sampling within a single phase, all likely congurations are broadly similar, but in the free energy problem, whether approached by connecting with a innite or zero temperature reference state, or by tunneling between two coexisting phases, the simulation must move between congurations which are qualitatively very dierent in structure. Because the Metropolis algorithm works by accumulating small changes to a starting conguration to produce others, it inevitably takes a long time to build up large dierences. Thus (as we found in section 3.4.3), once we have removed the free energy barriers that produce a tunneling time that is exponential in the system size, further changes to the sampled distribution produce no more than a marginal improvement in the eciency of sampling. to make further large improvements will require the use of algorithms that can take larger steps through conguration space than the Metropolis algorithm. Though we have not done any work on them in this thesis, some such algorithms are already being developed: [116, 118, 115]. We think it is likely that the hybrid MC technique [39] could also be fruitfully combined with multicanonical sampling. Finally, let us speculate on some interesting problems that might in the future be tackled with the multicanonical/expanded ensemble method. Multiple minima problems, for example the simulation of spin-glasses and protein folding, are examples of applications which demand the good ergodicity properties of multicanonical-like methods, and some work has already been CHAPTER 5. CONCLUSION 263 done on these. We also consider that the development of the technique's ability to sample across phase boundaries would be interesting and productive. This has already been done for isostructural solid-solid (chapter 4) and uid-uid [121] transitions; we consider it possible that a way could also be found to sample reversibly across a solid-uid phase boundary, perhaps using the order parameter introduced in [24], that measures `crystallinity,' as the preweighted variable. This would be, to our knowledge, the rst simulation of reversible melting and freezing. As well as enabling the simulation to be done more elegantly, without needing separate reference systems for the solid and liquid, the pseudodynamics of such a simulation could give insight into the process of nucleation in real systems, which is itself a subject of considerable interest. Appendix A Exact Finite-Size Scaling Results for the Ising Model Let the Gibbs free energy density in the innite-volume limit be gb . We shall examine the behaviour of the nite-size estimator g(L) = G(L)=L2 . For the critical temperature only, we shall also discuss F (M ; L). Now, the eect that nite size has on a system's properties depends on its boundary conditions, and on where in its phase diagram it is located|in a single phase region, at the coexistence of two or more phases, or near the critical point. (For the 2d Ising p model, the critical point occurs where the coupling = c = (1=2) ln( 2 + 1) 0:440686.) We shall consider rst the case where the system has periodic boundary conditions (P.B.C.). High Temperature (Small ) For many systems with short-ranged forces, in the single phase region, with P.B.C., g(L) has only exponentially small corrections [185, 186, 183], so g(L) = gb + O(L 2 e L=L ) 0 (A.1) The source of the correction is the interaction of a particular spin or particle with its `image' in the periodic boundary; in the absence of long-range forces this interaction has to be mediated through O(L) spin-spin interactions, and away from criticality correlations decay exponentially [12]. The constant L0 will therefore be a function of the inverse temperature , having similar behaviour to the correlation length . 264 APPENDIX A. EXACT FINITE-SIZE SCALING RESULTS FOR THE ISING MODEL 265 Low Temperature (Large ) In a single phase region, there are once again only exponential corrections [183]. However, for the 2d Ising model, the line H = 0 for > c is the line of coexistence of two phases of opposite magnetisation. Therefore we must now consider the case of phase coexistence. At the coexistence point of q + 1 phases, with P.B.C. and away from criticality, the free energy has been shown [182] to have the following form for a large class of lattice models: G(L) = 1 ln " q X m=0 # exp( Ld gm ( )) + O(Ld e L=L ) 0 (A.2) where gm is a `metastable free energy' of phase m: gm = gb if m is stable, otherwise gm > gb . For the 2d Ising model, there is an exact symmetry between the two phases, so both terms in the sum in equation A.2 are the same. This leads to g(L) = gb 1 L 2 ln 2 + O(e L=L ) 0 The Critical Region ( = c) Before discussing the scaling of g(L), we shall discuss the scaling of the Helmholtz free energy F (M ; L) and thus of PLcan (M ) at this temperature. We have seen that at high temperatures FL (; M ) has a single minimum at M = 0 and at low temperatures two minima at M < M >. In both cases as L increases the fractional width of the peaks of PLcan (M ) decreases, so that at high temperatures PLcan (; M ) is concentrated progressively in the centre of the distribution, and at low temperatures in its wings. The crossover between these two types of scaling occurs at the critical point, = c , where PLcan (; M ) has the property that the relative weights of centre and wings remain constant, in the sense that P can (0)=P can(M ) = constant ( 0:04 for the 2d Ising model), independent of L. The scaling of PLcan (c ; M ) is quite dierent from its scaling away from c ; PLcan(c ; M )dm = p (x)dx, where x = ML dL d=(1+) and where p (x)dx is a universal function and a constant for all the systems in a particular universality class. For the universality class that includes the 2d Ising model, p (x) has a doublepeak structure and = 15. Thus the modes of PLcan (c ; M ) lie at M where M L15=8 , which is to say that the modes of PLcan (c ; m) lie at m L 1=8 . This, together with the constant value of P can (0)=P can(m ), which is easily large enough for easy tunneling between the two peaks, produces the large (almost extensive in the system size) uctuations in the order APPENDIX A. EXACT FINITE-SIZE SCALING RESULTS FOR THE ISING MODEL 266 parameter M that are typical of the critical point. Now we return to consideration of the Gibbs free energy. Near the critical point with P.B.C., it is well established [17, chapter 1] that the free energy g(L) can be decomposed into a singular part gs , which is zero outside the critical region, and a non-singular `background' part gb : g(t; H ; L) = gs (t; H ; L) + gb (t; L) where t = (T Tc )=Tc. It follows from renormalisation [187, 17] or other arguments that gs is given by gs (t; H ; L) = L dY (atL1= ; bHL= ) (A.3) where is the critical exponent describing the divergence of the correlation length, ( = 1 for the 2d Ising model), and Y is a universal function that goes quickly to zero as its arguments increase. It is also found [17, chapter 1] that the dependence of gb on L is much weaker than that of gs , and can be neglected, from which it follows that at the critical point itself (t = H = 0) we have g(L) gb + 1 L 2U0 where U0 = Y (0; 0) is universal. The next term in this series is non-universal, but might be accessible by simulation methods that concentrate on nite-size scaling; the result [17, pps 9-15] is in general: g(L) gb + 1 L 2U0 + L d = a1 Where (non-universal) is the critical exponent belonging to the rst (largest-) `irrelevant' scaling eld u, and a1 = udY=du(0; 0) is the corresponding amplitude (also non-universal). In general need not be integral, but in the 2d Ising case it is: analysis in Ferdinand and Fisher [10, eqn 3.39] shows that g(L) = gb + 1 L 2U0 + O(L 4 ln3 L) (A.4) So = 2. The amplitude U0 is also known exactly for the 2d Ising universality class: it is given by [17, 10]; U0 = ln(21=4 + 2 1=2 ) = 0:6399. It is instructive to compare this with the amplitude of the L 2 correction for > c , which APPENDIX A. EXACT FINITE-SIZE SCALING RESULTS FOR THE ISING MODEL 267 was simply ln 2 0:69315, with the 2 coming from the number of coexisting phases (see equation A.2 and the discussion thereof). However, 21=4 + 2 1=2 0:6399, so in some sense the critical point is behaving like the coexistence point of slightly less than two phases. This is in a way physically reasonable, since at criticality large uctuations continually take the system back and forth between congurations which are themselves similar to those in the truly two-phase region. Other Boundary Conditions Finally let us consider the case where we have xed boundaries. Away from criticality and in the single-phase region (where the there were previously only exponential corrections) we now nd [17, 185, p20] g(L) = gb + L 1 g(s) + L 2 g(e) + + L dg(c) + O(e L= ) (A.5) where g(s) is the surface free energy due to the presence of (d 1)-dimensional interfaces in the system, g(e) is due to (d 2)-dimensional edges and so on. At the critical point this type of scaling and that described by equation A.3 are superimposed [17, p21 et seq.]; the non-singular part of the free energy gb has an expansion similar to that of equation A.5, and the singular part gs has one similar to that of equation A.3, though some modication to this scaling form may be necessary [17, p23]. Appendix B The Double-Tangent Construction It was shown in section 1.1.3 (equation 1.16) that, as a consequence of the scaling of the probability distribution of the order parameter, which becomes increasingly sharply peaked about its maximum or maxima at M (H ) or p (V ) as system size increases, we can use the minimum of FL (; M ) HM or FL (; V ) + pV to give an estimator of g with an error that disappears as 1=Ld. This suggests the following method for nding the coexistence value of the eld in cases where the simulation is most easily performed with a constant order parameter. We shall thus describe the method for the o-lattice system, where this is more usually the case, and where pcoex is not determined by symmetry. Suppose we have some method that can measure the absolute Helmholtz free energy FL (; V ). This should be used to nd FL (; V ) for various V around V of one phase, then the entire process should be repeated for some values of V characteristic of the other phase. From these measurements it is then possible to construct FL (; V ) + pV for various values of p to nd pcoex This is most easily done by the double-tangent construction. We begin from ln P can (; p; V ) = (FL (; V ) pV ) + constant. Equation 1.16 shows that, to good accuracy, phase coexistence is found when ln P can (; p; V ) has two maxima of equal heights, at VA in phase A, say, and VB in phase B . This means that FL (; VA ) + pcoex VA = FL (; VB ) + pcoexVB 268 APPENDIX B. THE DOUBLE-TANGENT CONSTRUCTION i.e. 269 FL (; VA ) = FL (; VB ) pcoex (VA VB ) (B.1) which has the form y = y0 + m(x x0 ). VA and VB themselves are found by solving @ ln P can = 0 @V i.e., at coexistence, @FL = @FL = p coex @V VA @V VB (B.2) Consideration of equations B.1 and B.2 together shows that pcoex is given by the negative of the gradient of the common tangent to the two branches of FL (; V ). The points of tangency give VA and VB . The construction is shown schematically in gure B.1. F( β ,V) FA FB VA* VB* V Figure B.1. A schematic illustration of the double-tangent construction for the o-lattice pV case to nd pcoex = -gradient of tangent. The solid lines represent the parts of FL (; V ) that will typically be measured; the dotted line shows the part that is typically not measured. We note that this method has the advantage that it is not necessary to map out the whole of FL (; V ); it suces to have those parts around the eventual minima in FL (; V ) + pV , and though of course we will not be sure a priori of their exact location, we are likely to have quite a good idea. However, the method has the disadvantage that the absolute free energies of the two APPENDIX B. THE DOUBLE-TANGENT CONSTRUCTION 270 phases are required, since the relative positions of the two branches of FL (; V ) must be known before the double-tangent construction can be applied, and it uses the equal-heights criterion for phase coexistence when it is now thought that the equal-weights usually has smaller nite-size error [23, 182, 183]. Appendix C Statistical Errors and Correlation Times Because the Monte-Carlo method uses random numbers to generate a sample of the congurations of the simulated system, the estimates of thermodynamic quantities that it produces are associated with a statistical error. We shall now describe the behaviour of this error, taking the example of a Boltzmann-sampling simulation. Though the concepts introduced are general, some of the results are specic to this (e.g. equation C.5). The analogues of equation C.5 for multicanonical sampling are extremely complex; see [116, 119]. Suppose we generate a set fi g; i = 1 : : : Nc of congurations from a Markov chain whose stationary probability distribution is P (). We can dene some operator O (internal energy, magnetisation...) on each one, giving the set fOi g. The sample mean O, is an estimator of the P expectation value < O >= fg O()P (), which, for a Boltzmann sampling algorithm, is the canonical average of O. Dene the quantity (O)2 = (O < O >)2 : (O)2 is a measure of the statistical error in the estimate O of < O >; it is called the Standard Error of the Mean (SEM). Averaging over all possible data sets of the Nc observations, we nd p < (O)2 > = < O2 > < O >2 271 APPENDIX C. STATISTICAL ERRORS AND CORRELATION TIMES 272 Nc Nc X X = < N12 Oi Oj > < O >2 c i=1 j =1 NX Nc c 1 X = < N12 (Nc O2 + 2 Oi Oj ) > < O >2 c i=1 j =i+1 # " Nc 2 X 1 i 1 < O O > < O > 1 i 2 2 = N (< O > < O > ) 1 + 2 (1 N ) < O2 > < O >2 c c i=2 (C.1) where we have used < O2 >=< O2 > and assumed that we have no preference for a particular starting state, so < Oi Oj > depends only on i j . We shall now simplify the notation by writing var(O) =< O2 > < O >2 and 2 i = <<OO1 O2 i>> <<OO>>2 (C.2) To estimate var(O) in practice we use the sample variance s2 = N1 N c X c i=1 (Oi O)2 . By expanding < O2 > as in the derivation of equation C.1, it can be shown that < s2 >= (Nc 1)var(O)=Nc . If adjacent congurations are uncorrelated, all the i are zero and equation C.1 reduces to < (O)2 >= var(O)=Nc (C.3) (see any statistics textbook, for example [188, chapter 14] for another derivation of this). However, because the Monte-Carlo method generates a new conguration by making a small change to the existing one, adjacent congurations in the Markov chain are in practice always highly correlated; that is to say, the i remain appreciable until i is quite big. How fast the correlations decay depends on how well the system can explore its conguration space, which depends in turn both on the algorithm, which determines the matrix R of allowed transitions, and on the sampled distribution, which may make certain transitions very unlikely even if Rij 6= 0. In order to quantify the eect of correlations, we dene the correlation time O of the observable APPENDIX C. STATISTICAL ERRORS AND CORRELATION TIMES O, by O = 1 X i=1 i 273 (C.4) which will be dierent for dierent observables in a single simulation. O is measured in units of the time for a single MC update. Now let us assume that Nc is big enough that O 1 and that (1 (i 1)=Nc) can be put equal to one for all terms where i is not negligibly small; if this is not true then it implies that the total sampling time is only of the order of O and the results will in that case be irredeemable by any amount of variance analysis (we are thinking here of congurations generated by a single Markov process; this statement is not necessarily true if many independent simulations are run together in parallel, as in section 4.3). Putting all this into equation C.1 gives ([53]): < (O)2 > (1=Nc)var(O)(1 + 2O ) (C.5) It is normally found that the i decay exponentially, so i 1 exp( (i 1)=O ) . In that case a slightly better approximation than equation C.5 is [50] < (O)2 > (1=Nc)var(O)(1 + 1 2 ) where = exp( 1=O ). Other improved approximations, of an accuracy not normally required, are discussed in [189]. In any case, it is clear that if the correlation time O is large it will dominate the error, and in fact it may be a waste of eort to record O() for every conguration of the Markov chain. If we sample at regular intervals of k updates, then k < (O)2 >= (k=Nc )var(O)(1 + 1 2 k ) (C.6) which stays within a few percent of its minimum value until k O , and then increases linearly with k with gradient 1. This tells us that there is no advantage in collecting samples at intervals more frequent than O . However in practise doing so does little harm, since recording O() and doing the analysis usually require negligible time compared with the generation of APPENDIX C. STATISTICAL ERRORS AND CORRELATION TIMES 274 the congurations themselves. There are two valid approaches to the estimation of < (O)2 > in practice; one is to measure the correlation functions i and then to estimate O by summing them. It is found to be essential to cut o the summation at some point, for example when for the rst time a negative term in the series is encountered, since the estimates of i at large i are very `noisy' and may seriously distort the answer [53, 50], [57, section 6.3]. We can then use equations C.5 or C.6 directly. Alternatively, we can simply try to measure the standard error of O directly. To do this we block the congurations into m = 1 : : : Nb blocks (O(10) is enough) and estimate O on each block [45, chapter 6]. Then we measure the mean and variance of the blocked means fOm g, and use the simple formula < (O)2 >= var(O )=Nb since the blocks should be long enough for the block means to be uncorrelated (if they are not then Nc is not large enough for good results anyway). A variant of this is to dene estimators OJm on all the blocks of congurations except the mth, and then to nd the mean and variance of these (see appendix D). It is the blocking approach that we have generally used to measure (O)2 in this thesis; however, we shall still consider O on occasion, particularly in section 3.4, where we shall use the fact that it can be expressed in terms of correlation functions to make an analytic prediction of the variance of estimators from various sampled distributions. We should note that, whatever algorithm we are working with, we must expect to have to update all the particles (or at least a constant fraction of them) to get uncorrelated congurations. This implies that the best we can do1 is have O =t Ld. If we are interested in calculating observables like < E > within a single phase, O normally has a behaviour not dissimilar to the ideal and accurate answers can be obtained without too much eort by simple algorithms like the single-spin-ip Metropolis, sampling the Boltzmann distribution. However if we are trying to measure the free energies associated with phase transitions then O can be very large indeed. 1 Strictly it is the amount of computing power that goes like Ld . If we have a parallel computer then we may apparently do better, since for small L some processors may be unoccupied, and we can bring them in as we increase L, thus apparently keeping O constant as L increases. Once all the processors are occupied the given scaling law applies. Appendix D Jackknife Estimators The estimators that we produce in multicanonical simulations, like O~ in equations 3.6 and 3.7, are ratio estimators, that is to say, they are ratios of sums, and are in fact slightly biased: < O~ >xc6=< O >can. It can be shown (see [190, p. 80]) that O~ ; E C (E ) exp( E )[1=P xc(E )]) P < O~ >xc < O >can= cov( < C (E ) exp( E )[1=P xc(E )] > P E xc P This bias will not be zero unless O~ and E C (E ) exp( E )[1=P xc(E )] are uncorrelated; typically it is of order 1=Nc. The same is true of other biased estimators; for instance, the estimator of free energy we shall use below is the logarithm of a ratio estimator. p It should be noted, however, that we expect the standard deviation of O~ to go like 1= Nc, and we can usually safely regard the bias as negligible in comparison with this. However, to be sure that the bias is negligible we have in this chapter generally used double-jackknife bias-corrected estimators [191] for our estimates of canonical averages and their error bars. A jackknife estimator is dened in the same way as a normal estimator but on a subset of the data. We divide up the Markov chain into b sets of Nb congurations, so that we have b histograms, C j (E ); j = 1 b, with b Nb = Nc. Then the j th jackknife estimator O~jJ is dened like O~ but on all the the pooled data from all the b histograms except the j th. J = Pb O~ J , while the standard error of the mean (the We can dene the mean of these, O~AV j =1 j Pb J O~ J )2 . (Simple substitution of O(E ) = E shows error bar) is given by s2J = (b 1)=b j=1 (O~AV j that this reduces to equation 1.24, the normal expression for the standard error in the mean in the unbiased case.) These single-jackknife estimators provide an estimate of variance that 275 APPENDIX D. JACKKNIFE ESTIMATORS 276 is somewhat more robust (less aected by a small sample size) than the usual blocking. They can also be used to produce an estimator which has a reduced bias. We assume that the bias of O~ = c1 =bNb + c2 =(bNb)2 + ; then the bias of all the O~jJ = c1 =(b 1)Nb + c2 =((b 1)Nb)2 + . J is then unbiased As can be seen by substitution, the estimator O~ JC = bO~rat (b 1)O~AV to order 1=Nc. However we no longer have an estimate of the standard error of this new estimator. To obtain both we can extend the approach, dening double-jackknife estimators: O~jkJJ is dened on the data with both the j th and kth blocks of congurations omitted (j 6= k). P JJ are a set of double-jackknife biasThen OjJJC = (b 1)O~jJ (b 2)(b 1) 1 k6=j O~jk corrected estimators and we can calculate their mean and variance as for the O~jJ above. For a fuller explanation of the use of jackknife estimators see [190], and for an account of their use in multicanonical simulations see [191]. Appendix E Details of the Square-Well Solid Simulation We wish to carry out a simulation of a 3d fcc square-well solid in a constant volume. The primitive unit call of the fcc lattice consists of four particles arranged in a tetrahedron at (0; 0; 0), (0; 1=2; 1=2), (1=2; 0; 1=2) and (1=2; 1=2; 0), where the vectors are in units where the side of the cubic unit cell has unit length. For convenience we wish to simulate in a cubic volume and to apply periodic boundary conditions to remove the eects of surfaces, edges and corners (see A). Suitably-sized assemblies of particles then consist of n3 unit cells arranged in a cube, with n = 1; 2; 3 : : :; the rst few such systems thus contain 4; 32; 108; 256; 500 : : : particles. To make particle moves we shall use the usual Monte-Carlo procedure of generating trial random displacements of randomly-chosen particles and accepting or rejecting them using the Metropolis method. As regards the choice of system size, there is clearly a trade-o between ease of simulation and the accuracy in principle achievable in a simulation with unlimited run-time. Because phase transitions are rounded o and shifted in a nite volume [17], it is ideally desirable to simulate large systems, to get closer to the innite-volume limit; however larger systems clearly demand more computer time and the time required for equilibration and to sample all accessible parts of the phase space quickly becomes excessive. This is particularly true of simulations like these, where we are interested in free energies; we have seen (section 3.3.2) that thermodynamic integration requires increasingly many simulation points around a phase transition, because of 277 APPENDIX E. DETAILS OF THE SQUARE-WELL SOLID SIMULATION 278 rapid variation of the integrand, while in a multicanonical/expanded ensemble simulation the range of macrostates that must be covered to take in both phases is itself extensive. It turns out in fact that the multicanonical ensemble with a hard core potential is particularly demanding: see section 4.3.2. Therefore we have in practice chosen to work with quite small systems of 32{ 256 particles, and, (at least in section 4.3), to use nite-size scaling to extrapolate the results [17, 23, 183, 182]. This choice then presents some problems in relation to our chosen computer, the Connection Machine CM-200. The CM consists of 16k processors grouped into 512 processing nodes. Arrays are spread across this machine and corresponding elements operated on in parallel. Thus the obvious way of mapping the square-well solid onto the CM would be geometric decomposition: break up the simulation volume and assign a region to each processor. Non-interacting particles may be updated in parallel1, which in this case, where the interparticle potential is (extremely) short ranged, would require information only from within each processor and, in some cases, from nearest neighbour processors. However, if the array is too small, then some processors are assigned no data and are deactivated; clearly this is an inecient use of the machine. The minimum size of array that uses all the processors depends on the geometry, and in our case turns out to have 163 =4k elements. Therefore, if we simulated a single system with geometric decomposition, it would have to contain at least 4k particles, which is much too large to be dealt with easily. We might, then, think instead of using just primitive parallelism, where we would simulate 4k independent replicas of a single smaller simulation, with each simulation being completely local to a processing node and all updates within each simulation being serial. This strategy, which has the additional advantage of eliminating interprocessor communication in particle moves, is in fact the way that the simulations of 32 particles have been carried out. However the machine does not have enough memory to treat systems of more than 108 particles this way. To deal with them, we need a mixture of primitive parallelism and geometric decomposition. The way we have implemented this in practice is shown in gure E.1. The large cube shows the way that the parallel dimensions of the array that holds the particles' coordinates are laid out; it can be thought of as showing the layout of parallel `virtual processors' in a 3d grid. The small shaded cubes show the array elements that belong to a single simulation; 1 we emphasise that we are here considering only updating the positions of the particles, and in no simulation performed here does this result in a change of the variable that is preweighted by . As we saw in section 3.2.5, MC updates where may change introduce an eective coupling between the particles and prevent parallel updating. APPENDIX E. DETAILS OF THE SQUARE-WELL SOLID SIMULATION 279 the particles within them exist in a single physically continuous volume, even though they are separated in the coordinate array. Figure E.1. The layout of a single simulation volume within a 161616 parallel array holding the particles' positions. Left: each simulation divided into eight subvolumes, 512 simulations run in parallel. Right: each simulation divided into 64 subvolumes, 64 simulations run in parallel. In the diagram on the left each simulation is divided into 23 = 8 `subvolumes,' and distributed over eight `virtual processors.' The total number of simulations run in parallel is (16=2)3 = 512. With eight unit cells 32 particles per subvolume, we would then have 256 particles per simulation in total. Similarly in the diagram on the right each simulation is divided into 64 and 64 of them are run in parallel. With 32 particles per subvolume this would imply 2048 particles in total, which is excessive for our purposes, but nevertheless illustrates how the system may be scaled. The numbers of unit cells/subvolume and subvolumes/simulation (which may of course be equal to one) and the total size of the coordinate array are controlled by parameters at compile time and to change them does not require rewriting of code. The relevant parameters are NPR3, the edge length of the array that holds the coordinates (so the diagrams in gure E.1 both have NPR3=16). NEDGE, the number of subvolumes along each edge of each simulation (so the diagrams in gure E.1 have NEDGE=2 (left) and NEDGE=4). LPV, the number of unit cells along the edge of a subvolume. Related quantities are N , the number of particles/simulation, given by N =4(NEDGE*LPV)3. APPENDIX E. DETAILS OF THE SQUARE-WELL SOLID SIMULATION NSIM 280 Ns , the number of independent replica simulations run in parallel, given by NSIM = (NPR3/NEDGE)3. The reason that the subvolumes of each simulation are split up across the array is that by doing this each subvolume can access the coordinates of particles in neighbouring subvolumes in a single periodic shift operation where all data moves the same distance. For example, single shifts of NPR3/NEDGE are used to access the coordinates of particles in subvolumes that share faces; repeated shifts at right-angles are required to get at neighbours that share edges or corners. Particles in subvolumes that are not nearest neighbours are too far apart to interact before or after a particle move, so we do not need to check them. If the subvolumes were grouped together, slower messages would have to be sent using the general communications router, because the periodic boundary conditions of each simulation would not match those of the array as a whole. Having described how each simulation is (or can be) split up into subvolumes, it remains to describe how the particles are treated within each subvolume. All particles within a subvolume are local to a processing node and are indexed using extra dimensions, declared :serial, in the coordinate array. The normal way to treat this problem is to keep a neighbour list [45], which, for each particle, records all the other particles that it may interact with. However, to reference and update the neighbour lists, which are in general dierent for each particle, requires indirect addressing (indices on indices), and this generates slow communication code on the CM even when the particles are within the same processing node. For this reason we did not use neighbour lists. In fact, because we are dealing only with solids, each particle always stays near its lattice site and so would have had the same, unchanging neighbour list of its twelve nearest neighbours. Given this, indirect addressing could have been avoided; however we opted at the design stage for a method that could be applied to uids with little modication, since it was then our intention to investigate them as well. This led us to the following method: each simulation subvolume is further subdivided into eight octants, and they are cycled through in a xed order, with a particle in each being picked at random for a trial displacement (the displacement is chosen at random within a small sphere of radius x; x itself is chosen to give an acceptance ratio of particle moves of 1=2). Provided that the subvolume is big enough, this ensures that particles in corresponding octants of dierent subvolumes of the same simulation cannot interact before or after they are moved, and so can be updated in parallel; we require, for the general case APPENDIX E. DETAILS OF THE SQUARE-WELL SOLID SIMULATION 281 where the particles need not be on their lattice sites, that the side length L of the subvolume should satisfy L=2 (1 + ) + 2x For the square-well solid, each particle move requires the calculation of a minimum of 24 interactions (12 with the nearest-neighbours before the move and 12 with them after it). However, we do not in general know which the nearest-neighbours are, because there are no neighbour lists, and so must test all possible candidates. This can cause substantial ineciency; for example with NEDGE=1 and LPV=2 (N = 32 and primitive parallelism only) we must test all the other particles, that is to say, we must calculate 2 31 interactions for each particle move. For NEDGE=1 and LPV=3 (N = 108) and NEDGE=2 and LPV=2 (N=256) this rises to 2 107, and in the second case 2 76 of the interactions require interprocessor communication. The best performance for the solid is in fact obtained by using LPV=1, with NPR3 increased to keep NSIM constant. Each `virtual processor' then contains only four particles (so half the octants can be skipped over), and the process of checking within the processor volume and within neighbouring octants nds just the twelve nearest neighbours, as required, and no others. Chosing LPV=1 at solid densities violates the general equation for L given above but, since the forces are short-ranged and the lattice prevents large particle movements, it is still the case that interacting particles are never updated simultaneously. Because we no longer waste time calculating interactions that are always zero, this procedure is slightly faster (i.e. does a slightly greater total number of particle updates per second) than pure primitive parallelism for N = 32, even though interprocessor communication is now involved. For N = 108 it is about ve times faster than primitive parallelism alone. Bibliography [1] C. Truesdell, The Tragicomical History of Thermodynamics 1822-1854, Springer-Verlag, Berlin (1980). [2] A. B. Pippard, Elements of Classical Thermodynamics, Cambridge University Press, Cambridge (1957). [3] H. B. Callen, Thermodynamics and an Introduction to Thermostatics, John Wiley & Sons, New York (1985). [4] K. Huang, Statistical Mechanics, John Wiley & Sons, New York (1963). [5] R. P. Feynman, Statistical Mechanics|A Set of Lectures, W. A. Benjamin inc., Reading, MA (1972). [6] D. Chandler, An Introduction to Modern Statistical Mechanics, Oxford University Press, Oxford (1987). [7] J. R. Waldram, The Theory of Thermodynamics, Cambridge University Press, Cambridge (1985). [8] S.-K. Ma, Statistical Mechanics, World Scientic, Singapore (1985). [9] Phase Transitions and Critical Phenomena, ed. C. Domb & M. S. Green, Academic Press, London (1975). [10] A. E. Ferdinand & M. E. Fisher, Phys. Rev. 185, 833 (1969); B. Kaufman, Phys. Rev. 76, 1232 (1949). [11] L. Onsager, Phys. Rev. 65, 117 (1944). 282 BIBLIOGRAPHY 283 [12] B. M. McCoy & T. T. Wu, The Two-Dimensional Ising Model, Harvard University Press, Cambridge, MA (1973). [13] R. Baierlein, Atoms and Information Theory, W. H. Freeman & Co. (1971). [14] E. T. Jaynes, in Maximum Entropy and Bayesian Methods, ed. P. F. Fougere, Kluwer Academic Publishers, Dordrecht (1992). [15] J. L. Lebowitz & E. H. Lieb, Phys. Rev. Lett. 22, 631 (1969) [16] M. Plischke & B. Bergersen, Equilibrium Statistical Physics, Prentice Hall, New Jersey (1989). [17] Finite Size Scaling and Numerical Simulation of Statistical Systems, ed. V. Privman, World Scientic Publishing, Singapore (1990). [18] D. P. Woodru, The Solid-Liquid Interface, Cambridge University Press, London (1973). [19] Jooyoung Lee, M. A. Novotny & P. A. Rikvold, Phys. Rev. E 52, 356 (1995). [20] K. Binder & D. P. Landau, Phys. Rev. B 30, 1477 (1984). [21] Murty S. S. Challa, D. P. Landau & K. Binder, Phys. Rev. B 34, 1841 (1986). [22] J. Lee & J. M. Kosterlitz, Phys. Rev. Lett. 65, 137 (1990) [23] C. Borgs & R. Kotecky, Phys. Rev. Lett. 68, 1734 (1992). [24] J. S. van Duijneveldt & D. Frenkel, J. Chem. Phys. 96, 4655 (1992). [25] E. Buenoir and S. Wallon, J. Phys. A 26, 3045 (1993). [26] P. Martin, Potts Models and Related Problems in Statistical Mechanics, World Scientic Publishing Co., Singapore (1991). [27] R. J. Baxter, Exactly Solved Models in Statistical Mechanics, Academic Press, London (1982). [28] J. A. Barker & D. Henderson, Rev. Mod. Phys 48, 587 (1976). [29] A. J. Gutmann & I. G. Enting, J. Phys. A 21, L165 (1988). [30] M. E. Fisher, Rev. Mod. Phys 46, 597 (1974). BIBLIOGRAPHY 284 [31] J. Amit, Field Theory, the Renormalisation Group and Critical Phenomena, McGraw-Hill, New York (1978). [32] G. S. Pawley, R. H. Swendsen, D. J. Wallace & K. G. Wilson, Phys. Rev. B 29, 4030 (1984). [33] J. P. Hansen & I. R. McDonald, Theory of Simple Liquids (2nd edition), Academic Press, London (1986). [34] J. K. Percus & G. J. Yevick, Phys. Rev. 110, 1 (1958). [35] J. E. Mayer & M. G. Mayer, Statistical Mechanics, McGraw-Hill, New York (1940). [36] M. Parrinello & A. Rahman, Phys. Rev. Lett. 45, 1196 (1980). [37] J. J. Erbenbeck & W. W. Woad, in Statistical Mechanics Vol. 6b, ed. B. J. Berne, Plenum Press, New York (1977). [38] J. Kushick & B. J. Berne, in Statistical Mechanics Vol. 6b, ed. B. J. Berne, Plenum Press, New York (1977). [39] S. Duane, A. D. Kennedy, B. J. Pendleton & D. Roweth, Phys. Lett. B 195, 216 (1987). [40] B. Mehlig, D. W. Heerman & B. M. Forrest, Phys. Rev. B 45, 679 (1992). [41] S. Nose, J. Phys. Cond. Mat. 2, SA115 (1990). [42] The Monte Carlo Method in Condensed Matter Physics, ed. K. Binder, Springer-Verlag, Berlin (1992). [43] K. Binder & D. W. Heerman, Monte Carlo Simulation in Statistical Physics: An Introduction, Springer-Verlag, Berlin (1986). [44] O. G. Mouritsen, Computer Studies of Phase Transitions and Critical Phenomena, Springer-Verlag, Berlin (1985). [45] M. P. Allen & D. J. Tildesley, Computer Simulation of Liquids, Clarendon Press, Oxford (1987). [46] K. Binder, J. Comp. Phys. 59, 1 (1985). BIBLIOGRAPHY 285 [47] D. Frenkel, Free Energy Computation and First-Order Phase Transitions, in MolecularDynamics Simulation of Statistical-Mechanical Systems, ed. G. Ciccotti & W. G. Hoover, North-Holland, Amsterdam (1986). [48] D. Frenkel, Monte Carlo Simulations, in Computer Modelling of Fluids, Polymers and Solids, ed. C. R. A. Catlow, C. S. Parker & M. P. Allen, Kluwer Academic Publishers, Dordrecht (1990). [49] W. H. Press, B. P. Flannery, S. A. Teukolsky & W. T. Vetterling, Numerical Recipes, Cambridge University Press, Cambridge (1989). [50] C. J. Geyer, in Computer Science and Statistics, Proceedings of the 23rd Symposium Interface, 156 (1991). [51] C. J. Geyer & E. A. Thompson, J. R. Statist. Soc. B 54, 657 (1992). [52] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller & E. Teller, J. Chem. Phys. 21, 1087 (1953). [53] H. Muller-Krumbhaar & K. Binder, J. Stat. Phys. 8, 1 (1973). [54] K. S. Shing & K. E. Gubbins, Mol. Phys. 49, 129 (1982). [55] R. H. Swendsen & J.-S. Wang, Phys. Rev Lett. 58, 86 (1987). [56] U. Wolfe, Phys. Rev. Lett. 62, 361 (1989). [57] R. M. Neal, Probabilistic Inference Using Markov Chain Monte-Carlo Methods, Technical Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto (1993). [58] B. A. Berg & T. Neuhaus, Phys. Lett. B 267, 249 (1991); Phys. Rev. Lett. 68, 9 (1992). [59] A. D. Kennedy, review article, Nucl. Phys. B S30, 96 (1993). [60] A. P. Lyubartsev, A. A. Martsinovski, S. V. Shevkunov & P. N. Vorontsov-Velyaminov, J. Chem. Phys. 96, 1776 (1992). [61] M. Abramowitz & I. A. Stegun, Handbook of Mathematical Functions, Dover, New York, N. Y. (1970). [62] N. F. Carnahan & K. E. Starling, J. Chem. Phys. 51, 419 (1969). BIBLIOGRAPHY 286 [63] B. L. Holian, G. K. Straub, R. E. Swanson & D. C. Wallace, Phys. Rev. B 27, 2783 (1983). [64] G. N. Patey & J. P. Valleau, Chem. Phys. Lett. 21, 297 (1973). [65] G. N. Patey, G. M. Torrie & J. P. Valleau, J. Chem. Phys. 71, 96 (1979). [66] W. G. Hoover, M. Ross, D. Henderson, J. A. Barker & B. C. Brown, J. Chem. Phys. 52, 4931 (1970). [67] W. G. Hoover, S. G. Gray, & K. W. Johnson, J. Chem. Phys. 55, 1128 (1971). [68] R. E. Swanson, G. K. Straub, B. L. Holian & D. C. Wallace, Phys. Rev. B 27, 2783 (1983). [69] P. Bolhuis & D. Frenkel, Phys. Rev. Lett. 72, 2211 (1994). [70] M. Hagen, E. J. Meijer, G. C. A. M. Mooij, D. Frenkel & H. N. W. Lekkerkerker, Nature 365, 425 (1993). [71] D. Frenkel & A. J. C. Ladd, J. Chem. Phys. 81, 3188 (1984). [72] W. H. Hoover & F. H. Ree, J. Chem. Phys. 47, 4873 (1967). [73] W. H. Hoover & F. H. Ree, J. Chem. Phys. 49, 3609 (1968). [74] J. P. Hansen & L. Verlet, Phys. Rev. 84, 151 (1969). [75] K. K. Mon, Phys. Rev. B 39, 467 (1989). [76] T. P. Straatsma, H. J. C. Berendsen & J. P. M. Postma, J. Chem. Phys. 85, 6720 (1986). [77] D. A. Kofke, Mol. Phys. 78, 1331 (1993). [78] McDonald & Singer, J. Chem. Phys. 47, 4766 (1967). [79] McDonald & Singer, J. Chem. Phys. 50, 2303 (1969). [80] J. P. Valleau & D. N. Card, J. Chem. Phys. 57, 5457 (1972). [81] Z. Li & H. A. Scheraga, J. Phys. Chem. 92,2633 (1988). [82] Z. Li & H. A. Scheraga, Chem. Phys. Lett. 154, 516 (1989). [83] B. S. Whatson & K.-W. Chao, J. Chem. Phys. 96, 9046 (1992). [84] C. H. Bennett, J. Comp. Phys. 22, 245 (1976). BIBLIOGRAPHY 287 [85] S. D. Hong, B. J. Yoon & M. S. Jhon, Chem. Phys. Lett. 188, 299 (1992). [86] K. K. Mon, Phys. Rev. Lett. 54, 2671 (1985). [87] A. M. Ferrenberg & R. H. Swendsen, Phys. Rev. Lett. 61, 2635 (1988); Erratum: ibid 63, 1658 (1989). [88] B. Widom J. Chem. Phys. 39, 2808 (1962). [89] K. S. Shing & K. E. Gubbins, Mol. Phys. 46, 1109 (1982); K. S. Shing & K. E. Gubbins, Mol. Phys. 49, 1121 (1983) [90] S. K. Kumar J. Chem. Phys. 97, 3551 (1992). [91] A. M. Ferrenberg & R. H. Swendsen, Phys. Rev. Lett. 63, 1195 (1989). [92] A. M. Ferrenberg, in Computer Simulation Studies in Condensed Matter Physics III, ed. D. P. Landau, K. K. Mon & H.-B. Schuttler, Springer-Verlag, Berlin, Heidelberg (1991). [93] J. M. Rickman & S. R. Philpot, Phys. Rev. Lett. 66, 349 (1991). [94] E. P. Munger & M. A. Novotny, Phys. Rev. B 43, 5773 (1991). [95] G. M. Torrie & J. P. Valleau, Chem. Phys. Lett. 28, 578 (1974). [96] L. D. Fosdick, Methods Comput. Phys. 1, 245 (1963). [97] B. Hesselbo & R. B. Stinchcombe, Phys. Rev Lett. 74, 2151 (1995). [98] M. Mezei, J. Comp. Phys. 68, 237 (1987). [99] B. A. Berg, Int. J. Mod. Phys. C 4, 249 (1993). [100] W. Janke, in Computer Simulations in Condensed Matter Physics VII, ed. D. P. Landau, K. K. Mon & H.-B. Schuttler, Springer-Verlag, Berlin (1994). [101] B. A. Berg, U. H. E. Hansmann & T. Neuhaus, Phys. Rev B 47, 497 (brief reports) (1993). [102] B. A. Berg, U. H. E. Hansmann & T. Neuhaus, Z. Phys. B 90, 229 (1993). [103] A. Billoire, T. Neuhaus & B. A. Berg, Nucl. Phys. B 413, 795 (1994). [104] W. Janke, B. A. Berg & M. Katoot, Nucl. Phys. B 382, 649 (1992). BIBLIOGRAPHY 288 [105] W. Beirl, B.A. Berg, B. Krishnan, H. Markum & J. Reidler, Nulc. Phys. B S42, 707 (1995); B.A. Berg & B. Krishnan, Phys. Lett. B 318, 59 (1993). [106] B. Grossman, M. L. Laursen, T. Trappenberg & U. J. Wiese, Phys. Lett. B 293, 175 (1992). [107] B. Grossman & M. L. Laursen, Nucl. Phys. B 408, 637 (1993). [108] B. A. Berg & T. Celik, Phys. Rev. Lett. 69, 2292 (1992); Int. J. Mod. Phys. C 3, 1251 (1992). [109] B. A. Berg, T. Celik & U. H. E. Hansmann, Europhys. Lett. 22, 63 (1993). [110] B. A. Berg, U. H. E. Hansmann & T. Celik, Nucl. Phys. B S42, 905 (1995). [111] T. Celik, U. H. E. Hansmann & M. Katoot, J. Stat. Phys. 73, 775 (1993). [112] B. A. Berg, Nature 361, 708 (1993). [113] U. H. E. Hansmann & Y. Okamoto, J. Comp. Chem. 14, 1333 (1993). [114] B. A. Berg, U. H. E. Hansmann & Y. Okamoto, J. Phys. Chem. 99, 2236 (1995); U. H. E. Hansmann & Y. Okamoto, preprint ETH-IPS-95-06 NWU-1/95, March 1995. [115] K. Rummukainen, Nucl. Phys. B 390, 621 (1993). [116] W. Janke & T. Sauer, Phys. Rev. E 49 3475 (1994); Nucl. Phys. B S34, 771 (1994). [117] W. Janke & T. Sauer, J. Stat. Phys. 78, 759 (1995). [118] W. Janke & S. Kappler, Nucl. Phys. B S42, 876 (1995); Phys. Rev. Lett. 74 212 (1995). [119] A. M. Ferrenberg, D. P. Landau & R. H. Swendsen, Phys. Rev. E, 51, 5092 (1995). [120] Jooyoung Lee, Phys. Rev. Lett. 71, 211 (1993); Erratum ibid 71, 2352 (1993). [121] N. B. Wilding, Phys. Rev. E 52, 602 (1995). [122] B. A. Berg, preprint, available as paper 9503019 from archive at http:://xxx.lanl.gov/hep-lat. [123] A. P. Lyubartsev, A. Laarksonen & P. N. Vorontsov-Velyaminov, Mol. Phys. 82, 455 (1994). BIBLIOGRAPHY 289 [124] E. Marinari & G. Parisi, Europhysics Lett. 19, 451 (1992). [125] L. A. Fernandez, E. Marinari & J. J. Ruiz-Lorenzo, unpublished. [126] G. Iori, E. Marinari & G. Parisi, Europhysics Lett. 25, 491 (1994), Int. J. Mod. Phys. C 4, 1333 (1993) [127] A. Irback & F. Potthast, preprint LU TP 95-10. [128] D. Bouzida, S. K. Kumar & R. H. Swendsen, Phys. Rev. A 45, 8894 (1992). [129] E. Marinari, G. Parisi, J. Ruiz-Lorenzo & F. Retort, preprint, available as paper 9508036 from archive at http:://babbage.sissa.it/cond-mat, submitted to Phys. Rev. Lett. [130] N. B. Wilding, M. Muller & K. Binder, J. Chem. Phys. 101, 4324 (1994). [131] G. C. A. M. Mooij, D. Frenkel & B. Smit, J. Phys.: Cond. Mat. 4, L255 (1992). [132] W. Kerler & P. Rehberg, Phys. Rev. E 50, 4220 (1994). [133] W. Kerler, C. Rebbi & A. Weber, Nucl. Phys. B S42, 678 (1995), same authors, preprint BUHEP-95-10, available as paper 9503021 from archive at http:://xxx.lanl.gov/hep-lat. [134] J. P. Valleau, J. Comp. Phys. 22, 193 (1991). [135] J. P. Valleau, J. Chem. Phys. 95, 584 (1991). [136] R. W. Gerling & A. Huller, Z. Phys. B 90, 207 (1993). [137] M. Promberger & A. Huller, Z. Phys. B 97, 341 (1995). [138] G. E. Norman & V. S. Filinov, High Temp. Res. USSR 7, 216 (1969). [139] D. J. Adams, Mol. Phys. 29, 307 (1975). [140] N. B. Wilding & A. D. Bruce, J. Phys. Condens. Matter 4, 3087 (1992). [141] A. Z. Panagiotopoulos, Mol. Phys. 61, 813 (1987). [142] A. Z. Panagiotopoulos, N. Quirke, M. Stapleton & D. J. Tildesley, Mol. Phys. 63, 527 (1988). [143] B. Smit, P. De Smedt & D. Frenkel, Mol. Phys. 68, 931 (1989). BIBLIOGRAPHY 290 [144] B. Smit, in Computer Simulation in Chemical Physics, ed. M. P. Allen & D. J. Tildesley, Kluwer Academic Publishers, Dordrecht (1992). [145] A. Z. Panagiotopoulos, Mol. Simulation 9, 1 (1992). [146] S.-K. Ma, J. Stat. Phys. 26, 221 (1981). [147] H. M. Huang, S.-K. Ma & Y. M. Shih, Solid State Communs. 51, 147 (1984). [148] H. Meirovich, J. Phys. A 16, 831 (1983). [149] A. G. Schlijper, A. R. D. Van Bergen & B. Smit, Phys. Rev. A 41, 1175 (1990). [150] Kikuchi, Phys. Rev. 81 (1988), reprinted in Phase Transitions and Critical Phenomena Vol. 2, ed. C. Domb & M. S. Green, Academic Press, London (1975). [151] K. Binder, Z. Phys. B 45,61 (1981). [152] J. M. Rickman & S. R. Philpot, J. Chem. Phys. 95, 7562 (1991). [153] J. M. Rickman & D. J. Srolovitz, J. Chem. Phys. 99, 7993 (1993). [154] G. Bhanot, S. Black, P. Carter & R. Salvador, Phys. Lett. B 183, 331 (1987); G. Bhanot, R. Salvador, S. Black, P. Carter & R. Toral, Phys. Rev. Lett. 59, 803 (1987). [155] K. M. Bitar, Nucl. Phys. B 300, 61 (1988). [156] A. B. Bortz, M. H. Kalos & J. L. Lebowitz, J. Comp. Phys. 17, 10 (1975). [157] M. A. Novotny, Phys. Rev. Lett. 74, 1 (1995); Erratum: ibid 75, 1424 (1995). [158] G. L. Bretthorst, in Maximum Entropy and Bayesian Methods, ed. P. F. Fougere, Kluwer Academic Publishers, Dordrecht (1992). [159] T. J. Loredo, in Maximum Entropy and Bayesian Methods, ed. P. F. Fougere, Kluwer Academic Publishers, Dordrecht (1992). [160] T. Bayes, reprinted in Biometrika 45, 293 (1958). [161] H. Jereys, The Theory of Probability, Clarendon Press, Oxford (1939); later editions 1948, 1961. [162] E. S. Ristad, preprint CS-TR-495-95; available as paper 9508012 from archive at http:://xxx.lanl.gov/cmp-lg. BIBLIOGRAPHY 291 [163] J. J. Martin, Bayesian Decision Problems and Markov Chains, Wiley, New York (1967). [164] R. A. Howard, Dynamic Probabilistic Systems Vol. 1: Markov Models, Wiley, New York (1971). [165] M. Krajci & J. Hafner, Phys. Rev. Lett. 74, 5100 (1995). [166] D. Nicolaides & A. D. Bruce, J. Phys. A 21, 233 (1988). [167] A. D. Bruce, submitted to J. Phys. E (1995). [168] R. Hilfer, Z. Phys. B 96, 63 (1994). [169] A. Aharony & M. E. Fisher, Phys. Rev. B 27, 4394 (1983). [170] E. W. Montroll, in Proc. Symp. Applied Maths. 16, 193 (1964). [171] P. Bolhuis, M. Hagen & D. Frenkel, Phys. Rev. E 50, 4880 (1994). [172] C. F. Tejero, A. Daanoun, J. N. W. Lekkerkerker & M. Baus, Phys. Rev. Lett. 73, 752 (1994). [173] A. Daanoun, C. F. Tejero & M. Baus, Phys. Rev. E 50, 2913 (1994). [174] C. F. Tejero, A. Daanoun, J. N. W. Lekkerkerker & M. Baus, Phys. Rev. E 51, 558 (1995). [175] P. N. Pusey, in Les Houches, Session LI, 1989: Liquides, Cristallisation et Transition Vitreuse/ Liquids, Freezing and Glass Transition, ed. J. P. Hansen, D. Levesque & J. ZinnJustin, Elsevier Science Publishers, B. V. (1992). [176] S. Asakura & F. Oosawa, J. Polymer. Sci. 33, 183 (1958). [177] D. A. Young, Phase Diagrams of the Elements, University of California Press (1991). [178] P. R. Sperry, J. Coll. Interface Sci. 99, 97 (1984). [179] J. N. W. Lekkerkerker, W. C.-K. Poon, P. N. Pusey, A. Stroobants & P. B. Warren, Europhysics Lett. 20, 559 (1992). [180] M. Hagen and D. Frenkel, J. Chem. Phys. 101, 4093 (1994). [181] R. Hall, J. Chem. Phys. 57, 2252 (1972). [182] C. Borgs & R. Kotecky, J. Stat. Phys. 61, 79 (1990). BIBLIOGRAPHY 292 [183] C. Borgs & W. Janke, Phys. Rev. Lett. 68, 1738 (1992). [184] A. R. Ubbelohde, The Molten State of Matter, Wiley, New York (1978). [185] M. N. Barber, in: Phase Transitions and Critical Phenomena, Vol. 8, p. 145, ed. C. Domb & J. L. Lebowitz, Academic Press, New York (1983), and references therein. [186] V. Privman & J. Rudnick, J. Stat. Phys. 60, 551 (1990). [187] V. Privman & M. E. Fisher, Phys. Rev. B 30, 322 (1984). [188] H. Cramer, The Elements of Probability Theory, Wiley, New York (1955). [189] M. Kikuchi, N. Ito & Y. Okabe, in Computer Simulations in Condensed Matter Physics VII, ed. D. P. Landau, K. K. Mon & H.-B. Schuttler, Springer-Verlag, Berlin (1994). [190] H. L. Grey & W. R. Schucany, The Generalized Jackknife Statistic, New York M. Dekker (1972). [191] B. A. Berg, Comp. Phys. Communs. 69, 7 (1992).