Download The Measurement of Free Energy by Monte-Carlo

Document related concepts

History of statistics wikipedia , lookup

Transcript
The Measurement of Free Energy by
Monte-Carlo Computer Simulation
Graham R. Smith
A thesis submitted in fullment of the requirements
for the degree of Doctor of Philosophy
to the
University of Edinburgh
1996
Abstract
One of the most important problems in statistical mechanics is the measurement of free energies,
these being the quantities that determine the direction of chemical reactions and|the concern
of this thesis|the location of phase transitions. While Monte Carlo (MC) computer simulation
is a well-established and invaluable aid in statistical mechanical calculations, it is well known
that, in its most commonly-practised form (where samples are generated from the Boltzmann
distribution), it fails if applied directly to the free energy problem. This failure occurs because
the measurement of free energies requires a much more extensive exploration of the system's
conguration space than do most statistical mechanical calculations: congurations which have
a very low Boltzmann probability make a substantial contribution to the free energy, and the
important regions of conguration space may be separated by potential barriers.
We begin the thesis with an introduction, and then give a review of the very substantial
literature that the problem of the MC measurement of free energy has produced, explaining
and classifying the various dierent approaches that have been adopted. We then proceed to
present the results of our own investigations.
First, we investigate methods in which the congurations of the system are sampled from a
distribution other than the Boltzmann distribution, concentrating in particular on a recentlydeveloped technique known as the multicanonical ensemble. The principal diculty in using
the multicanonical ensemble is the diculty of constructing it: implicit in it is at least partial
knowledge of the very free energy that we are trying to measure, and so to produce it requires an
iterative process. Therefore we study this iterative process, using Bayesian inference to extend
the usual method of MC data analysis, and introducing a new MC method in which inferences
are made based not on the macrostates visited by the simulation but on the transitions made
between them. We present a detailed comparison between the multicanonical ensemble and
the traditional method of free energy measurement, thermodynamic integration, and use the
i
former to make a high-accuracy investigation of the critical magnetisation distribution of the
2d Ising model from the scaling region all the way to saturation. We also make some comments
on the possibility of going beyond the multicanonical ensemble to `optimal' MC sampling.
Second, we investigate an isostructural solid-solid phase transition in a system consisting of
hard spheres with a square-well attractive potential. Recent work, which we have conrmed,
suggests that this transition exists when the range of the attraction is very small (width of
attractive potential/ hard core diameter 0:01). First we study this system using a method of
free energy measurement in which the square-well potential is smoothly transformed into that
of the Einstein solid. This enables a direct comparison of a multicanonical-like method with
thermodynamic integration. Then we perform extensive simulations using a dierent, purely
multicanonical approach, which enables the direct connection of the two coexisting phases. It is
found that the measurement of transition probabilities is again advantageous for the generation
of the multicanonical ensemble, and can even be used to produce the nal estimators.
Some of the work presented in this thesis has been published or accepted for publication:
the references are
G. R. Smith & A. D. Bruce, A Study of the Multicanonical Monte Carlo Method, J. Phys.
A. 28, 6623 (1995).
G. R. Smith & A. D. Bruce, Multicanonical Monte Carlo Study of a Structural Phase Transition, to be published in Europhys. Lett.
G. R. Smith & A. D. Bruce, Multicanonical Monte Carlo Study of Solid-Solid Phase Coexistence in a Model Colloid, to be published in Phys. Rev. E
ii
Declaration
This thesis has been composed by myself and it has not been submitted in any previous application for a degree. The work reported within was executed by me, unless otherwise stated.
March 1996
iii
for Christina and Ken
iv
Acknowledgements
I would like to thank the following people: Alastair Bruce for all his guidance, help and encouragement, and for never shouting at me, even when I richly deserved it; Stuart Pawley and
Nigel Wilding for many useful and pleasant discussions; David Greig, Stuart Johnson, Stephen
Bond and Stephen Ilett for carefully reading and commenting on the nal draft of this thesis;
Peter Bolhuis for making available the results of [171]; my atmates; and all my other friends
in Edinburgh and elsewhere.
I also gratefully acknowledge the support of a SERC/EPSRC research studentship.
v
Contents
1 Introduction
1
1.1 Thermodynamics, Statistical Mechanics, Free Energy and Phase Transitions .
1.1.1 Phase Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2 The Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3 Statistical Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.4 O-Lattice Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Calculation in Statistical Mechanical Problems . . . . . . . . . . . . . . . . .
1.2.1 Analytic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Monte-Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 Monte-Carlo Simulation at Phase Transitions . . . . . . . . . . . . . .
1.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Review
1
3
4
9
22
27
27
30
37
43
44
2.1 Integration-Perturbation Methods . . . . .
2.1.1 Thermodynamic Integration . . . . .
2.1.2 Multistage Sampling . . . . . . . . .
2.1.3 The Acceptance Ratio Method . . .
2.1.4 Mon's Finite-Size method . . . . . .
2.1.5 Widom's Particle-Insertion Method .
2.1.6 Histogram Methods . . . . . . . . .
2.2 Non-Canonical Methods . . . . . . . . . . .
2.2.1 Umbrella Sampling . . . . . . . . . .
2.2.2 Multicanonical Ensemble . . . . . .
2.2.3 The Expanded Ensemble . . . . . .
vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
46
46
51
54
57
59
61
63
63
65
71
2.2.4 Valleau's Density-Scaling Monte Carlo . .
2.2.5 The Dynamical Ensemble . . . . . . . . .
2.2.6 Grand Canonical Monte-Carlo . . . . . .
2.2.7 The Gibbs Ensemble . . . . . . . . . . . .
2.3 Other Methods . . . . . . . . . . . . . . . . . . .
2.3.1 Coincidence Counting . . . . . . . . . . .
2.3.2 Local States Methods . . . . . . . . . . .
2.3.3 Rickman and Philpot's Methods . . . . .
2.3.4 The Partitioning Method of Bhanot et al.
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Multicanonical and Related Methods
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 The Multicanonical Distribution over Energy Macrostates . . . . . . . .
3.1.2 An Alternative|The Ground State Method . . . . . . . . . . . . . . . .
3.1.3 The Multicanonical Distribution over Magnetisation Macrostates . . . .
3.2 Techniques for Obtaining and Using the Multicanonical Ensemble . . . . . . . .
3.2.1 Methods Using Visited States . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Incorporating Prior Information . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Methods Using Transitions . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4 Finite-Size Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.5 Using Transitions for Final Estimators: Parallelism and Equilibration .
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Free Energy and Canonical Averages of the 2d Ising Model . . . . . . .
3.3.2 A Comparison Between the Multicanonical Ensemble and Thermodynamic Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3 P (M ) at = c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Beyond Multicanonical Sampling . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 The Multicanonical and Expanded Ensembles . . . . . . . . . . . . . . .
3.4.2 The Random Walk Problem . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.3 `Optimal' Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.4 Use of the Transition Matrix: Prediction of the `Optimal' Distribution .
vii
.
.
.
.
.
.
.
.
.
.
77
79
81
82
84
84
85
86
87
88
95
. 95
. 96
. 102
. 104
. 106
. 112
. 114
. 127
. 140
. 144
. 152
. 152
. 154
. 158
. 168
. 168
. 171
. 176
. 181
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
4 A Study of an Isostructural Phase Transition
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Comparison of Thermodynamic Integration and the Expanded Ensemble|Use
of an Einstein Solid Reference System . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Thermodynamic Integration . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Expanded Ensemble with Einstein Solid Reference System . . . . . . . .
4.2.3 Other Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Direct Method|Multicanonical Ensemble with Variable V . . . . . . . . . . .
4.3.1 The Multicanonical NpT -Ensemble and its Implementation . . . . . . .
4.3.2 The Pathological Nature of the Square-Well System . . . . . . . . . . .
4.3.3 Finding the Preweighting Function . . . . . . . . . . . . . . . . . . . . .
4.3.4 The Production Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.5 Canonical Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.6 Finite-Size Scaling and the Interfacial Region . . . . . . . . . . . . . . .
4.3.7 Mapping the Coexistence Curve . . . . . . . . . . . . . . . . . . . . . .
4.3.8 The Physical Basis of the Phase Transition . . . . . . . . . . . . . . . .
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
191
. 191
. 198
. 200
. 209
. 212
. 214
. 216
. 219
. 220
. 232
. 236
. 242
. 246
. 250
. 254
5 Conclusion
260
A Exact Finite-Size Scaling Results for the Ising Model
264
B The Double-Tangent Construction
268
C Statistical Errors and Correlation Times
271
D Jackknife Estimators
275
E Details of the Square-Well Solid Simulation
277
viii
Chapter 1
Introduction
We begin by giving necessary background to the work carried out in this thesis. We shall deal
with the thermodynamical and statistical mechanical notions that underpin our understanding
of phase transitions, in particular, the ideas of entropy and free energy. We shall describe the role
of computer simulation, especially Monte-Carlo simulation, and explain why the measurement
of free energy presents particular challenges.
1.1 Thermodynamics, Statistical Mechanics, Free Energy
and Phase Transitions
Who could ever calculate the path of a molecule? How do we know that the
creation of worlds are not determined by falling grains of sand?
FROM Les Miserables
VICTOR HUGO
Thermodynamics and statistical mechanics are theories which describe the behaviour of a
bulk material comprising large numbers of interacting particles distributed in space, for example
the molecules of a solid, liquid or gas.
Thermodynamics, as its name implies, is concerned with the bulk energy of the material,
work done on it and heat ows in and out of it. It does not acknowledge explicitly the microscopic interactions of the particles that compose the bulk|indeed the theory evolved from
1
CHAPTER 1. INTRODUCTION
2
empirical observations and experiment at a time when the microscopic nature of materials was
not understood at all. For this reason the development of the theory was itself a painful process [1] and it was not connected with the physical principles of mechanics until the work of
Boltzmann and Gibbs at the end of the last century, and of Einstein at the start of this, led to
the development of statistical mechanics. For a traditional exposition of thermodynamics see
[2] or the rst chapters of [3].
How might we attempt to use a model of the microscopic structure of a macroscopic sample
of a material to calculate its properties? In a Classical Mechanical framework, it is clear that,
given the initial positions and velocities of all the particles that compose the system (i.e. the
initial microstate of the system), and knowledge of the interactions, it is possible in principle
to calculate the state at any later time from the known laws of dynamics; but it is equally clear
that such a calculation will never be possible in practice, for two reasons: rstly, because the
number of particles in a macroscopic system is O(1023 ), which is simply too large for any existing or foreseeable computer, and secondly, because it is likely that the dynamics are in detail
chaotic, that is to say, the later evolution depends with enormous sensitivity on the initial state.
Nevertheless, it is apparent from our observations of real systems that this microscopic complexity does not result in macroscopic complexity. Materials have well-dened bulk properties
that depend only on the values of a few thermodynamic control parameters, such as pressure,
temperature, magnetic eld, etc, not on the continually varying positions and velocities of all
their innumerable constituent particles. Moreover, these bulk properties seem to all but the
most precise measurements not to vary with time (provided the control parameters remain
xed), even though there is continuous microscopic activity, and when suitably normalised they
are the same for all macroscopic samples, large or small1. This gives a hint that it might be
possible to construct a theory which predicts the bulk properties without any details of the
kinetics and which depends in only a simple way on the number of particles|an enormous
simplication. This is indeed what statistical mechanics achieves: on the basis of a knowledge
only of the interactions between the constituent particles, it provides expressions for the bulk
properties. We shall not give a detailed derivation of the formalism of statistical mechanics
here, or connect it fully with thermodynamics, and results will often be stated without proof.
1 As long as they are large enough that the number of particles near the boundary of the sample is small
compared with the number in the bulk. The bulk properties of a material are normally proportional to the
quantity of material present|they are extensive. When we refer to a quantity as a density, it means that we
have divided an extensive quantity by the number of particles. The result is intensive, or eectively independent
of the quantity of material. The control parameters are naturally intensive.
CHAPTER 1. INTRODUCTION
3
A full exposition can be found in [4] and other useful references are [3, 5, 6, 7, 8].
1.1.1 Phase Transitions
We have just noted that the bulk properties of a material, such as the internal energy density,
or the magnetisation density (if the material is magnetic) depend on a few control parameters,
like the temperature. Normally they vary smoothly with the control parameters, but there
exist a few points where they may `jump' suddenly from one value to another, i.e. points
where the material changes its properties dramatically. Where this happens we say that the
material undergoes a phase transition. The property that `jumps' can be used as a so-called
order parameter of the transition: by a suitable choice of origin we can make it zero in one
phase and nite in the other, or of the same magnitude and opposite sign in the two phases,
(which is how it is naturally dened for the model we shall introduce in section 1.1.2). Examples
of phase transitions that are found in almost all simple atomic or molecular materials are the
melting and boiling transitions. Here there are two obvious order parameters: one is the internal
energy density, which changes by LH pV , (where LH is the latent heat of the transition,
p the pressure and V the change in volume) and the other is the specic volume v = V=N
(where N is the number of particles in the system) which changes more than a thousandfold at
the liquid-vapour transition of water at atmospheric pressure. Transitions of this kind, where
the order parameter is discontinuous, are called rst order phase transitions. Exactly at the
transition values of the control parameters, volumes of the material in states characteristic of
the two `sides' of the phase transition can exist in contact with one another|coexisting phases.
There is another class of phase transitions, known as second order or continuous transitions, in
which there is no latent heat and the order parameter itself does not change discontinuously, but
instead experiences large uctuations, and its derivative with respect to the one of the control
parameters diverges. An example of this is the ferromagnetic transition of iron, in which
the spontaneous magnetisation (which is the order parameter for the transition) disappears
continuously at a temperature of 1043 K, while the susceptibility diverges. The point in the
space of control parameters where a continuous phase transition occurs is called a critical point.
Critical points occur as the limit of a series of rst order transitions in which the change in the
order parameter has been getting progressively smaller (though not all lines of rst order phase
transitions terminate in critical points; solid-liquid melting curves do not appear to do so). The
unusual phenomena associated with critical points have been the subject of a huge amount of
CHAPTER 1. INTRODUCTION
4
study in the past thirty years or so [9].
Clearly we would like to know which phase will be found at particular values of the control
parameters, and, particularly, at what values of these control parameters phase transitions will
occur. We would also like to be able to calculate the bulk mechanical and thermodynamic
properties of the material in its various phases. Statistical mechanics provides answers to these
questions in principle; however we shall later see that the expressions that it enables us to write
down, particularly those relating to the location of phase transitions, are dicult to evaluate
accurately either analytically or computationally. The main concern of this thesis has been the
investigation of various computational techniques designed to overcome these diculties. We
shall explain our intentions in more detail when we have provided more background to bring
the particular problems associated with phase transitions more clearly into focus. Computer
simulations are introduced in section 1.2 and reviewed in more detail in chapter 2. In the
remainder of section 1.1 we shall introduce relevant aspects of statistical mechanics itself. We
shall illustrate our explanation of statistical mechanics using one of the models that will be
widely used in the investigations of this thesis, the Ising model.
1.1.2 The Ising Model
First [the gypsies] brought the magnet. A corpulent gypsy...who introduced himself as Melquades, made a showy public demonstration of what he himself described
as the eighth marvel of the wise alchemists of Macedonia.
FROM One Hundred Years of Solitude
GABRIEL GARCIA MA RQUEZ
The Ising model is a simple model of a magnetic system, where the `particles' are classical
`spins' on a lattice. It is dened by the characteristic that each spin s may exist in one of
two state, called `up' and `down' or `+1' and ` 1'. There is a coupling J which denes an
interaction energy between the spins at sites i and j : Eij = Jij si sj . (The choice of sign is a
convention.) Aside from this, the system may have any dimensionality, any type of lattice and
there may also be an external magnetic eld. However we shall be particularly concerned with
the Ising model on an L L 2-dimensional square lattice with the choice
CHAPTER 1. INTRODUCTION
8
>
<
Jij = >
:
5
1 if i and j are nearest neighbours
0 otherwise
There is no loss of generality in choosing J = 1 since, as will become apparent in section 1.1.3,
any other choice just results in a scaling of temperature. We shall impose periodic boundary
conditions, (P.B.C.) meaning that spins (i; L) and (i; 1) are neighbours for all i, as are spins
(L; j ) and (1; j ) for all j . Thus all spins are equivalent and each interacts with its four nearest
neighbours only. The coupling is positive, so that the spin-spin energy is lower when spins are
aligned parallel and higher when they are antiparallel. The ground state is therefore a state
where all spins are parallel, for which reason the Jij > 0 model is known as the ferromagnetic
Ising model. The total energy due to spin-spin interactions, which we shall call congurational
energy and represent2 by E , is therefore given by
E () =
where
X
<ij>
X
<ij>
si sj
denotes a sum over nearest-neighbour pairs and represents a particular arrange-
ment of all the spins (also called a conguration or microstate). The number of spins in the
system is N = L2. The magnetisation M is simply
M () =
X
i
si
which interacts with an external magnetic eld H to give an extra term HM () in the
energy3.
The total energy (often called the Hamiltonian) is therefore ETOT
ETOT () = E () HM ()
(1.1)
Despite the simplicity of the Ising model, it exhibits a phase transition driven by the external
In the literature the symbols H, U and V are also frequently used for congurational energy.
The choice of sign is, once again, conventional: one may characterise the energy of a magnetic material
either in terms of the energy stored in the eld or the work done on the solid, and both conventions are in use.
We have chosen to consider the second case.
2
3
CHAPTER 1. INTRODUCTION
6
eld H : at temperatures below the critical temperature Tc this is a rst-order transition, at Tc
it becomes continuous and for T > Tc it disappears. The order parameter is the magnetisation
M or magnetisation density m = M=N . This phase behaviour is qualitatively (and in some
respects quantitatively) very similar to that of a real ferromagnet like iron. We have therefore
to look at the phase behaviour as a function of the two variables H and T ; however it makes
for slightly tidier notation if instead of T we use the inverse temperature = 1=kB T , so we
shall do this from the start. First consider the dependence of m on H for three dierent values
of (see gure 1.1).
1
2
m
3
m
m
mo
H
H
H
-mo
β>β
c
β=β
c
β<β
c
Figure 1.1. Schematic diagrams of magnetisation density vs. external magnetic eld for the
Ising model for three dierent values of inverse temperature.
The qualitative behaviour at high H is the same in all cases; m ! sign(H ). However the
behaviour as H ! 0 depends on . At high temperatures (or low ), as in graph 3 of gure 1.1,
m ! 0 as H ! 0, while at high , (graph 1), there is a residual magnetisation left after the eld
has gone to zero, m ! m0 ( ) as H ! 0+ and m ! m0( ) as H ! 0 . Therefore, if at inverse
temperature > c , the eld is taken from just below zero to just above it, the magnetisation
`jumps' discontinuously by 2m0 ( ). We have, therefore, a rst order phase transition. The
point of crossover between these two regimes occurs at c (graph 2). Here there is no residual
magnetisation at H = 0 but @m=@H jH =0 diverges, so there is a continuous phase transition
and c is the inverse critical temperature. If the system is cooled from < c to > c
at H = 0 then the system `chooses' either the +m0 or m0 state with equal probability|a
phenomenon known as spontaneous symmetry breaking.
The usual way of summarising phase behaviour of this kind is by way of phase diagrams.
CHAPTER 1. INTRODUCTION
7
1
2
H
mo
βc
β
βc
β
Figure 1.2. Schematic (1) m and (2) H phase diagrams for the Ising model.
The two most usual kinds are shown in gure 1.2. We may plot m0 , the order parameter for the
transition, against temperature or inverse temperature; this is done in the diagram on the left.
Note that if m is free to vary then those points lying between +m0 and m0 for > c do not
represent equilibrium points of the system; however if we constrain the total magnetisation to
be constant, then this region of the phase diagram contains states (which we call mixed states)
which describe two-phase coexistence; the system under these conditions separates into large
`domains' each with a magnetisation equal to +m0 or m0 , the domains having that volume
which keeps the total magnetisation at its constrained value. The other kind of phase diagram
is shown in the diagram on the right of gure 1.2. Here the axes are the two elds, H and ;
the phase transitions then appear as a single line which in the Ising case lies along the axis
(H = 0) and ends at the critical point. This line is called the coexistence curve. All the mixed
states also lie on this line, because they dier only in their value of the total magnetisation. In
mathematical terms, the coexistence curve is the line on which m is not a single-valued function
of H and . In fact, the two phase diagrams are really projections of a surface in m; H; space,
on which each point represents the value of m produced by the elds H and . This surface is
drawn in gure 1.3. The (m; H ) graphs in gure 1.1 are constant- sections through it.
We should remark here that both a computer-simulated Ising model and a real ferromagnet,
particularly one made of a pure metal, are liable to show the phenomenon of hysteresis instead
of the sudden rst order phase transition at H = 0. If the system has a positive m and H
CHAPTER 1. INTRODUCTION
8
m
β
H
β
c
Figure 1.3. The surface of state points in m; H; space for the Ising model.
is decreased and then made negative, m remains positive at rst, and requires the applied
eld to be appreciably negative to drive it to its (negative) equilibrium value. The same thing
happens if H is made positive and m is initially negative. This phenomenon, generally called
metastability, occurs widely in nature. Metastability interferes with the measurement of the
true location of phase transitions, both in experiments and in computer simulation; it will be
a frequent concern of ours to try to develop simulation techniques that are as little aected by
it as possible.
The two-dimensional Ising model with H = 0 is particularly interesting because many of
its properties, including the phase behaviour, can be exactly calculated analytically, so it is an
ideal testbed for computer simulation methods which can then be applied to other systems.
There exist exact solutions for the internal energy, the heat capacity and the free energy (which
we dene in the next section), both for nite systems [10] and in the N ! 1 limit [11]. In the
latter case m0 can be evaluated too. A detailed exposition of many of the the exactly-known
properties of the 2d Ising model can be found in [12]. A typical conguration (typical in the
sense that its magnetisation lies near the peak of the probability density function (p.d.f.) of M )
p
of the 64 64 2d Ising model at the critical point (H = 0, c = (1=2) ln( 2+1) = 0:440686 : : :)
is shown in gure 1.4. This system is just large enough for the self-similar structure of the
critical clusters of dierent magnetisations to be becoming apparent; this behaviour is one of
the most interesting features of the critical point, and is central to the renormalisation group
CHAPTER 1. INTRODUCTION
9
theory of continuous phase transitions. Further detail and references can be found in section
1.2.1.
Figure 1.4. A typical conguration of the critical 642 2d Ising model. White corresponds to
si = +1, black to si = 1.
Now in order to explain this phase behaviour of the Ising model we must introduce some
statistical mechanics. The initial discussion is general, but we shall return to the Ising model
as a specic example to clarify the discussion of phase transitions.
1.1.3 Statistical Mechanics
We shall begin by stating the equation which is the basis of statistical mechanics:
P can () = exp( EZTOT ())
(1.2)
The probability distribution given by this equation is called the Boltzmann Distribution.
For a standard derivation of it, see [4] or [5]; for an alternative (and simpler) derivation using
information theory, see [13]. We shall motivate it here by showing that it is consistent with,
and illuminates connections between, the simple physical properties of bulk material that we
described at the start of this chapter. We shall start by considering materials away from phase
transitions, moving on to phase transitions later.
CHAPTER 1. INTRODUCTION
10
Equation 1.2 relates the total energy ETOT () to P can (), the probability that the system in
equilibrium will be observed in conguration . Note that only the energy of the conguration
appears in this expression; there are no momentum terms and no dynamics. In the sense that
a probability can be interpreted as a time-average, the dynamics can be regarded as having
been averaged out, though this interpretation is not necessary to the truth of equation 1.2, as
is discussed in [14].
The normalising factor Z , the partition function, is given by
Z=
where
X
f g
X
f g
exp( ETOT ())
(1.3)
is a sum over all the microstates of the system, which, for the 2d Ising model, means
over all congurations of the spins. The logarithm of the partition function, the Gibbs free
energy, is dened by
G(; H ) = 1 ln Z
(1.4)
It is also a very important quantity; we shall nd (section 1.1.3) that it is g = G=N that
determines the location of phase transitions. The set of microstates available to the system,
over which the sum runs, is called the ensemble; here we are considering an ensemble where the
system's volume, temperature and number of particles are constant. The choice of ensemble will,
in general, aect how many terms there are in ETOT ; in the Ising model canonical ensemble it
is in general necessary to include the HM term as well as the congurational energy E (though
in most of the cases we shall consider H will be zero). However in an ensemble where M was
constant, i.e. we considered only those congurations with a particular magnetisation, the HM
term would be the same for all congurations and so irrelevant (cancelling out above and below
in equation 1.2).
Now let us consider what predictions the theory makes for the values of the bulk properties
(or observables) of the system. These observables, such as internal energy, are `canonical
averages,' averages over the Boltzmann distribution: we dene an operator which acts on a
conguration to give a number which is the value of the property for that conguration. Then
the observed bulk property will be the average of the operator over the Boltzmann distribution
of the congurations of the system. For a general operator O we have
CHAPTER 1. INTRODUCTION
< O >=
X
f g
11
O()P can () = (1=Z )
X
f g
O() exp( E ())
(1.5)
so, for congurational energy (internal energy) we have
< E >= (1=Z )
X
f g
E () exp( E ())
(1.6)
If the interactions are fairly short-ranged, as for most molecular materials, (and the Ising
model) then we expect < E > N ; this is in fact a general property of what are called `normal
systems' in statistical mechanics [15], [16, chapter 2]. Important systems that are not normal
include electrically charged and gravitationally bound ones. Now let us consider the the heat
capacity CH = @ < E > =@T . This is also known to be extensive for short ranged-interactions;
if it were not we could violate conservation of energy by cutting up a system, heating it,
reassembling it and cooling it. However, by dierentiating equation 1.6 it is easy to show [6,
p
chapter 3] that E , the r.m.s. size of uctuations in < E >, is given by E CH . Thus
p
E N , and so the uctuations in the internal energy density e < E > =N die away
p
as 1= N . For macroscopically-sized samples of material, the uctuations will be too small to
observe and < E > is eectively identical to E , the most probable value of the energy. P can (E )
is thus very sharply peaked about its maximum (or mode) at E ; in the single-phase region
there is only one mode. The large-N limit is also called the thermodynamic limit, since here
the predictions of statistical mechanics become fully consistent with those of thermodynamics.
This is a convenient time to introduce the notion of density of states, (O), where O is an
operator on the microstates as before. We dene
(O) =
X
f g
(O O())
so that it is just the number of states for which the operator has the value O. The possible
values of O are called the `O-macrostates' of the system. The total number of microstates is
TOT =
X
f g
1
which may or may not be nite (though it is in the Ising case). Generally TOT increases
exponentially with system size: TOT exp(cN ) for some constant c. Let us rst look at the
CHAPTER 1. INTRODUCTION
12
particularly simple case of O = E , so we are considering the congurational energy macrostates.
The Boltzmann distribution, where P can() is a function of E () only, implies that here all
microstates within a macrostate are equally probable, so the probability of an E -macrostate is
P can(E ) = (1=Z )
(E ) exp( E )
(1.7)
which may also be written as
P can (E ) = (1=Z ) exp[ (TS (E ) E )]
or
(1.8)
P can (E ) = (1=Z ) exp[ F (E )]
where S (E ) = kB ln[
(E )] is the entropy and F (E ) = E TS (E ) is the Helmholtz free energy,
also sometimes called the free energy functional. These quantities are also extensive, and
densities f = F=N and s = S=N are dened as usual. F and S dened here are in fact identical
in the L ! 1 limit to the quantities for which we use the same symbols in thermodynamics.
The identity can be shown rigorously [3, chapters 15-17].
Equation 1.8 makes it clear that the probability of a macrostate is controlled by an interplay of its energy and entropy. At low temperatures (large ) the energy term dominates and
the most probable macrostates are those of low energy. At high temperatures (small ) the
energy has less eect and those macrostates which comprise the largest number of microstates,
that is, have the largest entropy, are the most probable. It is interesting to consider what the
`typical' microstates of low-temperature and high-temperature macrostates are like. At low
temperature, low energies are favoured, and these are best achieved, for attractive forces, by
surrounding each particle with as many neighbours as possible, or, for the Ising model, by
surrounding each spin with neighbours having the same orientation. These low-energy congurations are typically highly ordered and few in number: there are only a few close-packed crystal
lattices, and only two ways of arranging the Ising model's spins so that they are all parallel.
Conversely, at high temperatures, all microstates are equally probable, whatever their energy,
so those macrostates having large numbers of microstates are favoured. However, consideration
of producing congurations by placing particles or orienting spins randomly (which would generate all microstates with equal likelihood) shows that these congurations will almost always
CHAPTER 1. INTRODUCTION
13
have rather high energy. Thus the notion of entropy as a measure of order begins to emerge.
It is interesting to see how such an apparently unquantiable idea becomes associated with the
multiplicity of congurations of a particular energy. This happens because the nature of the
force is such that low energy demands very ordered congurations, and these are geometrically
restricted to be few in number, while the overwhelming majority of the much more numerous
disordered congurations have high energy.
We can now rewrite the canonical averages as averages over the macrostates; for example
< E > = (1=Z )
X
= (1=Z )
X
E
E
E (E ) exp( E )
E exp[ F (E )]
F (E ) and P can (E ) are shown in gure 1.5. We showed above that the fractional uctuations
p
in E about E die away like 1= N , so P can (E ) is very sharply peaked about its maximum at
E (the study of the scaling of probability distribution functions and the related canonical averages with system size is known as nite-size scaling theory [17]). Since P can (E ) exp[ F (E )],
the behaviour of P can (E ) leads to the thermodynamic principle that a system seeks out the
value of E that minimises F (E ).
p
In fact, the extensivity of F (E ) is another conrmation of the fact that P can (e) 1= N ; if
we write P can (e) in terms of the bulk free energy density f (e) = limL!1L dFL (eLd) (writing
eLd for E ), it is found that
PLcan(e) exp( Ldf (e))
exp( f (e)=2e2 )
Thus e2 = L d=2, and so E2 = Ld=2 , i.e. P can (E ) is a Gaussian with half-width Ld=2
We can describe the Ising model's magnetisation in a similar way to that in which we described the energy:
< M >= (1=Z )
X
f g
M () exp[ E () + HM ()]
As was the case with < E >, it can be shown that, away from the critical point, consideration
CHAPTER 1. INTRODUCTION
14
exp(F(E))
F(E)
can
p (E)~exp(-F(E))
δE
<E>
E
Figure 1.5. A Schematic diagram of the free energy functional F (Ep) and the macrostate
probability P can (E ). Note that < E > N while the half-width E N .
of the extensivity of < M > and of the magnetic susceptibility = @M=@H , leads to the
p
conclusion that the uctuations in M grow only as N , so the uctuations in m = ML d
about its maximum m disappear as L ! 1. Thus < M >! M , where M is the most
probable4 value of M . It will be noted that there is an obvious similarity in the behaviour of
the uctuations in E and M : in each case the uctuation is related to the response function
Y = @Y=@y where Y is E or M and y is the eld that couples to it, either T or H . Such
behaviour is in fact extremely general; it is described by the uctuation-dissipation theorem [16,
chapter 8] which relates the uctuation Y in a quantity Y to the response function Y .
The density of magnetisation states is now
(M ) =
X
f g
(M M ())
This time all microstates within an M -macrostate are not equally probable, because they
may have dierent congurational energies. Therefore the appropriate free energy functional
F (; M ) is now dened by
At the critical point, however, we have already said that diverges. This implies that the uctuations in
become large, and this is indeed the case: they are large both in absolute size (M is almost extensive) and
in spatial extent, extending over the entire system. In this case < M >6! M . See appendix A for a futher
discussion.
4
M
CHAPTER 1. INTRODUCTION
15
exp[ F (; M )] =
X
f g
(M M ()) exp[ E ()]
(1.9)
so that the probability of an M -macrostate becomes
P can(; M ) = (1=Z )
X
f g
(M M ()) exp[ (E () HM ())]
= (1=Z ) exp[ F (; M ) + HM ]
and we can write
< M >= (1=Z )
Z=
and
X
M
X
M
(1.10)
M exp[ F (; M ) + HM )]
exp[ F (; M ) + HM ]
X
G(; H ) = 1 ln exp[ F (; M ) + HM ]
M
Now let us extend the discussion from the single phase to phase transitions. Taking the
2d Ising model as a paradigm, we shall now use equation 1.10, along with the results for
the behaviour of P can (; M ) as a function of system size, and physical arguments about the
nature of the favoured congurations at various temperatures, to explain the appearance of
the dramatic jumps in canonical averages that we know are characteristic of rst-order phase
transitions.
We now write FL (; M ) instead of F (; M ) to make clear its dependence on the nite-size of
the system. Consider, therefore, the shapes of FL (; M ) for some nite L above and below c.
For the 2d Ising model these can be shown [6, chapters 5-6], [17, chapter 11] to be as illustrated
schematically in gure 1.6.
Diagram (1) describes the situation at high temperatures. In this regime the probability
of a macrostate is dominated by its multiplicity, and the eect of the (average) energy of
the congurations is small. Thus the favoured macrostates are those around M = 0, which
correspond to spins being chosen with random orientation. Thus PLcan (; M ) has a single
maximum, and FL (; M ) has a single minimum at M (0) = 0, describing the single phase
that exists there. The limiting bulk free energy density fb (; m) = limL!1L dFL (; mLd) is
approached quite quickly as L increases and thus (when viewed on the right length scale) looks
CHAPTER 1. INTRODUCTION
1
16
F
2
L
F
L
M
M
Figure 1.6. Schematic diagram of the free energy functional FL(; M ) (1) for < c, (2) for
> c .
very similar to FL (; M ), as shown in diagram (1) of gures 1.6 and 1.7.
1
f
2
m
f
m
Figure 1.7. Free energy density f (; m) in the limit N ! 1, (1) for < c, (2) for > c.
The behaviour at low temperature is dierent, as shown in the second diagram of gure
1.6. FL (; M ) now has two minima at M which will describe the two phases of the system,
and correspondingly fL(; m) has two minima at m. These come from the dominance of
energy over entropy at low temperatures; low energy congurations are favoured even though
their multiplicity is low, and these congurations tend to have almost all their spins aligned
either up or down, that is to say, they have nite magnetisation. The symmetry of FL (; M )
CHAPTER 1. INTRODUCTION
17
follows from the symmetry between positive and negative magnetisation here. Correspondingly,
PLcan (; M ) must be symmetric with two maxima.
However, the scaling this time is
8
>
<
FL (; M ) >
:
Ld for M > M and M < M Ld 1 for M < M < M The rst line here is normal scaling behaviour; the microstates that dominate FL (; M ) have
a fairly uniform magnetisation. We shall digress for a while to explain the scaling in the region
between the modes of P can(; M ).
The Interfacial Region.
The Ld 1 scaling in the region between the two minima occurs because, although there are
of course very many microstates where the magnetisation is roughly uniform throughout the
conguration (these are the `typical' microstates at high temperature), these have relatively
high energy and are strongly suppressed at low temperature. The best way the system can
achieve low energy when M 6= M is to go into mixed states, state of phase coexistence,
where congurations consist of regions of magnetisation density m and +m separated by
interfaces (these are thus also known as interfacial states). The overwhelming low-temperature
contribution to FL (; M ) comes from such microstates. These regions have a free energy density
fb (; m ) which is typical of the bulk states while the interfaces introduce an interfacial free
energy fs . fs is a free energy because there is an internal energy necessary to create a particular
interfacial area and an entropic part related to the number of ways that such an interface can
be positioned in the system in order that the phases of dierent magnetisations are present in
the right proportions to produce the overall magnetisation M .
For a nite-sized system in the interface region one can therefore write
F (; M ) Ld fb + c(M )Ld 1fs
(1.11)
The second term is the eect of the interface. Its `area' in d space dimensions is Ld 1 while
c(M ) is a geometrical factor determined by the shape of the interface. As well as M , c(M ) also
depends on the boundary conditions of the system. It is given by the Wul construction [18].
The existence of the interface has interesting eects on the behaviour of the system. Its
CHAPTER 1. INTRODUCTION
18
contribution to the total free energy goes to innity with L, but more slowly than that of the
bulk free energy. Therefore its fractional contribution to the total free energy is cLd 1fs =Ldfb =
cfs =Lfb ! 0 as L ! 1, and so in the thermodynamic limit it has no eect on bulk properties,
nor on the location of phase transitions. This is reected in diagram (2) of gure 1.7, which
looks qualitatively dierent from the corresponding diagram of gure 1.6. We see that the
limiting fb (; m) as L ! 1 is constant between +m and m, lacking the central maximum
of FL (; M ), which disappears as 1=L. Thus the limiting fb (; m) is convex, as required by
thermodynamical arguments (see for example [3, chapter 8]). However we would not be right
to conclude that the interface has no eect at all on the system in the thermodynamic limit.
At coexistence
P can(Minterf )=P can(M ) / exp( cLd 1 fs ) ! 0 as L ! 1
(1.12)
so that the interfacial macrostates are exponentially suppressed compared to the pure states.
Therefore in order to see the coexistence of phases in the Ising system it is necessary to constrain
the total system at constant M , so that the system is forced into one of the mixed states. If
we had looked only at f (m) to try to predict this behaviour, that is, if we had taken only the
leading term (Ld f (m)) in the expansion of FL (; M ) in L, we would have concluded, wrongly,
that P can(Minterf )=P can(M ) ! 1, implying that even in the thermodynamic limit a system
with an unconstrained magnetisation should be able to pass freely between +m0 and m0.
The presence of this large interfacial region of macrostates whose probability goes to zero in the
large system limit, but whose free energy density approaches that of the pure phases, provides
a large measure of explanation for certain non-equilibrium properties of statistical mechanical
systems; in particular, metastability. For example, we now see that to change the sign of M at
the phase transition, given that the spins must be ipped piecemeal, requires the creation of
an interface between regions of +ve and ve magnetisations; that is, it implies the necessity
of passing through the unlikely interfacial region. Similar considerations aect all rst-order
phase transitions, and are a bugbear of computer simulations of them (see sections 1.2.3 and
4.3).
Of course, this explanation is not the whole story because it takes no account of the dynamical
mechanism by which the metastable state eventually decays, which is known to be via the
nucleation of a droplet of some critical size [17, chapter 11], which then grows. We would not
CHAPTER 1. INTRODUCTION
19
necessarily expect the order parameter M of the whole system to be a suitable quantity for
studying this (although some very recent work [19] has in fact suggested that a surprisingly
good description of the relaxation of metastable states may be obtained from consideration of
M alone).
We now return to the main thread of the argument. We shall now write an expression for the
probability of a phase, which is the sum of P can (; M ) over those values of M characteristic
of each phase, which we shall take as being M > 0 and M < 0 here. In fact, of course,
because of the shape of P can(; M ), only those M -values within O(Ld=2 ) of M are really
`characteristic' of the phase; but equally very little error (and an error which is increasingly
small as L increases) is made by including all the states with +ve magnetisation in one phase
and all with -ve magnetisation in the other. Therefore, labelling the phases with A and B ,
PAcan (; H ) =
X
M 2A
P can (; M; H ) = ZA (; H )=Z (; H )
(1.13)
where ZA is dened as
ZA (; H ) =
X
M 2A
exp[ F (; M ) + HM )]
and similarly for B (we retain the HM term for generality though for the Ising model transition
occurs at Hcoex = 0 because of the symmetry of F (; M )). The relative probabilities of the
two phases are
PAcan =PBcan = ZA =ZB
and we can dene restricted Gibbs free energies on a particular phase alone:
GA (; H ) = 1 ln ZA (; H )
so that
PAcan =PBcan = exp( [GA (; H ) GB (; H )])
(1.14)
CHAPTER 1. INTRODUCTION
20
At phase coexistence, PAcan (; Hcoex ) = PBcan (; Hcoex ) = 1=2 so
ZA(; Hcoex ) = ZB (; Hcoex )
and
GA (; Hcoex ) = GB (; Hcoex )
demonstrating the fundamental result that the condition for a phase transition is that the
specic Gibbs functions of the two phases should be equal. However, the shape of P can (; M )
means that the quantities GA and GB can be related to the basic free energies G(; H ) and
F (; M ):
ZA(; Hcoex ) = ZB (; Hcoex ) = Z (; Hcoex )=2
so
GA (; Hcoex ) = GB (; Hcoex ) = G(; Hcoex ) 1 ln 2
that is,
gA (; Hcoex ) = gB (; Hcoex ) = g(; Hcoex ) + O(1=N )
so to within a correction which vanishes in the large N limit, gA and gB are also equal to the
specic Gibbs function of the system considered as a whole.
The narrowness of the peaks of P can (; M ) is also responsible for the increasingly abrupt
change in < M > at the transition. Consider the case when the specic Gibbs functions of the
two phases are not quite equal, because of a slight change H in the applied eld from its
equilibrium value. Since only states very near to M contribute, we have
PAcan=PBcan exp(2 HM )
(1.15)
If > c , M > 0 and M < M > N . Therefore any macroscopic H will cause
PAcan =PBcan ! 0 or 1 depending on its sign, or, put dierently, we can make PAcan =PBcan < by
applying
ln H = 2 < m
>N
CHAPTER 1. INTRODUCTION
21
Therefore we can see the `jump' in the order parameter emerge from the exponential dependence
of the probability of a phase on the size of the system, though of course if H is very small
there may be metastability problems. If < c , then < m >= 0 and the response to H is
smooth.
The fact that P can (; M ) is so sharp about its maxima also enables us to write the equilibrium condition in another way. If, under the action of eld H , the macrostate of maximum
p
probability in phase A is M (H ) with probability P can (M ), then PAcan NP (M ). Thus
even if we constrain M = M we can dene
g~A (; H ) (1=N )(F (; M ) HM )
=
=
(1=N ) ln[exp(
F (; M ) + HM )]
"
p
(1=N ) ln (c= N )
= gA (; H ) + O( lnNN )
X
M 2A
#
exp( F (; M ) + HM )
where c in the 3rd line is a constant of order unity. Thus in the large N limit
L d(F (; M (H )) HM (H )) ! gA(; H )
(1.16)
So even a knowledge of F (; M (H )), determines the Gibbs free energy of the entire phase,
if we also know M (H ). A practical method that uses this to determine the phase coexistence
eld Hcoex (where this is unknown) is the double-tangent construction. This is described (in
the context of an o-lattice system) in appendix B.
As the temperature increases, the dominance of the energy over the entropy becomes weaker
and m becomes smaller. Finally, at c , the rst order phase transition disappears. The scaling
of FL (c ; M ) is somewhat unusual and is discussed in appendix A. It has the property that
P can (0)=P can(M ) = constant ( 0:04), independent of L, but the two-peak structure remains,
leading to large (almost extensive in the system size) uctuations in the order parameter M .
CHAPTER 1. INTRODUCTION
22
1.1.4 O-Lattice Systems
Let us now extend the discussion of phase behaviour from the Ising model to the more familiar
melting and boiling transitions. Once again we model the material under consideration as
a system of particles with some position-dependent potential energy acting between them,
though the potential will usually be appreciably more complicated than in the Ising case. A
clear dierence is that this time the position coordinates may take a continuous range of values;
such systems are known as o-lattice systems, as opposed to lattice-based systems like the
Ising model. The total number of microstates is therefore not denumerable. However the same
analysis carries over in slightly modied form: P can () and P can (E ) etc., become probability
densities, and we write the partition function (here at constant volume) as a congurational
integral:
Z (; V ) =
Z
V
exp( (E ())d
(1.17)
R
Where E () is the congurational particle-particle energy and the shorthand d refers to the
Nd-dimensional integral over the coordinates of all N particles in a d-dimensional space. The
canonical averages are also dened as integrals in the obvious way.
To examine the phase behaviour it proves best to analyse this model in an ensemble with V
allowed to vary controlled by an external pressure eld p, since this corresponds to real phase
coexistence where the pressure is the same in the two phases. Therefore the total energy of a
conguration is ETOT = E () + pV () where pV is the work that the system has to do against
the external pressure p to reach volume V . It then follows from 1.2 that the p.d.f. of and V
is
P can(; V ) = exp( pV )Zexp( E ())
p
with the partition function
Zp =
=
1
Z
Z
0
0
1
dV exp( pV )
Z
V
d exp( E ())
exp( pV )Z (; V )dV
where Z (; V ) is the constant ; V -partition function dened in equation 1.17. The logarithm
CHAPTER 1. INTRODUCTION
23
of Zp is related to a Gibbs free energy
G(; p) = 1 ln Zp
(1.18)
and as before we dene a related intensive free energy density, g(; p) = G(; p)=N , and a free
energy functional
F (; V ) = 1 ln Z (; V )
so that
exp[ G(; p)] =
X
V
exp[ pV
F (; V )]
and the free energy of a phase is
exp[ GA (; p)] =
X
V 2A
exp[ pV
F (; V )]
where A denes the set of volumes characteristic of a phase; and
F (; V ))
P can (; V ) = exp( pV ) exp(
Zp
(1.19)
We remark immediately that while F (; V ) will often have a double-well structure it will
not in general possess the symmetry of the Ising model's F (; M ), which will lead to phase
transitions occurring at p 6= 0, whereas all the Ising model's occur at H = 0.
To expand on this, let us begin by considering a typical solid-liquid-vapour phase diagram
of a continuous system, as shown in gure 1.8, and comment on it using the formalism we
set up in the previous section, comparing it with the Ising model's phase diagram (gure 1.2).
By examination of the energy functions, we identify the external eld energy terms HM
(Ising) and pV (continuous) and pair o the corresponding intensive variables H (Ising) with
p (continuous), and the extensive variables M (Ising) with V (continuous).
The analogy with the Ising model is clearest in the liquid-vapour part of the continuous
system's phase diagram, so let us ignore the solid phase for now. The p and H diagrams
both show a coexistence line which ends in a critical point; at high there are two phases
(liquid and vapour or the +M and M phases), while at < c there is only one. The V
and M lines both show a U-shaped coexistence region with its axis roughly along the
temperature axis (the convention of drawing the temperature axis vertical for the continuous
CHAPTER 1. INTRODUCTION
24
s+v
p
gas (g)
TrL
Solid (s)
CP
β
s+l
l
l+v
v
liquid
(l)
CP
s
TrP
s+g
g
vapour (v)
β
V
Figure 1.8. Schematic p and V phase diagrams. CP is the critical point, TrP the
triple point and TrL the triple line.
system and horizontal for the magnet obscures the similarity a little). Nevertheless there are
dierences in detail: the magnetic system has a clear symmetry about H = 0 in its phase
diagrams that the continuous system lacks. As we have said, this is a result of the fact that
there is a complete correspondence of microstates between the two phases of the magnet; for
each microstate with positive magnetisation there exists another, its `photographic negative',
with -ve magnetisation. In the uid there is no such symmetry between the liquid and vapour
phases.
The statistical mechanical description is also very similar. Once again, the requirement that
the response function @V=@p should be extensive for each phase, while still being related to the
uctuations of the order parameter, leads to the result that PLcan(p; V ) becomes sharply peaked
about its mode (or modes, very near coexistence), with a width such that V=V N 1=2 .
The free energy functional F (; V ) for the continuous system is qualitatively similar in shape
to F (; M ) for the magnet. At high temperatures entropy dominates and there is only one
uid state, a gas, for which the energy is high, because the particles are widely separated, but
so is the entropy, because each particle can explore the whole volume of the system. At low
temperatures the energy term becomes more important and FL (; V ) develops two area which
are locally convex, one centred on states with high volume (and thus high entropy but also
high energy) which form the low-density vapour phase and the other centred on states with low
volume (and thus low entropy but also low energy) which form the high-density liquid phase.
CHAPTER 1. INTRODUCTION
25
Nevertheless, the vapour-like states still have the lower F and thus, at p = 0, enormously higher
weight. However, the convexity of FL (; V ) means that by imposing a suitable nite pcoex it is
possible to produce a PLcan (; V ) that has two separated maxima such that
X
V 2A
PLcan (pcoex ; V ) =
X
V 2B
PLcan(pcoex ; V )
clearly implying
GA (pcoex ; V ) = GB (pcoex ; V )
which is just the same as was derived for the Ising model. The narrowness of the modes means
that analogues of equations 1.15 and 1.16 also apply. Equation 1.16 can be used to estimate
pcoex by the double-tangent construction even if only a part of F (; V ) is available; see appendix
B.
We digress briey to mention that there has been some controversy as to where the analogue
of the transition should be regarded as occurring in a nite-size system|whether it should be
when the two peaks of PLcan (V ) have equal weights, or when they have equal heights. The
controversy arises from the asymmetry between liquid and vapour, which causes the two peaks
of PLcan (V ) to have dierent shapes|indeed, because of this asymmetry, the phase transition
in this type of system is known an an asymmetric rst-order phase transition. Both criteria
are compatible with the expression just given for the transition point in the innite volume
limit: the values of the eld variables required to produce both equal-heights and equal-weights
approach the same limits. However, the dierence is important when trying to identify the
transition point of a nite system in a computer simulation, and both methods have been used:
the authors of [20] favoured the equal-weight criterion while those of [21] used equal height.
The problem is discussed in [17] and [22]. Recently it has been established [23] that, for latticebased systems, using equal weights will give estimators of the control parameters at coexistence
that have smaller discrepancies from the innite-volume limit. However, it is not clear that the
analysis applies to o-lattice systems.
As before, the region of low probability between the peaks is dominated by states exhibiting
phase coexistence. As for the Ising model, these states are unstable unless the order parameter
is constrained. Equation 1.12, and the appropriate analogue of equation 1.11, also apply.
Let us now consider the solid-uid phase transitions, which are a new feature of the continuous system's phase behaviour. In the solid phase the energy is near its minimum, since the
CHAPTER 1. INTRODUCTION
26
highly ordered structure maximises the number of close neighbours that a particle can have, but
the entropy is also low because each particle is held in the `cage' formed by its neighbours and
can move only in a small volume around its lattice site. As can be seen from the p T phase
diagram, if we begin from low temperature and pressure then at rst there is a solid-vapour
coexistence curve, which has a junction with the liquid-vapour coexistence curve at the triple
point (marked TrP in gure 1.8) where solid, liquid and vapour can all coexist (i.e. all have the
same specic Gibbs free energy). After this a solid-liquid coexistence curve continues upward.
It is generally called the solid-gas coexistence curve once the pressure is higher than pc , since
after this heating a dense uid at constant p will not produce another phase transition to a less
dense uid.
An obvious qualitative dierence between the solid-uid transitions and the liquid-vapour
transition is that the solid-uid transitions do not end in critical points (at least, no experimental
evidence for such behaviour has ever been found). This seems to be because the solid and uid
have qualitatively dierent structures: in the solid there is clear long-range crystalline order
and particles are localised on their lattice sites, while in the liquid and vapour there is no such
order and particles may wander throughout the volume of the system. It is hard to envisage
a mechanism which could allow one type of structure to merge continuously into the other, as
would have to happen at a critical point. It is interesting to note that the liquid and vapour
do have qualitatively identical structures in this sense, even though the dierences in bulk
properties such as density between the phases may be very large|far larger than the dierence
in the same property between the solid and the liquid.
Though the absence of a critical point means that the order parameter description of the
transition does not carry over in all its details from the uid case, the usefulness of the concept
means that one is often dened. Volume (or density) of the system will once again serve,
since the solid and uid always have dierent densities. Analysed like this, the solid-uid
phase transition looks very much like the liquid-vapour one|the probability PLcan(p; V ) at a
suitable pressure again has an (asymmetric) double peak structure, the transition occurs when
the specic Gibbs function g is the same for both solid and uid, and the region between the
maxima is dominated by mixed phases. There are also other possible order parameters that
are more closely linked with the obvious dierence in structure of the two phases, related to
the average number of nearest neighbours or to the structure factor of the system [24].
CHAPTER 1. INTRODUCTION
27
1.2 Calculation in Statistical Mechanical Problems
All exact science is dominated by the idea of approximation.
BERTRAND RUSSELL
The formalism of statistical mechanics that we have described provides in principle for the
solution of the problem we stated in section 1.1.1: that of nding which phase is stable at particular values of the control parameters, and in particular of nding where any phase transitions
occur. We need a model for the potential, of course, but having this we have expressions for the
partition function, canonical averages and the weights of phases in terms of this potential, and
these tell us which phase is favoured at particular values of the control parameters. Knowing
this we can in principle nd those values of the control parameters where both phases have
equal weights, and so can construct the phase diagram. However evaluation of these expressions, consisting as they do of sums over all the congurations of the system, is in practice
extremely dicult, and can only be carried out for a few particularly simple forms of E (),
whether we are dealing with a continuous or a discrete space of congurations. The subject
of this thesis is of course the use of computational methods in the evaluation of the necessary
expressions, and we shall give a basic overview of computer simulation in sections 1.2.2 and
1.2.3, as an introduction to chapter 2, where we review in depth the various ways that the
phase coexistence problem has been tackled computationally. But rst let us look at analytic
methods.
1.2.1 Analytic Methods
There do exist analytic methods for evaluating the partition function and canonical averages of
statistical mechanics: there are some `exact' results and some approximate methods of general
applicability. However, as we shall see, most of these approximate methods fail in the situation
that is of interest to us, viz. the calculation of the location of phase boundaries.
Non-interacting systems, i.e. systems in which Z can be expressed as a product of singleparticle partition functions, are generally soluble with ease, but they are not of much physical
interest. A number of exactly soluble systems which do contain particle-particle interactions
have been found. The 2-dimensional Ising model in zero eld is one example [11]; various
`embroidered' extensions of this can also be solved, see [12]. Some properties of the Potts
CHAPTER 1. INTRODUCTION
28
model, a generalisation of the Ising model, can also be found analytically [25, 26]. The other
exactly soluble models, for example the 6-Vertex model, tend not to provide much further insight
into `real' systems; for more details of their behaviour see [26] and [27]. The amount that is
known about these systems varies depending on the model; for example g(; 0) is known at
all temperatures for the Ising model but is only known for the Potts models at their phase
transition temperatures. All the exactly soluble models are lattice models having only one or
two spatial dimensions, and they are solved by Transfer Matrix techniques [26, 27], which are
not applicable to other systems. The diculty with exact solution is that we must calculate how
many congurations there are with a particular energy, density or magnetisation, but because
the the number of congurations is so vast, and the potential describing the interaction couples
together the degrees of freedom of the particles5 , this calculation quickly becomes a problem in
combinatorics which is in soluble analytically in only a few special cases.
Mean Field Theory is an attempt to deal with an interacting system by eectively reducing it
to a non-interacting one. We do this by considering a single particle and treating its interactions
with all the other particles in the system by averaging them out so that they form a continuous
`background' eld. The `background' eld depends on bulk properties that we do not know,
but we can solve the problem by enforcing self-consistency: we calculate the properties of the
single particle that are produced by a particular `background' eld, and then demand that
these properties would, if extended to all the other particles, be such that they would produce
the desired background eld. In [6, chapter 5] and [3, chapter 20] the 2d Ising model with
H = 0 is treated in this way. The successes and deciencies of the technique are immediately
apparent: In one dimension it is qualitatively wrong, while in 2d and 3d it predicts qualitatively
the right behaviour but its answers are wrong in detail (the critical temperature is too high).
The reason for this lies in the fact that the eects of uctuations and cooperative phenomena
are ignored; the results of the technique are therefore best at low or high temperatures and
worst at phase transitions, especially continuous ones. Moreover, there is no way of extending
the theory systematically to obtain successively better results (although there are heuristic
criteria applicable in particular circumstances). It is interesting that the performance of MeanField theory gets better as the spatial dimensionality d of the problem increases; in four or
more dimensions it is essentially exact for the Ising model. This occurs because the number
All physically interesting potentials have at least pairwise interactions, (E =
contain three-body, four-body, etc terms.
5
P
i<j Eij ) and may also
CHAPTER 1. INTRODUCTION
29
of `nearest neighbours' of any one particle in particular increases with d, and so the size of
uctuations as a fraction of the mean eld diminishes.
Series expansions [16, chapter 4], [28] result from perturbation techniques; the partition
function is expanded as a series in some convenient parameter about a state where it is known
exactly. For example we can expand about T = 0 in T , (low temperature series), or about
= 0 in (high temperature series), or about = 0 in . These can equally be regarded as
expansions about the fully ordered state(s), or the entirely random states. Series expansions
for simple energy functions like those of the Ising or Potts models have (with great eort)
been continued to upwards of 40 terms [29]. With careful analysis (and using techniques which
were developed using insight obtained from other methods) they can give useful results even
around phase transitions, but convergence tends to be slow, and results are not better than
those of computer simulation even for the longest series. This is unsurprising, because the
expansion is about the temperatures/densities where a particular phase is at its most stable,
which is inevitably distant from the phase transition point, where it is becoming unstable. The
expansion parameter thus cannot be `small' and many terms will be required. For complex
energy functions the expansion can only generally be carried to a few terms and results are
correspondingly poorer still.
The renormalisation group (RG) ([9, volume 6], [30, 31], and [6, chapter 5] for an introductory
reference) is a method of evaluating the partition function in stages, summing over some of
the degrees of freedom at each stage and then dening a `renormalised Hamiltonian' on the
remaining ones that in some sense `looks like' the one that existed at the previous stage.
This denes a mapping which, with appropriate approximations, leads to results for the free
energy and canonical averages (see [30]). The process is in principle exact, but quickly becomes
extremely complicated to carry out. It was developed to deal with the a critical region, where
the self-similar physical structure of the system when viewed on dierent length scales is echoed
by the similar structures of the sequence of renormalised Hamiltonians. We note at this point
that analytic RG calculation can be supplemented by a powerful numerical technique called the
Monte Carlo Renormalisation Group [32].
Fluid systems can often be usefully be treated by methods which aim at producing an
equation of state, the equation relating p, and . One approach is approximate solution
of the Ornstein-Zernike equation, which links the potential function to the radial distribution
function g12 (r) (dened in section 4.3.5). From g12 (r) an equation of state may be derived. For
CHAPTER 1. INTRODUCTION
30
a general account see [33]; for a particular method of approximate solution, the Percus-Yevick
closure, see [34]. Results are often good for simple uids like the hard-sphere uid, but the
best schemes are not systematically improved but are the result of heuristic combination of
results obtained by using dierent approximations. Moreover, g12 (r) is very dierent in the
liquid and vapour phases, and so an approximation scheme that describes one phase well is not
good for the others. The consequence is that no single model predicts phase transitions; they
must be treated by using a dierent model for each phase and estimating the free energy by
integrating the equation of state (assuming we have a reference free energy for the liquid state
from some other source). Another approximation scheme is the method of cluster expansions
[35], which relates the potential function to functions known as f-functions that describe the
interactions amongst successively larger groups of particles. The equation of state can then be
expanded in terms of integrals involving f-functions. However once again the method becomes
very complicated when taken beyond the lowest orders.
1.2.2 Monte-Carlo Simulation
Given that the partition function and canonical averages cannot be calculated analytically for
the vast majority of systems of interest, can they be calculated, or at least estimated, by
computer? It is indeed the case that they can, and there are at least two broad approaches to
the problem. In both cases a model of a system of particles interacting with the potential of
interest is set up in the computer. Clearly the number of particles N that is found in macroscopic
samples of a material, O(1023 ), is out of the reach of simulation, but in fact it is frequently found
that the macroscopic properties are observed at least qualitatively, and often quantitatively as
well, for N = O(1000) and sometimes even for N = O(100) or less. Moreover techniques for the
extrapolation of results from nite samples to the thermodynamic limit are now well developed
(see [17] and appendix A), even where the nite-size of the computer simulation has its greatest
eect, that is, near continuous phase transitions.
There are two broad approaches to computer simulation in condensed matter physics: molecular dynamics (MD) and Monte-Carlo (MC) simulation. In molecular dynamics simulation, the
force produced on each particle by the potential is calculated and the particles are moved in
accordance with the Newtonian equations of motion. However doing this immediately moves us
away from the attractive simplicity of the absence of kinetics in the statistical mechanical picture of many-body systems. This picture is the foundation of the Monte-Carlo method, which
CHAPTER 1. INTRODUCTION
31
will be our concern throughout the thesis. An introductory reference to molecular dynamics,
which we shall not discuss any more, is [36]; more detail can be found in [33, 37, 38]. There
are also various methods that attempt to combine the best features of the two, for example
`hybrid Monte-Carlo' [39, 40] and constant-temperature molecular dynamics, the `Hoover-Nose
Thermostat' [41].
The MC method is basically a technique for evaluating multidimensional integrals; in this
case the integrals (or sums) that dene the partition function and canonical averages. There
are a great many general references to the method; see for example [42, 43, 44]. A specic
reference to the problem of simulating uid systems is [45], and phase transitions are discussed
in [46] (mainly for lattice models) and [47, 48] (mainly for o-lattice systems).
For a system with a discrete conguration space, one might at rst imagine that one could
evaluate the partition function directly by simply summing exp( E ()) over all the congurations of the system (we neglect eld terms for now; they do not aect the argument). However,
because the number of possible congurations increases exponentially with increasing system
size, it passes out of the range of even the fastest computer quite soon after it passes out of the
range of calculation by hand. The same problem prevents the use of normal numeric integraR
tion routines to nd exp( E ()) in a system with a continuous conguration space. The
dimensionality of the problem is Nd, where N is the number of particles and d is the dimensionality of space. A numerical integration routine would require the evaluation of the integrand
exp( E ()) at a `grid' of points nely spaced enough for it to be smooth [49]. However if the
grid is such that there are mp points along each coordinate axis, then the integrand must be
evaluated mNd
p times, which once again grows exponentially with N . For example, in 3d with
10 particles and a grid with just 10 points along each axis, 1030 evaluations would be required;
this is already out of the range of computability even though the choice of 10 points along each
axis would be far too small if the interparticle potential were realistic and the system were
fairly dense. A ner grid would be required because the integrand would be zero almost all
the time, whenever there was at least one `overlap' between the repulsive `hard cores' of the
particles (corresponding to the strong repulsive forces produced by inner-shell electron in real
atoms and molecules), and for those few points on the grid where there were no overlaps the
integrand would vary too sharply for a numeric integration routine to give an accurate picture
of its behaviour.
Obviously, then, we cannot exhaustively denumerate the state space, so to proceed we must
CHAPTER 1. INTRODUCTION
32
take a sample of congurations and use them to estimate the quantities of interest. This
sample, while it may well with modern computing power contain millions of congurations, will
nevertheless include only a tiny fraction of the congurations that exist.
The simplest way to produce the sample would be to generate the congurations at random,
so that all microstates are generated with equal probability. This is known as `Simple Sampling'.
For example, we might make a conguration of particles by generating the x,y and z coordinates
of each particle with a uniform probability on [0 : : : L] where L is the length of the side of the
box containing the particles; or we might make a conguration of Ising spins by chosing each
spin randomly in the +1 or 1 orientation. Suppose we generate Nc congurations. Then
Z e=:b: Z~ = TOT
N
c
X
Nc i=1 exp( E (i ))
Nc O( ) exp( E ( ))
i
i
< O >e=:b: O~ = i=1
PN
c exp( E ( ))
i
i=1
P
(1.20)
where `e=:b:' means that the right hand side is an estimator of the left. It is indeed true
that < Z~ >= Z and < O~ >=< O >. However they are not good estimators for most
cases, because Nc must be extremely large|O(
TOT )|before it is at all likely that Z~ Z
and O~ < O >. The problem is the same one that was described above when talking about
numerical integration|the vast majority of the states in the sample are likely to have a very high
energy (because they contain at least one overlap) and so a very small value of exp[ E ()]. As
we discussed in section 1.1.3, the expressions for Z and the canonical averages are dominated by
these high energy congurations only at very high temperatures; at more normal temperatures
the congurations that are important have lower energy but because there are so few of them
compared to the high energy ones they are generated by the simple sampling technique only at
extremely lengthy intervals (and intervals whose length increases exponentially with Ld).
We see that what is required is a biased sampling scheme that generates much more frequently
those congurations that dominate Z and the canonical averages. This is known as importance
sampling. Suppose we have an algorithm that generates microstates such that state i appears
with probability Pi = P (i ). Then we call the vector Pi i = 1 TOT the sampled distribution.
(We shall employ a vector notation P for fP ()g here and later (especially in chapter 3),
for the set of macrostate probabilities).
CHAPTER 1. INTRODUCTION
33
An obvious choice of sampled distribution that satises our requirements for most statistical mechanical problems (with the major exception of the phase coexistence problem) is the
Boltzmann distribution itself: we call a simulation that samples this distribution a Boltzmann
sampling simulation. However it is not at rst obvious how to do this because we do not know
the partition function Z . The problem is the same for any other choice of sampled distribution;
we will in general have
P () = Y ()=Z
where the function Y ( exp( E ) for Boltzmann sampling) is easily specied calculated for
P
any conguration, but its normalisation Z = fg Y () is unknown and hard to calculate
because it involves a sum over all congurations. The known function Y to which the unknown
probability distribution is proportional is called the measure of the probability distribution, an
expression coming from the statistics literature [50, 51].
A solution, which notwithstanding its age is still the basis of most MC work done in condensed
matter physics, is the Metropolis algorithm [52]. The central idea of this method is to generate
a sequence of microstates by evolving a new trial microstate j from the existing one by making
some (usually small) random change to it. Now we arrange, in a way described below, to move
to this new state with a probability P (i ! j ) = ij . If the transition is accepted, i is
replaced by j which is then itself used as the source of a new trial microstate. Otherwise we
try again, generating another trial microstate from i .
P
ij is called a Stochastic Matrix, meaning that it has the property that j ij = 1. Its size
is TOT TOT . Note that ij depends only on the present state i ( i ) and the trial state
j , not on any other state that the chain has passed through in getting to i. This property is
called the Markov property and the sequence of congurations produced is called a Markov
chain. ij can be thought of as describing the time evolution of the probability distribution of
the microstates. Suppose we have Pi (0) at time t = 0 (if we know exactly that we are in state
i0 at this time then Pi (0) = (i i0 )). Then at t = 1 we have
Pi (1) =
X
j
Pj (0)ji
(1.21)
Of particular interest is the (left) eigenvector of ij with eigenvalue 1, which we write simply
CHAPTER 1. INTRODUCTION
34
as Pi . It is the vector of equilibrium microstate probabilities:
Pi =
X
j
Pj ji
(1.22)
P represents what is called the stationary probability distribution of the chain|the probability distribution of microstates that is invariant under the action of , and so remains unchanged
once it is reached. An extremely important property of a Markov chain is that it can be shown
[50] that the sampled distribution converges to P as n ! 1 for any choice of initial state (i.e.
any Pi (0)) as long as is such that the chain is ergodic, meaning that any state can be reached
from any other in nite time. In many situations the convergence is rapid. This means that
we can use the Markov chain with transition matrix as a way of producing microstates with
probability distribution P . We must perform `equilibration', discarding the early congurations, which are unlikely to be typical of equilibrium, while the sampled distribution converges6
to P ; then after that congurations will indeed be drawn with probability P .
Now we must consider how to choose to produce the desired probability distribution P .
There are many possible solutions because there are 2TOT components in and only TOT
constraints coming from the components of P . We normally choose to observe detailed balance,
which means taking
Pi ij = Pj ji
with
8
>
<
R
Pi < Pj
ij = > ij
: R P =P
ij j i otherwise
where Rij is an arbitrary symmetric matrix; its signicance is discussed below. That this
choice satises equation 1.22 can be easily veried by substituting for ji . The power of the
method comes from the fact that the components of P (the equilibrium probabilities) enter
only as ratios, so that we have overcome the problem of the unknown partition function|it
simply cancels out, and we need only the measure of the distribution: Pj =Pi = Yj =Yi . For
example, to sample from the the Boltzmann distribution we choose:
6 It is not possible except by observation to say when equilibration has occurred; in practice this is often a
problem where equilibration is slow, as occurs near phase transitions
CHAPTER 1. INTRODUCTION
8
>
<
R
Ej < Ei
ij = > ij
: R exp[ (E
ij
j Ej )] otherwise
35
(1.23)
The matrix Rij describes which other microstates are accessible from a given microstate, so
it is determined by the particular choice of algorithm for generating the trial move. Another
way of putting this is to say that R determines the `pseudodynamics' of the simulation (as
we have seen, the physical dynamics of the system are not a part of a MC simulation). R is
normally extremely sparse, because we choose the new conguration by making only a small
modication to the present one: for example the single-spin-ip Metropolis algorithm, where
microstates i and j dier only in the movement of a single particle or the ipping of a single
spin. Most microstates are therefore inaccessible in one move. We are forced to make this
choice because, if we allow a trial transition to any other conguration, we shall once again
be sampling the states of highest entropy, so it is almost certain that the energy of the trial
conguration will be very much higher than that of the starting one. Therefore, the acceptance
ratio of Monte-Carlo moves ra becomes exponentially small. We discuss the eect that this has
below and in section 1.2.3.
We implement the algorithm in practice in two stages: rst we select a trial move from i
to j by a process for which Rij is symmetric (such as a ip of a random spin or a random
small displacement of a randomly chosen particle), then we evaluate = Yj =Yi . If is greater
than one we accept the move; if it is less than one we accept it with probability . This is
normally done by generating a pseudorandom variate X [49, chapter 7] on the interval [0 : : : 1]
and accepting the move if X < .
Let us rst consider a Boltzmann sampling simulation. Canonical averages are obtained very
simply in this case:
Nc
X
< O >e=:b: O = (1=Nc) Oi
i=1
Oi is the result of applying the operator O to the ith microstate of the Nc generated (by this we
mean that we generate Nc trial updates of the Markov chain; not all of them will be accepted,
so sometimes the same microstate will reappear several times in succession). In practice, it is
more ecient, particularly for a long chain, to store a histogram fC g of the number of times
CHAPTER 1. INTRODUCTION
36
the chain visits each of the j = 1 : : : Nm macrostates of every operator O of interest. Then
< O >e=:b: O = (1=Nc )
N
m
X
j =1
Cj Oj
Because the MC method uses random numbers to generate a sample of the congurations of the
simulated system, the estimates of thermodynamic quantities that it produces are associated
with a statistical error. We consider this error in detail in appendix C. Here we conne ourselves
to quoting some of the important results: full derivations of them can be found in this appendix
and [53].
We quantify the statistical error in our estimate using the Standard Error of the Mean
(O)2 = (O < O >)2
If all congurations were uncorrelated, this would be simply related to the variance of O
(var(O) =< O2 > < O >2 ) thus
< (O)2 >uncorr = var(O)=Nc
(1.24)
However, because the MC method generates a new conguration by making a small change
to the existing one, adjacent congurations in the Markov chain are in practice always highly
correlated. The consequence of this is that the standard error of the mean is larger than
equation 1.24 suggests; we have
< (O)2 >corr (1=Nc)var(O)(1 + 2O )
(1.25)
where O is the correlation time of the observable O. This can be measured by expressing it in
terms of correlation functions i . We consider O mainly in section 3.4.2, where we shall use
it as an analytic prediction of the variance of estimators from various sampled distributions.
However to measure the standard error of O it is easier use a direct method. To do this we
block the congurations into m = 1 : : : Nb blocks (O(10) is enough) and estimate O on each
block [45, chapter 6]. Then we measure the mean and variance of the blocked means fOm g,
and use the simple formula
< (O)2 >= var(O )=Nb
CHAPTER 1. INTRODUCTION
37
since the blocks should be long enough for the block means to be uncorrelated (if they are not
then Nc is not large enough for good results anyway). A variant of this is to dene Jackknife
estimators OJm on all the blocks of congurations except the mth, and then to nd the mean
and variance of these (see appendix D).
We should note that, whatever algorithm we are working with, we must expect to have to update all the particles (or at least a constant fraction of them) to get uncorrelated congurations.
This implies that the best we can do is have O =t Ld.
It has been the norm in MC simulation to choose the sampled distribution to be the Boltzmann distribution, as described above. However other sampled distributions can be used, and
may in fact be superior, particularly for problems involving phase transitions (for which, as we
have mentioned, the Boltzmann distribution does not perform well; in terms of the ideas of
correlation times that we have just introduced we would say that near phase transitions O can
be very large indeed). The investigation of alternative (non-Boltzmann) sampled distributions
that are better matched to this problem has been the key concern in this thesis. In section
1.2.3 below we describe why Boltzmann sampling gets into trouble; chapters 3 and 4 contain
results of investigations of various non-Boltzmann sampled distributions.
1.2.3 Monte-Carlo Simulation at Phase Transitions
Let us consider the ways that statistical mechanics suggests we could try to nd the values of
the control parameters and H (or p) that produce a phase transition in a particular system.
Section 1.1.3 suggests two ways in which we could try to do this. We shall discuss each in turn
and explain why conventional MC methods encounter diculties in each case. It was shown in
section 1.1.3 that the phase behaviour of a system is reected in the probability distribution of
the order parameter, with phase coexistence occurring when the two phases have equal weight.
So an obvious approach would be to simulate in an ensemble with a variable order parameter
that embraces the two phases, and measure its probability distribution directly, eventually
nding the values of the control parameters where the two phases have equal weight. This
method can be considered as measuring the free energy dierence between the two phases; we
saw in equation 1.14 that PAcan =PBcan = exp[ (GA GB )], which implied that at coexistence
gA = gB = g + O(1=Ld). Note that it is not necessary to know the absolute free energies GA
and GB themselves.
The other method would be simply to measure the absolute free energies GA and GB of
CHAPTER 1. INTRODUCTION
38
the two phases separately, and then compare them. This is particularly attractive if there is
a diculty in crossing the phase boundary, as happens for example in the case of solid-uid
transitions: it requires a very long time for the solid to crystallise out of the uid and any
crystal formed is likely to contain defects and grain boundaries. We shall show below how the
absolute free energy of a phase can also be expressed as a canonical average. As a variation on
this, we remark that equation 1.16 shows that it is not necessary to know F (; V ) (to use the
o-lattice example) for all V 2 A to estimate GA (; p), as long as V (p) is known. As we have
already commented, this is the basis of the double-tangent construction which is described in
appendix B. However, we will more often use the calculation of G as a canonical average in
this thesis.
Free Energy Dierences
Let us examine the rst method rst, in the context of measurement of P can (; M ) by Boltzmann sampling for the 2d Ising model. Suppose we have H = 0 and > c so we are actually at
the phase coexistence and should nd that the two sides of the distribution have equal weight,
that is, that there is no free energy dierence between the two phases. However, let us imagine
that we do not know that H = 0 is the phase coexistence eld, only that it is a reasonably close
approximation (so that P can (; M ) does have two maxima), which we are trying to rene.
The diagrams in gure 1.9 illustrate the problems faced by Boltzmann sampling Monte-Carlo.
All show the underlying distribution PLcan (; M ) (it has roughly this shape for all c )
sampled by the simulation, and diagrams A2 and B2 also show the function MPLcan(; M )
which gives the weight with which each part of the macrostate space contributes to the canonical
average < M >. We also show some possible data histograms of visited-states produced after
a short run ((A) gures) and after a long run ((B) gures). These histograms would give the
estimate for the probability of each phase. The accuracy of our assessment of whether we are at
phase coexistence and of any canonical averages obtained is clearly limited by rw , the average
time required to travel between the peaks. Thus, we see that after the short simulation we have
not visited both sides of the distribution. Only after a long run have enough `tunnelling' events
between the two sides of the distribution occurred to give us a good idea of their relative weight
and to outline the shape of the whole of P can(; M ). The accuracy obtained is limited not by
the total number of congurations generated, which could be millions, but by the number of
tunnelling events. The presence of such `bottlenecks' in conguration space can cause the results
CHAPTER 1. INTRODUCTION
39
A1
A2
M
can
P (M)
M
can
MP (M)
M
can
P
Estimation of <M> from short run
(M) and Histogram, short run
B1
B2
M
can
P (M)
can
MP (M)
M
M
can
P
(M) and Histogram, long run
Estimation of <M> from long run
Figure 1.9. P can(; M ); MP can(; M ) and possible data histograms for an Ising model at
c .
of even a very long simulation to be very poor. In such a case it is said that the simulation
suers from ergodic problems. Note however that the shape of the distribution within each peak
is obtained quite rapidly, so an an estimate of < jM j >A within phase A from the short run
would be quite accurate, even though the estimate of < M >, which depends on both phases,
is very poor.
We understand, then, that adequate sampling will only be obtained with a run much longer
than rw ; but how long is this likely to be? The part of the sampled distribution that has
the greatest eect on rw is of course the region of low probability between the peaks, through
which the simulation has to pass to get from one peak to the other. At criticality, rw L4,
because the relative heights of the centre and peaks of the p.d.f do not change with L. However
at > c we are in the region of rst-order phase transitions, and here, as we described in
section 1.1.3, to pass through the region around M = 0 we must create an interface between
the two phases, with a free energy cost / Ld 1fs . Thus with Boltzmann sampling we must
wait on average for a time
CHAPTER 1. INTRODUCTION
40
M / exp(fs Ld 1)
(1.26)
before a uctuation in the energy that is large enough to do this occurs. Unless L is very small
or is very close to c , this exponential slowing down is so severe that in any run of practicable
length the simulation will remain eectively trapped in one phase and never tunnel to the other.
We took as a premise when starting this discussion that the eld H was close enough to zero
that PLcan (; M ) has in reality roughly equal weights in the two phases (equation 1.15 shows
how sensitive PLcan(; M ) is to the applied eld; for there to be roughly equal weights implies
H = 0 O(L d) for the Ising model). In fact, of course, for most systems, particularly o-lattice
models, Hcoex is not determined by symmetry and determining it is our major concern. But the
analysis we have just given shows that with Boltzmann sampling the estimate of PLcan (; M )
obtained at the coexistence point is indistinguishable from points in the single-phase region.
We can do no more than put a wide bracket on Hcoex: Hcoex is certainly less than a pressure
Hh that drives a simulation started in the high-M phase into the low-M phase, where it is
then observed to stay, but it is certainly more than Hl that allows a simulation started in the
low-M phase to pass into the high-M phase where it then stays. Hl and Hh must be at least
far enough from Hcoex that FL (; M ) Hl M and FL (; M ) Hl M are convex everywhere.
We should note that the shape of PLcan (; M ) causes problems only because the the structure
of the matrix Rij that determines the pseudodynamics is such that the simulation has to pass
through the region of low probability between the peaks in order to get from one peak to
the other. In this respect, the pseudodynamics of the Metropolis MC simulation resemble the
dynamics of a real system, which must also evolve by small steps, and the consequences are
similar [53]: the ergodic problems of the simulation correspond to the tendency of real systems
to exhibit metastability (section 1.1.2). Rij is, of course, under our control, but nave attempts
to improve the algorithm do not succeed: as we explained in section 1.2.2, generating a new
conguration by making a large random change to the existing one produces an exponentially
small ra |easily small enough to negate the eect of a larger average change in magnetisation.
For lattice models only, the tunnelling problems can in fact be overcome: there is a class of
algorithms called cluster algorithms [55, 56] that are able to generate new congurations with
a large M without necessarily incurring a large positive E . Hybrid MC [39, 40] is able
to some extent to do the same thing. Whilst these methods are extremely eective, we have
CHAPTER 1. INTRODUCTION
41
taken the simple Metropolis algorithm as a `given' in this thesis and concentrated instead on
improving performance by the use of non-Boltzmann sampled distributions. With a cluster
algorithm, Boltzmann sampling is an excellent strategy|the region between the peaks does
not contribute much to the weight of a phase or to < M >, and time spent there is time
that cannot be devoted to sampling the peaks where most of the weight lies. However with
the Metropolis algorithm a very substantial improvement is obtainable by chosing a dierent
sampled distribution that puts more weight between the peaks and so reduces rw , even at the
cost of a reduction in the sharpness of the denition of the peaks themselves (see section 4.3
and much of chapter 3). In fact, even after choosing a better sampled distribution, rw remains
large, because of the width of PLcan(; M ); the two important regions of macrostate space, the
peaks at M , are separated by a distance that grows like Ld. In the case where the sampled
distribution is roughly at between the peaks and the same height as them, we still require on
average the `random walk time'
2
1
2
M
rw = r M
(1.27)
a
to travel between the peaks, where the average change of magnetisation is M . As we have
seen, increasing M fails to have the desired eect because of a dramatic fall in ra ; it is unusual
in MC simulation to settle for an acceptance ratio of less than 1=3. Therefore rw L2d L4.
How this relates to O is discussed in section 3.4.2.
Measuring Absolute Free Energies
Faced with the diculties described above, we may try to avoid the interface region entirely
and measure the free energy of each phase separately. In this case we require absolute free
energies, so that the two phases can be compared. These can in fact be derived from averages
over the Boltzmann distribution of operators which are exponentials: for the Ising model with
H = 0, we nd
< exp(E ) >= (1=Z )
so that
X
1d
Z = 2L = < exp(E ) >
2
CHAPTER 1. INTRODUCTION
and
42
G( ) = 1 (L2 ln 2 ln < exp(E ) >)
This is in fact G( ) of the entire system, not just of a single phase (because H = 0 is the coexistence line), but nevertheless serves to illustrate the principle; and if we restricted the algorithm
to generate only congurations with M 0 (say), the result would indeed be the free energy of
that phase. However, attempting to implement this method with Boltzmann sampling is again
very unsatisfactory. In this case, the problem is not that the sampled distribution produces
ergodic problems but simply that the average to be measured and the sampled distribution
put their weight in dierent macrostates. Consider gure 1.10, which shows schematically the
situation in the case we have just described, for the sampling of energy macrostates in the 2d
Ising model:
Estimation of < O 1 (E) >
O1 (E)
P
can
(E)
P
can
(E)O1 (E)
E
O2 (E)
P
can
(E) and Histogram
E
P
can
(E)
Estimation of < O 2 (E) >
P
can
(E)O2 (E)
E
Figure 1.10. A schematic diagram of a Boltzmann sampling distribution over energy and a
suitable (O1 ) and an unsuitable (O2 ) operator to estimate from it.
The leftmost diagram shows a typical sampled distribution for a Boltzmann sampling simulation and a histogram that might be generated from it. This distribution is well suited to
estimating < O1 > where O1 (E ) and O1 (E )P can (E ) are shown in the upper right illustration.
However P can (E ) is unsuited to estimating < O2 > (lower right illustration) because O2 (E )
CHAPTER 1. INTRODUCTION
43
increases so fast with E that P can (E )O2 (E ) has a lot of weight in one of the tails of P can (E ).
exp(E ) is, of course, an operator of this kind.
The error in the estimate of P can (E ) is larger in the tails because there are fewer counts,
and the eect of this error on < O2 > is magnied by multiplying by O2 . Finally it normally
happens, as shown in the diagram, that P can (E ) becomes so small for states far into the tail
that no counts are recorded there, and so no contribution is made to < O2 > even though
P can (E )O2 (E ) is still large. This is liable to happen whenever P can(E ) 1=Nc, which soon
arises since P can(E ) exp( cLd) in the tail. Thus MC sampling cannot be be used to evaluate
averages like < O2 > if the largest values of P can (E )O2 (E ) are produced by the multiplication
of a very small value of P can(E ) by a very large value of O2 (E ).
This problem can also be ameliorated by using a non-Boltzmann sampled distribution, one
that extends over the region where O2 exp( E ) is large. We explain how an estimate of the
canonical average < O2 > can be extracted from this in section 3.1. Indeed, the Ising problem
that we have just described will be the test-bed for an investigation of this `single phase' method
in chapter 3. In the Ising case we shall look at O2 (E ) exp(E ); for an o-lattice model, for
example a uid in the NpT -ensemble, we would need to evaluate < exp[(p p^)V ] >, where p^
is small.
1.2.4 Discussion
We have introduced the theory of Statistical Mechanics and a method of computer simulation,
the Monte-Carlo method, that is naturally related to it. We have described the problem of
nding the location of phase transitions, and how it relates to the concept of free energy, and we
have described two approaches by which the MC method could be used to tackle this problem.
We have also explained why MC simulation in its most easily-applied (Boltzmann Sampling)
form fails for both of these approaches. Our task in this thesis will be to investigate ways in
which the diculties may be overcome by the use of of MC simulations which sample from
distributions other than the Boltzmann distribution. We shall look at methods for generating
and applying these distributions in chapter 3, and shall produce some new results for the
behaviour of the p.d.f. of its magnetisation. Then in chapter 4 we shall apply the method
to investigate a system of topical interest, the square-well solid. But rst we shall review the
extremely extensive literature on the problem of free energy measurement.
Chapter 2
Review
And what there is to conquer
By strength and submission, has already been discovered
Once or twice, or several times.
FROM Four Quartets
T S ELIOT
As we saw in chapter 1, the problems of Monte-Carlo simulation of phase transitions, particularly rst-order phase transitions, centre around the diculty of measuring the appropriate
free energy or free energy dierence. If we keep the order parameter of the transition constant,
then we are faced with measurement of a Helmholtz free energy, for which a Boltzmann sampling algorithm is not suitable because it does not sample the high-energy congurations. If we
allow the order parameter to vary, then there is a large free energy barrier separating the two
phases. Because the simulation's pseudodynamics constrains it to move in short steps through
its conguration space, it takes an exponentially long time to cross this barrier.
A large amount of work has already been done on solving these problems. Progress up to
1986 is described in the review articles by Binder [46] and Frenkel [47], while [48] covers some
developments, especially for dense liquids, up to 1989. A more recent short review can be found
in [57]. Two recent methods, that have been the basis of the work done in this thesis, are the
multicanonical ensemble [58] (or [59] for a review) and the expanded ensemble [60].
We shall now give a brief description of the important methods, going over again some of
the ground covered in the reviews [46, 59, 47, 48], but also bringing in the newer methods. We
44
CHAPTER 2. REVIEW
45
avoid a detailed description of the multicanonical ensemble at this stage; such a description can
be found in chapter 3. We shall divide the methods described into three categories as follows:
1. Methods which nd a free energy by expressing it in terms of some other quantity which
is more readily evaluated in Boltzmann sampling Monte-Carlo simulation. We shall call
these integration-perturbation methods. They are
(a) Thermodynamic Integration
(b) Multistage Sampling
(c) Bennett's acceptance ratio method.
(d) Mon's Method
(e) Widom's particle insertion method
We shall also include in this section a description of a relevant related technique:
(g) Histogram methods
2. Methods which sample from a distribution other than the canonical Boltzmann distribution with a constant number of particles. We shall call these non-Boltzmann methods.
They are
(a) Umbrella Sampling
(b) The `Multicanonical Ensemble' and its variants. Originally due to Berg and Neuhaus,
we describe work by those authors and others (Lee, Rummukainen, Celik, Hansmann
and others).
(c) Expanded Ensemble due to Lyubartsev et al., called `Simulated Tempering' by Marinari and Parisi. We also describe related methods by Geyer and Neal.
(d) Valleau's Density-Scaling
(e) the Dynamical Ensemble
(f) Grand Canonical Monte-Carlo
(g) the Gibbs Ensemble
3. Others. Mostly these are coincidence counting methods, which try to measure the probability of an individual microstate. They are
CHAPTER 2. REVIEW
46
(a) Coincidence Counting (Ma's Method)
(b) Local States (Meirovich, Schlijper)
(c) Rickman and Philpot's function-tting method.
(d) The partitioning method of Bhanot et al.
We shall describe each method individually and discuss its advantages and disadvantages,
before discussing and comparing them with one another.
2.1 Integration-Perturbation Methods
2.1.1 Thermodynamic Integration
Thermodynamic integration (TI) may perhaps be considered the standard method; certainly it
is the easiest way to perform free energy calculations because a Boltzmann sampling program
that may already exist can often be used with little or no modication. A review of some of
the many applications can be found in [47].
It has been the norm to use constant-NV T simulations in these calculations; all the examples
given here assume that this is the case. Applied in this fashion (and assuming that the order
parameter is constant in this ensemble), the method does not allow the direct determination
of the whole probability density function (p.d.f.) of the order parameter (V say), but rather
measures F (V ) for a particular V . However there is no reason why it cannot be implemented
using constant-N; p; T simulations, in which case equations analogous to those below (equation
2.1 etc.) lead to G(p). In whatever form, the method relies on the fact that the derivatives of
free energies may often be related to canonical averages that are easily measured by Boltzmann
sampling, for example
@ (F ) =< E >
@ V
which implies
F ( ) 0 F (0 ) =
Z
0
< E > d
(2.1)
(2.2)
To use this equation, we measure < E > by simulation at constant V for a series of
-values connecting and 0 , closely spaced enough that the shape of < E ( ) > is welldetermined. Then the integration is performed numerically, typically by using a 5- to 10-point
CHAPTER 2. REVIEW
47
Gauss-Legendre quadrature [61]. It is important that the path of integration should not pass
through a rst-order phase transition; if it does, then < E ( ) > will itself be poorly determined
because of metastability problems, and it will vary so fast with that the determination of its
shape will be dicult. As a result, the accuracy of the numerical integration will be drastically
reduced. and 0 are known, so F ( ) is determined if we know F (0 ). This state at 0 must
therefore be chosen to be a reference state of known free energy. It usually corresponds to
either a very high or very low temperature. In the high temperature limit, at 0 = 0, we have
Z = exp( F ) = V N for an N -particle uid with a soft potential. If the system is a uid
with a hard core in its potential, then the system reduces to the hard sphere uid, a system
for which the free energy is known with good accuracy from analytic calculations [62]. At
very low temperatures (high 0 ), a solid with a smooth potential (to take a dierent example)
approaches a harmonic solid. Examples of the use of equation 2.2 in calculation can be found
in [63, 64, 65], which use a low- reference state, and [66, 67, 68], which use a high- one.
Another equation that is often used is
@F = p
@V T
which leads to
F (V ) F (V0 ) =
Z
V
V0
pdV
(2.3)
and we measure p using (see [45])
p = NkB T + < P >
where
P = (1=3)
XX
i j<i
(2.4)
rij @E@r(rij )
ij
This time the usual reference state is one of high V , for which the virial expansion may
be used to obtain p(V0 ) and thus F (V0 ). Near the reference state equation 2.3 may be badly
behaved; better numerical stability is obtained by using
F () F (0 ) =
where is the density.
Z
pd
0
2
(2.5)
CHAPTER 2. REVIEW
48
TI based on equation 2.5 has been used so often in the literature that we shall not attempt
to give an exhaustive list; two recent examples are [69, 70].
Both the above are examples of what might be called `natural' TI; the integration path is of
the type that might be followed in an experiment on a real system. However, a Monte Carlo
(MC) simulation is more exible than a laboratory calorimetry experiment, since the reference
system may dier from the system of interest in the form of its congurational energy as well
as in its control parameters, and indeed need not correspond to any `real' system at all. It
is in fact common to use `articial' methods where we take advantage of this greater freedom
to change the details of interaction between the particles. In this case we usually introduce a
parameter which controls the change of the interaction, so that by varying it we can smoothly
transform the system under investigation into the reference system for which the free energy is
known exactly. If we write E = E (; ) then from the denition of F
R
d(@ER (; )=@) exp( E (; ))
d exp( E (; ))
= @E@()
@F ()=@ =
(2.6)
(2.7)
where indicates that the canonical average is evaluated by a Boltzmann sampling simulation
with congurational energy E (). We now have an equation of the form of equation 2.1; the
desired free energy is obtained by casting it into the form of equation 2.2 or 2.3 and integrating,
as in the `natural' examples.
A typical application is Frenkel and Ladd's method [71], designed for solid systems. We shall
use a variant of this method in section 4.2. Here
E () = E0 + U
where U is the energy of an Einstein solid with its lattice sites at req
i (The Einstein solid
is a crystal composed of non-interacting point particles each attached to its lattice site by a
harmonic spring), so
N
X
2
E () = E0 + (ri req
i )
i=1
CHAPTER 2. REVIEW
49
In almost every application the extra interaction U is added linearly, so that < @E (; )=@ >
is just < U >. The desired free energy F (0) is found from
F () = F (0) +
Z
0
max
< U >0 d0
It is apparent that F () is equal to Fein , which is exactly known, only in the unmeasurable
limit ! 1, but Frenkel and Ladd use a 1= series expansion of F () Fein to correct
their large- results. They carry out simulations on hard spheres to to investigate the relative
stabilities of fcc and hcp hard spheres, nding that fcc is marginally the more stable structure.
Their results agree with those of [72].
Another useful technique for measuring free energies of solids is the single-cell occupancy
method of Hoover and Ree [72, 73]. Equation 2.5 is used for the integration, and|the essential
component of the method| each particle is constrained to stay throughout within the WignerSeitz cell formed by its neighbours. This does not aect the solid phase, where diusion of
the particles is in any case prevented by their neighbours, but stops the rst order melting
transition that would otherwise occur when the density falls suciently. The reference system
is thus a solid which is vastly expanded, so that the interaction of the particles is extremely
small and the partition function can be calculated exactly (it is similar to the ideal gas, except
that each particle has available to it only a fraction of the total volume). However it should
be mentioned that there is some evidence [66, 67] that a second-order or weak rst-order phase
transition does takes place in spite of all eorts, and because of this extra computer time is
needed in this region for equilibration and to capture the shape of < E (V ) >.
Advantages and Disadvantages
As we have said, a major advantage of the method is that it may require very little modication
of an existing Boltzmann sampling routine to use it, though an `articial' TI method will usually
need some alteration to insert the extra potential (U above). The major disadvantage comes
from the inability to handle phase transitions: when using TI in phase transition problems,
we do not usually have even the option of measuring the free energy dierence between the
phases directly. They have to be treated separately, and each linked with its separate reference
state. Whether or not this causes any diculty depends on the length of the integration path
required, and whether or not it is easy to nd an integration path that does not cross a phase
CHAPTER 2. REVIEW
50
transition. It may be dicult to avoid such a path: we have seen that for solids this requires
the use of a trick like the single-cell occupancy method, and the same problem may well arise
in uid problems, where integration from the dense uid (liquid) phase to a very dilute state
would cross the liquid-vapour line. One way to solve this problem would be to integrate around
the critical point, though this seems not to have been tried for a simple uid. In [74] the phase
transition was prevented articially by suppressing density uctuations, but this clearly has the
disadvantage that congurations with signicant canonical weight may be suppressed.
The total time required by the method depends, then, on the number of simulation points
required on the integration path, and so depends on the particular problem. If the path is
long, or if F (V ) has some regions where its higher derivatives are large, then many points will
be needed, and if some of the simulation points have ergodic problems (i.e. if equilibration at
constant V is a problem), then they will take a long time to obtain. The extreme example of
this is a phase transition. Conversely, however, if F (V ) is well-behaved then TI will be very
ecient and is likely to outperform most other methods listed here, not least because of the
way it scales with system size. Many methods require the number of `simulation points' or the
equivalent to increase at least as Ld=2, because they require that adjacent simulation points
have some likely congurations in common, and the size of the `uctuations' in a canonical
ensemble increase as Ld=2 while the size of the system ( Ld) determines the separation of the
reference state and the state of interest. With TI this is not so: the separation of simulation
points depends only on the smoothness of the integrand and need not increase much with L at
all. (see section 2.1.3 for further comments). One example where a very smooth integrand has
enabled a very large system to be investigated is [75], discussed further in section 2.1.4 below.
Estimating the error in measurements of free energy made by TI is not easy to do, and
must be counted a disadvantage of the method. An estimate can be obtained by the blocking
procedure of section 1.2.2, or by looking at the nite-size scaling behaviour of the estimate
(which is done in [71], leading to a claim of an accuracy of 0.1%). However the total error
thus obtained may be an underestimate of the true error, because it includes only the eect of
random uctuations, while the eect of, for example, rapid variation in < E ( ) > which is not
well captured for the chosen spacing of simulation points, is to put in a systematic error which
is not detectable simply by repeating the simulation with dierent random numbers. Errors of
this kind are found in our investigations in section 3.3.2. In some systems ergodic problems may
also aect the estimates of < E >. It is also important, though time consuming, to equilibrate
CHAPTER 2. REVIEW
51
each simulation separately; it has been found [76] that failure to do this may cause hysteresis.
A common way to reduce the amount of equilibration that must be done is to use the nal
conguration of one simulation as the starting conguration of the next.
In the case where TI is done in an ensemble in which the system has a constant order
parameter, so the result is F rather than G, it is easiest to nd the coexistence curve by using
the double-tangent construction, described in the context of a uid simulation in appendix B
(see also [47]). This avoids the necessity of mapping out the whole of F . The process is easier
once one coexistence point has been found, because the Clausius-Clapyron equation
dp
H
d coex = V
may be used to predict how p must change for a given change in to keep on the coexistence
curve. Integrating the Clausius-Clapyron equation using a more complex predictor-corrector
method is known as Gibbs-Duhem integration [77].
2.1.2 Multistage Sampling
In this method, (also known as the `overlapping distributions' method) the idea is to measure
the free energy dierence between two canonical ensembles, dened on the same conguration
space but with dierent values of the eld variables, by using the overlap of the p.d.f.'s of some
macrostate. The method seems to have been used rst by McDonald and Singer [78, 79]. Later
implementations include Valleau and Card [80], and the method has in fact been fairly widely
used [54, 64, 65]. To see how it works, consider distribution functions of the internal energy in
two canonical ensembles at temperatures 0 and 1 . We have
P0can(E ) = (E ) exp( 0 E )=Z0
and
P1can(E ) = (E ) exp( 1 E )=Z1
Clearly, we can measure P1can and P0can from Boltzmann sampling MC simulation; let the
estimators derived from histograms of visited states be P~1can and P~0can . As we have said before,
only the unknown value of (E ) prevents us from estimating Z . But we can eliminate it between
CHAPTER 2. REVIEW
the two equations:
52
Z1 =Z0 = [P~0can (E )=P~1can (E )] exp( (1 0 )E )
So we can now estimate Z1 =Z0 = exp(0 F (0 ) 1 F (1 )), as long as the state E is such that
we obtain both P~0can (E ) and P~0can (E )|that is to say, as long as the measured probability
distributions P~1can and P~0can overlap. If they do overlap, it obviously makes sense to use all the
energy states in the overlap region, which gives us
P~ can (E ) exp( ( )E )dE
0 F (0 ) 1 F (1 ) = ln ov 0 R ~ can 1 0
ov P1 (E )dE
R
(2.8)
If one of the states (at 1 say) is a reference state of known free energy, equation 2.8 now
gives us F (0 ).
A similar equation results from considering the p.d.f.'s of the magnetisation (or whatever the
order parameter is) at dierent values of the eld H , and using the overlap region to eliminate
exp[ F (; M )] (compare the equations for P0can(E ) and P1can (E ) with equation 1.10).
However, this simple implementation runs into trouble for more than small free energy dierences between small systems, because the two measured p.d.f's will fail to overlap. This problem
becomes more acute as the system size increases: as we discussed in section 1.1.3, the fraction
of the possible energy states of the system which have a canonical probability signicantly different from zero goes like N 1=2 , making overlap with the reference state harder to achieve. In
[80] this problem is solved in a fairly obvious way by generating `bridging distributions': a series
of simulations is performed using modied Boltzmann weighting Pj (E ) / exp( 1 E=j ) for a
set of coecients fj g in the range 1 =0 < j < 1 (i.e. eectively for a range of temperatures
between 1 and 0 ). The set fj g is chosen so that adjacent distributions overlap and the
`coldest' bridging distribution overlaps with the Boltzmann simulation at 0 (say) while the
`hottest' overlaps with the simulation at 1 . Equation 2.8 is applied repeatedly to eliminate
all but F (1 ) and F (0 ) (though if the process is applied between 1 and any intermediate
temperature it can be used to nd F there too). This implementation is what gives the method
the name `multistage sampling.' If the temperature is varied, as above, we commonly choose
the reference state to be at innite temperature, 1 = 0, when the free energy is often known
exactly, or to a good approximation, as described in section 2.1.1.
In [80], Valleau and Card test the method on hard spheres with Coulomb forces and report
results with a quoted 1% error in good agreement with results obtained by other methods. They
CHAPTER 2. REVIEW
53
also point out that the exponential bridging distributions used are not the most ecient, being
sharply peaked, though of course they require hardly any modication of a normal Metropolis
MC program to produce.
It is interesting to note that, if the two distributions overlap almost entirely, then
R
~ can
ov P1 (E )dE 1 and we need sample only the ensemble at 0 , evaluating < exp[ (1
0 )E ] >0 . This case corresponds to the evaluation of < O > for the `suitable operator' O1 in
gure 1.10. However, unless (1 0 ) is quite small, exp[ (1 0 )E ] will be more like O2
(from the same gure) and the estimator of < O2 >< exp[ (1 0 )E ] >0 obtained will
accordingly be bad for the reasons described while discussing this gure. The normal multistage
method thus oers a way of overcoming the problem of incompatibility of sampled distribution
and operator (provided that the p.d.f.'s at 0 and 1 overlap to some extent), by sampling
both ensembles. We can see intuitively that the estimator of < O2 >, which tends to be an
R
underestimate, will be increased by dividing by ov P~1can (E )dE < 1.
A variation on this temperature-changing version of multistage sampling, called Monte Carlo
recursion, has been developed by Li and Scheraga. They express the free energy dierence as
a sum of terms of the form ln < exp[(i i+1 )E ]) >i and use acceleration-of-convergence
techniques to extrapolate it to = 0. The method has been applied to the Lennard-Jones
uid with 32 particles [81], and to two models of liquid water [82]. In [82] a comparison of the
method is made with TI and multistage sampling, and it is found that the eciency is about
the same. In [82] fairly good agreement with experiment is obtained. A simpler version of the
same technique, with temperature doubling at each stage, has recently been rediscovered [83].
The multistage method, as we have described it, is, like TI, only suitable for sampling along
a path on which there are no rst order phase transitions, because at the phase transition
(which, let us say, occurs at ) there is a discontinuous `jump' in E , which, if it is larger
than the size of typical uctuations, is likely to prevent overlap of the p.d.f.'s P can ( ; E ) and
P can (+ ; E ). Thus, like TI, it can only be used for nding the free energy of a single phase
separately, not for measuring a free energy dierence by sampling through the coexistence
region. It is possible to overcome this problem by performing a series of simulations all at ,
each with an articial constraint on E to keep it in some range of values narrow enough that
all macrostates have appreciable Boltzmann probability, and overlapping with its neighbours.
Then by matching the probabilities in the overlap regions, it is possible to reconstruct the
whole p.d.f. of E across the transition region, though with the caveat that equilibration of
CHAPTER 2. REVIEW
54
congurations containing interfaces may be extremely slow. Some authors, however, would
describe this kind of implementation as umbrella sampling (see below). The same problem
arises with the p.d.f. of the order parameter, with replaced by the appropriate value of the
external eld, and it can be overcome, to a certain extent, in the same way.
Advantages and Disadvantages
The multistage sampling method has very similar advantages and disadvantages to TI, though
it may be considered that in its concentration on probabilities it leads to free energies in a
rather simpler and more transparent way. On the other hand, the method does demand overlap
of the p.d.f's of adjacent simulations, which, as we have seen, is not necessarily the case for
thermodynamic integration. Like TI, multistage sampling has the advantage that it requires
little modication of existing Boltzmann sampling routines, but also like TI it cannot, in its
simplest form, deal with rst order phase transitions on its path of integration. It is similarly
vulnerable to poor estimation of the canonical p.d.f. caused by problems of slow equilibration
or free energy barriers in any of the substages of the simulation, particularly those at low
temperature.
Whether the method gives us F (; V ) or G(; p) depends on whether the order parameter
is constant in the ensemble simulated or may vary. If it is constant then a double-tangent
construction (appendix B) will be required to nd the coexistence eld, just as for TI.
2.1.3 The Acceptance Ratio Method
Introduced by Bennett in [84], this method has also been used by Hong et al. [85] and, in
modied form, in [86] (see section 2.1.4 below). It is a similar method to multistage sampling,
but extends it somewhat and addresses problems of optimising the process of measuring F . We
should also remark that [84] is also a good general reference to the problem of measuring free
energy; the author discusses what were the major methods of doing so when it was published
as well as contributing several new ideas. He connes himself to methods where the system of
interest is connected with a reference system of known free energy, and, indeed, asserts that
the free energy can in general only be found by using a reference state. In fact this is not the
case (the methods of section 2.3 do not do so), but the vast majority of methods do use this
technique.
Bennett then goes on to discuss in detail the problem of nding the free energy dierence
CHAPTER 2. REVIEW
55
between two canonical ensembles (where one is a reference system, this will clearly give the
absolute free energy of the other). He also treats in detail the statistical problem of analysing
the available MC data to extract the best estimate of the free energy, although he has to treat
this problem using the assumption that the data points are uncorrelated.
Let the two systems be denoted by suxes 0 and 1. Then for any function W () of the
coordinate variables we nd
R
Z W exp( (E0 + E1 ))d
exp[ (F0 F1 )] = ZZ0 = 0 R
1 Z1 W exp( (E1 + E0 ))Rd
R
W exp( (E0 + E1 ))d = W exp( (E1 + E0 ))d =
Z1
Z0
W exp( E0 ) >1
= <
(2.9)
< W exp( E1 ) >0
This is clearly reminiscent of equation 2.8. If we choose W = exp(E1 ) with E1 = 1 E0 = , then
it reduces to the equation for a single stage of multistage sampling with almost complete overlap.
The acceptance ratio method itself is produced by the choice W = min fexp(E0 ); exp(E1 )g,
in which case equation 2.9 becomes
Z0 < Me ( (E0 E1 )) >1
Z1 = < Me ( (E1 E0 )) >0
(2.10)
where Me is the Metropolis function Me (x) = min f1; exp( x)g.
So we see that we can estimate Z0 =Z1 = exp[ (F0 F1 )] by performing two simulations,
and in each recording the average of the Metropolis function of the dierence of the two energy
functions. < Me ( (E0 E1 )) >1 is the average of the probability for making a transition from
ensemble 1 to ensemble 2|hence the name `acceptance ratio method'. However no transitions
are made in fact: we sample each ensemble separately. This should be compared with the
expanded ensemble described in section 2.2.3.
Note also that the method is dened above for two systems with dierent potential energy
functions but the same canonical variables fg. However, one system could actually have more
than the other; the smaller system could be given dummy variables whose contribution could
be factored out of the congurational integral.
Bennett continues by making variational calculations (in which, however, the eect of correlations has to be ignored) to predict the conditions under which the method gives the results with
the smallest variance. It proves necessary, if both < W exp( E0 ) >1 and < W exp( E1 ) >0
CHAPTER 2. REVIEW
56
are to be measured accurately, that the two distributions of E should overlap, and Bennett
advocates shifting the origin of one of the potential functions to achieve this. The optimal
amount of shift is just the free energy dierence between the ensembles, which must be found
iteratively. Note that it is not optimal to shift by < E1 >1 < E0 >0 as might rst seem
to be the case, because a correction for the relative likelihood of other congurations in each
ensemble is necessary (cf. section 2.2.3).
Since transitions are not made between the 0 and 1 systems, it is also necessary to consider
what fraction of the available time to spend in each. The results is that it is, in fact, almost
always near-optimal to devote the same amount of time to each system. Finally, it can be shown
that the choice for W that minimises the variance of the estimate of the free energy dierence
is not Me (x), but the Fermi function, ffer (x) = (1 + exp(x)) 1 , though the dierence is small.
If the distributions cannot be made to overlap, we must either generate bridging distributions
as in Multistage sampling, or try extrapolating them if they are smooth enough. Though it
is not stated explicitly, the criterion for eective extrapolation seems to be the same one that
determines the separation of the points in thermodynamic integration. Bennett's extrapolation
will work well if the shape of P can (E ) is well-described by its rst few moments, which are
related to < E > and d < E > =d , while TI over widely separated points also works if F ( )'s
higher derivatives are small, and these are related to the derivatives of < E > by equation 2.1.
This seems to put overlapping-distribution methods back on an even footing with TI, though
extrapolation is more complex to implement.
The paper also contains a discussion of other methods: numerical integration, the `perturbation method' (which is what has since become known as the single-histogram method [87]),
which is viewed as a limit of the acceptance ratio method where one ensemble is not sampled
at all, and overlap methods like Multistage sampling. In this discussion, the multicanonical
methods of Berg and Neuhaus (see section 2.2.2 and chapter 3) are also anticipated, when the
use of at bridging distributions is suggested.
Advantages and Disadvantages
The acceptance ratio method and the other extensions and optimisations of the overlappingdistribution method that are considered here seem to improve it to an extent where it is again
competitive with TI. However the analysis applies directly only to a two stage process, and in
a real application many stages will usually be necessary. To choose the spacing of the stages
CHAPTER 2. REVIEW
57
and optimise the method using Bennett's criteria is a complicated problem which would involve
several `trial' runs, but fortunately it seems that the `eciency maximum' is in practice quite
wide and at, and rough improvements (inserting an extra stage wherever the acceptance ratio
cannot be measured properly, etc.) quickly give an answer very close to the optimum.
Another problem which is not addressed in the paper is that around a rst order phase
transition, ergodic problems may occur which cause the averages measured in each simulation
separately to converge slowly. This problem is obscured in [84] because it is assumed throughout the analytic derivations that the congurations that are sampled in each simulation are
uncorrelated. No reference is made to the possibility of reducing the correlation time itself.
It seems that in some cases the ergodic problems might be overcome by actually performing
the transitions whose probabilities are measured; this is the basis of the expanded ensemble
method. This is a point we shall return to in the discussion section, and indeed often in later
parts of the thesis.
2.1.4 Mon's Finite-Size method
Mon's method [86] relies on MC sampling to nd the dierence in free energy density between a
large and a small system. The free energy is then calculated for the small system by evaluating
the partition function explicitly.
The method (described as implemented for the 2d Ising model with H = 0, with an obvious
generalisation to 3d) is this: consider the Ising model on a 2L 2L square lattice. Two dierent
kinds of boundary conditions are considered: rstly, the normal periodic boundary conditions,
for which the total energy function is E2L , and secondly, boundary conditions which divide the
2L 2L lattice into 4 (L L) lattices each individually having periodic boundary conditions,
for which the energy function (of the 4 lattices considered as one composite system) is E L . It
follows that
fg exp( E L )
fg exp( E2L )
P
fg exp(P (E L E2L )) exp( E2L )
=
fg exp( E2L )
ZE L
ZE L =
2
P
P
= < exp( (E L E2L )) >2L
(2.11)
(2.12)
(2.13)
CHAPTER 2. REVIEW
58
so that the dierence in free energy density is
1
g2L gL = ln < exp( 4L(2Ed L E2L )) >2L
The Gibbs free energy is found for the Ising model, since M varies in the NV T -ensemble.
The procedure can be iterated until L is about 2 (in 3d) or 4 (in 2d) when gL can be found
exactly. Then so can g for the larger lattices. Rather than attempt to measure the average
of the exponential directly, Mon's method employs techniques derived from the acceptance
ratio method. For small systems both ensembles are simulated, with the transition probability
between the two being measured(using the Fermi, not the Metropolis function); then it is found
that
ZEL < ffer ( (E L E2L )) >2L
ZE L = < ffer ( (E2L E L )) >L
2
For large systems, especially in 3d, there is insucient overlap between the ensembles even
for this and Mon uses multistage sampling, simulating in addition interpolating Hamiltonians
E () = E2L (1 )E L for in the range [0 : : : 1] and nding the free energy dierence by
ZE L
ZEL ZE( ) ZE(n )
=
ZE L ZE( ) ZE( ) ZE L
1
2
1
2
2
where each ratio on the RHS is found by measuring Fermi transition probabilities. Up to
six stages were used in [86]. In this paper, g2L gL was measured for Ising models at Tc
for L up to 12 (2d), 6 (3d simple cubic) and 5 (3d body-centred cubic). g was obtained to
about 0.2% accuracy with 50 000 MC passes per site for each Hamiltonian simulated. An
investigation is also made of the predictions of nite-size scaling theory, that in d dimensions
g(Tc; H; L) = gb + U0(d)L d + O(L d 1) (see appendix A). Since the method measures g2L gL
directly, the contribution gb from the background analytic part of the free energy density is
removed and he was able to conrm the theory directly and measure the scaling amplitudes U0
to a few percent.
More recently, in [75], the same method has been applied to large Ising models on lattices up
to 323 , and U0 measured to extremely high accuracy. In this implementation, the interpolating
ensembles E (a) are used, but with thermodynamic integration (in the version described by
equation 2.6), rather than multistage sampling, used to nd the free energy dierences. This is
the only case we have seen where the system size has been suciently large, and < @E ()=@ >
CHAPTER 2. REVIEW
59
suciently smooth, that TI can be used with high accuracy in a situation when the energy
histograms do not overlap. 16 TI stages were used for the 323 system; many more would have
been necessary with multistage sampling without Bennett's extrapolation.
Advantages and Disadvantages
This is another method that is most easily implemented for lattice models, and though it could
also be applied to o-lattice systems this has not to our knowledge been attempted. Presumably,
we would either consider subdividing the system until we reached one that contained only a
single particle, or we would stop when the system became small enough for another method to be
easily used to measure the free energy. However, especially in a dense uid, the energy function
of the large system considered divided up would most frequently be innite, corresponding to
the case where at least some particles overlap the `walls' partitioning the large system into
the subsystems. This would drastically reduce the eciency of the method; some kind of
intermediate ensembles, with the walls put in gradually, would be required.
The numerical eort involved depends on the dimensionality and nature of the problem. The
idea of the method is attractive and the accuracy is high because it concentrates on measuring
a correction term (the dierence in free energy densities between the two systems) rather than
the free energy directly. While it is similar to other multistage sampling methods, we would
expect that fewer intermediate stages would be necessary in this case because of the intelligent
choice of systems E L and E2L : the dierence between the two Hamiltonians will be small
over most congurations, and will be caused mainly by the presence of the extra interfacelike terms produced by evaluating the E L energy in the E2L ensemble. The size of these
interfaces increases only as Ld 1, so this is also the scaling of the quantity in the exponent
to be averaged. In a similar multistage technique with ensembles diering in temperature, for
example, the analogous quantity would scale like Ld. The result is that fewer stages are needed
for a particular system size than in a `normal' Multistage method. It is also probably the best
method for measuring the correction-to-scaling amplitude U0 .
2.1.5 Widom's Particle-Insertion Method
This method was introduced in [88]. Its goal is to measure the chemical potential ( G=N
for a system with only one kind of particle) by making trial insertions or removals of a particle.
CHAPTER 2. REVIEW
60
Consider the denition of :
=
@F @N V;T
or for a nite system
= F (N + 1) F (N )
+ 1)
= 1 ln 3(ZN(N
+ 1)Z (N )
Because the number of particles changes, it is necessary to include the eects of the the
p
kinetic energy, which produces the 3 ( is called the thermal wavelength: = h= 2M= ),
and the indistinguishability of the particles, which produces the N +1 = (N +1)!=N !. However
we can remove the need to consider them explicitly by using id = 1 ln[(N + 1)3=V ] for the
ideal gas, giving
ex = id = 1 ln ZV(NZ (+N 1)
)
R
N
0 0 N N 0
= 1 ln exp[ E (R )] exp[ EN( ;N )]d d
V exp[ E ( )]d =
0 0 N
0
1 ln < exp[ E ( ; )] >N d
R
V
(2.14)
(2.15)
(2.16)
where we have cast the ratio of partition functions into the form of a canonical average (over an
N -particle simulation) of the excess energy E 0 (0 ; N ) of the interaction between an N + 1th
particle, 0 , and the other N . The procedure is thus to perform coordinate updates in a
constant-NV T simulation of N particles as normal, but to alternate them with trial insertions
of a `ghost' particle at randomly chosen locations, where we then evaluate E 0 to estimate
< exp( E 0 ) >. The `ghost' particle is not in fact inserted. A virtual trial removal of a
particle can also give us in a similar way (see [89]).
The method can be thought of in the framework of the acceptance ratio method. We are
sampling the transition probability exp( E 0 ) two canonical ensembles diering in the number
of particles they contain. The expectation of the transition probability gives the free energy
dierence. If the transitions are actually made then we have grand canonical Monte-Carlo.
CHAPTER 2. REVIEW
61
Advantages and Disadvantages
The method works well for uids at low density, but at high density the Boltzmann factor
becomes very small because almost any insertion move would result in the test particle overlapping some other, resulting in a very high energy. Once again the necessity of nding the
average of an exponential reduces the eciency. Similarly removing the particle would leave a
high energy `cavity.' For solids the method does not work at all, because we cannot insert or
remove a particle without disrupting the lattice.
Various methods have been tried to improve the performance of the method at high densities.
For example in [90] the inserted particle is moved around so that high-energy congurations are
sampled better, while in [89] we actively search for the cavities where insertions remain possible.
This method is similar to the generalised form of umbrella sampling (see below), where certain
congurations are generated preferentially.
2.1.6 Histogram Methods
Renewed interest in histogram methods was the result of papers by Ferrenberg and Swendsen
[87, 91, 92]. They present their methods as a way of optimising the analysis of data obtained
by conventional MC simulations, though it is also relevant to the free energy problem. Because
the Boltzmann distribution has the same form at any temperature, we can use the MC estimate
P~can (E ) from a simulation done at temperature to estimate (E ) and so by substituting in
equation 1.7 and normalising get
P~ can (E ) exp( E )
P~can(E ) = P ~ can
E P (E ) exp( Ei )
(2.17)
0
where = 0 . This is called histogram reweighting. Expectation values < O > follow
as before, so we can calculate TD quantities away from the temperature of the simulation|
but only slightly away from it because the canonical distribution is very sharply peaked. As a
result, once terms coming from the wings of the distribution become important the accuracy
falls, just as in multistage sampling or the evaluation of an exponential average in section 1.2.3.
In [87] the method is used to exactly locate turning points in quantities which are functions of
temperature, like specic heat. Simulations are performed at a temperature near that of the
specic heat maximum (the exact temperature at which this maximum lies is unknown), then
reweighted to obtain a much better estimate of the location of the maximum and the value of the
0
CHAPTER 2. REVIEW
62
specic heat there. If the whole p.d.f. at 0 can be accurately constructed, then the free energy
dierence G( ) 0 G(0 ) is also calculable (because to be able to nd P can(0 ; E ) accurately
is also what is required to nd < exp( 0 )E >). This use of the histogram method is thus the
same as the `single-ensemble' versions of multistage sampling (section 2.1.2) that has already
been mentioned. In a more recent paper [93] Rickman and Philpot have suggested that an
analysis of the distribution function in terms of its cumulants provides an approximation which
can be extrapolated with more condence into the wings of the distribution. They show, using
data from a simulation at one temperature, that various thermodynamic quantities (including
free energy dierences between the two temperatures) can be calculated by this method more
accurately over a wider range of temperatures than by simple reweighting (this clearly connects
with Bennett's extrapolation techniques).
The method has been extended to be more useful in the context of free energy estimation in
[91], where Ferrenberg and Swendsen extended it to `overlap' data from several simulations at
dierent temperatures, obtaining iteratively soluble equations giving the partition function and
its error at any temperature. The peaks in the error show where further simulations need to be
done. This procedure is very similar to the overlapping done in multistage sampling, but the
analysis showing where simulations should be done is new. A detailed discussion of the errors
of estimators of free energies and other thermodynamic expectation values obtained by both
single- and multiple-histogram methods can be found in [94].
Although histogram methods have become a standard technique as a result of the work of
Ferrenberg and Swendsen, the idea of reweighting the histogram to give estimators of thermodynamic quantities at other temperatures was used by many previous authors; see this chapter
ibid. One early example is the work of Macdonald and Singer in the late 1960's on the LennardJones and 18-6 uids ([78] and especially [79]). Histogram methods are also essential in many
of the non-canonical methods of the next section. Indeed, the full power of the technique is
perhaps best released by use of the multicanonical distribution. When histogram methods are
applied to Boltzmann sampling simulations, the fundamental problem of the unsuitable shape
of P can (E ) is still not solved, only alleviated.
CHAPTER 2. REVIEW
63
2.2 Non-Canonical Methods
2.2.1 Umbrella Sampling
This name is used in the literature in several ways to describe any or all of a number of dierent
methods. In its most general sense it is the name given to what we have called `non-Boltzmann
sampling'. As we saw in chapter 1, the Metropolis algorithm can be used to sample from any
distribution. It can be shown (see section 3.1.1) that if we do sample from a non-Boltzmann
distribution then canonical averages can be recovered by a suitable reweighting, provided that
the chosen sampled distribution (some general P (), not P can()) put appreciable weight in
those states that dominate the canonical p.d.f. P can () at the temperature/eld of interest.
Now, as we have seen (section 1.2.3 and elsewhere), the canonical sampled distribution is
incompatible with the estimation of the average of certain operators, in particular those that
lead to the measurement of absolute free energies or free energies dierences, because the
congurations that dominate the maximum of O(E )P can (E ) (to take the example of energy
macrostates) are generated with almost zero probability. If the sampled distribution is carefully
chosen so that, as well as the states in P can(E ), it also puts weight in the states that dominate
the maximum of O(E )P can (E ), then it can be used to overcome this incompatibility problem.
This normally requires a sampled distribution that is wider over the E -macrostates than the
canonical distribution|hence the whimsical name `umbrella sampling,' coined on the grounds
that the wider sampled distribution is like an umbrella extending over P can (E ) and OP can (E ).
Though it was not the rst use of this technique, the seminal paper on this method seems
to have been [95] (where the term `non-Boltzmann sampling' is also introduced), Some similar
methods were employed earlier, for example [78] and especially [79], which used a kind of
reweighting based on equation 3.5. These latter references also made the fundamental point
that estimators produced by multiplying small histogram entries by large reweighting factors
are no use in practice because of their high variance.
Possibly because of the inuence of [95], the name `umbrella sampling' is often used used in
a restricted sense where free energy measurement is the goal. However, as we have said, the
name is also applied by some authors, for example Frenkel in [47], to any kind of non-Boltzmann
sampling (in [6, chapter 6] the name `umbrella sampling' is applied to overlapping canonical
p.d.f's with constraints, which we would call multistage sampling, but this usage seems to be
rare). In this wider sense, the range of possible applications is almost endless: the method
CHAPTER 2. REVIEW
64
can be used to generate any event that is rare in the canonical ensemble with a high enough
frequency that its probability becomes measurable; for example, it has been used to investigate
large uctuations in an order parameter [45], and in [24] rough measurements of the free energy
barriers to nucleation are made. If we think of these uctuations as taking us across the free
energy barrier between two phases, then we see that the problem of free energy measurement
by direct connection of the two phases can also be approached this way: as we speculated in
section 1.2.3, we could use a sampled distribution which has a higher-than-canonical probability
of being in the two-phase region. Another possibility, used in uid simulations, is to use a nonBoltzmann distribution parameterised by the distance of closest approach of the molecules.
However, like umbrella sampling itself, these techniques are usually given their own names: the
multicanonical ensemble and density scaling. They are both described separately below.
A nal point is that, although we have constrained the shape of an eective umbrella sampling
distribution, we have not xed it entirely. In fact, it seems to be an unresolved question
which sampled distribution is the `best' for measuring a particular operator, (in the sense that
estimators produced from it have the lowest variance). The issue seems to have been addressed
rst by Fosdick [96] and more recently, in the context of the study of spin-glasses, by Hesselbo
R
and Stinchcombe [97], who recommend a distribution where P (E ) / ( E (E 0 )dE 0 ) 1 . We
shall return to this matter in section 3.4.
Advantages and Disadvantages
Umbrella sampling is a powerful method; once a good sampled distribution has been obtained
we can reweight it to obtain not only free energies but a variety of canonical averages for all
values of the control parameters such that both P can and OP can put almost all their weight in
the sampled region.
Another very signicant advantage is that any ergodicity problems in the low temperature
states may be largely overcome by the increased volume of phase space available to the system.
It may, for example, move from states typical of a low temperature, where movement through
conguration space is typically slow, up to states typical of a high temperature, where congurations change rapidly. Then it may `cool down' again and return to a region of conguration
space far away from the one where it started. The whole process may take much less time than
would be required to pass between the two regions by way of the low-temperature states alone.
The most serious disadvantage of the method seems to be the diculty in nding a suitable
CHAPTER 2. REVIEW
65
sampled distribution in the rst place [47], the problem being, as we shall see in chapter 3,
that some knowledge of the free energy that we are trying to measure is required. It has
apparently been rare in the literature to achieve a sampled distribution more than a few times
wider than the canonical, and so the method has frequently been combined with multistage/TI
techniques and has mainly been used with small systems where the fractional uctuations are
larger. Indeed, in [46] it is stated that the umbrella sampling method cannot be applied to large
systems because a suitable sampled distribution cannot be produced. However various authors
have given methods for evolving a suitable sampled distribution of any shape (see [50] and
references in section 2.2.2; see also [98], an early reference that seemed to go largely unnoticed),
and we have also addressed the problem at length in chapter 3.
2.2.2 Multicanonical Ensemble
This method is originally due to Berg and Neuhaus [58], whose work has prompted much recent
interest and further study, some of which can be found reviewed in [99], [59] and [100]. The
method is described in detail in chapter 3. It is really a rediscovery and reapplication of the
ideas of non-Boltzmann sampling that were already present in [95]; a sampling distribution
is generated so that the probability distribution of some chosen observable X (typically E
or M ) is roughly at over some range of its macrostates (other workers have used sampled
distributions with similar properties without calling them multicanonical; eg [24]). We shall
characterise this distribution by its dierence from the Boltzmann distribution, by using weights
() ( (E ()) or (M ()), etc., as appropriate) so that the sampled distribution has measure
Y () / exp( E ()) exp(()). The results can then be reweighted (as outlined in section 2.2.1
to recover the desired results for the canonical distribution.
In [58] the method was originally presented as a way of measuring interfacial free energy with
high accuracy. However it has much wider applicability than this: it enables us to tackle the
free energy problem both by the approach of simulating direct coexistence of the two phases or
by measuring the absolute free energy of each phase separately.
In their rst paper [58] Berg and Neuhaus investigated the rst-order phase transition of the
2d 10-state Potts model and measured the interfacial free energy fs with high accuracy. At the
transition temperature (which is known exactly for the Potts model at H = 0, so there was no
need to search for it), the canonical probability distribution of the internal energy is doublepeaked, with one peak corresponding to a disordered phase and the other to the 10 equivalent
CHAPTER 2. REVIEW
66
ordered phases. The states in between, which correspond to mixed states containing interfaces,
are exponentially suppressed (see section 1.1.3). Because the tunnelling times between the
symmetry-broken phases in canonical Metropolis simulations is so long, a preweighting (E ) is
used to produce a sampled distribution which is roughly at between the two peaks of P can (E ),
then reweighted the results to recover the canonical distribution. The accurate measurement
of the probability of the mixed states leads to an estimate of the interfacial tension. With the
probability of each energy (action) macrostate roughly constant, it was found that the ergodic
time rw , the time to tunnel between the symmetry-broken phases, scaled with system size as
rw L2:35d, comparable with the ideal rw L2d expected for a simple random walk, and an
enormous improvement on rw exp(2Ld 1fs ) expected for a Boltzmann sampling simulation.
Later applications and extensions of the method have included:
Berg and Neuhaus, together with various co-workers, have measured interfacial tensions
for several other systems by preweighting of the order parameter (magnetisation); they
call this the multimagnetical ensemble, though we use `multicanonical' for all applications.
In a paper written with Hansmann [101] they simulate the 2d Ising model as a strict test of
the method; these results also appear in [102] along with similar measurements for the 3d
Ising model. Like the energy distribution of the 10-state Potts model, the magnetisation
distribution of the critical Ising model has a double-peaked shape and preweighting is used
in a similar way to facilitate tunnelling and to measure the probability of the M = 0 states.
A recent paper written with Billoire [103] performs similar measurements on the 10-state
and 20-state 2d Potts models, and [104] presents data on the 2d 7-state Potts model
and the 4d SU(3) lattice gauge theory. In all cases the interfacial tension is measured
with much greater accuracy than had previously been achieved. The later papers contain
further development of error analysis and comparison with nite-size scaling theories.
Further work on lattice gauge theories has included the application of the method to to
the SU(2) [105] and SU(3) theories [106] and to QCD itself [107].
The multicanonical method has also been used in simulations of spin-glasses (with energy
preweighting), where particular advantage is taken of the algorithm's ability to move
rapidly across the free-energy barriers that severely slow canonical simulations. Three
papers investigate the Ising-like Edwards-Anderson spin glass. The rst, by Berg and
Celik [108], looks at the 2d model. The innite-volume zero-temperature ground state
CHAPTER 2. REVIEW
67
entropy and energy are estimated from nite-size data. They nd that the problems of
slowing-down with increasing system size are more severe than before, rw L3:2d, though
this is still much better than canonical simulations (exponential slowing down) and is
better even than simulated annealing, for which rw L4d. The others, by Berg, Celik
and Hansmann [109, 110], look at the spin-glass in 3d and present more complete results
for energy density, entropy and heat capacity, evaluated at all temperatures (including
= 1) by reweighting. They also show the order parameter distribution. Considering
that energy, not the order parameter, was preweighted, it may seem strange that this
distribution could be accurately obtained: however the locations and heights of the peaks
can be found because the multicanonical algorithm makes high energy congurations
accessible, where the order parameter has a single-peaked distribution, and then the
system can `cool down' again in any of the modes of the order parameter (as was described
in the `advantages and disadvantages' part of section 2.2.1). Thus, the system has the
eective ability to tunnel through the free energy barriers separating the phases. The only
signicant dierence is that, with energy preweighting, the order parameter distribution
is not well determined in the regions of low canonical probability, and so no estimate of
interfacial tension is obtained. Again, the simulation slows down with rw L3:4d. In
[111] the suitability of the algorithm is checked by applying to the well-understood Van
Hemmen spin glass. The use of the multicanonical algorithm to investigate spin-glasses is
also referred to in [112], a general paper on the multicanonical method and its application
to multiple-minimum problems.
Hansmann and Okamoto have applied the multicanonical method to the problem of
protein-folding. Like the spin-glass, this is a multiple-minimum problem where ergodicity problems can prevent the location of the ground-state. In [113] they preweight the
congurational energy of met-enkephalin, a simple protein consisting of 5 amino acids
and containing (in the simplied model used) 19 continuously variable dihedral angles as
parameters. They obtain the lowest-energy conguration, in agreement with results of
simulated annealing, and by reweighting the measured probabilities of energies evaluate
< E > and the heat capacity for a range of temperatures. This work is also referred to in
[112]. In [114] they have found the lowest energy states of three polypeptides, of length
10-20 residues, each containing only one type of amino acid.
CHAPTER 2. REVIEW
68
Rummukainen [115] has combined the multicanonical algorithm with microcanonical `demon' methods in a hybrid algorithm. By separating the demons, which form a heat
bath, from the multicanonical part, he is able to apply fast cluster methods (and, potentially, parallelism) to them. This is advantageous because the changes of the preweighted
variable in multicanonical algorithm are inherently serial in nature (because , being a
function of a variable (E or M ) that depends on all the spins, couples together all the
spins/particles of the system). To some extent this limitation is overcome by the use of
the demons. The method is tested on the 2d 7-state Potts model, preweighting energy,
and it is found that the ergodic time is much reduced below that of the simple multicanonical algorithm, e L1:8d. It is interesting to note that this is better even than `ideal'
random-walk behaviour, and demonstrates the eect that algorithmic improvement can
have. Interfacial tension is also measured, the results seemingly revealing inadequacies of
the normal nite-size scaling ansatz.
In work on multicanonical multigrid Monte Carlo [116, 117] Janke and Sauer combine the
multicanonical ensemble with the multigrid method to reduce critical slowing down. They
investigate the 4 theory in both one and two dimensions, with particular reference to the
autocorrelation time O of observables in the multicanonical simulation, and its relation
to the error bars of the estimators of canonical averages. As they point out, O is not
necessarily the same as rw . They nd that the behaviour depends on the nature of the
potential barrier they are trying to tunnel through. If the barrier height does not depend
on the system size, as is the case with the 1 d theory, then the scaling of O with L is much
the same for normal metropolis and multicanonical metropolis (though the multicanonical
algorithm always has the lower O ), and in both cases it is much reduced by using the
multigrid method too. However, if, as will usually be the case in physically interesting
dimensions, the barrier height increases with system size, the increase of O with system
size is much smaller for the multicanonical algorithms, and employing multigrid techniques
too further reduces O but keeps its scaling roughly constant|the opposite of what was
found in 1d. They conclude that where the multigrid algorithm can be used, it should
be used in combination with the multicanonical algorithm, since the two together can
produce an order-of-magnitude improvement in performance over that obtainable with
either alone. In a more recent paper [118], Janke and Kappler have combined the method
with the Swendsen-Wang cluster algorithm (to produce what they call a `multibondic
CHAPTER 2. REVIEW
69
cluster algorithm') and applied it to rst-order phase transitions of the the q-state Potts
model, nding that the multicanonical autocorrelation time grows as the ideal Ld .
We remark that Janke and Sauer's extension of error-propagation analysis to the multicanonical ensemble, (and a similar approach adopted in [119]), so that the error bars of
the estimators of canonical averages can be obtained from the correlation times of variables in the multicanonical simulations, produces expressions that are very complicated.
In practice we favour the use of blocking. We present our own investigations of O in
section 3.4.
Lee introduces what he calls `Entropic Sampling' in [120]; this is really no more than
a multicanonical preweighting carried out on internal energy rather than magnetisation.
However he does give an algorithm for evolving the preweighting, and discretises the
preweighting function at the level of the fundamental granularity of the problem rather
than making it piecewise linear within a series of bins, the choice of Berg and Neuhaus. He
also avoids using their idea of an eective temperature, which we too feel is unnecessary.
He presents some measurements on the 10-state Potts model and a small 3d Ising model,
comparing numerical values for the coecients in the high-temperature expansion of the
partition function with exact results. Very recently Lee, Novotny and Rikvold [19] have
applied the multicanonical method and Markov chain theory to study the relaxation of
metastable phases in the 2d Ising model. The multicanonical ensemble is applied as usual
to achieve a at sampled distribution over M , then the observed matrix of transitions
between macrostates is used to predict rst passage times and thus to study relaxation
properties, in particular the binodal and spinodal decomposition of metastable phases.
A modied algorithm is used to speed up dynamics at high-jM j. It turns out that the
description of the relaxation in terms only of the global order parameter M is surprisingly
accurate. In [19] the macrostate transition matrix is not used in the generation of the
multicanonical distribution, a technique that we have found to be useful (see section
3.2.3).
In what is to our knowledge the only application to date to the problem of nding the
phase diagram of an o-lattice system, Wilding in [121] has used the Multicanonical
technique to map out the coexistence curve for the 3d Lennard-Jones uid. The liquid
and vapour phases are connected directly by tunnelling across the interfacial region, in one
CHAPTER 2. REVIEW
70
of the clearest examples of the method's use in a free energy/phase coexistence problem.
In [58, 102, 103, 108, 122] the problems of the generation of the parameters that produce
the multicanonical distribution is addressed. In [58, 102, 103] nite-size scaling from a
small simulation is used to produce the parameters of a larger one. In [108] an overlappingdistribution method rather like multistage sampling is used to give an initial estimate for
the spin-glass problem where nite-size scaling does not work. In [122] all the methods
are reviewed and a slightly ad hoc method is given for combining the results of several
previous rening stages to give the best estimate for the reweighting function.
Advantages and Disadvantages
This method has the same key advantages and disadvantages as umbrella sampling, of which
it is a special case; producing a good sampled distribution requires an iterative procedure, but
once it is established a single simulation suces to measure the quantity of interest, which in
our case will be a free energy or free energy dierence. The larger set of accessible states reduces
error due to ergodic problems, which enables ecient simulation of systems like spin glasses,
but means that the random walk time across the multicanonical region becomes long, since the
multicanonical region is much wider than the peak of the sampled distribution of a Boltzmann
sampling simulation. eective autocorrelation time of observable O, O , also becomes long, but
it does not feed directly into the variance of estimators of canonical averages in the same way
that was described in appendix C, because of the eect of the reweighting process (see[116, 118]).
We shall investigate the nature of multicanonical errors more in chapter 3.
The ability to treat the two phases simultaneously in a single simulation (for lattice models
and, in some circumstances, o-lattice models too) is also clearly an advantage, though we
should point out that the direct linking the two phases is not easily applicable to o-lattice
systems where one phase is solid and the other uid because this would still produce severe
ergodic problems due to the diculty in `growing' crystals out of a uid.
Another potential disadvantage, (which would also aect most umbrella sampling simulations) is the impossibility of performing parallel updates on dierent degrees of freedom if those
updates would change the preweighted variable. In canonical simulations with short-range
forces, particles or spins that do not interact either before or after the move may be updated in
parallel. With the same forces but multicanonical sampling, only one particle or spin may be
updated at a time, because the change in preweighting, which we must know to calculate the
CHAPTER 2. REVIEW
71
acceptance probability, depends on the global value of some macrostate variable and so would
depend on whether or not parallel updates going on elsewhere in the system were accepted. As
we have seen, some methods like [115] have already been developed to partially overcome this
problem. We shall discuss it further in sections 3.2.5 and 3.4.1.
2.2.3 The Expanded Ensemble
This method, or methods similar to it, has also been discovered independently several times.
The way it is presented here follows the approach of Lyubartsev et. al. [60]. In some ways, it is
similar method to the multicanonical ensemble (section 2.2.2): the system is again encouraged
to explore more of its possible energy states and preweighting is again involved. We shall explain
the connections more formally in section 3.4.1, though they will be obvious while reading this.
First let us consider the temperature expanded ensemble, which is the rst version introduced
in [60]. A system of charged hard spheres (the RPM electrolyte) is simulated, which can also
make transitions between a number ( 10) of dierent temperatures fj g, so the total expanded
ensemble partition function Z is given by
Z=
X
j
Zj exp(j )
(2.18)
where the Zj are the normal canonical partition functions at each temperature
Zj =
X
f g
exp[ j E ()] = exp( j Gj )
and the G's are Gibbs free energies and the coecients fg are chosen so that about the same
time is spent in each j -state (we call the j -states `subensembles'). Transitions between the
subensembles are implemented using the normal Metropolis algorithm: the same conguration
is retained, and the move from ensemble i to j is accepted with probability Me (Eeff )
where Eeff = (j i )E () j + i . The nding of a suitable set fg is a non-trivial
problem which must be approached iteratively, just like the evaluation of the preweighting
coecients of the multicanonical distribution.
We expect to be at temperature j with probability Pj given by:
Pj = Zj exp(j )=Z
CHAPTER 2. REVIEW
so
j ) = exp( ( G G ) + )
Pj =P0 = ZZj exp(
j j
0 0
j
0
0 exp(0 )
72
(2.19)
If we can arrange for the state labelled by zero to be a state of known free energy (a perfect
gas, say, for 0 = 0 , or an Einstein solid), then we can nd the absolute free energies. In
P
the given references we estimate the Pj from the histogram of visited states: P~j = Cj = j Cj .
The coecients fg are required to stop the system spending almost all its time in the state
of lowest G. The `ideal' form for the fg is j = j Gj , under which the probability of all
subensembles is constant. We require j = j + O(1) if their probabilities are to be accurately
measured. In [60] known approximate values for G are used to bootstrap the estimates of fg;
the G's are in general unknown (and are indeed the very quantities we seek). This should be
compared to section 2.1.3.
Although we have described the method as it was implemented for subensembles diering
in temperature, it can easily be generalised to to dierent values of other eld variables (such
as H ) or to dierent forms of the interaction: in [60] itself, use of the temperature-expanded
ensemble is only sucient to transform the RPM electrolyte into the hard-sphere uid (at
= 0), because the hard core in the potential remains. Therefore, another expanded ensemble
is used where the Hamiltonian is changed to move from hard spheres to a perfect gas via
increasingly penetrable spheres. All the kinds of transformation of the energy function that
have been applied in thermodynamic integration are also applicable here.
In the investigation of the RPM electrolyte, Lyubartsev et.al. quote an error of about 1% in
the free energy and achieve good agreement with results of multistage sampling and theoretical
calculations. This method is one of the few that can as easily be applied to o-lattice as to
lattice-based systems. The same authors have extended the method to Molecular Dynamics
simulation and applied it to the Lennard-Jones uid and to a model of liquid water [123].
Marinari and Parisi [124] independently discovered the expanded ensemble, which they call
`simulated tempering', and applied it to the random-eld Ising model, which at low temperature has a p.d.f. of the order parameter with many modes separated by free energy barriers.
They present the method as a development of the simulated annealing algorithm, in which the
temperature is steadily reduced. Their concern is not with the absolute free energy dierence
of the system, or the free energy dierence between `hot' and `cold' ensembles, but with the
overcoming of ergodic problems in the lowest-temperature ensembles; the ability of the system
CHAPTER 2. REVIEW
73
to reach high temperatures where it can travel easily across free energy barriers means that,
when it cools down again, it is likely to enter a dierent mode of the p.d.f. than the one it
started from. This results in an eective rw for travelling between the modes of the lowtemperature ensembles which is much less than in a single low-temperature simulation, even
taking into account the extra eort in simulating the other subensembles. They measure < E >
and < M > in the lowest-temperature ensemble and show how the rapid tunnelling is indeed
facilitated. They compare simulated tempering with Metropolis and cluster-ipping algorithms:
it far outperforms Metropolis and seems better even than the cluster algorithm, though data
on rw comparable to that presented in the results of multicanonical simulations is not given.
In [125], however, the tunnelling time between between phases of opposite magnetisation is
investigated for the 3d nearest-neighbour Ising model, and it is claimed that rw L1:76(6)
with ve temperature-states used for all system sizes, but dierently spaced.
Thus, while Lyubartsev's implementation corresponds to the measurement of an absolute free
energy of a phase, Marinari and Parisi's shows how the method could also be used to connect
two phases for the measurement of a free energy dierence: we would use a variable order
parameter M within each subensemble and nd coexistence by nding control parameters such
that each mode of P can (M ) had equal weight in the particular subensemble that had the control
parameters of interest. However, because the sampled distribution within each subensemble is
unaltered, P can (M ) would only be measurably dierent from zero for both modes if H were
very close to Hcoex (see equations 1.14 and 1.15).
Other applications of the multicanonical ensemble/simulated tempering have included the
following:
The folding of simple models of proteins has been investigated in [126] and [127]. In the
second paper, the global minima of polymer chains of 8{10 residues are investigated by
two methods: one is a conventional temperature-expanded ensemble, while in the other
the polymer sequence changes between ensembles. This latter approach gives some idea
of the exibility of the expanded ensemble. In this context we should also mention [128],
where MC parameters are optimised during a run to achieve the fastest decorrelation in
a protein-folding problem. The convergence process resembles the nding of fg in an
expanded ensemble simulation, though non-Boltzmann sampling is not used.
The 3d Ising spin glass has been studied [129]. The method allows the ecient generation
CHAPTER 2. REVIEW
74
of a good sample of equilibrated congurations at temperatures below the glass transition.
It is found that the predictions of replica symmetry breaking theory rather than droplet
theory are supported.
The method has been used for the measurement of the chemical potential (free energy)
of a polymeric system. Used by Wilding et. al. [130], an expanded ensemble approach is
here combined with the particle-insertion method. The simple particle insertion method
fails because of the large size of the molecules, which results in a very small acceptance
probability. Therefore a series of intermediate ensembles are introduced in which the
interaction of the `test' polymer molecule with the others is gradually switched on. The
expanded ensemble technique is used to move between these ensembles. For long polymer
chains, the method is more eective than the commonly-used congurational-bias MonteCarlo [131].
The method has been applied to the 2D Ising spin glass [132] and, very recently, to the
U (1) lattice gauge theory [133], by Kerler and co-workers. In the Ising case a temperatureexpanded ensemble is used; for the gauge theory, an set of ensembles containing progressively more of a monopole term that transforms the transition from rst to second order.
We remark that, in [130] and [133] issues relating to the nding the coecients fg are also
investigated. The method of [130] uses a linear extrapolation technique, while in [133] a method
using information in the observed transitions between subensembles is used. This resembles a
method we developed independently, which is described in chapter 3, where we investigate the
problem of nding suitable fg's. (Our investigations are performed using the multicanonical
ensemble but would also apply to the expanded ensemble).
One issue that is important in expanded ensemble calculations is the spacing of the subensembles. Let us consider the temperature expansion case as an example. Unlike the multicanonical
ensemble, there is no `natural' granularity of the temperatures so they must be chosen with a
separation wide enough that the random walk time between the ends of the chain is not too
long, but not so wide that the acceptance ratio ra falls excessively low. It is possible to make
an approximate calculation of what the spacing should be (see [124]): we know P (i ! j j)=
Me (Eeff ) and Eeff = (j i )E () j + i . Consider only transitions between adjacent
states, so j = i + 1, and suppose we have the ideal set fg. Then we may expand ( ) as a
Taylor series, which gives, writing = i+1 i ,
CHAPTER 2. REVIEW
75
2 2 i + @ i + O( 3 )
i+1 = i + @
@ i 2 @ 2 i
2
= i + < Ei > 2k 2 (CH )i + O(3 )
B i
(2.20)
where in the second line we have used i = i Gi and expressed the derivatives of Gi in terms
of canonical averages.
Thus < Eeff > 2 (CH )i =2kB i2 . We demand < Eeff >= O(1) for a reasonable ra ,
so we can use this expression to estimate a suitable (and thus i+1 ) given i and a measurement
of (CH )i , the heat capacity in the ith ensemble. Then equation 2.20 enables us to estimate
i+1 . It is instructive to observe that, since CH Ld, we require = i+1 i L d=2.
This is the expected size of the fractional uctuations in the energy, and so we see that once
again we require an `overlap' between adjacent subensembles. It is the same sort of scaling as is
required for multistage sampling. However, use of the expanded ensemble, where congurations
pass from one ensemble to the other, makes it clearer that what is really required is an overlap
of the p.d.f.'s of the congurations of the adjacent subensembles. With a large dierence in
temperatures, the dominant congurations in one ensemble are not the dominant congurations
in the other|they will scarcely have any weight at all. Attempting a transition in temperature
without changing the conguration is liable to produce a `non-equilibrium' conguration in the
new ensemble, which is then very unlikely to be accepted, just as, when making coordinate
updates in the normal metropolis algorithm, the conguration can be altered only slightly at a
single step while maintaining a reasonable acceptance probability.
Methods Related to the Expanded Ensemble
Here we shall discuss two interesting methods which appear in the Statistics literature. They
are similar enough to the Expanded Ensemble to be treated here but also have important
dierences. They have not yet been applied to problems in physics, to our knowledge.
The rst, due to Geyer and Thompson, is called Metropolis-Coupled Markov Chain Monte
Carlo, (MC )3 [50]. Transitions are made between subensembles dened just as for the expanded
ensemble, but every one of the subensembles is active at a particular time, and the ensemblechanging moves consist of `swaps,' where the conguration i from ensemble i (i.e. the ensemble
with temperature i or energy function Ei ) is moved into ensemble j and vice-versa. The swap
CHAPTER 2. REVIEW
76
is accepted with probability
ra = min 1; PPj((i))PPi ((j ))
i i j j
With this choice, the stationary distribution Pi of each ensemble is preserved. Note that we
do not need coecients , because the method ensures that every ensemble is always active,
so there is no problem with the simulation seeking out the ensemble where G is the smallest
and staying there. However, we are as usual constrained by the acceptance ratio ra , which will
become very small if the ensembles have a large free energy dierence; indeed, ra has the form
of the product of two expanded ensemble transition probabilities, and so will become unusably
small rather faster. We should also note that we do not get absolute free energies directly,
because the expanded partition function is now the product of the Z 's for each subensemble,
not the sum. However, (MC )3 does oer a way of moving rapidly between the modes of a
multimodal probability distribution (Geyer presents it primarily as a way of doing this), and
so the application allowing tunneling between coexisting phases would still be possible.
In this respect the second method, `Tempered Transitions,' [57] is similar|it too would give
g not directly, but by speeding up tunnelling between the modes of P can (M ), where M varies
within a subensemble. However, here we seek to reduce the time taken for a random walk
through the subensembles by forcing the system to follow a trajectory from the temperature of
interest, up to high temperatures and then back down, then performing a global accept/reject
of the whole trajectory. However it is found (at least in the trial problem that is investigated
in [57]) that to achieve a good acceptance ratio we need N 2 intermediate states, where the
expanded ensemble requires only N , so that there is no reduction in time. Nevertheless, for
some systems Tempered Transitions may be a better way of moving between the modes of
P can (M ), the details depending in a rather complex way on the shape of P can (M ) in the
intermediate ensembles.
Advantages and Disadvantages
Like the multicanonical ensemble, the expanded ensemble enables a wider exploration of conguration space, and gives access to free energies and free energy dierences through a direct
measurement of probability. It can be used either to measure the free energy of each phase
separately, or to improve the exploration of the conguration space at low temperatures by
CHAPTER 2. REVIEW
77
giving the simulation the ability to connect to high-temperature states where the p.d.f. of the
order parameter no longer has a double peak (or, in the case of spin-glasses, a multiple peak).
This thus enables the simulation to bypass the potential barriers due to interfaces between
phases, though searching in the space of the other control parameters will be required to ensure
that the high-temperature phase does not always connect to only one of the low-temperature
phases.
We would remark on the clear similarities between this method and the acceptance ratio
method [84, 85] (see section 2.1.3). By recording < Me (E ) > rather than a histogram
of visited states, it is likely that the variance of the estimator of subensemble probability is
reduced; the acceptance ratio method records not only if a transition would be accepted or
rejected but also, in eect, how much it would be rejected by. However, because no transitions
are made, we lose the ability to connect with high-temperature subensembles to speed up the
decorrelation of the system.
Just as for multicanonical sampling, the expansion of the accessible volume of conguration
space leads to a long random walk time for its exploration. However the analysis is somewhat
simpler in this case, since (as we saw in equation 2.19) the quantities we are interested in here
are expressed directly as the ratio of the measured probabilities of being at the two ends of
the chain of subensembles. In section 3.4, we shall investigate how the length of the random
walk aects the error of the method, addressing in particular an argument that accuracy can
be improved by subdividing the chain of subensembles [130, 24].
The question of parallel updating once again arises, but here is generally less of a restriction.
Just as in section 2.2.2 it was found that, for the multicanonical ensemble, those updates that
aect the preweighted variable must be carried out serially, so here the updates that change
the subensemble must be carried out serially. However, in the expanded ensemble, they could
not be carried out any other way, since the energy function or temperature is naturally a global
property. Within a particular subensemble, we may perform whatever parallel updates of the
particles' coordinates would be possible in a Boltzmann sampling simulation.
2.2.4 Valleau's Density-Scaling Monte Carlo
This is another application of a non-Boltzmann sampling technique which enables the estimation
of canonical averages of uid systems accurately over a wide ranges of a variable, in this case
density, by sampling in a single simulation all relevant parts of conguration space. Free energy
CHAPTER 2. REVIEW
78
dierences between the average canonical densities within the sampled range are also obtained.
The method is described in [134]. It is argued that a good way of parameterising the sampled
distribution (at least for a hard sphere uid) is to choose the measure Y () = Y (snn ()), where
snn is the distance between the pair of particles closest to each other in the simulation, i.e.
snn = minij (sij ). sN are the reduced position vectors, so si = ri =L where rN are the `real'
positions and L is the length of the side of the box. In the canonical ensemble we do not expect
rnn to depend much on the density , because of the short-range repulsive forces, which are
very strong and increase rapidly as interparticle distance decreases, and therefore snn 1=3 .
Suppose we are interested in the dierence in free energy between ensembles at 1 and 2 . By
selecting a suitable form for Y , we can make sure that that snn covers the range from snn;1
to snn;2 where snn;1 is a typical value in the canonical ensemble at 1 and similarly for snn;2
at 2 . We therefore look for P DS (snn ) constant for the range of interest. If 1 and 2 are
representative of dierent phases, then the method is oering a way to connect the phases
directly, like multicanonical sampling with a variable order parameter. Also like multicanonical
sampling, canonical averages can be recovered by reweighting.
As well as the hard sphere uid, the method is applied to the primitive model Coulombic
uid, which has a spherical hard core plus Coulomb forces; there are equal numbers of +1 and
1 charged ions. A slightly dierent sampled distribution is used here; in order to sample
those congurations which are still important when weighted by exp( E (sN )) a distribution
Y (sN ) = w(snn )(sN ) is used, where is a function chosen to ensure that an appropriate range
of energies is sampled.
For the hard sphere uid results are presented, gathered from only 4 DS simulations, that
cover the range 0:4 < 3 < 0:9. Excess free energy, excess pressure and also the pair correlation
function g12 (r) are measured and found to be in very close agreement with analytic results. For
the reduced primitive uid extensive results are given for both 1 : 1 and 2 : 2 electrolytes covering
excess free energy, internal energy, excess osmotic coecient, mean ionic activity coecient and
like and unlike pair correlation functions. The ranges of density are 0:05 < 3 < 0:31 for 1 : 1
and 0:02 < 3 < 0:30 for 2 : 2, both ranges covered by 3 DS simulations. The number
of particles N = 108 throughout. Results match those of other canonical/grand canonical
simulations, and seem better for high densities when those can suer ergodic problems.
In a second paper [135], the `Coulombic Phase Transition,' the low-temperature low-density
phase separation of a uid of charged spheres is studied.
CHAPTER 2. REVIEW
79
Advantages and Disadvantages
Though it is not presented as such, this is a similar method to multicanonical sampling; Y (sN ) =
w(snn )(sN ) is used. It seems that a sampled distribution of the form
Y DS (sN ) = exp( E (sN )) exp((snn ))
could well be appropriate, i.e. a multicanonical distribution weighted not in but in snn . This
shows the main innovation of the method to be the use of snn rather than (or V ) itself as
the parameter that controls the non-Boltzmann weight of a conguration and thus extends the
range of sampled densities. It is not clear what the relative merits of the two approaches are,
though one feels that there is useful physical insight in Valleau's identication of hard-core
overlaps as a crucial factor controlling the variation of the density.
Probably the most serious disadvantage of the method as presented is that Valleau gives no
systematic procedure for evolving Y DS but uses physically motivated tting functions containing snn , the hard core radius, L and some t parameters. From a series of short initial runs,
these can be set to give a suitable sampled distribution, at least for the fairly small systems
studied here. Aside from this, advantages and disadvantages seem to be as for the general
umbrella sampling/multicanonical methods.
2.2.5 The Dynamical Ensemble
This method was introduced by Gerling and Huller in [136]. The system of interest (an L2
Potts model in [136]) is regarded as being coupled not to an innite heat bath, as is the case
with the canonical ensemble, but to a nite bath (they consider a 2d ideal gas of N particles,
where N = L2 ) whose density of states function is known. The combined system has constant
energy ETOT , so the probability that the Potts model has energy Ep is
P dyn(Ep ) / p (Ep )
bath (ETOT Ep )
/ p (Ep )(ETOT Ep )(N 2)=2
(2.21)
(2.22)
Because bath is known, the bath need not be simulated directly, but its eect can be included
by sampling from a non-Boltzmann distribution of energy: the Metropolis algorithm can be
CHAPTER 2. REVIEW
used with
80
P dyn(1 ! 2 ) = ETOT Ep (2 ) (N 2)=2
P dyn(2 ! 1 )
ETOT Ep (1 )
Thus it falls into the non-Boltzmann sampling framework, though its physical motivation is
clearer when it is described as above.
As for the canonical ensemble, only a small volume of phase space is sampled at a time, and
so it is necessary to do several simulations at dierent values of ETOT , measuring < Ep >dyn.
Then, using all of these, the entropy s(Ep ) is tted by least squares to
dyn )dEp
< Ep >dyn= REPp Pdyn(E(E)pdE
p p
R
with
P dyn(Ep ) / exp[Ns(Ep )](ETOT Ep )(N 2)=2
After this, canonical averages and Z may be found easily.
Gerling and Huller make measurements on the 4, 5 and 8-state Potts model near the transition point, nding that they can discern the transition from a continuous to a rst-order
transition much more easily than by conventional methods. In [137] the method is applied to
the 103 3d Ising model and the critical exponents and are measured with fair accuracy even
from this small system.
Advantages and Disadvantages
It is not immediately apparent from the above why the dynamical ensemble is better than the
canonical; however its sampled distribution does confer important advantages. Firstly, and
most signicantly, while Pcan (E ) is doubly-peaked at a rst order phase transition, P dyn (E )
remains singly peaked, so removing the diculty of tunnelling. Secondly, nite-size eects are
much smaller because the heat bath and the system of interest have similar sizes. Thirdly, we
nd canonical quantities (< E >can and Z ) in this method by a kind of laplace transform of the
tted s(E ). This is in fact extremely numerically stable, so that good accuracy is obtained even
when there are appreciable errors in s(E ). The reverse Laplace transform, which would give
s(E ) (or (E )) from < E >can is of course extremely un stable, which is another manifestation
of the fact that free energy measurements require information about (E ) at for a wider range of
E 's than can be obtained from a Boltzmann sampling simulation at one. The only disadvantage
CHAPTER 2. REVIEW
81
to the method seems to be the necessity of doing a series of simulations; it is not limited to
lattice-based systems.
The method does, then, seem to be very attractive, though it has only very recently appeared
and so has not been widely applied. It is not clear that its major advantage|the removal of
the free energy barrier between the phases|would necessarily remain if it were applied to other
systems (such as o-lattice systems); if it did the method would certainly be extremely useful.
2.2.6 Grand Canonical Monte-Carlo
In a sense, this method is really a kind of Boltzmann sampling, but it is most convenient to treat
it in this section, as the varying number of particles gives it special properties. It was introduced
in [138], and is applicable to o-lattice uid-like systems. Like the constant-NpT ensemble, it
allows the density of the system to vary; but this is achieved by varying the number of particles
in the simulation box, N , rather than the box's volume. This is an attractive method because
the Gibbs free energy becomes one of the input parameters ( = G=N for particles of a single
type) and so the need to measure it is removed. It is necessary to measure the pressure p, which
may be done using equation 2.4.
The partition function simulated is
ZV T =
1
X
N =1
(N !) 1 3N exp(N )
Z
V
exp( E (N ))dN
which we do by making the usual particle moves and also trying particle insertions/deletions;
see [47, 48] or [45, Chapter 4] for a derivation of the required expressions for acceptance of these
moves. As for Widom's method, we may replace by using the ideal gas equation of state, an
approach that was rst adopted in [139].
For the correct value of for the temperature of the simulation, the method should allow
direct simulation of coexistence (the system should move back and forth between densities
characteristic of the two coexisting phases). However, because the sampling is still from the
Boltzmann distribution, albeit in the Grand Canonical Ensemble this time, the problem of the
low probability of the interfacial states remains, and eectively prevents tunnelling between the
two phases, just as we described for Boltzmann sampling MC with a variable order parameter in
section 1.2.3. In practice, the method would be used more easily to treat each phase separately.
The problem this time would be the equalisation of pressures between the two phases, rather
CHAPTER 2. REVIEW
82
than the equalisation of free energies. We would simulate one phase for a variety of densities, by
varying and measuring p and , then repeat the whole thing for the other phase, for the same
set of values of , looking for the pair of -points where the pressures equalise. Once a single
coexistence point has been found it would probably be easier to use Gibbs-Duhem integration,
as we explained in section 2.1.1
Advantages and Disadvantages
The most serious disadvantage is clearly that the problem of the interfacial region is not overcome; tunnelling between the two phases is still suppressed, so the two phases cannot be simulated together. However, given this, the method has the attraction that the need to measure free
energy is replaced by the simpler problem (for potentials without singularities) of measuring
the pressure.
The use of the method is limited only by the density of the system; for dense uids acceptance
of particle insertions/removals becomes very low because of the likelihood of hard-core overlaps,
and in the solid phase the method fails altogether, because the insertion or removal of a particle
disrupts the crystal lattice. The NpT -ensemble is better for the simulation of solids. We should
also note that nite-size eects are large (though well-understood; see [140] where the method
was applied to a critical 2d Lennard-Jones uid) and the method seems to be unusually sensitive
to the quality of the random numbers used.
There is a clear similarity between this method and Widom's, which measures the acceptance
ratio for a single particle insertion without actually performing it. Both methods work well for
the same kinds of system (uids of low to medium density). It should be noted that care
must be taken to avoid a situation where the acceptance ratio of particle insertions/deletions
appears to be high, but in fact a particle is simply being removed, leaving a vacancy, then
replaced in the same density. This is best avoided by equilibration moves between the particle
insertions/deletions.
2.2.7 The Gibbs Ensemble
This method was introduced only recently [141, 142, 143, 144] but has already gained a great
deal of popularity (see [145] and references therein). Its use is restricted to phase equilibria
of uid systems, but given this restriction it has the great advantage that both p and are
guaranteed to be the same in the two phases, so that a coexistence point on the phase diagram
CHAPTER 2. REVIEW
83
is found immediately, without the need to search laboriously for the values of the control
parameters that produce equilibrium, as is the case with TI and most other methods.
In the Gibbs ensemble two phases are simulated simultaneously. They do not coexist in
the same simulation volume, but are still in a sense kept in thermodynamic contact. This
thus avoids the necessity of creating an interface between them, with the concomitant ergodic
problems of crossing it. The way this is achieved is as follows: two simulation `boxes' are
used, of volumes V1 and V2 , and volume-changing MC moves of both boxes are made as for
constant-NpT simulation, but the total volume V = V1 + V2 is constrained to be constant.
This ensures equality of pressure in the two phases. Similarly, the number of particles in
each volume is not constant, but N = N1 + N2 is constant, so we move particles from one
simulation box to the other. This ensures equality of chemical potential. As well as this we
move particles around in each simulation box in normal MC fashion. The statistical mechanics
of the system and the accept/reject procedure for the various kinds of moves are described in
[144] and [48]. Equilibration normally causes one of the simulation boxes to come to contain
the dense phase, and the other the rare phase, and the pressures and chemical potentials of the
two boxes equalise. Note that although pressure and chemical potential are the same in the
two phases, we do not immediately know what they are because the net pV and N terms
for a volume-change or particle-swap move are zero for the two boxes together. They must be
measured separately, the pressure by using equation 2.4 and by a version of Widom's method.
See [144] for details.
Advantages and Disadvantages
The method is attractive because it is very easy to investigate the coexistence curve once the
initial rather complex coding has been done. The method becomes dicult to apply in two
regions: One is at the critical point, where the `dense phase' and `rare phase' change identities
regularly and (more seriously) one cannot apply the nite-size scaling theory necessary to correct
the results of Monte-Carlo studies of critical phenomena [140]. The other is at low temperatures
where one phase is very dense and the other is very rare. As well as requiring a large total
volume, this makes the particle-swapping moves into and out of the dense phase dicult. We
should mention that, as with Grand Canonical Ensemble simulations, it is easy to be misled into
thinking that an good acceptance ratio of particle-swapping moves being achieved when in fact
the same particle is being moved back and forth between the same vacant sites. Nevertheless,
CHAPTER 2. REVIEW
84
the Gibbs ensemble is extremely attractive for most non-critical uid-uid phase equilibrium
problems, for which it seems to have made previous methods obsolete.
2.3 Other Methods
2.3.1 Coincidence Counting
This method was developed by Ma [146], and provides a way of estimating the density of states
(E ) for all states with appreciable P can (E ). To do this, we generate a set of Nc congurations,
of which N (E ) have energy E , and measure nx(E ), the number of coincidences|the number
of times we get the same microstate more than once. Now suppose we generate one more
conguration. If the probability of another coincidence is PE (hit) then clearly
< nx(E ) >N (E)+1= (nx (E ) + 1)PE (hit) + (nx (E ))PE (miss)
If we may assume that the set of points N (E ) is randomly distributed in the set (E ) (which
implies that we must not sample the trajectory at intervals shorter than the relaxation time),
then PE (hit) = N (E )=
(E ) and PE (miss) = 1 N (E )=
(E ). Substituting these we nd
< nx (E ) >N (E)+1= nx(E ) + N (E )=
(E )
and, averaging over values of nx(E ), (the angle brackets now denote an average over all N (E )
steps, not just the previous one)
< nx(E ) >N (E)+1 =< nx(E ) >N (E) +N (E )=
(E )
Taking the ansatz
< nx (E ) >N (E)= N (E )(N (E ) 1)=2
(E )
(2.23)
we proceed by induction:
< nx (E ) >N (E)+1 = N (E )(N (E ) 1)=2
(E ) + N (E )=
(E ) = (N (E ) + 1)N (E )=2
(E )
which has the correct form. It remains only to check the case N (E ) = 2, for which we indeed nd
CHAPTER 2. REVIEW
85
< nx (E ) >= 1 1=
(E ) + 0 (
(E ) 1)=
(E ) = 1=
(E ) = N (E )(N (E ) 1)=2
(E )jN (E)=2
completing the proof. To apply the method, we use our measured nx (E ) to approximate
< nx(E ) > and then equation 2.23 gives (E ). In [146], the estimates of (E ) are used to
estimate the total entropy S ( ); it is straightforward to show from G = E TS (with no
external eld), and the denitions of G and < E > in equations 1.4 and 1.6 et seq., that this is
given by
S ( ) = kB
X
E
P can (E ) ln(P can (E )=
(E ))
(2.24)
In [146] the method was demonstrated for a small 1D Ising model, and it has since been
applied to a lattice model of an entangled polymer [147].
Advantages and Disadvantages
p
To get a good estimate of (E ) we need nx (E ) > 1 , implying N (E ) > (E ); however (E )
grows so fast with increasing system size that this requirement soon becomes impossible to
satisfy: (E ) exp(aLds(E )=kB ) so the required N (E ) exp(aLds(E )=2kB ). (Nevertheless,
note that this is a much less stringent criterion than N (E ) > (E ), which would be necessary
if we attempted to nd (E ) by measuring directly the probability of a particular microstate.
In this case N (E )direct exp(Ld s(E )=kB ), exponential growth at twice the rate). Although it
is ultimately limited by the exponential growth of N (E ), Ma's method may be good enough
to enable us to measure the entropy of subsystems of the simulation that are large enough to
be independent (ie their size exceeds the correlation length ) so that we can get the total
P
P
entropy by combining the subentropies in a simple way: S A SA + A;B SAB where
SAB = SA+B SA SB . It is clearly most suitable for systems where the increase in entropy
with system size is comparatively slow (i.e. the prefactor a in the exponential is quite small).
The entangled polymer studied in [147] falls into this category because the entanglements reduce
the accessible volume of phase space.
2.3.2 Local States Methods
First used by Meirovich [148], these have been developed further by Schlijper et. al. [149].
We measure a local entropy Slo (L) dened as in eqn. 2.24 but with the sum taken over small
sublattices (clusters) of linear size L embedded in a larger simulation, typically 64x64 in 2d. L
is normally chosen to be 2,3 or 4; increasing it improves accuracy but increases the computer
CHAPTER 2. REVIEW
86
time and memory required exponentially. Since the clusters are small we can use eqn. 2.24
directly. The bulk entropy density s is found by extrapolations of L dSlo (L) using the cluster
variation method (CVM) [150]. The accuracy of the method depends on the rate of convergence
of L dSlo (L) as L increases; it is thus worst near c. Meirovich calculated S for the fcc Ising
antiferromagnet, and compared with integration [151], local states seems appreciably better in
high magnetic eld but shows no improvement in low eld.
Schlijper et. al. have improved the method by supplementing the CVM calculations with a
Markovian calculation of s based on the dierence of the cluster entropies of two stepped (d 1)dimensional clusters with one having one more site at the step than the other. This second
calculation provides an upper bound on s whereas the CVM gives a lower bound; Schlijper et.
al. therefore take the average of the two as their best estimate of s and half the dierence as a
rigorous bound for the intrinsic error. They present results for the 2d and 3d Ising models and
the 3d 3-state Potts model and claim an impressive accuracy.
Advantages and Disadvantages
Because of its reliance on the CVM for extrapolation, the method is applicable only to a
restricted class of models, lattice models with translational invariance. This rules out large
classes of interesting systems (e.g. uids, spin glasses). The CVM is also not a transparent
method and requires considerable mathematical eort to apply. Having said this, high accuracy
is available where the method can be applied, and the existence of both upper and lower bounds
on the entropy is attractive.
2.3.3 Rickman and Philpot's Methods
In this method [152] the problem of the measurement of the free energy of a solid is tackled
using the idea of measuring the probability of a single microstate, in this case the ground state,
in which all the particles are xed on their lattice sites. This probability is far too small to
measure, so instead MC simulation is used to measure the probability < () > that all the
particles are inside spheres of radius about their lattice sites, then extrapolate to nd < (0) >
p
by tting to the < () > data a function of the form fit = [(2= ) (3=2; 2 )]N where N
is the number of particles and (x; y) the incomplete gamma function. They give results for
the Lennard-Jones solid and claim an accuracy of 0.3%, although the agreement with results
of integration is only 1%. Their method is attractive but is rendered dubious the very complex
CHAPTER 2. REVIEW
87
form of fit . (Although they do have physical grounds for chosing such a tting function.)
More recently Rickman working with Srolovitz has devised another method [153], which also
falls into this category. It works only for lattice models with discrete conguration spaces, and
is described and tested for the 2d Ising model. The idea is to estimate (E ) by dening some
large subset (E ) which can be easily identied and enumerated by fairly simple combinatoric
methods: for example, those congurations which consist of n isolated spins in a `sea' of spins
pointing in the opposite direction (so these have an energy E = 8n with coupling J = 1). We
then run a normal Boltzmann sampling algorithm and test whether or not each conguration
falls into set (E ). The fraction that do, 0 (E ), gives us an estimate of (E ) from 0 (E ) =
(E )=
(E ). From a single (E ), the partition function for the temperature of simulation can
be estimated; from a set of them Z ( ) for a range of temperatures can be found.
The method gives very high accuracy (0:02% from 108 congurations) for the admittedly
unarduous task of nding G(; H = 0) for the 6 6 Ising model. Lattices up to 10 10 are
studied; the method is limited by the tendency of 0 to become immeasurably small for any
suitable set that is both enumerable and can be easily identied. Extension to other lattice
models or to F (M ) is fairly obvious.
2.3.4 The Partitioning Method of Bhanot
et al.
This method [154], like multistage sampling, concentrates on nding the density of states function (E ) and uses overlapping distributions to do so. We begin from a `simple sampling'
MC algorithm, (see section 1.2.2), which generates congurations with a p.d.f. proportional to
(E ). This time, however, we partition the allowed energy spectrum into narrow, overlapping
blocks. In one MC run we start the simulation in one of the blocks and constrain the energy to
remain within it by rejecting all MC moves that would take us outside. The block is `narrow'
in the sense that all energies within it appear with appreciable probability in the MC run, but
it must also be wide enough that we can in principle reach any conguration in the block from
any other. We then repeat the process for all the blocks. Within each block we have obtained
the relative probabilities of the dierent energies and because the blocks overlap we can combine results of adjacent blocks and normalise, eventually obtaining the absolute probability of
any energy in an unconstrained simulation. Knowledge of (E0 ) or TOT then gives us the
density of states function and F follows. (As do all other thermodynamic quantities). There
is a cumulative error in overlapping large numbers of probability distributions but nevertheless
CHAPTER 2. REVIEW
88
Carter and Bhanot obtain high accuracy, comparable with any other method, in their simulations. The 3d Ising model is treated in reference [154] and the method has had several other
applications, for example to the phase transition in the nite-temperature SU (2) lattice gauge
theory [155].
We note that this method is somewhat dicult to classify; we have put with the `direct'
methods because (E ) is measured directly, but we have already said that it resembles multistage sampling, while the use of constraints and the absence of Boltzmann weighting would
justify classifying it in the non-Boltzmann methods.
Advantages and Disadvantages
The method is applicable to o-lattice systems and is simple conceptually and algorithmically|
because we do not need to calculate Boltzmann factors, there is a substantial increase in the
speed for a single update compared with multistage sampling. However, the necessity of choosing narrow blocks means that we deliberately make worse the ergodic problems with metastable
states that we know occur around phase transitions. This is not a problem with the Ising model,
for which it can be shown that as long as the blocks contain at least four energy macrostates
ergodicity is guaranteed, but might be expected to be so with more physically realistic models.
2.4 Discussion
We can see that a great many MC methods of measuring free energy have been tried. Figure
2.1 below is an attempt to group conceptually related methods, to simplify the task of seeing
the connections between them.
This gure requires some explanation, because even those methods that we have classied
dierently frequently have points of resemblance to one another. In fact a single classication
is impossible, because the measurement of free energies and the nding of phase boundaries
is a process which involves a several dierent issues, not a single one, and one method may
resemble a second method in the way it tackles one issue and a third in the way it tackles
another. Realising this enables us to analyse and classify the methods more easily and also
to envisage possible improved methods which might combine the strong points of dierent
approaches.
First it is convenient to deal with those methods, discussed mostly in section 2.3, that stand
CHAPTER 2. REVIEW
89
Free Energy & Entropy
Non-Boltzmann Methods
Direct Measurement of
Probability of State
Coincidence Counting
Local states
Rickman & Philpot’s Methods
Umbrella Sampling
Integration-Perturbation Methods
Overlapping Canonical Ensembles
Multicanonical Ensemble
Thermodynamic Integration
Density-Scaling
Histogram Methods
Partitioning
Multistage Sampling
Acceptance Ratio Method
Dynamical Ensemble
}
Expanded Ensemble
Mon’s Method
Particle Insertion
Grand Canonical Monte Carlo
Gibbs Ensemble
Figure 2.1. A possible grouping of methods of measuring free energy. The three main groupings reect the three groups into which methods have been classied in this chapter, while the
dotted boxes show connections between methods in dierent groups; withing the dotted box,
the most closely related methods are arranged side by side.
out from the mainstream in that they attempt to tackle the problem of measuring a free energy
in a direct way, using the fact that if we could measure the probability of the appearance of a
single microstate, or a group of degenerate microstates whose degeneracy we knew, we would
know the partition function by P can () = Z 1 exp[ E ()]. These include Ma's coincidence
counting method [146], the local states methods [148, 149], and Rickman and Philpot's method
for estimating the probability of the ground state [152]. Because of the exponentially large
number of microstates, these methods usually rely on ancillary extrapolation procedures and
in any case do not work well for large systems. However, we should mention that, at least for
lattice models, the multicanonical method can be applied in a way that puts it in this group:
we can measure the probability of a single macrostate in the multicanonical ensemble directly
without the need to extrapolate, and then calculate from this the corresponding probability in
the canonical ensemble. If, therefore, we choose a macrostate that consists of only one or two
microstates (for example, the ground state of the Ising model), then Z can be calculated. We
describe this variant of the method in more detail in section 3.1.2.
Now let us turn to the remainder of the methods, which are attempts to deal with one or
both of the problems identied in section 1.2.3. The most obvious way to categorise them is
CHAPTER 2. REVIEW
90
by way of the algorithm employed (by which we really mean the sampled distribution here).
First there are those methods, that employ simply the NV T (or sometimes NpT ) canonical
ensemble in largely unmodied form|thermodynamic integration, multistage sampling and
the acceptance ratio method. These are put in the central `box' of gure 2.1. Because of the
unsuitability of the Boltzmann distribution to direct free energy calculations, these methods rely
on using several independent Boltzmann sampling simulations, which between them cover all the
necessary states in conguration space. These methods are most naturally applied to each phase
separately, connecting the state of interest to a state of known free energy. They do not handle
the direct connection of the two phases well, because the Boltzmann distribution for states in
the interfacial region is slow to equilibrate and extremely sensitive to minute variations in the
control parameters that push the system from one phase to another. Under some circumstances
ergodic problems may also aect the estimates of < E ( ) > in those simulations that are run
at low temperatures, or < p(V ) > (see equation 2.4) in those simulations that are run at high
density.
The dierences between these methods themselves lie less in the nature of the Monte Carlo
process|one could imagine them producing and using the same Markov chain of congurations|
than in what they measure from the congurations generated and the estimators of the free
energies that they extract from the measurements. In TI a canonical average is measured for
each stage which is integrated to give a free energy. In Multistage sampling a p.d.f. is measured
and overlapped between the stages (or extrapolated and overlapped if Bennett's extension is
being used), while in the acceptance ratio method the transition probability between the ensembles of each stage is measured. This classication by estimators is the second way that
methods of free energy measurement can be classied, and is to some extent separate from
the classication by algorithm, though the choice of algorithm usually has implications for a
sensible choice of estimators: we would not necessarily choose the spacing (in temperature or
whatever) of the series of canonical simulations in the same way for TI and multistage sampling.
All the methods are fairly easy to implement|TI perhaps the easiest|and for all of them the
behaviour of Monte-Carlo errors is similar: the total error is obtained by combining estimated
errors from within each ensemble, as described in section 2.1.1. We can also include Widom's
particle insertion method and Mon's method in this group. Widom's method is like the acceptance ratio method between two canonical ensembles diering by one in the number of particles
present, while Mon's method is like multistage sampling, with an unusual but intelligent choice
CHAPTER 2. REVIEW
91
of `dierence' between the canonical ensembles that means rstly, that the exponential whose
average must be taken grows only slowly with system size, and secondly, that the nite-size
corrections to the limiting behaviour of the free energy are obtained directly.
To continue with the classication by algorithm, the second group, put in the right-hand
`box' of gure 2.1, can broadly be called non-canonical methods. Unlike the methods of the
previous group, they allow direct connection of the two phases. In Grand Canonical Monte
Carlo (V T ensemble) it is unnecessary to measure since it is input, but the does suer from
the presence of the interface, necessitating a search in to obtain equal values of p, which must
be measured, in both phases. The Gibbs ensemble elegantly avoids the problem of sampling
across through the interface region and, as we have said, is probably the method of choice
where it is applicable. At least for lattice models, the shape of the p.d.f. of the Dynamical
ensemble may also oer a way round the problem of low sampling probability in the interface
region. And the two methods that will most concern us in this thesis also both fall into this
group: the expanded ensemble and the multicanonical ensemble. By expanding the volume
of conguration space accessible to the simulation, they both enable the determination of free
energies in a single simulation. They may be applied either to connect two phases directly, or
to nd the absolute free energy of each. The free energies are obtained from MC estimates of
the probabilities of the macrostates/subensembles, reweighted to correct for the non-Boltzmann
sampling. There is an obvious similarity between these two methods, and we can in fact put
them both into the same framework, though we shall wait until section 3.4.1 to do this.
These methods are relatively good at overcoming ergodic problems because they do not
consist of a series of Boltzmann simulations, each conned to a narrow range of macrostates;
they have the ability to reach macrostates of high energy or low density, where decorrelation of
congurations is rapid. Of course, ergodic problems are to some extent an inevitable feature of
Monte-Carlo simulation; one can never be sure that equilibration is complete and that there is
not some `hidden' region of phase space that has not yet been found. Nevertheless, the situation
here is obviously better than in the case of TI and the other methods of the rst group. The
total error of the process is also obtained more simply than for the multistage methods; the
standard blocking methods that work for a single canonical ensemble can be used.
Though we have classied them above as using both dierent algorithms and dierent estimators, there is an obvious kinship between some of the `multistage' methods and some of
CHAPTER 2. REVIEW
92
the `non-Boltzmann' methods, as we have shown in the gure by connecting them with a dotted rectangle; in particular, the acceptance ratio method bears an obvious similarity to the
expanded ensemble in the case where each of the `subensembles' of the latter corresponds to a
separate stage of the acceptance ratio method: the dierence is that whereas we merely measure
the transition probability of a trial move in the acceptance ratio method, we actually perform
the transition in the expanded ensemble. The issue is slightly obscured because the estimator
of the probability of the `subensembles' is dierent in the two cases as well|< Me (E ) >
in one case and the histogram of visits to the subensemble in the other. In the same way, grand
canonical Monte-Carlo and the particle insertion method are related: the rst method actually
performs the transitions whose probability is recorded in the second.
The partial separation we have eected between the algorithm used and the estimators of free
energy dened on the congurations produced enables one not only to understand more clearly
the plethora of methods in the literature but also to think of combinations of the two that
have not previously been tried. For example, we could employ the expanded ensemble method,
making transitions between the subensembles, but record < Me (E ) > and use that as
the estimator of the probability of a state. Indeed it is to be expected from Bennett's analysis
[84] that the variance of this estimator would be slightly lower, because it contains information
on how much each transition is accepted or rejected by. This information is discarded by using
only the histogram. Thus, by this method we would keep the advantages of using the expanded
ensemble (reduction of ergodic problems) but combine them with some of the virtues of the
multistage methods. Or we could envisage measuring < E >can in a multicanonical simulation,
then evaluating free energies by integrating as in TI. This combination will be tried in section
3.3.1.
We should point out that almost all the methods we have discussed are united in the need
to face the problem of exploring in some way a large volume of conguration space, whether
they tackle it by doing one or a series of simulations: whereas all congurations within a single
phase are in a sense `similar,' this is not the case with the congurations in two dierent
phases, or the congurations in one of the phases of interest and those characteristic of the
high temperature/low density limit which can serve as a reference system. Now, the metropolis
algorithm permits only a small change in conguration at each update step, if the acceptance
ratio is to remain reasonably high (in the expanded ensemble context this corresponds to a small
change in temperature/energy function). Therefore, to sample both phases or to connect to the
CHAPTER 2. REVIEW
93
reference state, the conguration must be changed completely in kind by the accumulation of
small perturbations. It is this that makes the free energy problem especially, and to some extent
unescapably, demanding. Even though this transformation of the conguration is not explicitly
carried out in methods like multistage sampling, the requirement of overlap of distributions
means that an equivalent pathway must be opened up. This way of looking at the problem
makes it clear why Mon's nite-size scaling method gives good results, at least for lattice
models; the congurational dierence between the system with energy functions E2L and E L
is comparatively small. It also explains why TI may in the right circumstances give very good
results, since it can avoid the need to pass in small steps between very dierent congurations
(although it has its own disadvantages too).
Issues to be Investigated
In the remainder of the thesis we shall concentrate on investigations of the multicanonical
ensemble (and, to a lesser extent, the related expanded ensemble). We do not see a clear
dierence between umbrella sampling and the multicanonical ensemble, which itself is similar
to the expanded ensemble. The dierence seems largely to have been in the class of problems
to which they have been applied|condensed matter for umbrella sampling, lattice models and
lattice gauge theory for the multicanonical ensemble. Thus, the multicanonical ensemble is in
many ways more a rediscovery of the principles of umbrella sampling than a new development
in its own right. Nevertheless, this rediscovery has clearly provoked a new wave of interest,
prompted perhaps by a realisation that umbrella sampling ideas are not as limited as had been
thought by the diculty of nding a suitable sampled distribution [46]; this was, we believe, the
most important reason that the original umbrella sampling was not more widely adopted. The
advantages of the multicanonical approach are particularly clear in cases where there are very
severe ergodic problems, like spin glasses, but even for more general problems of free energy
measurement we believe that the method is made attractive by the fact that only a single
simulation needs to be performed and the free energy is obtained more transparently by direct
measurements of probability rather than through an integration process.
The most signicant disadvantage remains the diculty of nding a suitable sampled distribution. Since to generate the sampled distribution requires knowledge of the very free energies
that we are trying to measure, it must be done by an iterative process. Though ad hoc methods
that work reasonably well in practice have been introduced (see, e.g. [122]) further progress in
CHAPTER 2. REVIEW
94
this regard is required before the method can be used `o-the-shelf.' We shall make extensive
investigations of iterative methods to do this in chapter 3, applying Bayesian methods to the
problem and introducing a new method based on the use of a histogram of accepted transitions
to construct estimators of the probabilities of the states. We shall look at the application of
the method both to the single-phase and to the coexistence problems.
The quantity that controls the multicanonical error might be expected to be the random
walk time rw over the wide range of accessible states. While this is to some extent true, we
shall show that the relation of rw to the error of the nal free energy estimators is not what
might be expected. We shall also compare the eciency of the multicanonical and expanded
ensembles with that of thermodynamic integration. While developing simulation methods we
shall concentrate on applications to the 2d Ising model, but in chapter 4 we shall apply the
techniques to the simulation of a solid-solid phase coexistence in a simple model of a colloid.
Chapter 3
Multicanonical and Related
Methods
3.1 Introduction
In section 2.2.2 we gave a qualitative description of the multicanonical ensemble and reviewed
its uses in the literature. To recap, the dening characteristic of multicanonical simulations is a
sampled distribution which is more or less at over at least a part of the space of macrostates of
a chosen variable (called the `preweighted variable') of the system, which will be either internal
energy E or magnetisation M here.
Since its introduction in [58], the multicanonical ensemble's uses have included: the measurement of interfacial tensions in Ising models [58, 101, 102], Potts models [58, 103, 104, 115, 120]
and lattice gauge theories [105, 106, 107], application to the Edwards-Anderson spin glass to
measure internal energy, entropy and heat capacity [108, 109, 110, 111], the study of the 2d
4 theory (combined with multigrid methods [116, 117]), and the study of protein tertiary
structure [113, 114]. The method is reviewed by Kennedy [59] and Berg [99] and some recent
algorithmic developments are reviewed by Janke [100].
In this chapter, we shall mainly, though not exclusively, be concerned with the application of
the multicanonical ensemble to the measurement of absolute free energies. We shall rst describe
how the multicanonical ensemble may be used to do this, and then (in section 3.2) we shall
investigate ways of producing the required importance-sampling distribution, which is unknown
95
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
96
a priori. This will lead us to a new method where inferences are made from the observed
transitions made by the simulation, a method which will turn out to have other important uses.
In section 3.3 we shall then present some new results for the behaviour of the critical p.d.f. of
the magnetisation of the Ising model at high M , along with results for densities of states and
canonical averages for the 2d Ising model and a comparison with thermodynamic integration.
Finally we shall widen the discussion to include other non-Boltzmann sampling methods, such
as those of Fosdick [96], Hesselbo & Stinchcombe [97], and the expanded ensemble (section 2.2.3,
[60]). We shall expand on the comments we have already made in chapter 2 about the similarity
of this latter method to the multicanonical ensemble. We shall also present some new theory
on the variance of estimators obtained from the multicanonical/expanded ensemble. As well
as laying to rest some `folk theorems,' this will enable us to investigate the question of which
non-Boltzmann importance-sampling scheme is optimal for the measurement of a particular
quantity. All the investigations made in this chapter will be made using the 2d square nearest
neighbour Ising model with coupling constant J = 1, as described in section 1.1.2.
First, then, let us briey describe how the multicanonical ensemble will be used in this
chapter. We shall describe two ways of measuring absolute free energy by preweighting in
energy, and we shall also describe magnetisation preweighting.
3.1.1 The Multicanonical Distribution over Energy Macrostates
As we have said before (section 1.2.3), absolute free energy can in principle be calculated from
< exp[E ()] >can= TOT
Z
(3.1)
ln Z equals F or G depending on whether the ensemble we are simulating has a constant
or variable order parameter. We shall be concerned here with the case where the order parameter
(the magnetisation) is variable; ln Z = G for Ising. Now, Boltzmann sampling cannot be
used to evaluate the free energy with equation 3.1 because it gives exponentially small weight
to the high-energy congurations that dominate the expectation value in equation 3.1 (compare
O2 in gure 1.10 in section 1.2.2).
We require, then to give more weight to the high energy states. To describe how to do this,
we shall rst give a general formulation of the use of non-Boltzmann sampling distributions,
then specialise to the energy case. As we saw in chapter 1, the Metropolis algorithm can be
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
97
used to sample from any distribution over the congurations space. Suppose we sample from a
distribution with measure Y (), which we can do by accepting trial transitions from 1 to 2
with probability min(1; Y (2 )=Y (1 )). Then the expectation value of an operator O() is
O()Y ()
< O >Y = fPg Y ()
P
f g
Now, even though we have sampled from the distribution with measure Y , we can also write
expressions for averages with respect to another distribution with measure W . To see this,
consider the ratio
)W ()Y 1 ()Y ()
fg O(P
fg Y ()
P
fg Y ()
fg W ()Y 1 ()Y ()
P
fP
g O( )W ( )
=
fg W ()
< OWY 1 >Y =
< WY 1 >Y
P
P
= < O >W
(3.2)
To nd canonical averages (< O >can) when the congurations are sampled from a distribution with measure Y / P (), we substitute W = exp( E ), giving
exp( E )Y 1 >Y
< O >can= <<Oexp(
E )Y 1 >
Y
(3.3)
The notation is slightly simplied if we characterise the sampled distribution by its dierence
from the Boltzmann distribution, introducing a function (), which gives an extra weight
proportional to exp[()] to each microstate. The sampled distribution thus has measure
which gives
Y () = exp[ E ()] exp[()]
(3.4)
exp( ) >Y
< O >can = <<Oexp(
) >
(3.5)
Y
An estimator of this from a nite sample of Nc congurations is
Nc O( ) exp( ( ))
i
i
=1
< O >can e=:b: O~ = iP
Nc exp( ( ))
i
i=1
P
(3.6)
where the congurations are assumed drawn from the distribution dened by equation 3.4.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
98
Note that the right hand side of equation 3.6 provides an estimator of < O > for any (),
though as only a few choices are usable in practice; for most, the sampling probability is such
that the congurations that dominate the sums of either O() exp( ()) or exp( ()) (or
both) will be generated with innitesimal probability. The possibility of applying equation
3.3 has been appreciated since the early days of computer simulation; see Fosdick's paper of
1963 [96], an early attempt at nding an `optimal' sampled distribution. It includes as special
cases Boltzmann sampling (put Y = exp( E ), when the denominator becomes a `minimum
variance estimator', a constant for all congurations) and `simple sampling' (Y = 1). However
it has been little used for a long time, mainly because Boltzmann sampling is very successful
for most choices of O (internal energy, magnetisation) and is very simple conceptually.
Though we shall come back to more general sampled distributions in section 3.4, we shall
until then concern ourselves only with multicanonical distributions. All the equations for the
estimators O~ that we produce (equation 3.7 and so forth) are true for any values of fg in
the limit of very long sampling time, but, because of the failure to sample the important
congurations frequently, would give very poor estimators in the run-times accessible in practice.
We rst consider the case of a multicanonical distribution with energy the preweighted variable,
so only the value of the energy macrostate is relevant to determine : () = (E ()). As
we said when introducing this method in section 2.2.2, multicanonical sampling means that
the sampled distribution P xc(E ) (for energy preweighting; P xc(M ) for magnetisation etc.) of
energies extends right up to very high energies and is roughly at, as shown schematically in
gure 3.1. Such a distribution is produced by a set of coecients (E ) = xc (E ) (we shall use
`xc' to signify `multicanonical' in mathematical expressions).
Equation 3.6, rewritten as a sum of energy macrostates, and written specically for the
multicanonical ensemble, is
xc
xc
EPP~ (E )O(E ) exp[ (E )]
(3.7)
E P~ xc (E ) exp[ xc (E )]
where P~ xc(E ) are estimators of P xc (E ); the most obvious way to produce them is simply
P
to use P~ xc(E ) = C xc (E )= E C xc (E ), where C xc (E ) is the histogram of energy macrostates
visited in the multicanonical run. However, there are other ways of estimating P xc(E ). Now,
O~ =
P
P xc (E ) exp[ xc(E )] P can (E ), so the sum in the denominator is dominated by energies
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
99
can
P (E)exp( β E)
can
P (E)
E
Figure 3.1. A schematic diagram of a typical Histogram sampled from a multicanonical
distribution, and the estimates of P can(E ) and P can (E ) exp(E ) that may be recovered from
it.
around < E > , as indicated in the diagram. Conversely, for O exp(E ),
P xc (E )O(E ) exp[ xc(E )] P can (E ) exp(E ) (E )
so the sum in the numerator is dominated by the maximum of (E ), which occurs at high
energy (at E = 2L2 for the Ising model). However, since the multicanonical distribution
extends over both regions of energy space, both sums are estimated to good accuracy. Indeed,
for the multicanonical distribution shown in gure 3.1, the estimators of < O > will be accurate
for all operators O which depend only on E () (and its conjugate eld ). We get not only free
energies but also internal energies, heat capacities etc., and not only at temperature but also
at all other temperatures ^: to evaluate these we return to equation 3.3 and replace exp( E )
^ ) and (if appropriate) O by O^. This leads eventually to the following equation,
by exp( E
analogous to equation 3.7:
O~^ =
xc
xc
E P~ (E )O^ (E ) exp[( ^)E ] exp[ (E )]
P
E P~ xc (E ) exp[( ^)E ] exp[ xc (E )]
P
(3.8)
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
100
^ )
The denominator is now dominated by the maximum of P^can (E ), while, for O exp(E
(to estimate G(^)), the sum in the numerator is still dominated by the maximum of (E ).
Thus, depending on O and ^, terms from various parts of E -space will dominate the sums in
equation 3.8, but since the multicanonical distribution is at, we are sure to have sampled the
relevant part of E -space. Even if the sampled distribution is only multicanonical over a part
of E -space, then these assertions will still be true as long as O and ^ are such that all the
appreciably large terms in the sums in equation 3.8 come from the multicanonical part. We
should also note that the estimators O~ in equations 3.7 and 3.8 are ratio estimators, that is to
say they are the ratios of sums, and as such are slightly biased. A way of removing this bias is
to use double-jackknife bias-corrected estimators, which are described in appendix D.
One other operator that we shall consider explicitly is the one that allows the measurement of free energy dierences between and ^. This is OG = exp[( ^)E ]: it follows
straightforwardly from the denition of the Boltzmann distribution that
< OG >can = Z (^)=Z ( )
= exp[G( ) ^G(^)]
Substitution of this operator into equation 3.7 and consideration of the numerator and denominator reveals that an accurate estimator will be obtained provided that the peaks of P can ( )
and P can (^) are both in the multicanonical region. This operator is of interest because there
are many systems, such as uids, for which we cannot not use equation 3.1 to calculate absolute free energies because (E ) increases without limit. However, we can still use < OG > to
^ (^) is known exactly, or is calculable
estimate free energy dierences, and if ^ is such that G
to high accuracy by some approximation scheme (e.g. the virial expansion), then the absolute
free energy at can be obtained this way. Indeed, the formula 3.1 can be seen as the ^ = 0
(innite temperature) limit of < OG >, using our knowledge that lim!0G = ln TOT ,
provided TOT is nite.
As will be observed, for operators that lead to free energies or free energy dierences, there
are two widely separated regions of E -macrostate space that make important contributions,
and the rest of the multicanonical distribution does not contribute directly, but is important
only in so far as it permits tunnelling between them (necessary to nd the relative weights of
numerator and denominator in equations 3.7 and 3.8). Since in the multicanonical distribution
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
101
every macrostate is equally probable, one would expect that at each Monte-Carlo step the
probability of moving to a higher macrostate would be about the same as the probability of
moving to a lower one. This suggests that the simulation should perform a random walk through
macrostate space and therefore that e (E ) V 2 (the total number of macrostates Nm V , the
volume of the system, and the separation of the peaks of P can(E ) and (E ) scales in this way
too). This is very similar to the way that the multicanonical method reduces the tunnelling
time e between the two phases in a simulation of a rst order phase transition (see section
1.2.3); e depends only on a power of the system size instead of increasing exponentially with it
(see equation 1.26). Indeed, when the multicanonical ensemble was introduced [58] this aspect
of the algorithm's performance was particularly emphasised: e V 2:35 in [58]. However, it is
not obvious that the at multicanonical distribution is necessarily optimal, and we shall return
to the question of exactly how much weight should be put in the region between the peaks in
section 3.4.3.
We should note that, while many of the applications of the multicanonical ensemble have
been to rst-order phase transitions, [58, 101, 103, 104, 115, 116, 120], the measurement of
absolute free energies by using knowledge of TOT is referred to only in [120], where it is used
in a calculation of S (E ) = (E F (E ))=T . The overall normalisation is also used in calculations
of the degeneracy of the ground state of spin-glasses [108, 109]. The dierence in approach in
the phase transition problem arises because most authors use the multicanonical method in the
form where the free energy dierence between two phases is measured by tunnelling through the
interfacial region. Absolute free energies are not required to do this, provided that a way can
be found to directly connect the two phases. Indeed, at coexistence the free energy dierence
is zero and so all that is required is to reconstruct P can and show that the sums over the
two phases are equal. This method is not appropriate for energy preweighting of the 2d Ising
model because P can (E ) never develops a double-peak structure; however, it is appropriate for
magnetisation preweighting.
Finally let us return to the question of what 's we may regard as multicanonical. First,
we repeat that the required set xc is unknown a priori; to produce a perfectly at sampled
distribution we would need xc(E ) = F (E ) = ln P can (E ), where direct measurement of
P can (E ) gives us an estimate that is at rst indistinguishable from zero for most E -states.
Thus, to produce the multicanonical distribution implies that we need at least a partial knowledge of the very quantities we wish to measure. In practice, then, we shall never be able to
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
102
use a perfectly multicanonical distribution, but only an approximately at distribution. In
fact, all the advantages of the `ideal' multicanonical distribution remain as long as the distribution is `approximately at,' so that we obtain good sampling in all states: while it is the
case that sampling from a distribution that is only roughly at will lead to larger expected
errors than sampling from a completely at one, the dierence in the expected error bar is
only a few percent between a completely at distribution and one where we impose only that
P xc (E ) = O(P xc (E 0 )) 8 E; E 0 in the range of interest. We shall therefore regard such sampled
distributions to be multicanonical too. Where necessary, we shall use the notation (E ) for
the `ideal' multicanonical distribution in which every macrostate has exactly equal probability,
to distinguish it from xc (E ), which implies only one of many possible sampled distributions
that are close enough to multicanonical to be used as such in practice. The condition on P xc
demands that xc (E ) should dier from (E ) by terms of order unity, i.e. a constant absolute error. But we know that (at least away from criticality) = F fLd , so therefore
the fractional accuracy with which must be known to produce a multicanonical distribution
increases with increasing system size. Moreover, the set is not xed absolutely even by the
requirement that it produce a particular sampled distribution; if 1 gives a multicanonical distribution (or indeed any other sampled distribution) for a particular , then so does 2 where
1 (E ) = 2 (E ) + k 8 E , k constant. We shall adopt the convention that k is to be chosen such
that minE ((E )) = 0. Indeed, there is even less restriction than this; it will be noted that the
^ ) in equation 3.8 are independent of the
parts of E -space that dominate the estimator of exp(E
temperature of the multicanonical simulation, . This shows that can be chosen more or less
at will; if we have a multicanonical distribution produced by coecients (; E ) for one temperature then the same sampled distribution would be produced by (; E ) = ( )E + (; E )
at temperature . It would perhaps be simplest in practice to choose = 0, though this is not
what has generally been done in this thesis. We shall consider various iterative procedures for
generating a suitable xc (E ) in section 3.2.
3.1.2 An Alternative|The Ground State Method
Aside from the use of equation 3.1, there is another way that a multicanonical simulation can
give access to absolute free energy: by enabling us to measure the canonical probability of a
macrostate that contains a known number of microstates, such as the ground state E0 (which
is two-fold degenerate in the case of the 2d Ising model). We rst calculate
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
xc
xc(E )]
P can (E ) = PP P (xcE()Eexp[
) exp[ xc (E )]
E
103
(3.9)
And then use P can (E0 ) alone to determine the free energy:
G = ln Z = E0 + ln P can (E0 ) ln 0
Thus we need to know P can (E ) both at E0 and also for those macrostates near its maximum,
which will dominate the normalisation (the denominator of equation 3.9). This implies that we
require the multicanonical distribution to overlap completely with the region around < E >
and also to extend down to the ground state. This is in contrast to the previous method
where the multicanonical distribution had to extend upwards from the region around < E >
to overlap with the maximum of exp(E )P can (E ) (E ). As before free energies at other
temperatures ^ may be estimated provided that the multicanonical distribution extends to
cover the peak of P^can (E ).
Whether this technique or the one of measuring < exp(E ) > is better depends on the
algorithm and the temperature(s) of interest; to a rst approximation, if < E > is near
to E0 (as will be the case for large , i.e. low temperature) then the ground state method
will be better; if it is near to the maximum of P can (E ) exp(E ) (as for high temperature)
then the < exp(E ) > method will be better. However, the situation is complicated by the
variation of acceptance ratio over the wide range of E that we are covering; for instance the
Metropolis algorithm slows down very dramatically near the ground state of the Ising model
(the acceptance ratio decreases like 1=Ld), owing to the diculty it has in `nding' the spins
that must be ipped to `steer' the system into the single microstate of the ground state out of
the higher energy states with their exponentially large number of microstates. However, other
algorithms, like the n-fold way [156] and generalisations of it [157], can alleviate this problem.
Once again, we shall comment further on this matter in section 3.4.
In the literature, measurement of the ground state probability seems only to have been used
to nd the unknown ground state degeneracy of spin glasses [108, 109], not to give the overall
normalisation in a case where the ground state degeneracy is known, and thus the absolute free
energy.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
104
3.1.3 The Multicanonical Distribution over Magnetisation Macrostates
In the canonical ensemble (we show an external eld for generality, though in the Ising case we
shall be concerned only with H = 0)
3
2
P can(M ) = exp(HM )
X
4
f g
(M M ()) exp( E ())5 =Z ( )
= exp(HM ) exp( F (M ))=Z ( )
(3.10)
(3.11)
dening the free energy functional F (M ).
We discussed the form of P can(M ) in section 1.1.3. For H = 0, it has two peaks for
c and one (at M = 0) for < c. Except exactly at c , these are Gaussian and change
shape as L ! 1 in such a way that their width, when expressed in terms of m = M=Ld,
becomes vanishingly small. For > c the states around M = 0 correspond to mixed-phase
congurations.
By introducing xc (M ), so that
P xc() / exp[ E () + xc (M ())]
and
P xc(M ) / exp(HM ) exp[ F (M ) + xc (M )]
we may produce a multicanonical distribution, at over some range of M values. From measured
multicanonical probabilities we can then recover the canonical distribution at a value of the
applied external eld H^ dierent from the value H prevailing during the simulation, by using
xc
~ xc
^
P can (M; H^ ) = PP ~(xcM ) exp[ (H ^ H )] exp[ (xcM )]
M P (M ) exp[ (H H )] exp[ (M )]
(3.12)
where P~ xc(M ) may be estimated from C xc (M ), the histogram of visited magnetisation states
in the multicanonical distribution, or by some other means. This equation may be used to
tackle the free energy problem in the same ways as it was in the energy case. If the range of
M that is multicanonical embraces those values typical of the two coexisting phases, then we
may simulate coexistence directly; a good nite-size estimator of the innite-volume rst-order
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
105
phase transition occurs where the two peaks of P can (M; H^ ) have equal weight, i.e.
X
M 2A
P can (M; H^ ) =
X
M 2B
P can(M; H^ )
where A and B are the two phases. It is essential to determine P can (M ) indirectly, via the
multicanonical ensemble for two reasons: rst, to allow tunnelling between the peaks, which
is necessary to nd their relative weight whatever the external eld may be; and second to
allow reweighting to dierent values of H^ until we nd the one that satises the equal-weights
criterion. This method has been used to determine the phase coexistence curve of the LennardJones uid in [121], and we shall use it in section 4.3 of chapter 4. It is not needed for the
Ising model where the location of the coexistence line is determined by symmetry. It also
allows accurate measurement of P can (M ) in the interface region, which enables us to measure
interfacial tension (see the discussion of mixed states in section 1.1.3 and [58, 102, 101]).
Equation 3.12 can also be used to measure the absolute free energy of a single phase without
crossing the interface, if this is dicult for some reason (for example because, for an o-lattice
system, it would necessitate growing a crystal out of a uid). In the Ising case the absolute
free energy G is most easily found by producing a multicanonical distribution that extends all
the way to the fully saturated states at M = Ld. This has been done, for the rst time to
our knowledge, in section 3.3.3. It enables us to obtain another, very accurate, estimate of G
by the method described in section 3.1.2. We also use this distribution to study the scaling of
P can (M ) at large M . The sampled distribution used in section 3.3.3 in fact extends over all
magnetisation states, so that G for the entire system is obtained; however the free energy of just
one phase would be obtained if M were restricted to embrace just the values characteristic of
the phase. This approach would be less useful for o-lattice systems (for example) where there
is not a single saturated state at very large values of the order parameter (volume), while at very
small values the there is the close-packed crystal which has innitesimal canonical probability.
For them a method analogous to the use of equation 3.8 in section 3.1.1 could be used. We
shall keep within the Ising context to describe this method. By substituting into equation 3.11,
it is easy to show that
< exp[M (H^ H )] >can = exp[ (G(; H^ ) G(; H )]
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
106
(If the range of M is restricted, but there is appreciable canonical probability outside it, then
G(; H ) should be replaced by GA (; H ), the free energy of the phase). Just as in section 3.1.1,
to estimate this accurately we require the sampled distribution to overlap both P can (M; H^ ) and
P can (M; H^ ), and so must use the multicanonical estimator 3.12. If the state at H^ is such that
its free energy is known exactly or to a good approximation (e.g. using the virial expansion in
the case of a dilute gas), then the absolute free energy at H follows.
3.2 Techniques for Obtaining and Using the Multicanonical Ensemble
To travel hopefully is a better thing than to arrive.
FROM El Dorado
ROBERT LOUIS STEVENSON
In this section we shall be concerned with techniques relating to the multicanonical ensemble:
we shall discuss various iterative processes for the generation of the coecients xc (we use the
vector notation for succinctness and because what we are about to say applies to both (E )
and (M )), and we shall also describe a method that may make implementation of the method
ecient on parallel computers. We have devoted particular attention to the development of
quick, ecient and reliable methods for nding a usable xc, because the absence of such
methods seems to have been the principal obstacle [47, 46] to the wider application of previous
non-Boltzmann sampling methods methods such as Umbrella sampling ([95], section 2.2.1).
We present and discuss the results we have obtained from simulations using the multicanonical
ensemble in section 3.3.
The usual approach to multicanonical simulation in the literature [58, 103, 108, 111, 120] has
been to divide the application of the method into two parts: the nding of an approximately
multicanonical distribution, which is done as fast as possible, followed by a lengthy `production
run,' in which a much longer Markov chain is generated without changing the sampled distribution. Only the results of the `production run' are then used in equation 3.7 or its equivalents to
estimate the quantities of interest, with error bars coming from (jackknife) blocking. We, too,
shall divide the tasks up like this (the results of section 3.3 come from simulations implemented
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
107
in this way), though at the end of this introductory section we shall discuss further why and if
this division is really necessary.
First, then, let us discuss the generation of xc. In a real problem the sampled distribution
never will be perfectly at on macrostate space because this would imply exact knowledge of
the probabilities P can / exp( F ), which, as this expression shows, are dependent on the very
free energies that we are trying to measure. To produce the multicanonical distribution and
to measure P can (or F ) are therefore the same problem, and to solve it requires an iterative
procedure. We begin with some initial guesses (which may well be = 0, corresponding to a
Boltzmann sampling algorithm), and generate a short Markov chain. Generally, this will sample
only a small fraction of the macrostates we are interested in. In the sampled region we can
make inferences about the underlying sampled distribution from the data. We then use these
inferences to generate a new sampled distribution, which will be approximately multicanonical
in the sampled region, while outside we increase the sampling probability so that the next data
histogram will be a little wider. Then we repeat the process, hopefully getting closer and closer
to the multicanonical distribution.
Let us formalise this a little. We wish to nd xc approximating to for the Nm macrostates
of a system. We shall denote the iteration number by a superscript n and the ith macrostate by
a subscript i, i = 1 : : : Nm , e.g. Cin for the nth histogram of visited states (very few expressions
contain anything `raised-to-a-power', so this is seldom ambiguous). The macrostates could be
of energy or magnetisation. We are going to make inferences about P n , the (unknown) `true'
macrostate probabilities in the nth sampled distribution, generated by n , on the basis of `data'
gathered by a Monte Carlo (MC) procedure constructed to sample from P n . The data do not
determine P n exactly, because of the eect of noise and because, at least at rst, many states
are not sampled at all. The best way to try to treat this problem, which implicitly handles the
problem of distinguishing the `signal,' due to P n , from the `noise,' is to use Bayesian Probability
Theory [14, 158, 159, 160, 161], where probabilities describe our state of knowledge about
quantities, so that constant but unknown parameters like P n may be assigned probabilities.
In this case, what we obtain from the data is P (P n ), a probability density function of P n .
In the `frequentist' interpretation of probability theory, where probabilities have meaning only
in so far as they express the expected frequency in a long series of trials, an expression like
P (P n ) is not admissible. However, it is now fairly well established [14] that the Bayesian and
frequentist formulations make almost exactly the same predictions where both are applicable,
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
108
while there are many situations|and, as we shall see, this is one of them|where the Bayesian
interpretation is the more powerful.
To return to the main thrust of the argument: P (P n ) is determined by the data according
to Bayes' Theorem [160], which is, after the nth set of data has been gathered,
P (P n j H; D1 Dn ) = R
P (P n j H; D1 Dn 1 )P (Dn j H; P n ; D1 Dn 1 )
P (P n ) j H; D1 Dn 1 )P (Dn j P n ; H; D1 Dn 1 )dNm P n
(3.13)
Here
H represents the knowledge, as expressed by equation 3.4 and its magnetic analogue, of how
n is related to P n .
Dn represents the `data', which consists of either the visited states or recorded transitions
of the Markov chain and the set n that produced them.
P (P n ) j H; D1 Dn 1 ) is the prior probability distribution of P n before the data Dn have
been considered.
P (D j P n ; H; D1 Dn 1 ) is the likelihood function, the probability that the observed data
are produced given that a particular set of values of the parameters P n holds. To calculate the
likelihood we must generally assume a model.
P (P n j H; D1 Dn ) is the posterior probability distribution of the parameters P n including
the eect of the data Dn .
From the posterior p.d.f. we generate estimators P~ n of the true P n . The mean (though it
may not in practice be calculable) is one obvious possible estimator; others are the mode and
median. The width of the p.d.f. gives us a measure of the uncertainty in the estimator. We
p
expect this width to be of the order of 1= C i because the stochastic nature of MC sampling
p
produces uctuations in the histogram C n of size O( C i ), which cannot be distinguished from
uctuations of the same size due to the true structure of the sampled distribution.
We then use the estimator, however dened, to generate the next sampled distribution. The
obvious way to do this (though as we shall see there may be better alternatives) is to put
in+1 = in ln P~in + k
(3.14)
where k is an arbitrary constant, which we choose so that the minimum of is zero. This
corresponds to sampling from a distribution P n+1 , obtained by setting
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
109
Pin+1 / Pin =P~in
(3.15)
P n+1 would indeed be exactly multicanonical if P~ n = P n . In practice, we never reach this
situation because of the random errors in the measurements, but we can expect P n+1 to be
closer to the multicanonical distribution than was P n ; it is shown in [51] that this algorithm
converges almost surely.
The reader may be wondering why we have written Bayes' theorem for the sampled distribution P n , when P can is our real interest. We do this because it is P n that determines the
data Dn , so that the likelihood is `naturally' expressed as a function of P n (as we shall see in
section 3.2.1). It is of course possible to write an expression relating P (P can ) to P (P n ): just
as in one dimension we would write
P (x) = P (y(x))dy=dx
so here we can write
@Pin P (P can j H; D1 Dn ) = P (P n j H; D1 Dn ) mod @P
can
j (3.16)
where
can exp( n )
i
Pin (P can ) = PNPmi can
P
exp(
kn )
k=1 k
However, that fact that all the Pkcan enter the expression for each Pin , most of them in
the denominator, makes the transformation 3.16 algebraically very complex, even if we make
simplifying approximations like using a uniform prior and a simple model of the likelihood
function (for example one neglecting correlations). Indeed, given that Nm, the dimensionality
of the problem, usually is at least O(100), the RHS of equation 3.16 cannot be handled either
analytically or numerically and so the transformation cannot be performed exactly (though
see section 3.2.2 below). Certainly it is impossible to integrate the function to nd expectation
values of P can , etc. Instead we are forced to make our inferences about the sampled distribution
P n , obtaining estimators P~ n , and then to use
~n
in )
P~ican (P~ n ) = PNPmi exp(
n
n
k=1 P~k exp( k )
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
110
which is the Nm -dimensional analogue of using x(< y >) as an estimator of < x >.
Moreover, the same diculty in making a transformation from one set of P 's to another
aects us even if we conne ourselves to inferences about P n . On the rst iteration (n = 1) we
may have no information about the sampled distribution (which is P can in this case). Therefore
it is appropriate to choose a uniform prior, P (P can j H) =constant, in which case the posterior
will depend only on the likelihood. On every subsequent iteration, however, we have prior
information about P n which comes from the posterior of the previous sampled distribution
P n 1 . But to get between P (P n 1 ) and P (P n ) involves just the same transformation as
equation 3.16; in terms of the variables P n 1 and P n it is
n 1
P (P n j H; D1 Dn 1 ) = P (P n 1 j H; D1 Dn 1 ) mod @P@Pi n j
where
(3.17)
~n 1 n
Pin 1 (P n ) = PN(Pmi ~ n)P1i n
k=1 (Pk )Pk
Once again the algebraic complexity of this expression prevents its being calculated, and we
seem to nd ourselves forced to use a uniform prior on P n at each stage. We thereby discard
much of the information from the previous iterations, which inuence the current P (P n ) only
by the choice of P n itself (through n ). With a unform prior at each stage, Bayes' Theorem
as given in equation 3.13 reduces to
P (P n j H; Dn ) / P (Dn j H; P n )
(3.18)
(Note that no approximation is involved in rewriting the likelihood function just as P (Dn j
H; P n ): D1 Dn 1 are irrelevant to Dn given P n ).
What disadvantages result from having to dispense with the informative prior? At least
initially (n small) the posterior changes rapidly with n and the new likelihood will be much
narrower than the prior over most of the macrostate space (as we start to sample regions we
previously had to guess about). In this case it makes little dierence to approximate the prior
as uniform and base the inference only on the likelihood function. However, as we converge
towards the multicanonical distribution, the sampled distribution changes little between iterations. Thus, if we keep Nc a constant, the prior is as narrow as or narrower than the likelihood,
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
111
so we throw away a lot of information by disregarding it, and convergence stops once the difference between the sampled distribution and the `true' multicanonical one has come down to
the order of the random uctuations in the likelihood, which are inevitably incurred in the
simulation process. Indeed, if Nc is too small a large uctuation may throw us a long way away
from P xc. Recently [122] an ad hoc renement of this method has been proposed, which uses
the histograms of all previous iterations, each contributing with a weight inversely dependent
on the size of local uctuations. The easiest way to skirt this problem, though, is by increasing
the length Ncn of the Markov chain generated at each iteration of the convergence process, at
rst in order to compensate for the increasing number of sampled macrostates, and later to
keep smooth convergence, and to minimise the eect of the waste of previous information. The
eventual move to the `production run' can be seen as the nal limit of this. On the other hand,
it is clearly true that increasing Nc at lot when we are still some way from a multicanonical
distribution wastes computer time that we would like to devote to the production run. There
is thus a scheduling problem of deciding on a suitable initial Nc , deciding when and by how
much to increase it later, and nally deciding on when to move to the `production run'. This is
usually found to need some initial experimenting, and even when a scheme is found that does
seem to converge smoothly a certain amount of human monitoring is required, though quite a
lot of progress has been made [122] on automating the procedure.
By using the Bayesian framework we have started to set up above, we have made some
signicant progress in incorporating prior information to stabilise the algorithm and in putting
the ad hoc `nding' methods that are employed on a rmer footing. This will be described in
sections 3.2.1 and 3.2.2 in the context of inferences made using the observed visited macrostates
of the chain as data. However, with this visited-states method, it it may be the case that the
`nding' stage inescapably gets increasingly lengthy, for example for large system sizes, so that
it consumes a large part of the total available computer time. This problem can only really be
alleviated by the choice of a more ecient `nding' algorithm, and we have also contributed here
through the development of a method, to our knowledge new, that converges very rapidly to
something close to the multicanonical distribution by making inferences based on the observed
transitions between macrostates made by the system (section 3.2.3).
Before embarking on a description of these techniques, we shall return briey to the question
of whether the division between `nding' and `using' the 's is really necessary. This somewhat
inelegant strategy is, in fact, forced upon us by a failure of P~ n to be a good estimator of P n
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
112
outside the sampled region, and by the algebraic diculties we have in handling expressions
like equations 3.16 or 3.17. If these expressions were tractable, and could be integrated, then
the whole simulation would become a single continuous process of narrowing P (P can ). Then
we would nally need some way of transforming the p.d.f. of P can into a p.d.f. of the required
P
estimators < O >= i Oi Pican, from which we could nd a mean and standard error (the
standard error is particularly problematic as it requires treatment of the correlations between
the estimates of the components of P can ). The methods we describe in section 3.2.1 go some
way towards a solution of the rst problem, but we do not really tackle the second, though other
recent work has addressed closely related matters [116, 119]. Signicant further work is required
before the division between the `nding' stage and the `production run' can be removed.
3.2.1 Methods Using Visited States
In this section we will consider inferences made from what is perhaps the `obvious' choice of
`data,' the visited macrostates of the Markov chain (We shall thus call this the visited states, or
VS, method). Suppose that Nc congurations are sampled from the Markov chain, with sampling occurring at wide intervals so that successive congurations are eectively uncorrelated.
The likelihood function for the data, the histogram of visited states Cin , will then be multinomial (multivariate binomial) with Nm 1 independent variables P n (The Nm th is determined
by the normalisation). If we keep the assumption of a uniform prior, then by Bayes' Theorem
(in the form of equation 3.18)
P (P n j H; D1 Dn ) =
Nm 1 (P n )Cin (1 PNm 1 P n )CNn m
i
i=1
k=1 k
R QN
m 1 (P n )Cin (1 PNm 1 P n )CNn m dNm 1 P n
i
k=1 k
R i=1
Q
(3.19)
P
P
subject to Ci = Nc, and where the domain of integration R is such that Ni=1m 1 Pin 1.
The combinatoric factors that would be present in the multinomial likelihood have disappeared
in the normalisation.
Several possible estimators of P n can be used. The simplest is the maximum likelihood
estimator (MLE), the set of values of P n most likely to have produced the given data. It is
dened by the equations
@P (Dn jP n ) @P n ~ n = 0
i
P MLE
These are easily solved for the multinomial distribution, using a Lagrange multiplier for the
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
113
normalisation; the result is the intuitively obvious one P~ nMLE = C n =Nc, leading to
n+1 = n ln C n + k
This updating scheme is used in [98], [120]; the frequentist interpretation of probability used
in these references leads naturally to maximum likelihood estimators (the unknown parameters
can only be considered as having one `true' value, which is most naturally taken to be the
one which is most likely given the data). Clearly the scheme does not work where Cin = 0,
which happens extremely frequently in the early iterations (there are many macrostates that
the Boltzmann sampling algorithm does not visit). For these states it would imply in+1 = 1.
We could try to x this by dening a norm kn+1 n k and setting a bound on the magnitude
that it may have (suggested in [51]), or by the method, adopted by Lee [120], of leaving i
unaltered (except for the arbitrary constant k) if Ci = 0, i.e. behaving as if Ci = 1 in this case.
However, with the multinomial approximation for the likelihood and using Bayesian inference, it is possible to do better than this: we can evaluate the expectation values
P~ nAV =< P n >:
n
P~AV;j
=
R
n QNm 1 n C n
RR PjQ i=1 (Pi ) i (1
Nm 1 n C n
R i=1 (Pi ) i (1
= (Cj + 1)=(Nc + Nm )
Nm 1 P n )CNn m dNm 1 P n
k=1 k
PNm 1
n
n Cn
k=1 Pk ) Nm dNm 1 P
P
(3.20)
(3.21)
which leads to in+1 = in ln(Cin + 1) + k; we note that for Cin =0 this gives the same
updating scheme as was introduced arbitrarily by Lee. For other states it gives slightly dierent
estimators, but the dierence is negligible if Cin is large. The eect of this updating, whether
using AV or (appropriately xed) MLE estimators, is to decrease the probability in the sampled
region (by a factor of Ci +1) and thus, since the probabilities must add to one, to increase it by a
uniform factor in the non-sampled region. However, the true P n almost certainly decreases, as
least for a while, as we move away from the sampled region, though we do not know purely from
C n whether it decrease monotonically or has other local maxima elsewhere. The extent of this
decrease is generally many orders of magnitude, so in the non-sampled region the true value of
n except at the very edge of C n , since P~ n eectively
P n is much lower than the estimate P~AV;j
AV;j
assigns one count to non-sampled states. The result of this is that on the next iteration the
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
114
sampled region widens slightly, becoming roughly multicanonical (at) over that part that was
sampled before, and extending a little further into the wings. The n + 1th histogram tends to
have large uctuations in the states that were at the edge of the nth, because the poor statistics
at the edges of C n tend to produce an which is inaccurate here. However, these get smoothed
away on subsequent iterations. The convergence is fairly smooth (as long as we increase Nc to
compensate for the increasing number of sampled macrostates).
We should contrast this behaviour with that expected if the true value of P n in the wings is
larger than the estimate. We have found that this is likely to happen if, despite having very little
evidence about P n far away from the sampled region, we attempt to get faster convergence by
tting some function to the part of n+1 that comes from the sampled region and extrapolating
it. In that case P n+1 may put a great deal, possibly almost all, its weight in the wings and
C n+1 may become separated from C n . Convergence then becomes irregular and awkward, with
the latest frequently needing to be discarded and a return made to earlier ones (though linear
extrapolations seem to be relatively safe in this regard). We shall discuss this further at the
start of section 3.2.3.
3.2.2 Incorporating Prior Information
Let us return to the idea of incorporating the prior p.d.f. As we discussed above, using a
uniform prior for P n at each stage does not accurately reect the condence we have in P n as
a result of the earlier iterations; indeed we do not have any real measure of the error in P n or
n . We could evaluate the variance of P n+1 using an expression similar to equation 3.21, but
since this implicitly neglects correlations, which aect the variance much more than they do
the mean, it is unlikely to be accurate.
We have tried to build in at least a measure of the `ideal' iterative scheme in which P (P can )
is narrowed continually. To do this we have avoided the diculties of changing variables (see
equation 3.16 et. seq.) by using a dierent data-driven (but still Bayesian) strategy to estimate
the the p.d.f. of the transformed variable of interest.
Rather than looking at P n , or even P can , let us consider P ( ), (we shall justify this choice
below) where , introduced in section 3.1.1, is that set of preweighting coecients that would
give an exactly multicanonical distribution:
P / exp( F + ) = constant
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
115
i.e.
= F + constant
=
ln(P can ) + constant
We do not use the notation xc since this also embraces all the `nearly multicanonical'
distributions.
Bayes' Theorem now becomes
1
n 1
n
n 1
1
P ( j H; D1 Dn ) = R P ( j H; 1D nD 1 )P (nD jH; ; 1D nD 1 N) m (3.22)
P ( j H; D D )P (D j ; H; D D )d We now choose to model the p.d.f of as a multivariate Normal distribution with the
covariance terms set to zero1, i.e., after the collection of data Dn ,
P (i j H; D1 Dn ) exp[ (i in+1 )2 =(2(in+1 )2 )]
It is our desire to use a Normal model that has led us, on phenomenological grounds, to consider rather than P can ; the latter will have an asymmetric p.d.f. (because of the constraint
Pican > 08i) which will not be well approximated by a Gaussian. The same applies to P n .
We expect that P ( ) will be better-behaved. We have (n + 1) superscripts on the parameters
n+1 and n+1 because n+1 , being the mean of P ( j H; D1 Dn ), is the set of weight
factors that we shall use to generate the (n + 1)'th sampled distribution.
This parameterisation takes care of the prior and posterior in equation 3.22; however the
likelihood function still remains. We cannot transform a multinomial likelihood (expressed
naturally in terms of P n ) without encountering the diculties of equation 3.16. To proceed,
then, we estimate the p.d.f. of by jackknife blocking the data histogram at each iteration,
nding the expectation value of P n for each block, then transforming these expectation values
to give a series of estimators of the variable of interest, , whose distribution outlines the
shape of P ( )
1 this is an ad hoc choice chosen to make the problem computationally more tractable: it will cause us some
diculties below
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
116
To see in detail how this works, we rst note that
P (Dn j H; ; D1 Dn 1 ) = P (Dn j H; )
because, (as we stated when discussing equation 3.18), D1 Dn 1 are irrelevant to Dn given
.
Thus, we can apply Bayes' theorem once more:
P ( j H; Dn ) = P (Dn j H; )P ( j H)=P (Dn j H)
evidently we may take P ( j H) to be uniform, which gives
P (Dn j H; ) / P ( j H; Dn )
n
c . We
We model P ( j H; Dn ) by another Normal distribution, parameterised by ^ n and estimate the parameters simply by jackknife blocking the recorded histogram into m = 1 NJ ,
NJ = O(10) pieces, generating a n;m from each block, and measuring their mean ^ n and
n
c . Thus we avoid the change-of-variable problem.
variance Putting all this into equation 3.22, we arrive at
n
c )2 )]
exp[ (i in+1 )2 =(2(in+1 )2 )] / exp[ (i in )2 =(2(in )2 )] exp[ (i ^in )2 =(2(
i
which implies
and
n
c ) 2
(in+1 ) 2 = (in ) 2 + (
i
(3.23)
n
c ) 2
in+1 = (in+1 )2 [(in ) 2 in + (
^in ]
i
(3.24)
Thus, n+1 is an average of its previous value, n , and the simple estimate of its new value
obtained from equation 3.14, ^ n , with the two combined according to their estimated variance.
The variance itself is always reduced, as we would expect since we are adding new information.
This is a more systematic way of trying to build in the results of previous iterations than one
based simply on the magnitude of Cin , as is given in [122], and it is likely to be more accurate
because the eect of correlations in the sampling process is implicitly included. Note, moreover,
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
117
that the basic idea|that of creating a `Monte-Carlo sample' from the unknown p.d.f.|is not
bound to any particular algorithm for sampling P n or to any model for the likelihood, so we
can apply it to other methods too.
Having said this, we have also found that there are several caveats attached to its use in
practice, as we shall now describe. First, in order to bootstrap the technique we assume 1 is
large so that 2 depends only on D1 . We also have to be careful with our policy of having the
normalisation min() = 0. This normalisation should not be applied to each n;m individually,
since if min( n;m) falls at the same macrostate each time, we would otherwise estimate a
variance of zero for this state. However it is best to set min(^n ) to zero before combining it
with n , so that over those regions that both sampled the two 's will be approximately equal.
n
c (see below).
This reduces the eect of uctuations in n
c
Moreover, the estimate of is itself subject to fairly large random errors, which need
careful treatment or they will spoil the estimator. Suppose that over some range of states ^ n
and n are separated fairly widely. Then, as we see from equation 3.24, random uctuations in
n
c (and indeed in n ) will pull the estimator n+1 back and forth between n and ^ n . We
can thus end up with a n+1 that is far less smooth than either n or ^ n . This is presumably
a consequence of neglecting the covariance terms in the Normal model of the p.d.f. of ;
including them would serve to force some smoothness on the function as a whole. However, to
do this would make the updating process much more complicated and time-consuming, because
equations 3.24 and 3.23 would become matrix equations involving matrix inversion. Therefore
n
c by locally
in practice we have adopted an ad hoc solution to the problem. First, we smooth averaging it. This is found to alleviate the problem in the region where both the n 1'th
and n'th iterations produced counts. However, we must also treat those regions where neither
iteration produced counts, and where the n 1'th iteration did not but the n'th did. At rst,
we tried simply assuming some arbitrary large variance in unsampled states (given that we
adopt the technique of averaging the n;m without setting the minimum to zero, we produce an
n
c of zero in the unsampled states, which clearly needs some xing). This was found
estimated i
to work well in the `newly sampled' region where the n 1'th iteration does not produce counts
but the n'th does, producing a n+1 which depends almost entirely on ^ n , as we would desire.
However, in the region which is still unsampled, this scheme leads to n+1 = (n + ^ n )=2, since
n
c . We thus found that n+1 tended to increase up to the edge of C n , and then fall
ni = back to a lower value again. This severely slowed down the desired spreading of the sampled
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
118
region. The problem was solved by recording the location of the edge of C n at each stage, and
putting n+1 = ^ n in both the newly-sampled and unsampled regions (i.e. anywhere past the
edge of C n 1 ).
Although it is found that using the average over m of n;m as ^ n gives perfectly adequate
convergence, it is found that, with the same total time devoted to each iteration, it spreads
into the non-sampled region a little more slowly than does the `nave' scheme of simply using a
uniform prior at each stage and updating with equations 3.14 and 3.21 alone. This is because
min(Ci + 1) = 1 is a larger fraction of the total counts in one jackknifed histogram than it is of
the total counts in the single histogram of the nave method: C n;m contains (NJ 1) NAV
0 = NJ NAV . The maximum change in that may occur is
counts while C n contains NAV
therefore a little larger in the nave case. To achieve the same spreading rate we could spend
NJ =(NJ 1) more time on the method with prior, but in fact we choose to use ^ n n;0 ,
which is the estimator dened on all the pooled data from the n'th iteration (and thus identical
to the nave estimator). The eect of the use of the prior is thus only on the way that ^ n is
combined with previous estimators of the weighting function.
To summarise, then, the method as we have applied it is:
0. Start with 1 = 0, and estimate 1 to be some large constant.
1. record m = 1 : : : NJ = O(10) histograms Cin;m
2. for each one, set
in;m = in ln(Cin;m + 1)
n
c from the n;m .
3. calculate 4. calculate ^ n n;0 dened on all the data from the n'th iteration.
5. calculate n+1 and n+1 using equations 3.24 and 3.23, with the caveats mentioned above
(use only ^ n in the regions where only iteration n + 1 samples, or where no iteration has
yet sampled).
6. if the distribution does not extend over all the macrostates of interest, then return to 1;
otherwise stop.
To illustrate this iterative scheme we shall examine the energy preweighting of the L = 8
and L = 16 Ising model starting from a canonical simulation at = 0:55. We wish to extend
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
119
the sampled distribution up to the states characteristic of = 0 to use the method of section
3.1.1 to nd G. This is only for purposes of demonstration, since with inverse temperature
= 0:55 the ground state already has signicant weight for these small systems, indeed is the
most probable macrostate for L = 8, so G(0:55) could be found directly.
To begin with, let us consider the L = 8 system in the case where NJ = 6 histograms are
gathered at each iteration, and we choose Nc such that NAV = 50, where NAV = Nc =Ld = the
number of counts expected per sub-histogram per energy state in the multicanonical distribution, sampling once per lattice sweep. In gure 3.2 we show the rst three histograms (in fact
the average over the six produced at each stage) and the weighting functions 2;3;4 that they
produce ( 1 = 0).
In the gures, we plot the estimates of produced at the end of every iteration, labelled
with their iteration number. We show only the bottom half of the energy range, up to the state
E2L = 0 which corresponds to the mean of Pcan
=0 (E ). At higher energies, we use (E ) = (E =
0) + E , which ensures that the multicanonical distribution matches the shape of Pcan
=0 (E ) but
does not extend to very high energy states, which have signicant canonical weight only for
antiferromagnetic spin-spin interactions.
As we see, the rst histogram extends up to E 64. is quite well determined immediately over most of this range (there is very little dierence between 2 and subsequent 's).
It becomes rather less accurate around E = 64 (because very few visits to these states were
recorded so the fractional uctuations are larger), and at higher energies is changed from its
original value only by the constant we add to keep min = 0. The second histogram is then
roughly at, as we expect for a multicanonical sampled distribution, up to about E = 64, then
it falls to zero at E = 32. There are some uctuations around E = 64, with some states
having appreciably more than their multicanonical probability, due to the uctuations in the
tail of the previous histogram having passed into the weighting function. As we might expect
3 therefore extends up to E = 32 before going at; it is now extremely close to the true for 128 < E < 64, where the second histogram obtained good statistics in all states, and
fairly close it up to E = 32, though once again with larger uctuations where few counts were
recorded. The third histogram C 3 , is then more or less at up to E = 32, the point where C 2
cut o, and then extends up to E = 16, but again the last few states have poor statistics, so 4
is not well-determined for them. The process can clearly be repeated, extending to higher and
higher energies, until n has stopped changing apart from small random uctuations (which
2
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
120
600
histogram 1
histogram 2
histogram 3
400
C
200
0
-128
-96
-64
-32
0
-32
0
E
30.0
η2
20.0
η3
η
η4
10.0
0.0
-128
-96
-64
E
Figure 3.2. Initial Convergence (rst three iterations) of the weighting function n using the
visited states method for Ising = 0:55; L = 8. The upper gure shows the histograms C
produced at each stage and the lower the resulting 's.
is the point where we would move to a longer run with a xed in the `nding|production'
scheme).
We show the later progress of the iterative scheme (iterations 1{7 , 10 , 15 and 20) in gure
3.3. It is apparent that, at least on the scale of the whole gure, convergence has occurred by
the fth iteration (we have produced a usable xc ). Examination of the inset shows that the
uctuations in are small after this, though it is not clear that convergence continues.
Now let us examine how using the Normal model of with an informative prior compares
with simply using a uniform prior at each iteration, updating n using equation 3.21 alone. In
gure 3.4 we show the results of such a `nave' run. We used NAV = 300 at each stage so that
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
121
30.0
12.0
20.0
η
5
4
11.0
η
3
10.0
-50
-49
-48
-47
-46
-45
E
10.0
0.0
-128
2
-96
-64
-32
0
E
Figure 3.3. Main gure: convergence of the weighting function n using the VS method for
Ising = 0:55, L = 8, NAV = 50, 6 blocks per iteration, iterations 1{20. Inset: detail of
50 < E < 45.
the same total time would be spent on each iteration as was the case before.
It is apparent from the main gures that there is nothing to choose between the two methods
as regards their speed of convergence|the speed with which they move into the unsampled
region; this is as we would expect, because the `tweaks' we have given the informative-prior
method have rendered it almost identical to the nave method over for these states. The
dierence only begins to become apparent when examining the insets. If a uniform prior is
used, uctuations in in the sampled region are larger and persist, rather than dying away
as they do in gure 3.3. This is shown more clearly in gure 3.5, where we have plotted the
dierence between n and final for the E = 8 macrostate. Thus, our method of incorporating
prior information yields improvements in the the smoothness of convergence and goes part of
the way to removing the necessity of switching to a `production' run, since the error in becomes continually smaller even though we continue updating it, implying a convergence of our
knowledge of F . We should note that the normal model is only accurate when we are already
close to the multicanonical limit|it dramatically underestimates in the early iterations,
because the uctuations in n;m are reduced in size by the updating using ln(Ci + 1) and do
not reect the real uncertainty in in the non-sampled region (though this is unimportant for
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
122
30.0
5
12.0
20.0
η
4
11.0
η
3
10.0
-50
-49
-48
-47
-46
-45
E
10.0
0.0
-128
2
-96
-64
-32
0
E
Figure 3.4. Main gure: convergence of the weighting function n using the VS method for
Ising = 0:55, L = 8, NAV = 300, 1 block per iteration, uniform prior used in updating.
Iterations 1{20. Inset: detail of 50 < E < 45.
updating, since we do not make use of in this region).
A similar pattern emerges in the L = 16 case. The sampled distribution widens gradually and
fairly smoothly, extending to higher and higher energies, until the multicanonical distribution
is reached. Once again, we present results for iterative schemes using both a normal model for
(gure 3.6) and a uniform prior (gure 3.7). In the latter case uctuations in n persist,
but if the prior is incorporated they die away once we are close to the multicanonical limit,
just as in the L = 8 case. This is shown by the insets in gures 3.6 and 3.7, and (more
clearly) by the approach of (E = 112) to its nal value, plotted in gure 3.8. However, the
most serious disadvantage of the visited-states method, its slow convergence for all save very
small systems, is now becoming apparent. The most that any i can change by in any one
iteration is max = maxi (in+1 in ) = ln(maxi (Ci )); thus the greatest possible change is
max = ln(Nc ), and more typically max ln(NAV ). This is not a large change considering
that generally scales like Ld, and the method becomes tedious even for simple systems like
the Ising model when the range of to be covered is > 100. The use of an informative prior
does not help because the problem is the extension of n into the unsampled region, where
we do not have any prior information to incorporate. We see that with L = 16 we require 15
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
123
20.0
NAV=300, one block/iteration
NAV=50, 6 blocks/iteration
0.4
0.3
η final − ηn
ηfinal − η n
15.0
10.0
0.2
0.1
0.0
-0.1
-0.2
-0.3
5.0
-0.4
5
10
15
20
Iteration n
0.0
5
10
15
20
Iteration n
Figure 3.5. final n for for E = 8 using the VS method for Ising = 0:55, L = 8,
NAV = 50, 1 and 6 blocks per iteration, uniform (line) and normal (triangles with error bars)
priors used in updating. Insert: detail of later iterations.
iterations to get as close to as we were after 4 iterations of the L = 8 system, while we
abandoned an application of the method to an L = 32 system, which was not near convergence
even after running for several days on a workstation.
We shall now make some remarks on the scaling of the method with system size, and on how
we should choose NAV . The analysis above shows that in 2 iterations the maximum change
in we expect to produce is 2 ln NAV , while if we use the same time to do just one iteration,
then we shall produce a max of only ln 2NAV . Thus, if this were the only consideration,
the fastest convergence would be produced with NAV = 2. However, this neglects the eect of
random errors, which we must be able to distinguish with sucient accuracy from uctuations
in the histogram due to the true structure of the underlying sampled distribution. Even for
p
the uncorrelated case, the random errors will generally be of size NAV . We thus need as a
bare minimum an NAV which is large enough that NAV1=2 1, which explains our choice of
NAV = 50. In fact, we nd that convergence is just about maintained, for the very small L = 8
system and using prior information, with NAV = 10, but NAV = 2 is impossibly small. For
larger systems the lower limit on NAV is enforced by the requirement that the simulation must
make several random walks over all the macrostates accessible to it, even though in a single spin
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
124
120.0
54.0
100.0
12-21
52.0
80.0
η
η
9
15
50.0
48.0
60.0
8
46.0
-160
-150
-140
10
-130
-120
E
40.0
5
20.0
2
0.0
-512
-384
-256
-128
0
E
Figure 3.6. Main gure: convergence of the weighting function n using the VS method for
Ising = 0:55, L = 16, NAV = 50, 6 blocks per iteration, iterations 1{22. Inset: detail of
160 < E < 120.
update it is only able to move from E to E 1 or E 2. The number of accessible macrostates is of
the order of Ld, at least when we have moved some way towards the multicanonical distribution,
so a simple random walk argument implies Nc L2d and therefore NAV Ld . Moreover, since
Ld, we shall require O(Ld =d ln L) iterations. Neglecting logarithmic corrections, the total
time for this method to converge thus scales like L3d = L6 for the 2d Ising model.
To summarise, then, this simple method of producing the sampled distribution provides
`slow but sure' convergence which is suitable for small systems where the weighting function
varies over only a few tens of orders of magnitude. Our Bayesian analysis of the problem
has claried the procedure of updating the sampled distribution, and enables us to combine
information from dierent iterations the sampled region, but does not serve to speed up the
slow convergence of n to .
It is interesting to note, particularly in relation to the slowness of convergence, that our
equation 3.21 was rst derived by Laplace in 1775. It has apparently become `disreputable' (for
which reason, perhaps, it is not to be found in [158, 159, 14]), precisely because of the high
probability it assigns to events that are known to be possible but not observed (which leads
in our application to the slow spreading into the non-sampled region). In certain applications
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
125
120.0
54.0
100.0
52.0
80.0
η
η
9
10-21
15
50.0
48.0
8
60.0
10
46.0
-160
-150
-140
-130
-120
E
40.0
5
20.0
2
0.0
-512
-384
-256
-128
0
E
Figure 3.7. Main gure: convergence of the weighting function n using the VS method for
Ising = 0:55, L = 16, NAV = 300, 1 block per iteration, uniform prior used in updating.
Iterations 1{22. Inset: detail of 160 < E < 120.
this can lead to counter-intuitive predictions. We became aware of this (after completing the
work of this chapter) through [162], in which a dierent, though still Bayesian, formulation of
the problem (starting with a uniform prior on all `strings' of congurations of length Nc rather
than on the unknown state probabilities) is used to arrive at a result which resolves some of the
counter-intuitive cases, and performs demonstrably better in a test on data-compression. This
result is identical to equation 3.21 in the case where all macrostates are visited but generally
gives a much smaller probability to unvisited states; in our case we would assign P~ n (E ) =
O(1=Nc2 ) where (C (E ) = 0). This would lead to a maximum change in i of ln(Nc2 ) = 2 ln(Nc ),
so convergence (if it remained uniform) would require about half as many iterations. However,
it also appears that this choice might well lead to an overestimate of P n+1 in the states just
past the the edges of the histogram C n , with consequent slowing of convergence. In any case,
the poor scaling with L of the time for convergence would remain the same. Another method
is still required for all but small systems.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
126
60.0
NAV=300, one block/iteration
NAV=50, 6 blocks/iteration
1.5
40.0
η final - η
η final - η n
1.0
20.0
0.5
0.0
10
15
20
Iteration n
0.0
0
5
10
15
20
Iteration n
Figure 3.8. final n for E = 112 using the VS method for Ising = 0:55, L = 16,
NAV = 50, 1 and 6 blocks per iteration, uniform (dashed line) and normal (triangles with error
bars) priors used in updating. Inset: detail of later iterations.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
127
3.2.3 Methods Using Transitions
It is apparent that the major problem with the above method of evolving the multicanonical
distribution is that large areas away from the central peak of the Boltzmann distribution are
initially not sampled at all and we cannot make reliable inferences about them; for example we
have seen that assuming a multinomial form for the likelihood and evaluating < P n > leads to
an assignment of a constant probability in the non-sampled region, when it is clear physically
that P n will decrease for at least some distance away from the sampled peak. (There may be
other peaks lying some distance away.) Convergence is thus rather slow.
We can try to increase the speed of convergence by tting a function to P~ n or to n+1 from
the sampled region and extrapolating it into the wings of the distribution. This way we make
some use of our knowledge about the likely shape of P n that is unused by the VS method of
section 3.2.1. However, as we discussed there, the distance of extrapolation cannot be made
very large because of the danger of heavily underestimating P n in the unsampled region. If
this happens, the next sampled distribution may then put almost all its weight in this region,
and another extrapolation, if made with a uniform prior and extending into the originallysampled region, will then result in the loss of all the information that we have built up there.
Convergence therefore becomes irregular and awkward, with the latest frequently needing to
be discarded and a return made to earlier ones.
Linear extrapolation is fairly safe, because P can will usually have a negative second derivative
and so will be smaller than P~extrap (thus n+1 < ). It still needs to be combined with
constraints on the distance of extrapolation, though, to avoid problems with subsidiary maxima
in other regions of macrostate space (where P can 's second derivative is not negative). There is
still some danger of overestimating because the gradient must be estimated from the points
near the edge of the sampled region, where statistics will inevitably be poor. The prescription
suggested in [108] is to choose some cut-o state near, but not too near, the bottom of the
sampled histogram and extrapolate from there, so that we are fairly sure that P n+1 will not be
too large. The choice of the cut-o can either be made by hand [108] or automatically from the
size of the histogram, combined with a strategy for `retreating' from a bad extrapolation [122].
Linear extrapolation with various ad hoc constraints has been successfully applied by several
authors [108, 98, 130] and found to improve appreciably the speed with which methods based
on visited states extend the region that they sample.
Despite the good performance of some extrapolation methods, we shall here describe the
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
128
results of a dierent approach to the problem. It would be very appealing if, instead of simply
trying to make better inferences about states we have not sampled, we could sample all parts of
the macrostate space immediately. We have developed a method to do this by using inference
based on the recorded transitions of the Markov chain, rather than the visited states. We shall
call this the transition probability (TP) method.
To our knowledge this method is new (it is not to be confused with Bennett's acceptance
ratio method [84]), though in very recent work on an expanded ensemble-like simulation by
Kerler et al. [133] it is also recognised that the observed transitions oer useful information
about the sampled distribution.
Initially the system is prepared in a macrostate with a low canonical probability, such as
the ground state. Unless we have already reached a multicanonical distribution, this state
has extremely low probability. When we make MC updates, the system therefore moves away
from this state until it is moving randomly among its equilibrium macrostates for the present
sampled distribution. The process resembles the equilibration of a normal simulation. At each
MC step we record in a histogram Cijn the transition performed between energy macrostate
i before the step and the macrostate j after it. (Rejected trial moves and accepted moves
that do not change the macrostate are recorded alike in Ciin .) We then repeat the process, if
necessary, for a release from an unlikely state at the other end of the macrostate space. Then
the entire procedure is repeated until the array of recorded transitions is reasonably full. We
used an ad hoc criterion based on a parameter NTP to decide this: C n is deemed full when
Ci;in +1 + Cin+1;i NTP for all i. Now suppose the transition probabilities between macrostates
in the sampled distribution are nij , i.e. nij P (i ! j j i; P n ). We can use Cijn to give an
P
estimator of this. The maximum likelihood estimator is ~nMLE;ij = Cijn = j Cijn , which as before
needs xing if Cijn = 0 for macrostates between which transitions are allowed. We preferred
to use ~nAV;ij , for which (by a calculation similar to that in equation 3.21, based on a uniform
prior on nij for allowed transitions) it can be shown [163] that
~nAV;ij = (Cijn + 1)=
X
(Cijn + 1)
(3.25)
j
This expression requires no xing for the case Cijn = 0, though like 3.21 it is obtained by
assuming a uniform prior for nij and so does not contain information from earlier iterations.
Before we proceed to obtain an estimator of P n itself we shall discuss the circumstances under
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
129
which is legitimate to consider the matrix of transitions between macrostates as describing a
Markov process, and how it can be related to the matrix of transitions between microstates,
which is the true determiner of the microscopic dynamics of the system and thus, ultimately,
of the dynamics of the transitions between macrostates.
Transitions Between Macrostates as a Markov Process
Let us label the microstates with r; s and the macrostates with i; j , so we write Prn and Pin
( P n ) to mean the equilibrium state probabilities under the prevailing sampled distribution2 .
Thus
Prn = (1=Z ) exp[ E (r) + n (r)]
and
Pin = (1=Z ) exp[ Fi + in ]
The particular set of microstates in macrostate i we shall write as r 2 i. We assume that the
macrostates partition the microstates exhaustively and uniquely.
Then the transition matrix for the macrostates is, at time t,
nij (t) = P (i ! j ji; n )(t)
=
XX
=
XX
r2i s2j
r2i s2j
P (r ! sjr; n )P n (rji)(t)
P n (rji)(t)nrs
where nrs is the transition matrix for the microstates, which is not time-dependent.
We would like nij to dene a simple (non time-dependent) Markov process. In general, this
P
will only happen if s2j nrs =constant 8r 2 i. In that case it does not matter what P n (rji)(t)
is, and we have what is called a mergeable process [164, chapter 1.4]|the microstructure of
each macrostate is completely irrelevant to the behaviour of the macrostates. In our case this
condition will not be satised. However, let us suppose that the underlying process is in a
sort of `local equilibrium' in the sense that P n (rji) is constant with time at its equilibrium
value (given by the Boltzmann distribution, since does not aect the relative equilibrium
2
Note that this is dierent from the notation used in section 1.2.2, where i and j were used for microstates.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
130
probabilities within each macrostate). Then we can regard nij as dening a Markov process,
and we get simply
P n (rji)(t) P n (rji) = Prn =Pin
and
nij =
XX
r2i s2j
(3.26)
P n (rji)nrs
(3.27)
We shall discuss the extent to which equation 3.26 is satised later. Assuming it for now,
and applying the detailed balance condition that nrs is known to satisfy to equation 3.26, we
then obtain
nij =
XX
r2i s2j
n
P n (rji) PPsn nsr
r
Psn nsr
= P1n
i r2i s2j
XX
Pn
= Pjn nji
i
(3.28)
where in the last line we have used
nji =
X
s2j
P n (sjj )
X
r 2i
nsr = P1n
XX
j r2i s2j
Psn nsr
So if nrs obeys detailed balance, then so does nij . But this then necessarily means that
P
the equation Pin = j Pjn nij is satised, so that the equilibrium distribution P n , is the left
eigenvector of the transition matrix n .
Thus we have proven the following: Firstly, that if the probability distribution of microstates
within each macrostate is constant with time, transitions between macrostates can be regarded
themselves as dening a Markov process, determined by the transitions between the microstates
(which we shall call the underlying Markov process). Secondly, that if the above holds and the
probability distribution of microstates within each macrostate is the same as it would be in the
equilibrium distribution Prn , then the Markov process of the transitions between macrostates
has as Pin as its stationary distribution. We have checked these results explicitly for small
matrices, binning some of the states and conrming that the equilibrium state probabilities
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
131
change in the way expected.
We may dene the macrostates as any exhaustive and unique partitioning of the microstates.
As we shall show, this result has many useful consequences: we may use the theory of Markov
processes [164] to relate the matrix n to, for example, the mean and variance of the number of
visits to each state in a run of a particular length. Important results relating to the expected
error of sampling from the Markov chain then follow (see section 3.4). This analysis cannot
be applied directly to the transition matrix for the microstates, because it is too large, but
the eective transition matrix for the macrostates will usually be a manageable size: for the
energy macrostates of the Ising model, with single-spin-ip Metropolis, its `natural' form is
pentadiagonal of size Ld Ld, and it can be reduced further in size by binning the macrostates,
which also makes it tridiagonal. In contrast, the microstate matrix has size 2Ld 2Ld .
What of the validity of the approximation made in equation 3.26, that P n (rji)(t) may be
replaced by its equilibrium (Boltzmann) value? To the extent that the macroscopic variables
(the macrostate labels) are the slowest to evolve, one may expect this approximation to be
reasonable, and to improve as P xc is approached, where the simulation moves in an increasingly
diusive, less directional fashion which gives time for relaxation of P n (rji). We have found
good evidence (see section 3.2.5) that the approximation is indeed essentially exact in the
multicanonical limit. Moreover, we emphasise that, whereas equation 3.21 was obtained by
using a model that is in fact known to be wrong (multinomial distribution of counts, each bin
independent of the others), equation 3.25 assumes only that each transition is independent of
the preceding ones. This is indeed true; it is just the denition of a Markov process, and we
have just shown that the real simulation in equilibrium is described by a Markov process with
transition matrix nij .
It might be argued that the approximation will be less good in the early iterations, where
the system will for at least part of the time be moving rapidly from a release state with a very
low P can to more probable states. However we have found in practice, as we shall describe
below , that even the rst iteration often gives a surprisingly good estimate of P n .
Now, to return to the task of estimating P n , it is clear that having used equation 3.25 to
nd ~nAV;ij , we may use P~ nE , the = 1 eigenvector of its transpose, as an estimator of P n .
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
132
Although the transition matrix n is Nm Nm , it may be (indeed, should be) chosen sparse,
or even tridiagonal, by binning the macrostates and/or choosing the matrix R (introduced
in section 1.2.2) to prohibit transitions between widely separated energies (which are in any
case very unlikely to be accepted). If this is done, it takes only O(Nm ) operations to nd the
eigenvector. Indeed, if n is tridiagonal, it is trivial to nd P~ nE using equation 3.28: we neglect
n = 1. Then we use P~ n
n
normalisation initially and take P~E;
1
E;i+1 = P~E;i ~i;i+1 =~i+1;i to generate
P
n = 1. In the rst one or two iterations,
all the others successively, and nally impose i PE;i
when P n still diers substantially from P xc, it may be necessary to work with the logarithms
of P n to prevent arithmetic overow, and it is necessary to generate P n in the direction in
which it is increasing to prevent the buildup of rounding errors. Thus we should start from
the release states, which were chosen because of their low P can, and iterate to the equilibrium
states.
As in section 3.2.1, it is possible to incorporate prior information from previous iterations,
combining the latest estimate with the previous one using the variance as a weight. However we
have found that this slows down the very rapid initial convergence that is the main advantage
of this method, and is only of advantage near to the multicanonical limit, where, as we shall
see, the visited-states method is probably preferable. Therefore we update the preweighting
coecients using the simple expression n+1 = n ln P~ nE + k of equation 3.14.
The procedure is thus:
0. start with 1 = 0.
1. record histogram Cijn
(a) release simulation from `unlikely' macrostate.
(b) perform several thousand spin updates of each.
(c) go to 1b until simulation has moved to equilibrium, or until it is moving only through
macrostates that have all been visited enough times.
(d) go to 1a, choosing a dierent release state if necessary, until all macrostates have
been visited enough times.
2. estimate transition matrix using equation 3.25.
3. estimate eigenvector P~ nE .
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
4. set
133
n+1 = n ln P~ nE + k
5. if procedure has not converged, go to 1; otherwise stop.
We have tested this method on the 2d Ising model with preweighting both of energy and
magnetisation; we describe the results for energy rst. We used two release states, the ground
state at E = 2L2 (spins all up or all down with equal probability) and the innite-temperature
states around E = 0 (where we simply generated a starting state with each spin randomly up
or down). Simulations launched from these states covered complementary parts of macrostate
space, those coming from E = 2L2 approaching `nishing' states (around < E > for small n)
from below and those from E ' 0 approaching them from above. The iterative scheme outlined
above was therefore modied to run with the step 1b alternated for the two simulations, so
that we could use the crossing-over of the simulations as a criterion of their having `moved to
equilibrium.'
To keep the matrix ~ tridiagonal, we blocked the E -macrostates so that the width of each
block was E = 8 (each blocked macrostate except the lowest thus containing two of the
underlying macrostates). The parameter NTP was set to 600 for L = 16 and 1200 for L = 32,
which meant that each iteration took a rather shorter time than a single iteration of the visitedstates method (since we are now counting spin ips not lattice sweeps). We should note that
the VS method performs more than twice as many spin updates per second as the TP method
does, because in the TP method the data histograms must be updated at every spin ip, the
spins for updating must be chosen at random, and we must check for nishing and re-initialise
the lattice more frequently. In the visited-states (VS) method, the fundamental update step is
a complete lattice sweep rather than a single spin ip, and so there is less bookkeeping and two
calls to the random number generator per ip are saved. For this reason the TP method would
not, in normal circumstances (cf. section 3.2.5) be a candidate for use in the `production' stage
of an Ising simulation, even if it were not for its other disadvantages (see below).
The results for the convergence of n are shown in gures 3.9 and 3.10.
It is apparent that the shape of is outlined, albeit with quite a lot of noise, right from
the rst iteration using this method. The superiority, at least in the early iterations, that
this method has over the VS method is demonstrated more clearly in gures 3.11 and 3.12
where the dierence between n (E = 0) and final (E = 0) is plotted for both methods ( final
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
134
120.0
100.0
80.0
η
5
4
60.0
3
40.0
2
20.0
0.0
-512
-384
-256
-128
0
E
Figure 3.9. Convergence of the weighting function n, n = 2 20, using the TP method for
Ising = 0:55, L = 16.
is established using a long visited-states run for L = 16 and nite-size scaling [see section
3.2.4] followed by a long visited-states run for L = 32). Moreover, the advantage of using the
transition probabilities clearly increases with increasing system size; for L = 32 it produces
a usable weighting function after about fteen iterations (about 1 hour), while extrapolation
of the VS results suggests that it would take at least ten times as long (probably appreciably
more, since in the few early iterations that were performed far fewer than Ld macrostates were
sampled).
We can see why the TP method converges so much faster by considering the maximum
change in n that we can expect to produce in one iteration. Suppose we make NR releases
from one of the unlikely starting states in the course of one iteration. Then the maximum
dierence in the estimated probability of two adjacent states, i and i + 1, would arise if every
one of the simulations followed a trajectory that took it from i to i + 1 and then on to i + 2,
etc, never returning to i. In this case we would have Ci;i+1 = NR , Ci+1;i = 0, and we would
estimate
~ni;i+1 P~in+1
~ni+1;i = P~in = NR + 1
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
135
450.0
400.0
350.0
300.0
η
250.0
4
3
200.0
150.0
2
100.0
50.0
0.0
-2048
-1536
-1024
-512
0
E
Figure 3.10. Convergence of the weighting function n, n = 2; 3 12, 14, 16, 18, 20, 25, 30,
35, using the TP method for Ising = 0:55, L = 32.
+1 n+1 = ln(NR + 1). But a change in the dierence of of this
which would mean that in+1
i
magnitude can now be produced between every pair of states in the chain, so that the total
available is max = Nm ln(NR + 1) Ld ln(NR + 1). Thus even for a fairly small NR ,
10{100 say, the method is able at least in theory to converge on the rst iteration no matter
what the system size. The time taken per iteration should also increase like Nm Ld.
In practice the method does not converge on the rst iteration, and there is clearly a small
residual bias remaining even after many iterations|the weight necessary to reach E = 0 is
overestimated. We shall discuss these two problems in turn.
The rst problem is largely due to the blocking of the macrostates, which compromises
the assumption underlying equation 3.26, that a local equilibrium is maintained within each
(blocked) macrostate. For this to be true, all degrees of freedom withing the macrostate must
relax on a faster time scale than that characterising the transitions between the macrostates,
which is clearly not the case since the blocked macrostates now contain dierent values of
energy. In fact, it is not hard to show that fewer transitions occur in the the direction (through
macrostate space) in which P n is increasing, and more occur in the opposite direction, than
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
136
100.0
80.0
η final −η
n
Transition Probability
Visited States
60.0
40.0
20.0
0.0
0
10
20
30
Iteration n
Figure 3.11. L = 16: Convergence of the weight n(0) as a function of iteration number for
both the TP method and the VS method. The ordinate shows the dierence between and
final , where final is the limiting behaviour of the VS method.
would if local equilibrium were established within each blocked macrostate. This result follows
from considering the fact that transitions that cross the boundary between blocked macrostates
are more likely to come from underlying macrostates near the boundary. The upshot is that P~ nE
continually underestimates the change in n required to reach . This explains a large part
of the behaviour of the method, though in the early iterations it should be noted that there
is a particularly large underestimate of the weight required in states around E=Ld ' 1:4.
We attribute this to the complex behaviour of the system near the critical point, where the
typical canonical congurations have more or less these energies. In the early iterations, when
the system moves ballistically down to these states from E = 0, there is not enough time for
relaxation of the large clusters typical of criticality. In later iterations, when the motion is more
diusive, there is time for this relaxation, equation 3.26 is better satised and the anomaly in
n disappears.
We do not have such a full understanding3 of the bias in the limiting behaviour of P~ nE ; it
3
though the analysis of section 4.3.3 may well be applicable here too.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
137
400.0
Transition Probability
Visited States
η final −η
n
300.0
200.0
100.0
0.0
0
10
20
30
40
Iteration n
Figure 3.12. L = 32: Convergence of the weight n(0) as a function of iteration number for
both the TP method and the VS method.
may simply be a result of the fact that
< P~ nE (~n ) >6= Pn
(3.29)
in which case better performance would be obtained by making each iteration longer, which
would reduce the bias. For the very long runs in section 3.2.5, for example, the bias is not a
problem. As a test of this, we show in gure 3.13 the dierence final (0) n (0) for an L = 16
simulation where the `fullness criterion' NTP is doubled at each step, so starting at NTP = 1200
and increasing to NTP = 153600 by the 8th iteration. We begin not with = 0 but with 8
from the run shown in gure 3.9, for which (0) has not yet exceeded its limiting VS value. We
combine the increase in NTP with jackknife bias-correction (see [191] and appendix D), which
however assumes that the bias goes as 1=NTP .
It is apparent that the bias is indeed much reduced by increasing NTP , but it is clearly
much more persistent than 1=NTP . However, using very large NTP , while it may remove the
bias, also clearly removes the TP method's main advantage, that of rapid convergence. Unless
(as in sections 3.2.5 or 4.3) there is some particular reason for sticking with measurements
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
138
4.0
NTP =1200 ... 153600
Limiting behaviour with NTP =600
η final −η
n
2.0
0.0
-2.0
-4.0
0
2
4
6
8
10
Iteration n
Figure 3.13. L = 16: Convergence of the weight n(0) as a function of iteration number for
the TP method with increasing NTP and bias correction.
of transitions, we would recommend switching to visited-states when n is changing between
iterations by less than could be obtained by visited-states in the same time. If the nal n does
not then immediately produce a viable multicanonical distribution (the bias is scarcely more
than an order of magnitude), only one or two iterations of the VS method will be required to
reach it.
If the distribution P can has more than one peak then the method needs slight modication.
For example, consider magnetisation preweighting for an Ising model for c . If we are
to sample all of the macrostate space by releasing the system from a state of low probability
and letting it move to one of high probability, then we need a release point at M = 0 as well
as at M = jMMAX j; otherwise, states around M = 0 will rarely if ever be visited. We
could initially impose the constraint M = 0 and generate equilibrium congurations with this
constraint before allowing M to change, but here we use random M = 0 congurations. In this
case we have adopted a two-stage process. The rst stage serves to outline the structure of the
macrostate space and bootstraps the second stage which completes the weight assignment.
In the rst stage, we perform a sequence of simulations launched, successively, from one of
the three initial states, then, once this has converged, we rene the weights using transition
probability data gathered using only the two ordered microstates as starting states. We do
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
139
this extra renement because it is not generally to be expected that the limiting set of weights
produced with three launch states will be multi-canonical, because the conditions presupposed
in equation 3.26 are not fullled. A typical M = 0 microstate (at = c ) comprises large
clusters of spins of the same orientation, which take a long time to evolve. As a result, the
information the algorithm gleans about the M = 0 macrostate is biased by the systematic
launch from a microstate which, being `random', has an energy signicantly higher than those
typical of M = 0. For this reason the algorithm would be expected to overestimate the weight
to be attached to the M = 0 macrostate in the rst stage. But this then ensures that in the
second stage simulations released from the M = Ld states will be able to reach the M = 0
states, which they would not with the canonical weighting. In general the method will require
(Np + 1) stages, where Np is the number of maxima in P can (the number of phases if each
maximum has the same weight). This scheme is shown in operation in gure 3.14 for L = 32
at = c ; we show only half of the macrostate space to reduce cluttering.
50.0
iteration 1 (VS)
iteration 1 (TP)
iteration 5 (TP)
naive fss of L=16 η(M)
limiting L=32 η(M)
40.0
η
30.0
20.0
10.0
0.0
0
256
512
768
1024
M
Figure 3.14. for Ising with L = 32 at = c = 0:440686::: inferred from one iteration of
the VS method, one or ve iterations of the TP method and nave ( Ld) nite-size scaling
of xc (L = 16). The solid line shows the limit established from long VS runs (performed to
gather data for section 3.3.3).
It is apparent that, notwithstanding our concern that the result with three release points
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
140
would be biased, the estimate of produced is found to converge on the rst iteration. The
faster convergence compared with the application to energy presumably results from P can(M )
being wide and at in the central region, so that the relaxation of M is naturally slow even
for the rst iteration. The relaxation time of the energy is much faster, so within only a short
time equation 3.26 is approximately satised. The eect of the random M = 0 launch point is
therefore small. It is also signicant that the matrix nij is naturally tridiagonal, so that it is
unnecessary to block it. The estimate 2 of is in fact good enough to be used immediately in
a multi-canonical `production run,' in contrast to the result of a visited-states run of the same
length, or to nave ( Ld) nite-size scaling of xc (L = 16), which are also shown in gure 3.14.
We also show the eect of proceeding to the second stage of renement, using only the
microstates at M = Ld as release points (this is marked as iteration 5 in gure 3.14, which
was the 3rd iteration conducted with only two release points). In this case, because the rst
stage has performed so well, only a marginal further improvement is obtained, and, as with
energy, there seems to be a small residual bias.
To summarise, then, we have found that the TP method provides very much faster initial
convergence to the multicanonical distribution than the visited-states (VS) method of section
3.2.1; we have demonstrated its ecacy for fairly large systems (L = 16 and L = 32 energy;
L = 32 magnetisation), where variations in canonical probability of more than one hundred
decades must be covered. If the transition matrix has a suitable structure, convergence can be
achieved on the rst iteration; however, if this is not the case, the nal convergence may be
poorer than that of the VS method, and there may be a residual bias. In practice, at least for
the Ising model, it is probably better to switch to the VS method for nal rening when is
changing only a little between iterations.
3.2.4 Finite-Size Scaling
It will be noticed that the shape of the nal xc (E ) generated in the previous sections is
very similar for dierent system sizes, merely being scaled by the system size Ld. This is a
manifestation of the extensivity of the canonical averages and free energy away from criticality
(section 1.1.3). As a consequence, xc for a small system, once generated (and the methods
we have examined above make it quite easy to do this) can be used to predict xc for a large
system: we t a function (such as a spline or Chebyshev polynomial) to the small-system xc,
then scale and interpolate it. The predicted can be rened, if necessary, to a multicanonical
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
141
form, again by using one of the two previous methods. The renement thus corresponds to
measuring correction-to-scaling terms. In the Bayesian framework the use of nite-size scaling
(FSS) corresponds simply to beginning with `prior information' about the sampled distribution
which comes from smaller systems, reected in a P ( jH) which has its mean at the nite-size
scaling estimate and a width chosen to reect (or to underestimate) the expected magnitude of
the unknown correction terms.
This has been found to work extremely well away from criticality; for the 322 Ising model at
= 0:55, scaling the 162 xc (E ) gives an estimate accurate to within 2%, which then requires
only a few iterations of the visited-states method to converge to a fully multicanonical form.
This convergence is shown in gure 3.15. The subgures that compose this gure are to be
read in the order that text would be, from left to right and top to bottom. At the top left is
the initial estimate of 1 (L = 32), produced by nite-size scaling of xc(L = 16). Next to this
on the right is the histogram C 1 produced in a MC run sampling with 1 . We then used this
histogram to produce 2 . The dierence between 1 and 2 is very small on the scale of ,
so we show the dierence between them, 1 = 2 1 , rather than 2 itself (2nd row left).
Sampling with 2 then produced the second histogram Ci2 (2nd row right), and the remaining
two lines of the gure show the equivalent data for iterations 3 and 4.
It is apparent that spreading of the sampled distribution occurs just as in section 3.2.1;
fractionally large uctuations occur around the edge of C 1 where it goes to zero, which translate
into a rather `jagged' C 2 , but the irregularities are then smoothed away in subsequent iterations.
As in the previous investigation of the VS method, we so not extend sampling also to those
states below Pcan
=0:55 . The histogram broadens rather faster than in section 3.2.1, because 1
has approximately the right shape even in the region that is not sampled in C 1 . Thus, a small
unform increase of the probability of all the macrostates in this region renders many more of
them accessible than would be the case if 1 were constant there.
It will be noted that in this set of runs we adopted a slightly dierent way of dealing with
n above E = 0, simply cutting it o at a constant value rather than letting it increase as
(E > 0) = (E = 0) + E . The result is that the states above E = 0 are scarcely sampled, so
when using these results to calculate free energies (as we do in sections 3.3.1 and 3.3.2) we use
the symmetry of the Ising density of states about E = 0 to reconstruct P can(E ) for E > 0.
However, there are situations where the simple FSS described above cannot be applied.
One is P can (M ) at c , where a simple FSS scaling does not correctly predict the shape of CHAPTER 3. MULTICANONICAL AND RELATED METHODS
142
(see gure 3.14). As we shall see in section 3.3 below, in some ranges of M -values (M; c )
scales like Ld but in others it obeys dierent laws. With a knowledge of the correct critical
exponents, a critical P can(M ) at c could be scaled correctly; however for the important case
of the simulation of spin-glasses (see [108]) there exists no FSS theory to predict the scaling of
. In such cases the TP method4 or linear extrapolation [108] would be preferable.
4
Though we have not yet tried to apply this method to spin glasses.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
750.0
500.0
η
143
1
C
250.0
1
500.0
250.0
0.0
-2048
-1024
E
0.0
-2048
0
0
-1024
E
0
-1024
E
0
-1024
E
0
750.0
5.0
1
∆η
-1024
E
C
0.0
2
500.0
250.0
-5.0
-2048
-1024
E
0.0
-2048
0
750.0
5.0
2
∆η
C
0.0
3
500.0
250.0
-5.0
-2048
-1024
E
0.0
-2048
0
750.0
5.0
3
∆η
C
0.0
4
500.0
250.0
-5.0
-2048
-1024
E
0
0.0
-2048
Figure 3.15. Renement of a FSS estimate of using VS. (top left) estimate for 1 (L = 32),
produced by nite-size scaling of xc (L = 16). Below on left: n 1 = n n 1 for iterations
2 (top) to 4(bottom); on right, from top to bottom, histograms for iterations 1{4.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
144
3.2.5 Using Transitions for Final Estimators: Parallelism and Equilibration
We now present some new results which show how the multicanonical method could be implemented eciently on a parallel computer, and how in some circumstances we can do away
with the necessity of performing full equilibration of the simulation, in the sense in which it is
usually meant.
We may reasonably speculate that any algorithm which is to be widely used in MC simulation
in the future will have to be amenable to running on parallel computers. Unfortunately, the
multicanonical method cannot be parallelised in the same way as Boltzmann sampling often
can be: by geometrical decomposition, in which each processor of the parallel computer looks
after a subvolume of the whole simulations. Geometrical decomposition works for Boltzmann
sampling when the forces between particles are short-ranged, so that calculation of E (to
be used in equation 1.2.2) for each trial particle move is a local operation. Particles which
are suciently widely separated that they cannot interact before or after any possible moves
may then be updated in parallel. However with multicanonical sampling this is no longer
possible: the transition probability depends on , where is a function of the total energy or
magnetisation of the system. Therefore, if we generate several trial moves in dierent regions
of the simulation, the transition probability for each will depend on the nal macrostate, which
we do not know a priori since it will depend on how many of the other moves are accepted5 .
However, some kinds of parallelisation are still possible: First, it is permissible to generate
several trial moves with geometric decomposition and then to perform just one at random of
those that would be accepted. This would produce a speeding-up in a situation where the
acceptance ratio was very low, but not otherwise. It is also permissible to update in parallel
with geometric decomposition if the moves that we generate are chosen to keep the value of
the preweighted variable, and thus , constant; so this would correspond to Kawasaki dynamics
on the Ising model with = (M ). This kind of parallelisation was used (in combination
with primitive parallelism|see below) for the simulations of chapter 4. Moves that change
must be performed serially, and so the total speed-up with parallelisation is unlikely to be
very large, since the preweighted variable, which varies over a wide range in a multicanonical
5 Very recently, and in a dierent context, a `fuzzy' MC method has been introduced [165] which, we speculate,
may enable errors introduced by parallel multicanonical updating to be corrected. This is one area of possible
future investigation.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
145
simulation, and thus has the slowest uctuations, still has to be explored in serial. This kind
of parallelisation is better suited to the expanded ensemble (see section 3.4.1 for a discussion
of the kinship between this and the multicanonical ensemble), where the number of -weighted
states is generally quite small. Finally, we may simply perform `primitive parallelism,' where
Nr copies, or replicas, of the simulation are run in parallel, one on each processor. The results
p
of all the replicas are then averaged to give estimators with a variance lower by Nr . It is this
kind of parallelisation that we shall now discuss in more detail, showing how the multicanonical
ensemble is particularly suited to it.
Imagine performing a simulation with primitive parallelisation where both Nr and Ld, the
system size, are quite large. As we discussed in section 1.2.2, a MC simulation usually needs
an equilibration period before unbiased results are produced (i.e. conguration is generated
with P n (), where P n () is the eigenvector of the microstate transition matrix nrs that we
have chosen with the goal of sampling P n ()). In the case we are discussing, as Nr and Ld
increase, the equilibration time becomes a larger and larger fraction of the total simulation
time: once equilibrium is reached, a short run suces (because Nr is large), but getting to
equilibrium takes a long time (because Ld is large). This problem aicted us quite severely in
the simulations of section 4.2.
The multicanonical ensemble might seem to oer a way of solving this problem, because
macrostates that require no equilibration (the ground state) or equilibrate very fast (innitetemperature) have a high probability under it. Can we therefore simply start every simulation
in one of these states and begin collecting data immediately, with little or no equilibration?
If conventional estimators are used the answer is no, but we shall show that by using the
eigenvector estimators we have just introduced we may indeed do just that.
The problem with using conventional visited-states estimators (P~in / Cin ) is that states near
to the starting state (and the nishing state too, in fact) receive too much weight. Suppose we
have a perfect multicanonical distribution in magnetisation, P (M ) = const. Let us start each
of our simulations in the -ve magnetisation ground state and let it evolve until it has done just
a few random walks over the whole range of M (this will take quite enough time, since we are
assuming that Ld is large). The expected distribution of C (M ) is then as shown (qualitatively)
in gure 3.16; even though the underlying P (M ) is at there is a concentration of probability
near the starting state, and this only disappears in the C (M ) ! 1 limit. What we have is
basically a diusion problem; the probability of being in state M at time t, P (M; tjM0 ; 0) starts
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
146
as a delta function at M = M0 ; t = 0 and slowly spreads out over the allowed states, nally
becoming uniform only as time goes to innity. The expected C (M ) is proportional to the sum
over time of P (M; tjM0 ; 0), and the acceptance ratio in each state plays the role of a diusion
coecient.
We should note that this problem is always present in MC simulation, but its eect is usually
very small; the number of sampled macrostates in a Boltzmann sampling simulation is small
and the Markov chain is long, so the the slight bias toward the starting state, which dies away
p
as 1=Nc, is swamped by the random errors of order 1= Nc . It occurs even in a simulation which
is well `equilibrated', by which we mean that the memory of any transient initial state, which
may have had a very low probability under the prevailing sampled distribution, has died away
and (if the the simulation were a `black box' we could not look into) we would ascribe to every
microstate and macrostate its equilibrium probability. But of course in fact the simulation is
in exactly one state with probability one when we begin sampling. To ignore this fact has little
eect in a Boltzmann sampling simulation, but it would lead to serious inaccuracy here, where
the number of sampled macrostates is large and the Markov chain is comparatively short.
0.0070
0.0060
<C(M)>/Nc
0.0050
0.0040
0.0030
0.0020
0.0010
0.0000
-128
-64
0
64
128
M
Figure 3.16. Expected histogram of visited states for a diusive system with 257 states (with
a separation of 2). It is assumed that we begin in the state at 256 and do 105 steps, moving
only to the two adjacent states with equal probability. This gure was generated by solving the
diusion equation for a system with a constant diusion coecient and bears only a qualitative
relationship to the Ising problem.
We can largely overcome this problem if we record the transition histogram Cij and use
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
147
equation 3.28 to give eigenvector estimators of the equilibrium state probabilities; as we have
already seen when we were using the method to nd the multicanonical distribution, it largely
removes the bias of the starting and nishing states. The actual number of visits to each state
should not aect our estimators of P n , except in so far as statistics will be poorer for the states
that are more rarely visited. Nevertheless, there are two possible sources of bias in the result, one
coming from equation 3.29, and the other from the fact that the use of eigenvector estimators
does depend on P (ji) having its Boltzmann value (see section 3.2.3). The bias coming from
equation 3.29 should get smaller as the number of counts gets bigger. For this reason, we pool
the results of all the runs to dene the eigenvector estimator, and use jackknife blocking to
give the error. As regards the other source of error, it is not, in fact, the case that P (ji) does
necessarily have its Boltzmann value, even given that it did at t = 0, but we argue on physical
grounds that in the multicanonical ensemble, where movement through the macrostates is
always diusive rather than directional, the macroscopic variables will evolve much more slowly
than the microscopic, so giving time for a local equilibrium within each macrostate to reestablish itself, which must be the Boltzmann equilibrium since the microstate transition rules
are chosen to produce this. This assertion is conrmed implicitly by the results for P xc(M ),
and more directly by measurements we have made of spin-spin correlation functions.
We show some results for the L = 16 and L = 32 2d Ising model in gures 3.17 and 3.18.
We use magnetisation preweighting at c , with an (M ) that is symmetric about M = 0, so
we know that the underlying distribution P xc(M ) should be symmetric even if it is not quite
at. We perform 500 runs, starting in the M = Ld ground state. These results were in fact
generated using a serial HP-700 computer, but the important point is that they could have been
generated in parallel, since each run started from the same state and was independent of all the
others. For L = 16 each run was stopped when the simulation returns to that state, after having
in the intervening period visited the M = +Ld ground state; for L = 32 we simply performed
a constant number of updates (3 107 ip attempts) for each run, so the nishing states were
distributed over all the macrostates. Then the transition histograms from all the runs were
pooled for the estimation of P xc (M ). It is apparent that using C (M ) (solid line) to give the
estimator of P xc (M ) would produce a systematic error, but that this bias is removed to within
the random error by using estimators from the transition matrix. The slightly biased jackknife
estimator from the pooled transitions is called evecJ (open triangles) and the double-jackknife
bias-corrected estimator is evecJJ (lled triangles). The error bars are shown only on evecJJ
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
148
but are about the same size on evecJ. On the L = 16 graph we also show an evecJ estimator
calculated just from the rst 100 runs (circles); there is an obvious bias present here, although
it is in fact still no larger than the random error, which is, of course, also larger this time. For
L = 16 there is very little dierence between evecJ and evecJJ, but for L = 32 it does appear
that there is a systematic error in evecJ which is eliminated by the bias-correction. In a real
implementation, P can(M ) would of course be recovered from P xc(M ) for the determination of
free energies and canonical averages, though we have not done this here.
0.0050
0.0045
P(M)
P(M) from 1-500, using C
P(M) from 1-100, using evecJJ
P(M) from 1-500, using evecJ
P(M) from 1-500, using evecJJ
1/257
0.0040
0.0035
-256 -192 -128 -64
0
64
128 192 256
M
Figure 3.17. Normal and eigenvector estimates for P xc(M ) at c; L=16.
We chose to study P (M ) at c to show that the method of doing many short runs can
cope with the large, slowly-evolving clusters of criticality without introducing a bias. We do
not expect that it should; if we allow two random walks over the whole range of M (repeated
enough times in parallel) then we expect to generate all the possible cluster structures. There
can be no structure whose generation requires a complex re-entrant trajectory with several
visits to each end of the chain of states, because the process is Markovian and so starting once
from the non-degenerate end state is equivalent, in terms of the probabilities of what happens
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
0.0012
0.0011
149
P(M) from 1-500, using C
P(M) from 1-500, using evecJ
P(M) from 1-500, using evecJJ
1/1025
P(M)
0.0010
0.0009
0.0008
-1024 -768 -512 -256
0
256 512 768 1024
M
Figure 3.18. Normal and eigenvector estimates for P xc(M ) at c; L=32.
after, of having been there any number of times before.
We have also investigated local spin-spin correlation functions to corroborate the claim that
P (ji) has its Boltzmann value to good accuracy. Let the spins be represented by s, and let
r be a vector from one lattice site to another. Then (r; M ), the correlation function with
magnetisation M , is dened by
2
(r; M ) = < s<(0)ss2(>r) >< s<>s2 >
> M M 2 L 2d
= < s(0)s1(r) M
2 L 2d
given that
P
r0 s(r0 ) = M
where r0 runs over the lattice sites. To estimate < s(0)s(r) >M we calculate
X
&~ L d s(r0 )s(r0 + r)
r0
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
150
and then our estimator of < s(0)s(r) >M is the average of &~ over those congurations which
have magnetisation M . We consider r only along the rows and columns of the Ising lattice,
and average over equivalent directions, so that r becomes a scalar: r = 1; 2; L=2 in units of
the lattice spacing.
We have measured (r; M ) for the 322 2d Ising model under two conditions; rst for a series
of 50 runs like those that produced the eigenvector estimator 3.18, and second for a single long
run containing the same number of congurations in total, starting in a random conguration
(therefore probably near M = 0) and allowed to equilibrate for 5 107 spin ips (twice as
long as required for a random walk from M = Ld to M = +Ld and back) before gathering
data. The rst set of conditions thus gives (r; M ) as produced by the eigenvector estimator,
and the second gives it under multicanonical equilibrium conditions. The results are shown in
gure 3.19; it is apparent that the two correlation functions are identical to within experimental
error. In particular, it should be noted that there is no evidence of M ! M asymmetry in
the `eigenvector' runs, even though the visited states histogram C is decidedly asymmetric.
As regards the shape of the correlation functions themselves, we observe that, as we would
expect, (r; M ) decreases with increasing r and is close to zero for large jM j. If r is small it
then increases to a maximum at M = 0, but for r L=2, it is negative around M = 0, a
consequence of the tendency of the system to exist in large clusters of opposite magnetisations.
To summarise, then, we have shown that we can remove the bias of the starting state in MC
simulation by the use of the transition matrix to measure macrostate probabilities combined
with the multicanonical ensemble's ability to reach non-degenerate macrostates (so we do not
have to worry about the probability distribution of microstates within the starting macrostate).
This opens up the possibility of doing simulation without full equilibration of the preweighted
variable, and thus of massively parallel implementations of the multicanonical ensemble in which
each processor does only a short run on a large system.
We do not, however, recommend using this method for simulation on serial computers,
because, as we said in section 3.2.3, the TP method performs fewer spin updates/sec than
does the normal (VS) method. If there is no problem with bias, as is the case with a serial
implementation where a single long Markov chain can be produced, then any extra speed is
clearly advantageous. It is also obviously better not to have to do bias-correction if it can be
avoided.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
151
0.8
0.6
γ (r,M)
0.4
r=1
r=2
r=3
r=5
r=7
r=10
r=15
0.2
0.0
-0.2
-1024
-512
0
512
1024
512
1024
M
0.8
0.6
γ (r,M)
0.4
r=1
r=2
r=3
r=5
r=7
r=10
r=15
0.2
0.0
-0.2
-1024
-512
0
M
Figure 3.19. Estimators of the correlation function (r; M ) for L = 32 Ising. Error bars
are not shown, but their size is comparable with the scatter of the symbols. Upper diagram:
results from a series of short multicanonical runs starting in the M = Ld ground state; lower
diagram: results from a single long multicanonical run after equilibration.
If the system is so large that only a few random walks over the preweighted variable can be
produced with the serial machine, then no accurate estimate of P n can be made either by VS
or TP methods.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
152
3.3 Results
In this section we shall present estimates of free energy and canonical averages of the 2d Ising
model, which will allow a comparison between the multicanonical ensemble (with = (E )),
thermodynamic integration (see section 2.1.1) and exact results. Then we shall present some
new results where we use multicanonical sampling to check analytic predictions about the form
of P can (M ) for the 2d Ising model at c .
3.3.1 Free Energy and Canonical Averages of the 2d Ising Model
Measurement of free energy is of course our central concern in this thesis, so we shall begin
with measurements of g( ) = G( )=Ld made using the multicanonical ensemble for the L = 16
and L = 32 2d Ising model. The 162 distribution was obtained by the TP method; the 322 by
nite-size scaling followed by rening with the VS method. Once a suitable xc was found, VS
estimators were used for all production runs. The production runs comprised a total of 2 106
(L = 16) and 11 106 (L = 32) lattice sweeps, generated in ten blocks, with jackknife blocking
used to estimate errors for all results. The early iterations, which contribute only to nding
the multicanonical distribution, took an extra 5 105 (L = 16) and 3 106 (L = 32) lattice
sweeps.
Free energy results are shown in gures 3.20 and 3.21 respectively, along with the exact
nite-size results from [10] (solid line). The error bars are much smaller than the data points
(triangles). The multicanonical distributions that we used for these measurements were `designed' for the measurement of g at = 0:55 by using equation 3.7 as described in section 3.1.1,
so we did not make any particular eort to extend the multicanonical sampling to energies
lower than the peak of P can( = 0:55; E ) or higher than the peak of P can ( = 0; E ). We would
therefore expect to be able to determine g( ) accurately for 0 < < 0:55, and this is indeed
found to be the case. For L = 16, P can ( = 0:55; E ) gives appreciable weight to the ground
state E0 , so we can in fact calculate g( ) for any higher value of too; we show g up to = 1:0
in the gure. For L = 32 this was not possible, but we were able to nd g( ) for up to 0:6.
The (1sd) error bars were in all cases smaller than 0:0003 for L = 16, and smaller than 0:0002
for L = 32, while g itself varied between 15 and 2 over the range of investigated; the scale
of the inset in gure 3.21 gives some idea of the accuracy obtained. Figures 3.26 and 3.27 in
section 3.3.2, where the multicanonical results are compared with Thermodynamic Integration,
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
153
show the dierence between the MC estimates of g and the exact value.
0.0
-5.0
g
Exact
Multicanonical
-10.0
-15.0
0.0
0.2
0.4
β
0.6
0.8
1.0
Figure 3.20. free energy g of Ising model L = 16, 2 million sweeps.
By reweighting with exp( xc) and normalising to recover the canonical distribution, we can
also measure < E > and the specic heat capacity
E > = 2 (< E 2 > < E >2 )
CH = 2 @ <@
Again, We can also calculate these exactly using results from [10]. The exact results for
< e > =< E > =Ld are shown in gure 3.22 and for cH = CH =Ld in gure 3.23. The anomalous behaviour in the critical region, associated with the continuous phase transition in the
L ! 1 limit, is clearly visible: the gradient of the internal energy becomes steeper as L increases, this manifesting itself also in the increasing height of the `peak' in the specic heat
capacity. In gures 3.24 and 3.25) we show the corresponding results from a single multicanonical ensemble simulation of the L = 32 system. It is apparent that agreement with the exact
results (solid lines) is very good. In the main gures, which show the full range of , the errors
are much smaller than the symbols; to see deviations from the exact results we must magnify
smaller regions. The largest fractional error in < e > is about 0:1% occurring at = 0:44
(very near the critical point, where dE=d is large). Away from criticality the typical error is
more like 0:01%. For the specic heat the largest fractional error, again occurring at the critical
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
154
0.0
-5.0
-2.0240
-2.0250
g
g
-10.0
-2.0260
-2.0270
-2.0280
0.555
0.560
0.565
β
Exact
Multicanonical
-15.0
0.0
0.1
0.2
0.3
β
0.570
0.4
0.575
0.5
0.6
Figure 3.21. free energy g of Ising model L = 32, 11 million sweeps. Inset: detail of vicinity
of = 0:56; vertical scale expanded more than 3000 times.
point, is 0:5%, while elsewhere it is typically about 0:1%. We do not show results for L = 16,
which are very similar.
3.3.2 A Comparison Between the Multicanonical Ensemble and Thermodynamic Integration
We have also performed simulations using thermodynamic integration to determine g( ). As
we described in section 2.1.1, the derivatives of free energies can be related to more accessible
canonical averages; in this case we have used
@ (g) =< e >
@
We therefore made measurements of < e > using Boltzmann sampling simulations for 11
evenly spaced values of between = 0:05 and = 0:55, investigated in that order. An
interpolating spline was tted to the data points and integrated numerically with respect to
. g( ) was then found by using lim!0 (g) = ln 2. The lengths of the runs were chosen
so that 2:5 105 (L = 16) and 1 106 (L = 32) sweeps were performed at each temperature,
enabling a more or less direct comparison with the multicanonical ensemble. Each simulation
was started using the nishing conguration from the previous one, to reduce the equilibration
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
155
0.0
-0.5
<e>
L=16
L=32
-1.0
-1.5
-2.0
0.0
0.2
0.4
β
0.6
0.8
1.0
Figure 3.22. Exact internal energy/spin of L = 16 and L = 32 Ising.
required, which was conned to 5 104 lattice sweeps.
The results are shown in gures 3.26 (L = 16) and 3.27 (L = 32). We show both the
results from thermodynamic integration (circles) and those from the multicanonical ensemble
(triangles). So that the accuracy obtainable is clearly visible, we have plotted the dierence
between g( ) from simulation and the exact g( ) from [10]. For L = 16 at all temperatures and
L = 32 at small , the two methods yield comparable accuracy (although the multicanonical
results are a little better), and the error bars include the line g = 0 as they should. However,
for L = 32 a large deviation from the exact result appears in the thermodynamic integration
points at about = 0:35. This is not a random error (the error bars, which represent the
measured `spread' of the estimator from blocking, are approximately the same size as those on
the multicanonical data) but is instead a systematic error caused by the presence of a phase
transition on the path of integration. This is a problem that often severely reduces the accuracy
of simulations that use thermodynamic integration. Here, the innite-volume Ising model has
a continuous phase transition at c = 0:440686 : : :, and for the L = 32 system < e > changes
so rapidly with around this point that the data points are inadequate to determine its shape,
producing the systematic error in g. The shape of the deviation{rst positive, then negative{
suggests that the `corners' of a sharp sigmoid curve are being smoothed away. To reduce this
error we would have to space the integration points dierently, clustering them around the
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
156
2.0
1.5
cH
L=16
L=32
1.0
0.5
0.0
0.0
0.2
0.4
β
0.6
0.8
1.0
Figure 3.23. Exact specic heat capacity of L = 16 and L = 32 Ising.
phase transition point. By contrast no such special care is required in the application of the
multicanonical ensemble. It is untroubled by the Ising phase transition, and indeed can even
be used to sample through a rst-order phase transition (see [121] and section 4.3). The error
bars get larger because of the large critical uctuations, but they still contain the line g = 0.
The multicanonical error bars, unlike those on the the thermodynamic integration points, thus
still provide a trustworthy condence limit on the accuracy of the results.
We speculated in section 2.4 that consideration of the algorithm used and the estimators of
free energy dened on the congurations produced could be partially separated. To investigate
this we have tried a `multicanonical-integration' hybrid, where g is estimated by rst nding
the internal energy < e > from the results of the multicanonical ensemble, then integrating it
with respect to . The variation of < e > through the critical region is tracked well this time,
so we would not expect systematic errors. In fact, we nd that exactly the same estimators are
produced this way as by direct estimation of < exp(e) >, right down to the seventh signicant
gure. There is, therefore, no advantage to the procedure.
It has been suggested [47] that thermodynamic integration along a path that avoids a phase
transition outperforms the multicanonical ensemble (or related methods like overlapping distributions) for large L because it does not require that all macrostates be sampled. We see
no evidence of this, but we have not examined suciently large system sizes. For the L = 32
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
157
0.0
-1.3
-0.5
<e>
<e>
-1.4
-1.5
-1.0
0.43
0.44
0.45
β
exact
Multicanonical
-1.5
-2.0
0.0
0.2
0.4
β
0.6
0.8
1.0
Figure 3.24. The specic internal energy < e > for the L = 32 Ising model. Solid line: exact
results. Points: MC results.
thermodynamic integration the system was still suciently small that P can (E ) at each simulation point overlapped signicantly with its neighbours. This suggests that we would need
to be dealing with extremely large systems before this became an issue, and it is precisely in
large systems that the behaviour around phase transitions is too singular for thermodynamic
integration to cope with.
To summarise, then, we have found that multicanonical sampling performs at least as well
as thermodynamic integration in a single-phase region, and is much better able to deal with
phase transitions. The question of the superiority of thermodynamic integration in very large
systems has not been resolved, but it is clear that it could only apply to large systems away
from phase transitions, where internal energy is very smoothly varying.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
158
2.0
1.95
1.90
1.5
1.85
cH
1.80
cH
1.75
1.0
1.70
0.420 0.425 0.430 0.435 0.440 0.445 0.450
β
Exact
Multicanonical
0.5
0.0
0.0
0.2
0.4
β
0.6
0.8
1.0
Figure 3.25. The specic heat capacity cH for the L = 32 Ising model. Solid line: exact
results. Points: MC results.
3.3.3
P (M )
at = c
As we have already discussed (sections 1.1.3), the p.d.f. of the magnetisation of the Ising model
has an unusual form at the critical point, related to the critical scaling of the free energy G
with L (see appendix A). It is well known [166] that at = c with no external eld and for
large L,
PLcan(m)dm ' p(x)dx
where m = M=Ld=x m=m and
m < M 2 >1L=2 L d=(1+)
where p (x) (which is in general non-Gaussian) is unique to a particular universality class (a
universality class is a collection of possibly highly disparate systems, united by spatial dimensionality and certain gross features of their interactions, which are found to have very similar
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
159
0.0010
0.0005
gMC-gexact
0.0000
-0.0005
Thermodynamic Integration
Multicanonical
-0.0010
-0.0015
-0.0020
0.0
0.2
0.4
β
0.6
0.8
1.0
Figure 3.26. Dierence between exact free energy and various MC estimates, L=16.
critical behaviour), and is the equation-of-state exponent: = 15 for the 2d Ising universality
class [11]. p (x) for this universality class is shown in gure 3.28.
From rigorous results [12] and scaling arguments [167] we may make the following ansatz for
p (x) at large x:
p (x) ' p~ (x) p1 x( 1)=2 exp( a1 x(+1) )
(3.30)
i.e. for the 2d Ising universality class,
p~ (x) ' p1 x7 exp( a1 x16 )
The form of this function (with p1 = 0:59 and a1 = 0:027) is shown in gure 3.29; it is
clearly in at least qualitative agreement with the measured p (x) (gure 3.28) for x > 1:0.
The form of the prefactor (x7 ) is in accord with a recent theory [168] that relates p (x) to
the stable distributions of probability theory. However this theory also suggests the existence
of further non-universal contributions to p(x) which would fall o as a power at large x
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
160
0.0010
gMC-gexact
0.0005
0.0000
-0.0005
Thermodynamic Integration
Multicanonical
-0.0010
-0.0015
0.0
0.1
0.2
0.3
β
0.4
0.5
0.6
Figure 3.27. Dierence between exact free energy and various MC estimates, L=32.
and so be asymptotically dominant. To see whether these non-universal corrections exist we
need to compare the ansatz 3.30 quantitatively with an accurate measurement of p (x) (i.e.
P can (M; c )) in the large-x regime. This measurement cannot be performed by Boltzmann
sampling; whatever the exact form of p (x) may be, it clearly falls o very fast for large x and
it will be impossible to measure it accurately above x 1:4. The ground state, by comparison,
lies at xmax Ld=(1+), which tends (albeit very slowly) to innity in the thermodynamic limit
and for L = 64 is at 1:6132.
We therefore used a multicanonical simulation with = (M ), arranged to be at over
the entire range M = Ld to M = +Ld. The usual reweighting then enables us to recover
P can (M ), and thus p(x), accurately measured over the whole range of M . The inset in gure
3.28 shows the canonical probabilities of the ground state and rst excited state of the 642
system measured this way. To be certain we were in a regime where corrections-to-scaling were
small, we investigated quite large system sizes L = 32 and L = 64; the latter is about as large
as is feasible using single-spin-ip Metropolis, whose acceptance ratio falls o like L d near
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
161
1.4
8.0e-85
1.2
1.0
6.0e-85
p*(x)
4.0e-85
2.0e-85
p*(x)
0.8
0.0
1.61280
1.61300
1.61320
x
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
x
Figure 3.28. The critical probability density function p(x) of the scaled order parameter x
for the 2d Ising universality class, determined by MC simulation. p (x) is symmetrical about
x = 0. The inset shows the canonical probability of the ground state and rst excited states
of the 642 2d Ising model, which lie in the extreme tail of p (x) and have been measured by
multicanonical simulation as described in the text.
the ground state for reasons we explained in section 3.1.2. Because the exact scaling form of
P can (M; c ) was unknown, or rather was part of what we wanted to investigate, we used the TP
method to generate xc initially, moving to the visited-states (VS) method for nal renement
and for the production stage.
For L = 32 we did 10 iterations of the TP method, the rst 7 taking 30 minutes each on an
HP-700 workstation, the last 3 taking 3 hours. Then we allowed the simulation to proceed using
the VS method, with a gradually increasing (automatically increased) NAV . For L = 64 we did
8 iterations of the TP method, generating about 109 spin ips (5 h) for each, then switched
to the VS method. The details of the implementation of the VS method are in table 3.1. The
TP program used was an early version which converged much more slowly than the one that
produced the results of section 3.2.3 for (M ) for 322 Ising; it had not in fact quite converged
when we moved to VS estimators.
During both the nal renement and production stages, we allowed continued updating of
n , using the method of section 3.2.1 to incorporate prior information. Finally we re-analysed
the results, combining 's according to the prescription of equations 3.24 and 3.23 but using
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
162
1.4
1.2
1.0
~
p*(x)
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
x
Figure 3.29. The ansatz p~ p1x7 exp( a1x16 ) with p1 = 0:59 and a1 = 0:027.
L NAV # iterations # lattice sweeps/iter time/iter
32 100
5
6:4 105
1.5h
32 200
5
1:2 106
3h
32 400
2
2:4 106
6h
64 7
5
1:6 105
1.5h
64 14
5
3:4 105
3h
64 28
7
6:8 105
5h
6
64 56
3
1:4 10
12h
Table 3.1. Details of the visited-states sampling for L = 32 and L = 64 with magnetisation
preweighting.
only the last seven (L = 32) and ten (L = 64) iterations, for which all sampled distributions had
been approximately multicanonical. Thus we avoided the bias that would have resulted from
using the early estimators n , which were not multicanonical, while still avoiding at run-time
the division of the process into `nding-' and `production' phases; or at least we were able
to decide a posteriori where the `production' phase was to begin. This produces a function
best (M ); the corresponding best P~ can (M ) is recovered using P~ can(M ) exp( best (M )).
The nal estimate of (M ) ranged over 0 44 for L = 32 and 0 191 for L = 64, corresponding to 83 decades of variation in P~ can (M ). This P~ can (M ) was used to produce the graph
of p (x) in gure 3.28. From (M0 = Ld) we can estimate g(c ) as described in section 3.1.3.
We nd, using the dierence between (Ld ) and ( Ld) to estimate the error, the remarkably
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
163
accurate estimates:
g(c ; L = 32) = 2:11115(4) (cf. g(c ; L = 32) = 2:11107 from [10])
g(c ; L = 64) = 2:10999(2) (cf. g(c ; L = 64) = 2:11001 from [10])
These demonstrate the accuracy with which the ground state probabilities (as shown in the
inset in gure 3.28) have been measured. Now let us consider the predictions of the scaling
ansatz for p (x). Figure 3.30 shows q (x) ln[x 7 p (x)], estimated from the multicanonical
results, plotted against x16 for x > 1. According to the ansatz 3.30 we would expect this to be
linear with gradient a1 . In fact, there is a linear regime for medium-sized x with a deviation
at both ends (though the deviation near the origin cannot be seen on this graph because of the
scale). At small x we nd that p (x) is larger than expected from the ansatz 3.30, while at large
x it is smaller. The low-x deviation comes from the approach to zero-magnetisation states, to
which the ansatz ascribes zero probability, while the high-x one is a non-universal nite-size
eect caused by the underlying microscopic structure of the system becoming apparent as we
approach the ground state. The relative weights of the ground state and rst excited state
for the Ising model, for example, must be 1 : Ld exp( 8c), in contradiction to the scaling
ansatz, while some dierent expression will apply to the p.d.f. of, for example, the density of
the 2d Lennard-Jones uid. However, in line with xmax tending to innity with Ld=(1+) , the
breakdown of the scaling ansatz occurs at larger x for the larger (L = 64) system.
Figure 3.30 also shows that we can discount6 the suggestion that the asymptotic behaviour of
p (x) is a power-law decay. If it were, then we would expect that at large enough x, q (x) ln x,
which would deviate `downwards' (i.e. with a negative second derivative) from the prediction.
Instead, the observed deviation from the ansatz has the opposite sense.
In dening q (x) we included an x 7 to cancel the expected x7 in the ansatz for p (x).
However, we would also like to demonstrate that the polynomial prefactor in p (x) is indeed
x7 . This is not easy to do because the eect of the polynomial is of course dominated by the
very strong exponential decay. Nevertheless, we have tried tting functions of the form
p1 x exp( a1 x16 )
to p (x) (determined by MC) over a series of `windows' of x-values, choosing the values of
6 at least for the Ising model, for which, it must be admitted, non-universal correction terms are often found
to have zero amplitude because of the high symmetry [169].
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
164
200
150
L=64
L=32
q*(x)
100
50
0
0
500
1000
x
16
1500
2000
Figure 3.30. The function q(x) ln[x 7p(x)] plotted against x16 (x > 1) for L = 32
and L = 64. We expect this relationship to be linear with gradient a1 in the region where the
ansatz 3.30 applies.
and a1 to give the best t. The results for the best-t =[( 1)=2] =7, plotted against
the central x of the `window,' are shown in gure 3.31. It is apparent that there is reasonable
agreement between the measured and ( 1)=2 for a range of x-values around x 1:2 (i.e.
before the exponential decay becomes too strong), and that the width of this range is larger
for the larger system (L = 64; triangles on gure). It is another demonstration of the accuracy
obtainable with multicanonical simulations that we have been able to pick out a polynomial
prefactor from data with an overall exponential behaviour.
Finally in this section, we shall use p (x) at large x to determine the scaling amplitude U0 ,
where U0 is the constant term in the expansion in powers of L of G(c ; H = 0; L) (see appendix
A). To carry out this calculation we introduce the function
F (y) ln
Z
dxp (x) exp(yx)
(3.31)
where y Hm Ld is a scaled version of the external eld. The integrand in F is thus
a scaled, unnormalised version of P can (c ; M; H ). Now, if we use the scaling ansatz p~ (x)
(equation 3.30) for p (x), it can be shown [167] using steepest descents arguments that for
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
1.4
165
L=64
L=32
ψ/7 1.0
0.6
1.0
1.1
1.2
1.3
1.4
x
Figure 3.31. Results of window-tting the exponent . Squares: L = 32; triangles: L = 64.
The lines are guides to the eye only.
large enough values of y
F (y) = b1 y1+1= + U0
(3.32)
where we have made the identication
21
U0 21 ln a 2p
1 ( + 1)
in terms of quantities in equation 3.30.
We are now in a position to determine U0 ; we determine F (y) through numerical integration
of equation 3.31 using multicanonical results for p(x), then plot F (y) against y1+1= , obtaining
an estimate for U0 by extrapolating the linear form back to y = 0. In gure 3.32 we show the
graph of F , the main gure showing just the region near the origin and the inset showing the
whole range of y up to y1+1= = 175, for which the fully saturated M = +Ld state was the
most probable. Figure 3.33 shows the eective U0 , the ordinate intercept obtained by tting
the data lying within a window of y-values, plotted against the central value of y1+1= in the
window.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
6.0
200.0
5.0
150.0
F*(y)
4.0
166
100.0
F*(y)
50.0
3.0
0.0
0.0
50.0
100.0
y1+1/δ
150.0
200.0
2.0
1.0
0.0
-1.0
0.0
1.0
2.0
3.0
1+1/δ
4.0
5.0
y
Figure 3.32. The function F (y) determined from multicanonical measurements, plotted
against y1+1= . The main gure shows behaviour near origin, while the inset shows the behaviour right up to y1+1= = 175.
It is apparent that convergence to the large-y form of equation 3.32 occurs rapidly, occurring
in the interval in which HLd=(1+) < 1, the size of eld required to drive the system out
of the critical region to a region where the steepest descents arguments are valid. Within the
linear region the estimates of U0 are in good agreement with the exact result U0 = ln 2
ln[(21=4 + 2 1=2 )=2] = 0:639912 from [11], although much the best estimates of U0 come from
fairly small y-values (see the inset of gure 3.33; at its minimum, we obtain U0 = 0:6398(1)).
There are three reasons for this behaviour: one is that the process of extrapolating back to
y = 0 for a window of y-values that far from the origin magnies the error in the intercept; the
second is that the width of p (x) exp(yx) gets smaller as the eective eld y increases, reecting
the behaviour of the susceptibility. Thus at large y, F (y) is dominated by only a few points
of p (x) and its random errors are therefore larger. Third, and probably most importantly, at
large y the states that dominate F (y) come from the extreme tail of p (x), where we have just
shown (see gure 3.30) that the nite-size p (x) deviates appreciably from the ansatz p~ (x).
Thus we would expect a small systematic change in the eective U0 .
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
167
-0.10
-0.625
-0.20
U0eff
-0.30
U0eff
-0.630
-0.40
-0.635
-0.640
-0.645
0.0
-0.50
10.0
y1+1/δ
20.0
30.0
-0.60
-0.70
0.0
20.0
40.0
y1+1/δ
60.0
80.0
Figure 3.33. The eective value of U0 given by the ordinate intercept of the linear t to F (y)
within a window of y-values, plotted against the central value of y1+1= in the window.
We should compare these results with those from [167], where U0 was estimated in the same
way, but using results from Boltzmann sampling MC simulations. As a result of the fact that
p (x) was not well determined for large x, the tting was limited to y < 10, though as we have
seen this is in fact all that is required for a good estimate of U0 .
A brief discussion of the physical signicance of U0 can be found in appendix A.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
168
3.4 Beyond Multicanonical Sampling
`It must be admitted that she has some beautiful notes in her voice. What a pity
it is that they do not mean anything, or do any practical good!'
FROM The Nightingale and the Rose
OSCAR WILDE
We shall now expand the scope of our discussion of the multicanonical ensemble, putting it in
a more general framework of other importance sampling distributions, including the expanded
ensemble [60, 124], see section 2.2.3. While discussing the expanded ensemble, we shall present
new results on the scaling of its expected MC error. Then, in section 3.4.3, we shall use
similar analysis to predict for the rst time the expected error of an estimator for various nonBoltzmann distributions, including the multicanonical, and we shall identify a near-optimal
sampled distribution for a particular quantity O and algorithm; sampled distributions of any
desired shape may be produced by a simple generalisation of the methods of section 3.2. We
shall then check our predictions by explicit MC measurement of the variance of O~ .
3.4.1 The Multicanonical and Expanded Ensembles
We shall now make more explicit the connection between the multicanonical ensemble and the
expanded ensemble that we rst mentioned in the discussion of them in chapter 2. The apparent
dierences between the two, in the way they have been formulated up to now, come from the
choice of ensemble, and we can put the two into the same framework as the expanded ensemble
by considering what the multicanonical ensemble would be like if applied to ensembles other
than the NV T ensemble, or what the expanded ensemble would be like if it were not `made
from' NV T ensembles. In the same way, it was necessary in chapter 1 to consider the Ising
ferromagnet in the NV T ensemble and the the uid system in the NpT ensemble in order for
the similarity between them to become apparent.
Consider, for example, a multicanonical system with the coecients depending on the
magnetisation, = (M ). The multicanonical partition function is:
Z=
X
f g
exp( E ()) exp[(M ())]
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
169
We can cast this into the form of the expanded ensemble (cf. equation 2.18) by writing:
Z=
X
M
Z 0(M ) exp((M ))
where
Z 0 (M ) =
X
f g
(M () M ) exp( E ())
= exp( F (M ))
So from this perspective the multicanonical ensemble appears as an expanded ensemble composed of xed-M ensembles.
This example is perhaps a little awkward since we are not accustomed to thinking in terms
of the xed-M ensemble. However, in just the same way, an expanded ensemble where each
subensemble is microcanonical is exactly equivalent to the multicanonical ensemble with E the
preweighted variable. Another example is a non-Boltzmann sampling NpT -simulation with
an (V ) designed to increase exploration of the volume macrostates, which can naturally be
regarded both as an expanded ensemble generated by putting together canonical ensembles
diering in V and as a multicanonical NpT -ensemble with providing a non-canonical weighting
to the dierent volumes. We shall use just such a simulation in chapter 4. A grand canonical
ensemble (canonical ensembles weighted by N , with an extra (N )) can be considered in the
same way. The expanded ensemble is thus the more general concept; a multicanonical ensemble
can always be described as in terms of an expanded ensemble.
Sometimes, then, it is not even clear what a particular sampling scheme should be called,
though `multicanonical' seems to be established for (E ) and (M ), `expanded ensemble' for
the system with subensembles having dierent energy functions and `expanded ensemble' or
`simulated tempering' for the system with subensembles having dierent temperatures. Because
there are no Gibbsian ensembles analogous to the systems with variable temperatures or energy
functions, they can only really be thought of as expanded ensembles; the issue is the status of the
multicanonical ensemble. We suggest that a classication on the basis of the behaviour of the
order parameter, rather than the nature of the subensembles; thus, an expanded ensemble's 's
would always be related to a G(x), where x is an intensive (eld) variable, for example, while
the multicanonical ensemble's would be related to F (X ), where X is an extensive (mechanical)
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
170
variable, such as E or M . With this nomenclature, the ensemble with xed N; p and T and
(V ) to make P (V ) at over some range of V would be called multicanonical. This classication
has the advantage of generally reecting the way that the quantities of interest are extracted
from the simulation. In multicanonical simulations, the required free energies are found by
reweighting and summing over all values of the preweighted variable (e.g. equation 3.7) while
with the multicanonical ensemble they come from the probabilities of the `end states' of the
chain of subensembles only (e.g. equation 2.19).
Because of the similarity between the them, all the methods described in this chapter for nding for the multicanonical ensemble are also applicable to the expanded ensemble, provided
that the notions of microstates and macrostates are redened slightly; the `microstates' become
the joint set ffg;f gg of coordinates and temperatures (taking the temperature-expanded ensemble as an example) and the `macrostates' become the canonical subensembles of which the
expanded ensemble is composed. In particular, the transitions between the subensembles still
form a Markov process and the TP method of section 3.2.3 can be applied, but with the proviso
that some care must be taken in the choice of starting state; since we cannot choose a starting
state that contains only one microstate|every state of the expanded ensemble contains all
the microstates of its constituent canonical ensembles|we must make sure that the simulation
has had time to equilibrate within the starting state before we permit it to move to the other
subensembles.
We should also note that parallelism of the geometric decomposition kind is more naturally
applied to an expanded ensemble than to a multicanonical ensemble. Any parallel coordinate
updating that could be applied to the canonical `subensemble' if it were a simulation in its own
right can clearly also be applied in the multicanonical simulation, and it would never occur to us
to change the value of the `eld' property that varies between the subensembles (temperature
or whatever) in one part of the simulation volume but not another. In the multicanonical
ensemble, conversely, the need to keep the preweighted variable constant during any parallel
updating may well aect our choice of the kind of particle moves or spin ips that are done.
Nevertheless, the similarity between the two methods remains in so far as the `preweighted
variable,' whether it be the energy/order parameter or the temperature/hamiltionian, must be
explored serially.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
171
3.4.2 The Random Walk Problem
The description of the dynamics of the expanded ensemble as a Markov process enables us to
derive an important new result about the expected error of expanded ensemble simulations. In
these simulations, the quantity we wish to measure is simply the ratio of the probabilities of
two of the subensembles in the chain, usually the two at the ends (this makes the analysis here
easier than for the multicanonical ensemble). For good accuracy, the system must visit both
ends of the chain several times, and so the the accuracy is ultimately limited by the time to do
a one-dimensional random walk over Nm , the number of subensembles in the chain (assumed
not to be very small). This time is rw , rw Nm2 . We now outline an argument suggested
by this fact, but which we think is fallacious, that expanded-ensemble-type calculations can
have their accuracy improved if the chain of subensembles is divided up into pieces and the
sampling performed in each piece separately. We then go on to give what we think is the correct
argument.
First, then, the fallacious argument:
Suppose that the underlying probability distribution is such that each subensemble has equal
probability, but that we are `pretending' that we do not know this and are trying to measure Pi .
This situation will in fact be realised to a very good approximation in the `production' stage
of real applications, and the small deviations from a constant Pi , while being exactly what we
want to measure, will have no eect on the random walk arguments that we are about to give.
Suppose we generate Nc counts in total in our histogram of the occupancies of the Nm
subensembles. If there were no correlations, then the number of congurations going into the
ith subensemble would be Ci , which would have a binomial distribution with mean Ci = Nc =Nm
and variance Nc (1=Nm)(1 1=Nm) Nc=Nm . Thus we would estimate Pi by Pi e=:b: Ci =Nc (or
(Ci + 1)=(Nc + Nm)) with expected error
r
C Nm
Nc
C
In practice, adjacent congurations are highly correlated. Now, as we have seen, even though
the Markov chain we are simulating is really a microscopic one between the microstates of the
system, we can treat the process that describes transitions between subensembles as a eective
Markov chain in its own right, and given that equation 3.26 is satised, it obeys its own detailed
balance condition that we can use to estimate the stationary probability distribution. Because
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
172
the underlying Markov process is is highly correlated, so is the eective process|each ensemble
will usually only make transitions to its neighbours (indeed we shall usually restrict attempted
moves to the neighbours to save wasting eort in attempting transitions that are almost certain
to be rejected). It is now tempting to apply equation 1.25, equating the eective correlation
time (that multiplies the variance of the estimators obtained) with the random walk time
rw Nm2 1. This would give for the correlated case
s
C Nm3
Nc
C
(3.33)
This would also be the scaling of the fractional error in the ratio r = C1 =CNm , the estimate
of P1 =PNm , assuming that the ends are far enough apart for the errors to be independent:
s
r = var(C1 ) + var(CNm )
r
(C 1 )2
(C Nm )2
(3.34)
This result implies that we could achieve a better accuracy by dividing up the Nm -state
expanded ensemble into b groups with one state (or a few states) overlapping between adjacent
groups. Neglecting the overlaps, the number of states in each group would be m Nm =b, and
we would devote time Nc=b to each. The estimator of r would be
r e=:b:
b
Y
j =1
C1(j) =Cm(j)
and then equations 3.33 and 3.34 would give
s
r pb (Nm =b)3
r
(Nc =b)
s
Nm3
(bN
c)
(3.35)
implying that the error decreases as b ! Nm , i.e. as the expanded ensemble is divided up
into smaller and smaller pieces. We expect that in practice errors would eventually begin to
increase again, because if b were very small a large fraction of transitions would have to be
rejected as taking us outside the group (and possibly because correlation times within some
subensembles, may become longer if they cannot connect with others at higher temperatures),
but it is not clear what the best value of b to choose would be; we would have to investigate each
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
173
b separately and measure the error directly. The core of this argument, that the correlation
time is the same as the random walk time, can be found in one form or another in [6, 24, 130].
Now let us do a rather more careful analysis of the eective Markov chain, using results from
[164, chapter 5]. Dene state occupancies Ci (Nc jj ), the number of times that state i is recorded
in Nc steps of the chain given that it starts in state j . It can be shown that in the large-Nc
limit (Nc Nm2 ) the eect of the starting state j disappears:
as before, and
Ci (Nc jj ) Ci (Nc ) = NcPi = Nc =Nm
(3.36)
var(Ci (Nc jj )) Nc Pi [Pi + 2tgii (1) 1]
(3.37)
where the tgii (1) are components of the transient sum matrix Tg (1), which is given by
Tg (1) = [I + 1] 1 1
(3.38)
It appears that we need to do a full matrix inversion to nd Tg (1), which will cost Nm3
operations, but in fact we can do it in only O(Nm2 ) operations using a trick (see [49, chapter
2]) which depends on the fact that is sparse and 1 has all its rows equal. This makes the
calculation of var(C ) economical even for large Nm ; the Nm = 1000 case requires less than one
minute on a workstation.
Let us test these predictions using a simple Markov model, where there are no `microstates'
within each state of the chain. We make the simple choice
8
>
>
>
>
>
>
>
<
ij = >
>
>
>
>
>
>
:
2=3
1=3
1=3
0
for (i = 1; j = 1) and (i = Nm; j = Nm )
for (i = 1; j = 2) and (i = Nm; j = Nm 1)
for 2 i Nm 1 and j = i 1; i; i + 1
otherwise
This has eigenvector Pi = 1=Nm as required. In gure 3.34 we show the behaviour of Nc 1 var(Ci )
p
and Ncvar(Ci )=Ci as a function of Nm for i = 1 and i = Nm =2.
It is clear that, except at very small Nm , var(C ) is roughly constant rather than increasing
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
1000
(Nc var(C)) /<C>
2.0
1/2
var(C)/Nc
1.0
0.5
0.4
0.3
state Nm/2
state 1
0.2
0.1
174
1
10
100
state Nm/2
state 1
100
10
1
1000
1
Nm
10
100
1000
Nm
Figure 3.34. Behaviour of variance (left) and fractional error (right) of the number of visits
C to states 1 and Nm =2 in a 1D random-walk over Nm states, with reecting boundaries.
p
p
like Nm, and var(C )=C increases linearly with Nm , rather than going like Nm3=2 . r=r will
behave similarly, except that it will show an even smaller deviation from linearity, because Ci
and CNm are strongly anti-correlated at small Nm , whereas they are roughly independent for
large Nm . If we now consider subdividing the range Nm into b pieces, we nd
r=r p
p
b(Nm =b) (Nc =b)
(3.39)
p
Nm = Nc
so this time the variance of the estimator r is not decreased by subdividing Nm , and therefore
in an expanded ensemble calculation we may as well always do a single random walk over the
whole range to get whatever benets of improved ergodicity are available. We shall present
results in chapter 4 that demonstrate that this behaviour is indeed what is found in a real
expanded ensemble calculation.
Where, then, does the mistake lie in the fallacious argument we gave rst? Equation 1.25
is certainly applicable, since it is based on very general principles of time-series analysis. The
error comes, in fact, from assuming that eective correlation time O in equation 1.25 is the
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
175
same as, or of the same order as, rw . the operators we need to consider are simply
8
>
<
Oi () = >
:
1 if 2 i
0 otherwise
(3.40)
so that Nc < Oi >=< Ci >.
It is shown in appendix C (equations C.1 and C.2) that
Oi =
1
X
t=1
i (t)
where t labels time in units of the basic update and the correlation functions i (t) are
2
i (t) = < O<i (0)OO2 i>(t) >< O<>O2i >
i
i
so using equation 3.40 it follows that here the i (t) are
2
P (i; tji; 0) (1=Nm )
i (t) = (1=Nm(1)=N
) (1=N )2
m
m
where P (i; tji; 0) is the probability that the simulation is found in state (subensemble) i at time
t given that it was there at time 0. Summing the 's can be done analytically, using results
from [170], for the case of a random walk with periodic boundary conditions: the result is
Oi = Nm =12. This is clearly also the result for the central state i = Nm =2 for a random walk
with reecting boundaries, which is what we have in an expanded ensemble simulation. For the
other states of the random walk with reecting boundaries, i (t) can be summed numerically,
and this reveals the same dependency on Nm, though Oi is larger.
The key result, then, is that Oi Nm; when substituted into equation 1.25, this gives the
same form for the scaling of the error in r as comes from our analysis of var(C ) in equation 3.37.
There is thus no contradiction between these two expressions, and equating the expressions for
the variance we nd
g
Oi = 1tii (1)
P 1
i
thus relating T g (1) to the more familiar Oi (though this equation is true only for the simple
delta-function form for O that is given in equation 3.40).
It is certainly rather surprising that Oi should increase more slowly than rw , but we may
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
176
rationalise this by noting that the initial decay of the correlation functions i (t) depends on the
average time to diuse away from the starting state, which is essentially a local property and
so is scarcely aected at all by Nm. It is only the long-time tail of the decay, where the system
is returning to its starting state after wandering far away, that depends strongly on Nm . The
interplay of these two eects gives rise to the observed behaviour. We should also emphasise
that the result Oi Nm does depend on the condition that the run-time of the simulation
should be rw Nm2 .
3.4.3 `Optimal' Sampling
We now return to more general questions of MC importance sampling and ask what sampled
distribution P () is optimal, given an algorithm, for the measurement of < O >can, the canonical average of O, an operator on the congurations. An estimator of < O >can can be found
using equation 3.7 for any sampled distribution P (), though in general a choice that is not
tuned in some way to O and the Boltzmann distribution will not give a useful estimator in
a time that is not exponentially long. Clearly it is desirable that the standard error of the
estimator obtained should be as small as possible, so that computer time can be used as eciently as possible; this is what we mean by `optimal.' In what follows we shall concentrate on
operators on E -macrostates, in particular the `free energy operator' O = exp(E ), and we shall
parameterise P as usual by a set of weights . We shall use O~ () to mean the ratio estimator
of < O >can coming from sampled distribution dened by .
Two concepts introduced by Hesselbo and Stinchcombe [97] are useful here for explaining
the requirements of optimal sampling. They serve to make more concrete ideas that we have
already discussed or alluded to. These concepts are ergodicity, which is measured by O (), and
pertinence, measured by Ns ( ; I ), which is the average number of independent samples that are
required to obtain the information that is sought (so here I =< O >). The total time required
for the problem is thus O ()Ns (; < O >), and this should be minimised as a function of the
weights . Of the two, the pertinence is in fact the easier quantity to handle, and the problem
of nding the sampled distribution with the best (lowest) pertinence was solved analytically by
Fosdick [96] by minimising < O 2 > < O >2 ; the result, which is unique given O and the
system, is
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
()
P fs () / < O
O>
can
1 exp[ E ()]
177
(3.41)
or, for O() = O(E ()),
O(E )
< O >can 1 exp( E )
P fs (E ) / (E ) which in terms of corresponds to
O(E ) 1
< O >can fs (E ) = ln This distribution seems never to have been used in practice; its implementation is complicated
by the appearance of < O >can itself in the expression, and its ergodicity turns out to be very
poor, for reasons we shall describe below.
The ergodicity is rather harder to deal with. It depends not only on O and the system,
but also on the algorithm, so general solutions like equation 3.41 cannot be given. Moreover
it is usually not analytically tractable, and, while O () can be measured by simulation for a
particular , this does not (at least if standard techniques are used) tell us about O (^) for
any other sampled distribution. Thus, while we could envisage nding the minimum-variance
P () by this method, treating O or var(O) as a function of the 's, to be minimised with
respect to them, this would be extremely time-consuming because there are Nm (more or less)
independent variables in and each `function evaluation' requires an entire MC simulation.
Such a procedure is likely to waste more computer time than we could hope to recoup from
more ecient sampling.
We shall now go on to discuss issues related to optimal sampling in greater depth, using
the notions of pertinence and ergodicity where appropriate. We shall show how an expression
for the sampled distribution that is very similar to equation 3.41 follows from consideration of
the structure of the ratio estimator O~ (), and discuss the ergodicity of P fs and other sampled
distributions. Then in section 3.4.4 we shall give new theory showing how measurement of
the macrostate transition matrix for one sampled distribution (the multicanonical is best for
this purpose) enables us to estimate it for any other. From this we can make an approximate
calculation of the error in the ratio estimator for any other sampled distribution, implicitly
including the eect of correlations, without needing to calculate O (^) itself.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
178
Pertinence
As we rst said in section 3.1.1, the numerator and denominator of equation 3.7 are dominated
by energies around the peaks of P can(E )O(E ) and P can (E ) respectively, as shown (for O exp(E )) in gure 3.1. The states that lie between these peaks contribute hardly anything
to either the numerator or denominator of the equation; the weight they are given by the
multicanonical distribution serves only to enable the system to tunnel between the two. This
is irrelevant to pertinence, which is related only to the information provided by independent
samples, so it is clear that the pertinence can be increased by downweighting these states.
Indeed, within the peaks themselves the contribution of a particular macrostate to the integral is
proportional to its value of P can (E )O(E ) or P can (E ), so the maximally-pertinent P (E ) should
have a shape that follows the shapes of these peaks. All that then remains is to determine the
relative weights to be assigned to sampling the two peaks; and it follows from simple errorpropagation that the fractional error of the ratio estimator is minimised when the fractional
errors of its numerator and denominator are equal. Thus, without any detailed calculation, we
arrive at the ansatz
P can (E )O(E ) + R P can (E ) can
can
E P R(E )O(E )dE
E P (E )dE
can
can )
O (E )
P (E )
= R P can(EE )O(E )dE + 1 R PP can ((E
E )dE
E
E
)
can
= < OO(E
>can + 1 P (E )
P # (E ) /
i.e.
R
# (E ) = ln < OO(E>) + 1
can
(3.42)
These equations dier from the analogous equation 3.41 only in a single sign. For O exp(E ), P fs (E ) and P # (E ) are almost identical; P #(E ) is shown (for the 162 Ising model with
= 0:48) in gure 3.35. Only in ln(P (E )) would any dierence be apparent; ln(P fs (E )) ! 1
at
O(E ) =< O > 160, while as can be seen from the inset, P #(E ) it is in fact nite there,
though very small. Thus the high pertinence of P fs (E ) is justied intuitively.
For other operators, P fs (E ) and P # (E ) are not so similar. If O is not a particularly rapidly
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
179
0.04
0
10
-5
0.03
10
P(E)
-10
10
P(E) 0.02
-15
10
-512 -384 -256 -128
0
128
E
0.01
0.00
-512
-384
-256
-128
0
128
E
Figure 3.35. Main gure: the sampled distribution P #(E ) for the 162 Ising model with
= 0:48. Inset: the same but with logarithmic vertical scale.
increasing function of E , then the peaks of P can(E )O(E ) and P can (E ) will overlap. P # (E ),
which is a weighted sum of the two peaks, then becomes a single-peaked function of E , while
P fs (E ), which always gives zero weight to the state where O(E ) =< O >, retains its doublepeaked structure. For O = E , this this means that the congurations at < E > are not sampled,
an extreme contrast to Boltzmann sampling, where these congurations are among the likeliest
to occur.
Let us now comment on the multicanonical ensemble's pertinence. There is in fact no canonical average for which it has the best pertinence; rather, it is the best if one is interested in
knowing (E ) for all E with constant fractional accuracy. Nevertheless, while its pertinence is
not optimal, because of the weight given to states between the peaks (and possibly above and
below them), it has, in the language of [97], good worst-case pertinence (and also reasonable
ergodicity, a point we shall return to). By this we mean that, whatever operator O we choose,
the multicanonical ensemble will have sampled that part of macrostate space and so at least a
tolerably accurate estimator of < O > will be obtained. The observables that it estimates worst
are those that depend only on a narrow region of macrostate space, for example < E >can ; for
these, the eort that the multicanonical simulation puts into sampling all the other macrostates
is wasted. But even in the worst case the multicanonical distribution could never need more
than Nm times more independent samples than the optimal distribution (this in the case that
the spectrum of the observable was so narrow that it depended only on one macrostate). This is
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
180
in contrast to Boltzmann sampling, and indeed to the `optimised' sampled distributions P fs (E )
and P #(E ). If we wish to nd the expectation of a dierent operator, they will in general put an
exponentially small fraction of their weight in the region of macrostate space which dominates
the ensemble average, and so would require O(exp(Ld )) times more samples. Multicanonical
sampling can never have this problem because it puts an equal amount of weight in every region
of macrostate space.
According to [97], the sampled distribution with the best worst-case pertinence is the 1=k
R
ensemble, for which P 1=k (E ) / ( E (E 0 )dE 0 ) 1 (see also section 2.2.1). This gives rather
more weight to low-energy states than does the multicanonical ensemble. However the scaling
of the pertinence, (a factor ln(
TOT ) Nm worse than an ideal estimator), is the same as that
of the multicanonical distribution, so any improvement is presumably only in the size of the
prefactor.
Ergodicity
We shall now discuss in qualitative terms the ergodicities of these various sampled distributions
when implemented with an algorithm like single-spin-ip Metropolis that can make transitions
only over a very short distances in macrostate space. For the case O = exp(E ), it is apparent
that the distributions P fs and P # in fact have very poor ergodicity; the sampling probability
is exponentially small between the two peaks for P # , and zero for P fs , which makes tunnelling
between them extremely slow. Thus, an MC simulation of normal length is in fact likely to
spend all its time in the region of one of the peaks of P fs and not to sample the other at
all. O~ from equation 3.7 then has eectively innite error: nothing has been gained over
Boltzmann sampling (where the similarly enormous error is due to lack of pertinence). Thus
it becomes apparent that the demands of pertinence and ergodicity may well be mutually
contradictory; if, to improve the pertinence of the sampled distribution, the states between the
peaks are downweighted, the ergodicity will suer and the net eect may be to degrade overall
performance. For operators such as O E , P fs would still suer from severe ergodic problems,
while P # would become satisfactory. However it can scarcely be claimed that non-Boltzmann
sampling is necessary to estimate < E >.
The multicanonical ensemble, being at over all accessible macrostates, has no such selfinicted ergodicity problems, though with the Metropolis algorithm the step size is not large
and the acceptance ratio becomes low near the ground state (see section 3.1.2). Thus the
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
181
tunnelling time between the regions that are important in the ratio estimator is rw Nm2 .
The 1=k ensemble has similarly robust ergodicity, with rw scaling in the same way. It is
shown in [97] that the acceptance ratio of the 1=k ensemble may be better than that of the
multicanonical ensemble, so giving slightly better ergodicity.
It is impossible to make this discussion more quantitative while, as we have said, analytical
calculation of O () is impossible and measuring it by MC from visited states for more than a
few sampled distributions would be prohibitively expensive. Thus, while we might imagine that
the true optimal sampled distribution for exp(E ) would be (say) similar to the multicanonical
but giving less weight to the states between the peaks (though more than that assigned by P #
or P fs ), we cannot say exactly what the trade-o between ergodicity and pertinence should be.
3.4.4 Use of the Transition Matrix: Prediction of the `Optimal' Distribution
We shall now show how, at least for the O = O(E ) problem, the diculties caused by our
inability to calculate O () may be skirted by using macrostate transition information. Suppose
we carry out a simulation with some weighting and estimate the macrostate transition matrix
ij () using equation 3.25. Now, for any microstates r, s in i and j respectively, we have
rs () = Rrs min(1; exp[ (Es Er ) + s r ])
where the matrix R (dened in section 1.2.2) describes which transitions are allowed. Similarly,
for some other set of weights ^
rs (^) = Rrs min(1; exp[ (Es Er ) + ^s ^r ])
so, returning to equation 3.27, we nd that, for j 6= i, ij (^) becomes
ij (^) =
X
r2i
P (rji)
X
s2j
rs (^)
min(1; exp[ (Es Er ) + ^s ^r ])
rs () min(1
; exp[ (Es Er ) + s r ])
s2j
r2i
min(1; exp[ (Ej Ei ) + ^j ^i ]) X P (rji) X ()
= min(1
rs
; exp[ (Ej Ei ) + j i ]) r2i
s2j
=
X
P (rji)
X
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
182
min(1; exp[ (Ej Ei ) + ^j ^i ]) ()
min(1; exp[ (Ej Ei ) + j i ]) ij
(3.43)
=
ii (^) follows from
ii (^) = 1
X
j 6=i
ij (^)
(3.44)
Equations 3.43 and 3.44 show that we can calculate the macrostate transition matrix for any
desired weighting ^ if we have measured it for single weighting. So that ij () is accurately
determined for all macrostates, it is obviously a good choice to use = xc .
Note that, in the derivation of equation 3.43, in order to be able to take the ratio of the
Metropolis functions outside the sums over the microstates, it is necessary that Er and r
should be the same for all r 2 i, and similarly for s 2 j ; thus the derivation we have just given
is valid only for energy macrostates. From equation 3.28, the detailed balance condition on
macrostate transitions, it follows that for any system where the TP method is valid we can
write
ij (^ ) = ij () exp[^j ^i ]
ji (^ ) ji () exp[j i ]
but it is not clear how the normalisation of the individual terms in (^) follows from this.
Now we wish to move from this to an estimate of var(O~(^)), the ratio estimator derived from
sampling with ^ . Rather than trying to calculate O (t) and O , we shall use an approximate
method, casting the problem into the same form as an expanded ensemble calculation and
bringing to bear the machinery of [164] just as in section 3.4.2. This will give a useful qualitative
estimate of the error of a particular sampled distribution.
We argue that the dominant source of error in O~ (^) is the tunnelling back and forth between
the states that dominate the peaks of the numerator and denominator of the ratio estimator, not
the sampling within each peak. That is to say that the problem is the estimation of the relative
weights of the peaks of numerator and denominator, not the shape of each peak individually.
So the fractional error in the estimate of the numerator is
P C (E )O(E ) exp[ ^(E )] r in C (E0 )
C (E )O(E ) exp[ ^(E )]
Np C (E0 )
P
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
183
where E0 is some state in the peak of O(E )P can (E ) (the mode say), in is some sort of correlation time for sampling within the peak, and Np is the width of the peak. We argue that in will
be similar for all (sensible) sampled distributions that we may choose, and so will not aect the
p
comparison of dierent ^ 's. Therefore, we shall need only to calculate var(C (E0 ))=C (E0 ).
The same arguments can clearly be applied to the denominator (with the mode at E1 say) and
lead to
s
O~ (^) var(C (E0 )) + var(C (E1 ))
(3.45)
O~ (^)
C 2 (E0 )
C 2 (E1 )
This is now of the same from as equation 3.34 for the estimation of r=r, and we can once
again use equations 3.36, 3.37 and 3.38 to calculate the right-hand side from the transition
matrix (^).
Thus our procedure, (for the specic case of the 2d Ising model, is the following:
0. Measure the macrostate transition matrix (xc ) (pentadiagonal) by MC.
1. Calculate (^) (pentadiagonal) using equation 3.43.
2. Block into a tridiagonal form ~ (^). By blocking at this stage, we can correctly take
into account the variation of the probability of the underlying E -macrostates with each
blocked macrostate, and so avoid the problems discussed in section 3.2.3.
3. Calculate C and var(C ) using equations 3.36, 3.37 and 3.38, and thus estimate the error
of O~ (^).
4. go to 1 and repeat until the optimal ^ is found.
Though this process is vastly faster than performing a full MC simulation for each ^ , it is
still not so fast as to make a full multidimensional minimisation over ^ feasible; there would
in any case be little point in this because of the approximations that have been made. Instead
we should choose to examine only certain likely forms of the sampled distribution. To test
the predictions of the above theory, we have performed simulations using various sampled
distributions for the 162 Ising model with = 0:48. Clearly it would be desirable to investigate
larger system sizes and dierent temperatures, but pressure of time has prevented this. For each
sampled distribution investigated, we initially predict the error of the estimator of g( = 0:48)
using equation 3.45, then perform the simulation and measure it using jackknife blocking of the
histograms. We also measure the random walk time rw between the peaks of the distribution,
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
184
and compare it with a prediction made using the mean rst passage time ij , the average time
taken to reach state j for the rst time starting at state i. Like var(Ci ), ij can be calculated
from ~ (^) (see [164]).
As candidates for the `optimal' sampled distribution, we have chosen to examine distributions
interpolating between the best-pertinence P fs distribution and one that is very similar to the
multicanonical. These sampled distributions follow the shapes of P can and OP can , in the
peaks, which are arranged to have equal weight, (where the canonical distribution is that at
= 0:48, and O exp(E )), but also add in a constant weight between them. This weight is
parameterised by w; for w 1, P (w; E ) between the peaks is linear, passing through the points
whose y-coordinates are w times the maximum heights of the peaks, and whose x-coordinates
are the E -values at the maxima. As w increases, we expect the pertinence to get worse,
because less time is spent on the peaks, but the ergodicity to improve as rw . The distribution
with w = 1 is very close to multicanonical, diering from it only because the insistence that
both peaks have the same weight means they must have slightly dierent heights, producing
a sampled distribution which is not at but has a slight slope. We also investigate sampled
distributions that put the majority of their weight in the region between the peaks, since we
anticipate this will further reduce rw . These are also parameterised by w with w > 1, but
here P (w; E ) is chosen such that ln P (w; E ) between the peaks is parabolic, passing through
the maxima of ln P can and ln(OP can ) and rising half way between them to w their average
height. To make this clear, we show the sampled distributions that have been used in gure
3.36. To generate them requires only a few seconds.
First, before comparing the predicted and actual error, we show in gure 3.37 the MC results
for g~(w) gexact for the various sampled distributions. We do this to demonstrate that the MC
results are consistent with the exact answer, taking errors into account. g~(w) is the measured
estimator = 0:48 and gexact is its exact value for the 162 Ising model, taken from [10]:
gexact = 2:0713203. A total of Nc = 2 106 lattice sweeps were performed for each sampled
distribution, performed in ten blocks. The error bars come from jackknife blocking.
Now we proceed to the results that are our real interest, the variance of the estimators of g.
We show in gure 3.38 the size of the error bars on g( = 0:48), both predicted and measured,
as a function of w. Because equation 3.45 is only a proportionality, depending on unknown
parameters, we show not the absolute error, but the error for each w divided7 by the error
7
this also corrects for the fact that the operator O is directly related to Z and thus to exp(g), not g itself.
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
185
0.04
w=0.01
w=0.03
w=0.4
w=1
w=4
w=32
0.03
P(w,E) 0.02
0.01
0.00
-512
-384
-256
-128
0
128
E
Figure 3.36. The sampled distributions used to test the predictions of the error of g( = 0:48),
labelled by the value of the parameter w that determines how much weight is put in the region
between the two peaks P can and OP can .
(predicted or measured as appropriate) from the rst point, which is at w = 0:01. We call this
quantity g=g1. We also show (gure 3.39) the predicted (from ij ) and measured values of
rw between the modes of P can and OP can for the various sampled distributions.
Agreement between prediction and experiment is generally very good; the estimate of is
approximately right throughout the range of w, while the estimate of the error bar g is very
good for w < 1, though the w = 1 (near-multicanonical) distribution has a rather smaller error
bar than predicted, while the error for large w is also rather lower than expected. We attribute
some of this discrepancy (the low value of g(w = 1), the fact that g(w = 4) > g(w = 32))
to error in our estimate of g itself, which we have not quantied; but for large w we may be
overestimating g because more weight than is taken into account by equation 3.45 goes into
the peaks of P (w). This occurs because we assumed in deriving that equation that the shape
of P (w) matches that of P can and OP can in the peaks, but in fact, because the `extra' weight
that is put into the region between the peaks is matched on to them at their maxima, for half
the macrostates in the peaks P (w) is substantially larger than it would be if it followed P can
and OP can . We correctly predicted the distribution that gave the smallest g, which was the
near-multicanonical one.
The most striking result of the investigation is probably the surprisingly small eect that
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
186
0.002
g-gexact
0.001
0.000
-0.001
-0.002
-3
10
-2
10
-1
0
10
10
1
10
2
10
w
Figure 3.37. The dierence between the measured estimator g~ and its exact value gexact =
2:0713203 for the 162 Ising model at = 0:48, for six dierent sampled distributions parameterised by w. Nc = 2 106 lattice sweeps were performed for each sampled distribution. The
error bars come from jackknife blocking.
variation in the sampled distribution has on the estimator of g. It is clear that a wide range of
near-multicanonical distributions can be used without any signicant eect on the size of the
error bar. It is also apparent that for the single-spin-ip Metropolis algorithm, no signicant
improvement on the multicanonical distribution seems possible. One's intuition, which perhaps
leads one to favour distributions with high pertinence, can be misled because of the need for
ergodicity too: the error of the highly-pertinent w = 0:01 estimator is appreciably larger than
the others, because its long random walk time means that a poorer estimate of the relative
weights of numerator and denominator is obtained. The w = 32 estimator is better because
its ergodicity remains good while at least some appreciable weight is put in the regions that
dominate the ratio estimators.
It seems, then, that the error of a particular sampled distribution can be predicted with
reasonable accuracy. Even though no large reduction in error is possible here, we could use
the method to decide between the methods of sections 3.1.1 (connect to innite temperature
states) and 3.1.2 (connect to the ground state). Given the transition matrix of a multicanonical
distribution extending from the ground state right up to the = 0 states, we should be able
to calculate the expected error of g( ) for any by the two methods. For the method using
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
187
1.2
MC data
prediction
1.0
δg/δg1
0.8
0.6
0.4
0.2
0.0
-3
10
-2
10
-1
0
10
10
1
10
2
10
w
Figure 3.38. Predicted (line) and measured (points) error bars on g, as a function of the
parameter w, shown as a fraction of their size at w = 0:01. Values for the 162 Ising model at
= 0:48, MC data gathered from 2 106 lattice sweeps per value of w, with jackknife blocking.
the probability of the ground state, the 1=k distribution of [97], which was designed for a very
similar problem, should be considered as well as the multicanonical distribution.
The reason that scarcely any improvement on multicanonical sampling is possible using the
single-spin-ip Metropolis algorithm, is that the algorithm is limited by the time taken to move
between the two regions that dominate the ratio estimator. Any large improvement would
require the use of a dierent algorithm (cluster ipping, `demon' etc). It seems likely that, as
the correlation time decreases, the `optimal' sampled distribution will become less like the multicanonical and more like Fosdick's (because, in the limit of uncorrelated congurations, when
pertinence alone need be considered, Fosdick's prescription does give the sampled distribution
of lowest variance). Thus, we anticipate that, for such an algorithm, the the use of the methods
described here could well lead to the prediction of a sampled distribution with a substantially
lower variance than the multicanonical, even though it did not for the simple Metropolis.
3.5 Discussion
We have made a thorough investigation of issues related to the production and use of the
multicanonical distribution in the free energy measurement problem (and elsewhere), and we
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
188
6
10
MC data
prediction
5
τ
10
4
10
3
10
-3
10
-2
10
-1
0
10
10
1
10
2
10
w
Figure 3.39. Predicted (line) and measured (points) values for rw , the random walk time
between the modes of P can and OP can , as a function of the parameter w. Values for the 162
Ising model at = 0:48, MC data gathered from 2 106 lattice sweeps per value of w, with
jackknife blocking.
have also investigated its relationship to the expanded ensemble method method and to other
non-Boltzmann importance sampling methods. Though the 2d single-spin-ip Ising model has
been used throughout, most of our results, particularly in so far as they reect new methods,
are not limited to this system and would be expected to be of wider applicability: we shall use
them to study a phase transition in an o-lattice system in the next chapter.
In our investigation of the generation of the multicanonical distribution we have studied
distributions preweighted both in the internal energy and in the magnetisation (this latter
case corresponding to the problem of measuring the p.d.f. of the order parameter over a
range embracing both the phases; the former corresponding more to the application of the
method to nd the absolute free energy of a single phase). We looked at several ways of
producing a suitable sampled distribution as rapidly as possible. First we examined perhaps
the most obvious method, based on the visited-states of the MC algorithm. While the absolute
performance of this method was inferior to that of the other methods, because of its slowness in
spreading out from the region sampled by the `initial' (Boltzmann sampling) algorithm, it did
serve to show the usefulness of Bayesian methods in this problem: Bayes' theorem is the natural
way of distinguishing uctuations in the observed data that are really due to the underlying
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
189
structure of the sampled distribution from uctuations that are simply the product of the
stochastic nature of MC sampling. However, as we saw, the full prior-posterior formulation of
the problem leads to the appearance of integrals of some complexity. We used a simple Normal
model to treat these approximately, though this extra eort did not help the rate of convergence
to the multicanonical distribution. In general, in fact, we should always bear in mind that it
is better, as well as easier, in regions where data is sparse, to use further MC sampling from
an updated distribution to conrm and rene the results of a simple, approximate estimation
of the underlying sampled distribution, rather than to devote the computer time to a lengthy
Bayesian analysis of the sparse data.
Secondly, we introduced a new method, the Transition Probability method, to try to overcome the slowness of convergence of the VS method by sampling all the macrostates immediately. We found this method to be of use both for energy and magnetisation preweighting,
though it performed better for magnetisation: convergence was faster (in fact almost immediate) and there seemed to be little or no residual bias, in contrast to the energy case. Indeed,
it was possible to use the TP method not just to generate the multicanonical distribution, but
to obtain accurate nal estimators of the canonical p.d.f. This promises to alleviate to some
extent problems caused by the fact that the multicanonical algorithm cannot be fully parallelised, by permitting the length of a multicanonical run to be shorter than it would have to
be if VS estimators were used, while still giving an unbiased estimator. This makes it easier
to use massive parallelism. To anticipate future results for a moment, we shall use the TP
estimator and massive parallelism in this way in the next chapter, and we shall also do further
work which will shed more light on the conditions required for the approximation equation 3.26
to be satised.
In section 3.3 we have conrmed that the Gibbs free energy g and the canonical averages for
the specic internal energy e and specic heat capacity cH are correctly produced for all values
of the inverse temperature from a single simulation. We have compared the multicanonical
ensemble with thermodynamic integration for the problem of measuring absolute free energy,
showing that the size of random errors for the two is about the same, but that the multicanonical
method copes far better with a singularity (the continuous phase transition at H = 0, = c =
0:440686:::) on the path of integration. Though it does not relate directly to the problem of
free energy estimation, we have also used the ability of the Multicanonical method to make
accessible states of very low canonical probability to produce new results on the scaling form
CHAPTER 3. MULTICANONICAL AND RELATED METHODS
190
of P (c ; M ).
In the last section we have addressed rather more general questions about importance sampling. First, we have put the multicanonical and expanded ensembles into the same framework,
and have shown how the theory we developed for use with the TP estimators of the multicanonical and ensemble is also useful in the context of the expanded ensemble, where it enables us
to show that simulation within subgroups of the subensembles of which the expanded ensemble consists does not confer any advantage. Second, because we may sometimes be interested
only in a single canonical average at a single temperature, we have addressed the question of
`optimal sampling.' We have shown that from TP measurements on a multicanonical distribution we may calculate approximately the expected variance of the estimator from any sampled
distribution. We have in fact been able to make only preliminary investigations, looking only
at O = exp(E ) for the 162 Ising model and investigating various candidate distributions explicitly, to conrm that our predictions of their variance are approximately correct. For this
observable and the single-spin-ip Metropolis algorithm, the multicanonical distribution turns
out to be very near the best that can be used. There are of course observables for which it is
far from optimal, internal energy at any particular temperature being one. But there are none
for which it is bad in the way that Boltzmann sampling; it can require at most O(Ld ) more
sampling time than an `optimal' distribution, whereas Boltzmann sampling can require a time
that is longer by a factor that is exponential in the system size.
Chapter 4
A Study of an Isostructural
Phase Transition
4.1 Introduction
Two partially intertwined threads run through this chapter. The rst is the continuing study of
the techniques relating to the generation and use of the multicanonical/expanded ensemble and
comparison of its eciency with that of thermodynamic integration (TI). We shall investigate
these matters in section 4.2 by applying TI and the expanded ensemble to the square-well solid,
just as in section 3.3.2 we applied TI and the multicanonical ensemble to the Ising Model.
We shall also, in section 4.3, continue the exploration of the use of the `transition probability'
estimators introduced in the last chapter (section 3.2.3).
The second thread is the examination of the square-well solid as a system of physical interest
in its own right. We shall in particular conrm and elaborate upon recent results that suggest
that this system displays an isostructural solid-solid phase transition [69, 171, 172, 173, 174].
Thus, at some times the focus will be on the way that the results obtained by various simulation techniques compare with one another, whereas at others it will simply be on the results
themselves and their physical meaning.
191
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
192
The Square-Well Solid and Related Systems
We shall now introduce the system that we shall investigate in this chapter and describe the
aspects of its physical behaviour that we shall study. In order to motivate the choice of the
the square-well solid, and to place our work in a wider context, we shall also briey describe
related theoretical and experimental work, in particular on colloidal systems.
Consider a system of N particles moving in a continuous three-dimensional space and interacting with one another via a simple spherically-symmetric pair-potential1 E (rij ), where rij is
the separation of the centres of two particles i and j . The potential we shall be using consists of
a hard-core repulsion, dening the diameter of the particles, and a short-range attractive force:
8
>
>
>
>
<
E (rij ) = >
>
>
>
:
1
if rij E0 if < rij < (1 + )
0
if rij (1 + )
(4.1)
as shown schematically in gure 4.1.
To Infinity
E
r
E
σ
0
λσ
Figure 4.1. Schematic diagram of the pair potential of the square-well solid; the width of the
well is exaggerated compared with its actual width in the simulations of this chapter.
The width of the attractive well is a fraction of the hard core repulsion distance ; though
1 V (r ) is more commonly used to represent this, but we reserve V for volume and keep E for internal energy,
ij
as in previous chapters.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
193
may be varied at will we shall in fact always use the value = 0:01. We choose to measure
inverse temperature in units of E0 1 (E0 is the depth of the potential well), so that E (rij ) =
for rij in the well. We shall deal exclusively in this chapter with densities where the system
is solid, and we shall examine only face-centred cubic (fcc) crystals containing N particles,
where N = 4m3 for integer m (in practice, m = 2; 3; 4). The total energy of the conguration
is the sum over all pairs of particles of E (rij ) :
E (rN ) =
X
i;j<i
E (rij )
We shall quantify the density mainly by the volume per particle v = V=N = 1=, which we
shall call the specic volume. We measure lengths in units of , so that for close-packed spheres
p
v = 1= 2.
We shall be concerned with specic volumes in the range 0:72 < v < 0:82. For comparison,
the fcc crystal of hard spheres is stable up to v = 0:96, where it melts into a uid of density
v = 1:06 [73].
The form of the potential dened in equation 4.1 does not bear more than a very rough resemblance in shape to the pair potentials usually employed in modelling the interactions between
atoms, such as the familiar Lennard-Jones potential ELJ (rij ) = 4E0 [(rij =r0 )12 (rij =r0 )6 ],
which is softer, everywhere dierentiable and has a much wider attractive well relative to the
repulsive part, with an long-range attractive tail. However, while extremely idealised, it does
bear a closer resemblance to the eective potential that may be induced between the particles
in colloidal systems.
Colloids, which occur frequently in nature, especially in biological systems, consist of particles
of one material suspended in a medium of another. The diameter of the particles is between 2
nm and 1 m, and both colloid particles and medium may be either solid, liquid or gas, though
the overwhelming majority of scientic studies have been on colloids consisting of solid particles
in a liquid medium. A detailed review of the properties of colloidal suspensions can be found
in [175]; here we are mainly concerned with the equilibrium phase behaviour of monodisperse
colloids. By suitable stabilisation we can obtain colloids where the particles behave like hard
spheres, then by adding a polymer to the solution we can induce an attractive force between the
colloid particles. This force is strictly a many-body entropic eect: when the colloid particles
are close together, the number of accessible polymer congurations is greater than when they
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
194
are separated. However, the many-body part of this eective interaction comes from excluded
polymer congurations that would intersect three or more colloid particles, so if Rg , the radius
of gyration of the polymer coils, is small (as will always be the case here), then the many-body
part is also small and the force can be well-described quantitatively by an eective pair-potential
called a `depletion potential.' The depletion potential can be thought of as an osmotic eect;
if two colloid particles approach closer than Rg , the polymer between them is squeezed out
and so they experience a net osmotic pressure from the rest of the polymer that serves to
drive them together. The range of this attractive force is thus controlled by Rg , and its overall
depth by the concentration of polymer. Its strength as a function of rij can be most easily
handled by treating the polymer coils as if they behaved like hard spheres of radius Rg in their
interaction with the colloid particles, so that the depth of the potential is simply proportional
to the volume of the `depletion region,' the region from which the polymer is excluded. The
resulting eective pair-potential is called the Asakura-Oosawa potential [176]. Even though the
shape of the depletion potential is still appreciably dierent from a square well, there is now
a much greater degree of similarity; there is a hard core and an attractive well of nite width
whose width and depth may be varied freely; in particular the well width may be chosen to be
much less than the hard core radius. We mention in passing that in colloid science the usual
choice of parameter to quantify the density is the volume fraction , the fraction of the volume
of the system occupied by the hard cores: = =(6v) in our units.
We were motivated to look at the solid phase with 1 by recent theoretical [172, 173, 174]
and computational [69, 171] results that suggest that systems with very short-ranged attractive
potentials may exhibit an isostructural solid-solid phase transition between a `dense' and an
`expanded' phase, with a phase diagram like that in gure 4.2. The qualitative structures2 of
the dense and expanded phases are shown in gure 4.3. The inner circles represent the hard
cores of the potential, of diameter ; the outer circles represent the attractive wells of width
. Examination of gure 4.1 then indicates that the interaction of a pair of particles is E0 if
the outer circle of one particle touches or cuts the inner circle of the other. Thus, in the dense
phase (top) each particle has a low energy, but the crystal is tightly packed, so the entropy is
also low; while in the expanded phase (below), each particle is in the potential well of only two
or three of its neighbours on average (but they are still close enough to form a cage to hold it
2 In drawing this gure, we are to some extent anticipating results we will not nd until section 4.3; see in
particular section 4.3.8.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
195
on its lattice site). The energy is therefore much nearer to zero, though the free volume in the
crystal is also much larger, producing a compensating increase in entropy.
β
TrL
F
S
1
CP
S
2
S
v
Figure 4.2. A schematic illustration of the phase diagram of a system with a very shortranged attractive potential, according to [172]. Between the triple line (TrL) and the critical
point (CP), two solid phases, labelled S1 and S2 may coexist. The horizontal lines are tie-lines;
if the system is prepared with a density in the tie-lined regions, it exists as a mixture of two
phases with densities given by the points at the ends of the lines. F labels a uid phase. The
dotted box contains the region that is studied in this chapter.
We have not developed a method to measure the free energy of a uid or to connect it
reversibly to a solid. Therefore we shall investigate only the part of the phase diagram shown
inside the dotted region in gure 4.2. We have no way of knowing where the triple Line is, so
it should be borne in mind that part (or possibly even all) of any coexistence curve we obtain
may be metastable: it may be energetically favourable for the expanded solid to decompose
into the dense solid and the uid.
As well as the presence of two solid phases, another unfamiliar feature of gure 4.2 is the
presence of only one uid phase, where normally we would expect two, a liquid and a gas,
becoming indistinguishable at a liquid-gas critical point. It is interesting to digress for a moment
to put this into the context of the more general theory developed in [172, 173, 174]. According
to this theory, as the range of the attractive part of the potential is altered, we produce in
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
196
Figure 4.3. Schematic (2d) diagram of the dense (top) and expanded solid phases. Each pair
of circles represents a particle: the inner circle represents the hard core of the potential, of
diameter ; the outer circle represents the attractive well of width .
sequence the three types of phase diagram shown in gure 4.4. This gure is based on results
obtained analytically3 for the square-well system in [173].
The gure on the left represents the familiar solid-liquid-gas phase diagram that is ubiquitous
in nature; for example it describes the phase behaviour of all simple atomic and molecular
systems. However, here we see that the condition for its existence is that > 0:25 [173]. If is smaller than this, as in the central gure, then the liquid-gas critical point disappears and
the phase diagram contains only one solid and one uid phase. Finally, if is extremely small,
< 0:06 [171], as in the gure on the right, we obtain the phase diagram with one uid and
two solid phases. There is thus a pleasing symmetry between long range and very short range
potentials. We shall describe the physical reasons for the existence of the solid-solid phase
transition for small in section 4.3.8.
However, we must point out that, aside from analytic calculations, the solid-solid phase
3 The calculations were based on the use of a variational principle to obtain an estimate of|or, strictly, an
upper bound on|the free energy dierence between the square-well system and an appropriate reference system.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
β
β
197
β
G
L
S
S
F
F
S
1
F
S2
S
v
v
v
Figure 4.4. Schematic phase diagrams of the square-well and related systems for > 0:25
(left), 0:06 < < 0:25 (centre) and < 0:06 (right), based on those calculated in [173]. The
limiting value < 0:25 for the disappearance of the liquid phase is taken from [173], while
< 0:06 for the appearance of the expanded solid is taken from [171]; in [173] it is suggested
that the expanded solid appears only for < 0:015. S=solid, F=uid, L=liquid G=gas.
coexistence has been observed by only one group of workers, in computer simulations of the
square-well solid [69, 171]; it has not yet been seen experimentally. Experiments looking for it
have been carried out on colloids, with inconclusive results caused by practical diculties in
working with such dense systems and with polymers with a very small Rg . It is interesting to
note that isostructural solid-solid phase transitions, each with a coexistence line ending in a
critical point, have been observed experimentally in the heavy metals Cerium and Caesium [177].
However, these phase transitions are not produced by very short-ranged interatomic potentials;
the (eective) potentials between atoms of these metals are of the `usual' long-ranged type that
would be expected to produce solid-liquid-gas phase diagrams. They are thought to be caused
by quantum eects (possibly involving localisation/delocalisation of f electrons, though there
is no agreement on the detailed mechanism).
Even the existence in real systems of phase diagrams of the second type (solid + one uid
phase) has been demonstrated only quite recently, in colloidal systems [178]; it is found experimentally that the liquid disappears when Rg = < 0:3, a value conrmed by analytic calculations
on the Asakura-Oosawa model [179]. Computer simulations have been performed on C60 with
a model potential produced by smearing and summing Lennard-Jones-type interactions [70],
and on hard spheres with an additional Yukawa attraction[180]. The results imply that the
phase diagram of C60 is of this type, almost uniquely for a fairly simple molecular material,
and that the Yukawa system also has a two-phase phase diagram if the exponential decay is
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
198
strong enough. The phase diagram of C60 has not yet been fully established experimentally.
The fact that potentials of dierent detailed shapes all produce phase diagrams of this
kind lends credence to the idea that the detailed shape of the potential is not important in
determining what kind of phase diagram occurs, only gross features like the relative ranges
of the attractive and repulsive parts. This fact, combined with a desire to be able to check
our results explicitly with [69, 171, 172, 173, 174], inuenced us in our decision to use the
square-well potential instead of a more physically realistic model like the Asakura-Oosawa
potential. We also originally intended to study a range of dierent values of , and all parts of
the phase diagram, when it would have been advantageous to have a potential whose range is
unambiguously dened. However, pressure of time prevented more than a part of this ambitious
project from being completed.
Details of the Simulation
We have used three dierent approaches to the measurement of free energy in this chapter,
each of which requires its own computer program, dierent in detail from the others. However,
there is a large `core' part of the program that they all have in common, which deals with
the basic problem of simulating the motion of the particles of the N -particle square-well solid.
This `core' program, and its implementation on the Connection Machine CM-200, is described
in appendix E; it is shown there that the most ecient way to map the problem onto the
machine is to use a mixture of geometrical decomposition and primitive parallelism, running
Nr = O(1000) independent `copies,' or replicas, of each simulation in parallel. Each replica is
quite small, containing 32{256 particles. The individual details of the modications required
to implement each method of free-energy measurement will be described in the section devoted
to that method.
4.2 Comparison of Thermodynamic Integration and the
Expanded Ensemble|Use of an Einstein Solid Reference System
In this section we shall mainly be concerned with the relative eciency of these two methods;
we shall obtain comparatively little information about the physics of the square-well system,
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
199
locating a single pair of coexistence points only. The method of free-energy measurement that
is used is the smooth transformation of the square-well solid, by way of a series of interpolating
systems, into a system for which the free energy can be calculated exactly: in this case the
harmonic Einstein solid at the same temperature and density. This technique was rst used in
[71], and is discussed in section 2.1.1. We shall measure the free energy both by using TI (as in
[71]) and by using the states along the thermodynamic integration path to make an expanded
ensemble (a case such as this, where several Hamiltonians are dened on the same phase space,
is more naturally described as an `expanded ensemble' than a `multicanonical ensemble'). We
note that this technique of transforming the energy function was used (combined with TI) in
the determination of the phase diagrams of the 3d square-well system in [69, 171]; however, the
reference system used there was the corresponding hard-sphere system, whose free energy was
taken as being already known absolutely from previous simulations and/or theoretical equations
of state for each phase. The equation of state used for the solid phase was due to Hall [181].
It is therefore necessary to modify the core simulation routine described in appendix E. The
potential function used is no longer simply
ESW =
N
XX
i=1 j<i
E (rij )
where E (rij ) is as dened in equation 4.1; in making particle moves we now use the energy
E () = ESW + (1 )EES
where EES is the potential energy of the harmonic Einstein solid:
EES = ks
N
(ri r0i )2
X
i=1
where the set r0i are the lattice sites of the Einstein solid (which are of course arranged here
to correspond to the fcc lattice sites) and ks is a spring constant. Each particle thus feels an
additional harmonic attraction to its lattice site. By varying in the range 0 1 we thus
interpolate between the pure Einstein solid ( = 0) and the pure square-well solid ( = 1).
In the expanded ensemble implementation we allow transitions between dierent values of ;
in thermodynamic integration, which we shall now describe, we simply perform a series of
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
200
independent simulations at dierent values of .
4.2.1 Thermodynamic Integration
The principle underlying this method is to use the equation
@ ln Z dN r exp( E ())
1 @
R
N
= d rR(@EN()=@) exp( E ())
d r exp( E ())
= < ESW > < EES >
@F () =
@
where < > denotes a canonical average measured in the ensemble dened by E (). Thus it
follows that
Z 1
FSW = FES + [< ESW > < EES > ]d
(4.2)
0
and at intermediate points along the path we dene
F () = FES +
Z
0
[< ESW >0
< EES >0 ]d0
(4.3)
FES can be calculated exactly: FES = 1 (3N=2) ln(ks =).
We shall estimate FSW by measuring < ESW > < EES > for a series of values of at a xed temperature and volume, tting a function to the data points, and evaluating the
integral numerically. We choose the Einstein spring constant ks to be suciently large that the
particles are prevented from moving far o their lattice sites even in the = 0 ensemble, so
that there is not an excessive variation in the `typical congurations' of the system as varies.
By doing this, we hope to keep the total free energy dierence between = 0 and = 1 fairly
small. We took the criterion that the particles should stay close to the lattice sites as implying
the following: let P (jr r0 j)dr be the probability that a particle, whose lattice site is at r0 , is
found in a shell of radius jr r0 j and thickness dr about the lattice site. Then, with Einstein
solid interactions only, it follows that
P (jr r0 j)dr = (16=)1=2 (ks )3=2 (r r0 )2 exp[ ks (r r0 )2 ]dr
So we demand that
[P (jr r0 j < R )]N > 0:5
(4.4)
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
201
where R is the radius of the sphere within which the centre of a particular particle can move
without approaching within of neighbouring lattice sites, at the prevailing density. This
criterion is thus designed to ensure that in the simulation as a whole all the particles are
further than from their neighbours (and so contain no hard-core overlaps) at least half the
time.
However, diculties arise in the = 0 and = 1 states that prevent us from implementing
this strategy exactly as described above. First, in the = 0 (pure Einstein solid) ensemble,
however large we make ks , there is a nite probability of generating a conguration where at
least one pair of particles are closer together than . Thus < ESW > is innite exactly at
= 0, though it is nite and well-behaved everywhere else. This is a consequence of the hard
core in the potential, which remains in E () for all nite without diminishing in size at
all, only disappearing precisely at = 0. In [71] this problem is handled by expanding F ()
analytically around = 0, but (as we shall explain in section 4.2.2) it can also be treated
easily with the expanded ensemble4 . Since we have an expanded ensemble simulation available,
we use it rather than TI to connect = 0 with the next state (we have used = 0:01 and
= 0:05). The second diculty, arising as ! 1, must be dealt with rather more carefully.
The Centre of Mass Problem
This problem is a consequence of the fact that ESW is invariant under translations of the center
of mass of the whole system, whereas EES is not. As a result, as ! 1, the centre of mass has
an increasingly large volume accessible to it and < EES > increases. At = 1, the probability
density of the position of the centre of mass becomes uniform over the simulation box (which
has side length L), corresponding to a mean square displacement of the order of L2 and, for the
large values of ks that we are using, an extremely large value of < EES > . This would make
the evaluation of the integral in equation 4.2 very dicult|not much easier, in fact, than if the
integrand had an integrable singularity. Many simulation points would be needed for close
to one, where < EES > is large and rapidly-varying, and these simulations would have to be
extremely long, since we would have to sample for long enough to allow the centre of mass to
wander through the whole simulation volume.
If the particle moves are made serially, it is fairly easy to solve this problem by enforcing the
constraint that the centre of mass of the particles should remain at all times at the position of
4
The possibility of doing this was also mentioned in [71].
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
202
the centre of mass of the lattice sites, preventing the Einstein energy from becoming excessively
large. This is done by accompanying every trial displacement of a particular particle, r say,
by a displacement of all the particles by r=N . The square-well energy is invariant under
this translation; the Einstein energy is not, but does not have to be recalculated by looping
over all the particles: it can easily be shown that this rigid translation of all the particles
simply produces a reduction of ks r2 =N in the total Einstein energy. The Metropolis test can
therefore still be carried out on the basis of local interactions only. Small (analytically-known)
corrections have to be made to the expressions for the free energies of both the Einstein solid
and the square-well solid, reecting the extra constraints on the system.
Unfortunately this technique (which is the one used in [71]) cannot be applied to simulations
where particle moves are made in parallel; it is not clear how much the centre of mass will move
by until all the trial moves are accepted or rejected, but the acceptance probability depends on
the energy, which itself depends on the motion of the centre of mass. In fact, all the simulations
performed in this subsection were carried out using primitive parallelism only, i.e. with serial
updating of particle coordinates within each simulation. Thus the centre of mass could have
been kept xed, and with the benet of hindsight this would have made both the simulations
and the analysis easier to perform. Nevertheless, we used in practice the following method,
which is applicable to a system with parallel updating:
For simulations carried out at = 1 only, we keep the centre of mass xed, so that < EES >
remains quite small. It is permissible to do this here even with parallel updating because
E ( = 1) ESW so the Einstein energy does not appear in the Metropolis test controlling the
acceptance of particle displacements5 . For other values of , we allow centre of mass motion
and accept the growth in < EES > . However, rather than attempting to use equation 4.2
directly, we choose a function q(), described below, which shows the same near-divergence at
= 1 as does < EES > , but which is analytically tractable. Then we split up the integral in
4.2 into two pieces:
1
Z
0
[< ESW > < EES > ]d 1
Z
0
[< ESW >
< EES > q()] d +
Z
0
1
q()d
5 We do this by calculating ~
rCM (by summing ~r over all accepted transitions at the end of every lattice
sweep), then adding ~rCM =N to all the ~ri0 ). We move the centre of mass of the Einstein lattice to follow that
of the simulation, rather than the other way round, because otherwise it is found that rounding errors in adding
~r=N to the particle positions can produce spurious hard-core overlaps
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
203
The second integral on the RHS can be done analytically or numerically, while the rst
is now much better behaved because the two near-divergences cancel. In particular, we may
extrapolate the integrand with condence from the largest value of that was tractable to
= 1. Our procedure is therefore to t a function (we used an interpolating cubic spline) to
< ESW > < EES > q(), extrapolate to = 1 and integrate numerically.
The function q() that we use is just
(N ) > N;
q() < EES
EES
(
)
where by the notation on the right hand side we mean the average energy of a single Einstein
particle with spring constant Nks evaluated in an ensemble with eective coupling (1 )Nks .
Thus, q() describes to a good approximation the energy due to the motion of the centre of mass
of the Einstein lattice, which moves almost exactly like a single particle with spring constant
Nks (the hard cores in the square-well potential keep the lattice rigid and so keep all the springs
very nearly parallel).
Written out more fully we have
q() =
=
R
3 L=L=22 Nks x2 exp[ Nks (1 )x2 ]dx
R L=2
2
L=2 exp[ Nks (1 )x ]dx
!
p
2 =4]
c
(
)
L
exp[
c
(
)
L
3
p erf(pc()L=2)
2 (1 ) 1
(4.5)
(4.6)
where in the last line we have written c() Nks (1 ). There is no analytic closed form
R
for q()d, but it can easily be integrated numerically to whatever precision is required.
(N ) > N; , though the MC data shows that there is approximate
In general < EES >6=< EES
EES
equality. However, to be condent that we can extrapolate the integrand to = 1, we require
that the approximate equality should continue to hold even for very close to one, where we
do not have MC data to conrm it. In fact, we expect the approximation to improve as ! 1,
and to become exact in the limit = 1, for the following reasons:
First, the probability density of the centre of mass becomes uniform over [ L=2 L=2]3 ,
(N ) .
and so the hard cores have no eect on the expectation value of EES
Second, as can be shown by explicit calculation, the expectation of the energy of the single
particle with spring constant Nks is equal in this limit to the energy of N independent particles
each with spring constant ks .
(
)
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
204
As a consequence, q(1) captures all of the Einstein solid contribution:
(N ) > N; = Nk L2 =4
< EES >=1 =< EES >EES =< EES
s
EES
( =1)
(
=1)
Having shown how the diculties at = 1 and = 0 may be overcome, we shall now go on
to present the results.
Results
We examine the points = 10=13, v = 0:752 and = 10=13, v = 0:72202, which are chosen on
the basis of results given in [69] as being on or very close to the solid-solid coexistence curve.
We therefore entirely avoid (for now) the dicult problem of locating the coexistence curve in
the rst place. We remind the reader that our units are such that E and F are in units of E0 ,
is in units of E0 1 and v is in units of 3 . Thus, the pressure p is in units of E0 3 and the
spring constant ks is in units of E0 2 .
We chose to examine the small system N = 32 (i.e. 23 unit cells) and to use only primitive
parallelism with Nr =4096, so the simulation parameters (dened in appendix E) are NPR3=16,
NEDGE=1 and LPV=2.
The values of at which simulations were conducted were chosen simply `by eye': rst a few
simulations were performed at =0.25, 0.50, 0.75 and 0.95, and then other simulation points
were chosen on the basis of an assessment of where < ESW > < EES > was changing
most rapidly. The largest value of used with the centre of mass free was = 0:995; we also
performed a simulation at = 1 but with the centre of mass constrained. For each simulation
point, estimates of < ESW > < EES > were output every 500{1000 lattice sweeps and
examination of the behaviour of these block averages were used to determine when convergence
had occurred: the `best estimate' of < ESW > < EES > was then obtained by averaging
over subsequent blocks. When changing , the nal conguration from a run at a nearby value
of was used as the starting conguration to reduce the equilibration time, though this still
became lengthy as ! 1. Typically 104 lattice sweeps were performed for each , of which
about 8000 were used for estimation; for = 0:99 (for v = 0:72202) and = 0:995, 2 104
sweeps were generated and (1{1.5)104 used. The simulations achieved a speed of about 7000
sweeps/hour.
Altogether, for v = 0:752, 1:42 105 lattice sweeps, representing about 20 hours' processing
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
205
time, were performed at 13 values of and 1:07 105 of them were used in evaluating <
ESW > < EES > . For v = 0:72202, 1:55 105 lattice sweeps were performed at 15 values
of and 1:35 105 of them were used. For both densities, these gures include expanded
ensemble simulations of length 5000 sweeps that were used to connect to the state = 0. We
consider in retrospect that more accurate results would probably have been obtained if more
values of had been considered and less time spent on each one, even though this would have
increased the fraction of time spent on equilibration.
Let us rst consider v = 0:752. The MC results are shown in gure 4.5. The data points show
the raw MC results for (1=N )[< ESW > < EES > ] with their error bars; the solid line is
the spline t to (1=N )[< ESW > < EES > q()]. We should note that q(1) = 114006
on the same scale as the gure; so (1=N )[< ESW > < EES > ] would reach about -3500 at
= 1. This makes it strikingly clear that to perform the thermodynamic integration accurately
without the analytic correction would be a near impossibility.
0.0
(1/N)[<ESW>-<EES>]
(1/N)[<ESW>-<EES>-q]
-20.0
-20.0
-30.0
-40.0
-25.0
(1/N)[<ESW>-<EES>-q]
(1/N)[<ESW>-<EES>-q]
-10.0
-30.0
-35.0
-40.0
-45.0
0.95
0.96
0.97
0.98
0.99
1.00
α
-50.0
0.0
0.2
0.4
0.6
0.8
1.0
α
Figure 4.5. v = 0:752. Diamonds: (1=N )[< ESW > < EES >] measured by MC; solid
line: spline t to (1=N )[< ESW > < EES > q()]. The main gure shows the whole
range of while the inset shows the region around = 1. We emphasise that the point exactly
at = 1 is generated with the centre of mass constrained to be stationary, while it is free for
all the other values of ; see the discussion of this problem in section 4.2.1.
We nd
(1=N )
Z
1
0:05
[< ESW > < EES > q()]d = 6:880(3)
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
(1=N )
so
(1=N )
Z
1
0:05
Z
1
0:05
206
q()d = 0:916356 : : :
[< ESW > < EES > ]d = 7:796(3)
And from the single expanded ensemble simulation used to link = 0 and = 0:05,
(F=0:05 FES )=N = 0:1049. The Einstein solid's spring constant, generated according
to the criterion expressed in equation 4.4, was ks = 54717, which leads to fES = 18:5305. Thus
fSW = fES + (1=N )
Z
1
0
[< ESW >
< EES > ]d
= fES + (F=0:05 FES )=N + (1=N )
= 18:5305 0:1049 7:796(3)
Z
1
0:05
[< ESW > < EES > ]d
= 10:629(3)
The corresponding results for v = 0:72202 are shown in gure 4.6.
0.0
-10.0
-10.0
-15.0
-20.0
(1/N)[<ESW>-<EES>-q]
(1/N)[<ESW>-<EES>-q]
(1/N)[<ESW>-<EES>]
(1/N)[<ESW>-<EES>-q]
-20.0
-25.0
-30.0
-30.0
-35.0
0.95
0.96
0.97
0.98
0.99
1.00
α
-40.0
0.0
0.2
0.4
0.6
0.8
1.0
α
Figure 4.6. v = 0:72202. Diamonds: (1=N )[< ESW > < EES >] measured by MC; solid
line: spline t to (1=N )[< ESW > < EES > q()]. The main gure shows the whole
range of while the inset shows the region around = 1.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
We nd
(1=N )
Z
1
0:01
[< ESW > < EES > q()]d = 9:562(4)
(1=N )
so
(1=N )
Z
207
1
0:01
Z
1
0:01
q()d = 1:0469854 : : :
[< ESW >
< EES >]d = 10:609(4)
And (F=0:01 FES )=N = 0:07724 from the expanded ensemble simulation. This time we
have ks = 460170 (it must be stronger in this denser solid to maintain the criterion 4.4, i.e. to
prevent hard core overlaps), leading to fES = 22:6829. Thus
fSW = fES + (1=N )
Z
0
1
[< ESW >
< EES > ]d
= fES + (F=0:01 FES )=N + (1=N )
= 22:6829 0:0772 10:609(3)
Z
1
0:01
[< ESW > < EES > ]d
= 11:997(3)
The errors in both cases were estimated from the size of the error bars on the MC data
points. The error is dominated by points near = 1, so only the eect of these points on
the estimate of f was considered. We have thus obtained estimates of the Helmholtz free
energy f for the two densities under consideration; however this is not sucient to establish
that these are indeed the densities of the coexisting phases at = 10=13, since the condition
for coexistence is of course that the Gibbs free energies of the two phases should be equal,
and g diers from f by a pV term6 that is still unknown, because the pressure p is unknown.
For many systems, p can be estimated from a constant-V simulation as a canonical average
< p(V ) > [45], but the expression contains the average interparticle force, which for the squarewell system is a pair of delta-functions and so inaccessible. To see what the magnitude of the
pV term is, therefore, it would be necessary to do more simulations for dierent specic volumes
around those already investigated, to establish the shape of f (v) in these two regions. Then the
coexistence pressure and the densities of the coexisting phases could be established with the
6
and 1/N corrections
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
208
double-tangent construction [47]. We do not attempt this here, but to demonstrate that good
accuracy has been achieved in the intergration estimates fTI of the absolute Helmholtz free
energy, we show in gure 4.7 fTI at v = 0:752 and v = 0:72202 together with f xc(v) calculated
from the results of section 4.3. This method does not yield absolute free energies, so the vertical
scale has been xed by constraining the two estimates to be equal at v = 0:752. The desired
consistency check is thus obtained by the good agreement at v = 0:72202.
xc
f
f TI
12.0
f(v)
11.0
12.05
10.0
f(v) 12.00
11.95
0.7200
9.0
0.71
0.7210
0.7220
0.7230
0.7240
v
0.73
0.75
0.77
v
Figure 4.7. The absolute Helmholtz free energy f as a function of specic volume v; the
dashed curve is obtained by the multicanonical NpT -ensemble of section 4.3; the absolute
additive constant being established using the absolute free energy, calculated in this section,
for v = 0:752. These absolute free energies fTI are marked by circles. The solid line is the
double-tangent to the dashed multicanonical curve.
We also show the double-tangent to the multicanonical f (v); the points of tangency estimate
the specic volumes and its gradient estimates the coexistence pressure. The results are pcoex =
44:69, < v >dense = 0:72019 and < v >expanded = 0:759032. These are appreciably dierent from
v = 0:72202 and v = 0:752 which were chosen, it will be recalled, because they were the estimates
for < v >coex given in [69]. However, we shall nd in section 4.3 that these discrepancies in
pcoex and < v > can be largely attributed to nite-size eects in this small N = 32 system; in
[69] the system size was N = 108.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
209
4.2.2 Expanded Ensemble with Einstein Solid Reference System
It is also possible to use an expanded ensemble approach to measure the free energy dierence
between the square-well and Einstein solids.
The intermediate systems are dened by a potential energy function that is once again a
linear combination of the square-well and Einstein potential energies, just as it was in the TI
approach, so that it is possible to make a direct comparison between the two methods. We
shall also briey investigate two other important issues in the use of the expanded ensemble:
we shall look at the eect of the number of intermediate states used, and we shall explicitly
conrm the result of section 3.4.2 that the subdivision of the intermediate states into groups to
be simulated separately does not aect the overall accuracy.
The simulation itself must be adapted to produce the expanded ensemble. First dene
Z (i ) =
X
f g
exp[ E (i )]
for some suitable set fi g; i = 1 : : : Nm . Now, as we know from section 2.2.3, by allowing
transitions between the `subensembles' (according to rules dened below), we may construct an
expanded ensemble with partition function
Z=
N
m
X
i=1
Z (i ) exp(i )
which, given that 1 = 0 (so that F (1 ) FES ), leads to
F (i ) FES = 1 [(i 1 ) ln(Pi =P1 )]
(4.7)
In particular, using Nm = 1, F (Nm ) FSW ,
FSW FES = 1 [(Nm 1 ) ln(PNm =P1 )]
(4.8)
We use the MC simulation to measure the last term, ln(PNm =P1 ), arranging for the set to be such that Pi Pj 8 i; j . The methods of chapter 3 could have been adapted7 to nd a
suitable F , but since it is easy to obtain estimates of F () from the TI results of section
7
In the next section we do adapt them to nd xc (V ) for the multicanonical NpT -ensemble
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
210
4.2.1 we chose to use these right from the start rather than to make an independent estimate
of starting from = 0.
We use the Metropolis algorithm both to make particle moves within each ensemble and
to make transitions between the dierent ensembles. To calculate the Metropolis function
for ensemble-changing moves is computationally extremely cheap, since we do not move the
particles while changing ensemble and so need only multiply ESW and EES by the appropriate
values of and (1 ) before and after. However, there is some cost in communication on
the Connection Machine, since clearly the prevailing value of must change in all parts of a
simulation at once, so all the subvolumes of each simulation must be considered together. There
is thus some reshaping and summing of arrays to be done before the move, and broadcasting
of the results afterwards, which requires the use of general communication routines. This
restriction corresponds to the `rule' discussed in section 3.2.5 that any attempted updates in
a single simulation of any coordinates or parameters must be made serially if they change .
However, we do of course make the trial ensemble-changing moves in parallel for all the Nr
independent simulations, the prevailing value of in each simulation being independent of its
value in the others.
Before presenting the results, we shall comment again on the end states = 0 and = 1.
First, we should note that with the expanded ensemble we have no problem reaching the pure
Einstein solid at = 0. Transitions into = 0 are not in any way special, and, while it is
the case that some attempted transitions out of = 0 will nd that the trial energy is innite,
because they come from an Einstein solid conguration where there would be an overlap of
the hard cores of the square-well potential, this simply results in a transition probability of
exp( 1) = 0. Thus, this situation is handled transparently by the algorithm, and we need only
ensure that the spring constant ks is large enough that a reasonable number of congurations
with no overlaps are generated. This is in contrast to TI, where the presence of any states with
hard-core overlaps in the = 0 ensemble prevents the evaluation of < ESW > < EES > .
As we said before, we used a simple two-state expanded ensemble to handle this in the previous
subsection. In that simple case, adequate sampling of both states was obtained without any
preweighting, that is to say, with 1 = 2 = 0.
Once again, however, the wandering of the centre of mass of the simulation as ! 1 presents
diculties. Though we believe it would be possible to nd some way to x this problem within
the expanded ensemble formalism, we have simply avoided it in practice by stopping at = 0:99
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
211
instead of extending the simulations right up to = 1. Thus, the free energies that we obtain
are of interest only in so far as they compare with corresponding measurements made with
TI, and in so far as they enable us to further investigate some other questions relating to the
expanded ensemble/multicanonical ensemble method itself.
Comparison with Thermodynamic Integration
Using the estimates obtained from TI, we constructed a weighting function xc for the system
with specic volume v = 0:752. The size of the system and all the simulation parameters are
the same as for TI. We used Nm = 20 subensembles (results for other choices of Nm are given
below), and chose the parameters so that (i ) (i+1 ) constant.
A total of 5104 lattice sweeps of the expanded ensemble were performed, with all simulations
starting in the = 0 subensemble. 3:5 104 sweeps were used for estimation of the free energy
dierences, when from examination of the visited states it was clear that equilibration had
occurred8. We then estimate f (i ) from equation 4.7, with Pi determined using simple visitedstates estimators. The error bars come from jackknife blocking.
To compare the results of TI and the expanded ensemble, we calculate f (), the free energy
at various points along the path (f () = F ()=N where F () is dened in equation 4.3).
Graphs of the two estimates of f () are indistinguishable, so we instead plot the dierence
between the two (see gure 4.8). The sizes of the random errors are comparable, with those on
the TI points slightly smaller, as one would expect given that rather more time was devoted to
it than to the expanded ensemble.
It is apparent that, as in section 3.3.2, the thermodynamic integration and expanded ensemble
estimates dier by substantially more than the random error in certain parts of the integration
range. In the Ising case, the availability of the exact results meant that we could attribute the
discrepancy entirely to systematic errors in the TI points due to a phase transition on the path
of integration. That is not the case here; lacking exact results, we can only speculate which
result is the more accurate. However, it seems more likely that the fault once again lies with
thermodynamic integration, and is again caused by choosing the simulation points too far apart
in a region where the integrand is changing rapidly; here that is the region near = 1. We
have already commented that the simulation points perhaps should have been chosen closer
8 The lengthy equilibration time, already a problem here, becomes so long in section 4.3 that special techniques
must be introduced to overcome it.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
212
0.010
free energy difference
Thermodynamic Integration (reference)
fEE-fTI
0.005
0.000
-0.005
0.0
0.2
0.4
α
0.6
0.8
1.0
Figure 4.8. The dierence between estimates of f (), the free energy per particle along
the path transforming the Einstein solid into the square-well solid. fTI from thermodynamic
integration is taken as the reference, so all points (circles) lie on the horizontal axis; only the
error bars are of interest. The other points (triangles) show fEE fTI , where fEE is the
estimate of the free energy from the expanded ensemble.
together. Nevertheless, no strong conclusion on the relative merits of the two methods can be
drawn from these data.
As in the previous section, though we have determined FSW with high accuracy, we cannot
say anything about phase coexistence because we do not know the shape of f (v) or the pressure.
To map out f (v), the procedure could be repeated for other specic volumes, or the results of
the NpT simulation could be used, as they were before.
4.2.3 Other Issues
Let us return to questions related to the most ecient use of the expanded ensemble method.
First, how many values of is it best to choose? With the Ising model there is a natural
`granularity' to the problem: that of the discrete macrostates. Here that is not the case, so we
have investigated the eect that changing Nm has on the accuracy of the results. Second, we
have used the simulation to conrm the result, derived analytically for a simple Markov process
in section 3.4.2, that the accuracy of the estimate of (PNm =P1 ) is not improved by dividing up
the Nm states into overlapping subgroups.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
Nm
5
10
20
213
fEE
ra
rw
-6.84(4) 0.009 22 000
-6.946(16) 0.21 1 900
-6.957(12) 0.54 2 200
Table 4.1. estimates of free energy dierence fEE , average acceptance ratio of -changing
moves ra and random walk time rw (in lattice sweeps) for the square-well solid with various
Nm .
First, then, let us consider the eect on the accuracy of the estimate of fEE f ( =
0:99) f ( = 0:0) of dividing up the range of into 20, 10 and 5 parts. The values of to be
used are generated from the TI results, arranged so that Fi+1 Fi is constant for all states i.
(though equation 2.20 for the prediction of spacing of states given in section 2.2.3 suggests that
this may not have been the best policy). The probabilities of the subensembles are estimated
with the visited-states (VS) method.
The results are shown in table 4.1. The data for each value of Nm were produced from
6 blocks of 670 `sweeps' each, with two such blocks discarded for equilibration and all the
simulations started in the = 0 state. One sweep here comprises one attempted update of all
the particles and one attempted change of the prevailing for each replica. The value of rw
is a rough estimate only, calculated from the fraction of the Nr =4096 simulations that in fact
did make a random walk during the course of one block.
The results do demonstrate clearly that reducing Nm to a very small value (5 here) impairs
overall accuracy by reducing the acceptance ratio ra . They also suggest that there is quite a
wide range of Nm that gives an acceptable accuracy; both Nm = 10 and Nm = 20 would be
usable. The random walk time is about the same for these two; the larger acceptance ratio of
the Nm = 20 simulation compensates for the greater distance to be covered.
We should note that the results obtained here for fEE , when compared with the results
of the longer runs that we are about to present, suggest a systematic underestimate of the
magnitude of fEE . This is probably attributable to insucient equilibration time, given that
all the replicas are launched from a single state, the = 0 state. This explanation is reinforced
by the particularly poor agreement of the Nm = 5 results, where equilibration over was the
slowest, because of the low acceptance ratio.
Now let us concentrate on the case where a total of 20 states are used in the range 0 <
< 0:99, and investigate the eect of subdividing the range into b subgroups, with each
replica simulation restricted to one subgroup, and with one state in common between adjacent
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
b
214
Nmj
fEE
1
20
-6.9715(10)
2
10,11
-6.9680(9)
4
6,6,5,6
-6.9717(12)
9 4,3,3,3,3,3,3,3,3 -6.9694(9)
Table 4.2. The behaviour of the expanded ensemble estimator fEE upon subdivision of
the range of the expanded ensemble. Nmj represents the number of subensembles in the j th
subgroup (j = 1 : : : b) and so shows how the subensembles are allocated to the subgroups. The
estimates of the error are scaled by the square root of the total number of sweeps performed.
subgroups. Once again the simulations are performed in blocks, here 5{10 blocks of 670{200
sweeps each, and calculations are performed on each block so that the standard error of the
nal estimators can be determined. The measured standard errors have been corrected for the
variation in the total number of lattice sweeps performed. Probabilities are measured with VS
estimators, where the probability P~i of a state is estimated from the number of visits to it Ci , so
there is an increasing need to discard early iterations as b decreases, as the time for equilibration
over the subensembles lengthens; for the b = 1 case it is necessary to discard the rst four of
thirteen blocks. Use of transition probability estimators with a uniform initial distribution
of replicas over the subensembles (see section 4.3) might have reduced this problem, though
equilibration within the high- ensembles would still have been dicult. The results are shown
in in table 4.2.
The most signicant results are the measured sizes of the error bars on the estimates of
fEE , which seem unaected by the process of dividing the range. Certainly our results allow
us to condently exclude the `fallacious argument' of section 3.4.2, which would suggest that
fEE should be estimated with three times the accuracy of a single expanded-ensemble run
when the range is divided into nine parts. The data thus provides reasonably strong evidence
in support of the argument of that section, that (P1 =PNm ) is independent of b.
4.3 Direct Method|Multicanonical Ensemble with Variable V
We now wish to estimate the phase transition pressure and the volumes of the coexisting phases
for the square-well solid, using a direct method that eliminates the need for a reference system.
We shall do this by applying the multicanonical ensemble that was introduced in chapter 3. We
shall study nite-sized systems by generating and using a at sampled distribution extending
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
215
over a wide range of macrostates, which in this case are macrostates of V and must embrace the
volumes of the coexisting phases. This distribution can then be reweighted to obtain estimates
of F (; v) and the canonical probability density PNcan(p; v) for a range of values of p. From this
we can nd the coexistence pressure pcoex and canonical averages.
Away from coexistence, PNcan (p; v) consists of a single Gaussian peak, but near pcoex it
develops a double-peaked structure as shown schematically in gure 4.9. Each peak corresponds
to one of the coexisting bulk phases. It is obvious that the natural nite-size estimator of pcoex is
that for which the two peaks have equal weight, that is to say PNcan (phase A) = PNcan(phase B) =
R
1=2, where PNcan(phase j ) = v2phase j PNcan(pcoex ; v)dv. We shall discuss in section 4.3.6 below
the way that this and other nite-size estimators of pcoex approach the innite-volume limit.
We note that, except near the critical point or for very small systems, there is a region of very
low canonical probability between the two peaks, which will be enhanced in the multicanonical
sampling.
P(v)
v
Figure 4.9. Schematic diagram of a typical canonical probability density P can(v) for the
square-well solid at phase coexistence.
We shall present and comment upon the results that are produced in section 4.3.5; however,
since there are some important dierences between the way that the multicanonical ensemble
is generated and used here, and how it was used in chapter 3, we shall rst devote some time to
explaining and justifying the procedure adopted in this chapter. We shall rst describe how we
have implemented the variable-volume multicanonical ensemble, then we shall explain why it is
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
216
that for this system (and this computer) rw , the time required for even a single random walk
over all the volume macrostates of the square-well system, becomes prohibitively long. Then
we shall describe in detail the procedure for estimating the sampled distribution in spite of this,
showing in particular that the transition probability (TP) estimators introduced in section 3.2.3
here outperform visited states (VS) estimators in all stages of the simulation process. Thus,
there are two main parts to this section: in the rst we describe the technique we use to produce
the results; in the second we concentrate on the physical behaviour of the square-well system.
4.3.1 The Multicanonical NpT -Ensemble and its Implementation
The appropriate Gibbsian ensemble for describing this system is dened by the partition function
Z
Z 1
dV dN r exp[ (pV + E (rN ))]
ZN (p) =
V
0
with associated probability densities
PNcan (rN ; V ) = (1=ZN (p)) exp[ (pV + E (rN ))]
and
PNcan (V ) = (1=ZN (p))
Z
V
(4.9)
dN r exp[ (pV + E (rN ))]
It is quite easy to construct a Monte-Carlo scheme for sampling from this distribution
(constant-NpT MC); however, as described in section 1.2.3, the barrier of low probability
between the phases in this case means that a very lengthy simulation would be needed to estimate the relative probabilities of the two phases, even if they were of roughly the same order of
magnitude. Otherwise, the best that can be done is probably to put a wide bracket on pcoex:
pcoex is certainly less than a pressure ph that drives a simulation started in the rare phase into
the dense phase, where it is then observed to stay, but it is certainly more than pl that allows
a simulation started in the dense phase to pass into the rare phase where it then stays.
Instead, we chose a multicanonical approach with the order parameter V preweighted by
xc (V ) such that the sampled distribution is approximately at over some range of V , from V
to V! say. The multicanonical p.d.f.s are thus
PNxc(rN ; V ) = (1=ZNxc) exp[xc(V ) E (rN )]
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
and
PNxc(V ) = (1=ZNxc)
Z
V
217
exp[xc (V ) E (rN )]dN r
and we recover the canonical probability using
PNcan(p; V ) / exp[ pV xc (V )]PNxc (V )
which we normalise using
Z
V!
V
PNcan (p; V )dV 1
Z
0
PNcan (p; V )dV = 1
This gives a good estimate of P can for any p, provided that p; V and V! are such that equation 4.13 is true (i.e., provided that the canonical p.d.f. has eectively all its weight in the
multicanonical sampling range). Thus it is not necessary to know pcoex a priori, only to have a
rough idea of the volumes of the coexisting phases so that V and V! may be chosen to bracket
them9 .
Computational Details
We start from the `core' CM program described in appendix E. We mention at this point the
choice of the maximum displacement x of particle moves in the congurational updates. The
same x was used for all the volumes and it was varied only a little between simulations done
at dierent temperatures; its values was x= 0:006 (about half the width of the potential
well). With this choice, the acceptance probability varied between about 0.25 in the states with
lowest V to about 0.9 in those with highest V . However, to implement the Multicanonical NpT ensemble we need to make volume-changing moves as well as coordinate updates. The volume
changes are realised by making uniform contractions or dilations of the box that leave the
relative positions of the particles unchanged. To avoid the necessity of updating the particles'
position coordinates whenever a volume change is accepted, we work with scaled coordinates
s = r=L where L3 = V . As a consequence, the potential energy function becomes a function
Even if we do not dene this interval correctly, the method is robust; we either nd a single-peaked
the location of the phase boundary, or a P can (pcoex; V ) that inIn either case the necessity of widening the interval is clear, and, at
least in the second case, it is clear how this should be done.
9
P can (pcoex ; V ) straddling our estimate of
creases up to V1 or VNm and then cuts o.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
218
P
of V : E 0 (V; sN ) = i;j<i E 0 (V; sij ) where
8
>
>
>
>
<
E 0 (V; sij ) = >
>
>
>
:
1
if sij =L
E0 if =L < sij < (1 + )=L
0
if sij (1 + )=L
i.e. there is a volume-dependent eective hard-core diameter 0 = =L.
Particle moves are made (in parallel within the same simulation, where possible) using the
usual Metropolis rule, which is here:
P (sN ! s0 N ) = min(1; exp[ (E 0 (V; s0N ) E 0 (V; sN ))])
(4.10)
The volume changes are made by discretising the range of V into a discrete10 set fVi g,
i = 1 : : : Nm. The Metropolis rule is used in the modied form
P (Vi ! Vj ) = min(1; exp[N ln(Vj =Vi )+ xc(Vj ) xc(Vi ) (E 0 (Vj ; sN ) E 0 (Vi ; sN ))]) (4.11)
where the (scaled) coordinates are left unchanged and j is restricted to be i + 1 or i 1
chosen with equal probability (trial moves that would take us outside the chosen range of V
are immediately rejected). The N ln(Vj =Vi ) term reects the Jacobian of the transformation
from r to s coordinates in the partition function ZNxc. Even though the conguration does not
change, we must still recalculate all the particle-particle interactions, since the eective hardcore diameter 0 does alter, and then sum E 0 (V; sN ) over all the subvolumes of each simulation.
This requires general communication on the CM, and so is quite an expensive process. Our
procedure is to perform `sweeps' consisting (usually) of one attempted update of the positions of
all the particles and one attempted volume change. One `iteration' then consists of Ns sweeps,
after which we update . We shall discuss the eect of the relative frequency of coordinate
updates and volume changes in section 4.3.3 below.
10 We do not believe that it is essential to discretise V {it could be left continuous and the transitions grouped
into a histogram. However, the TP method would then encounter the same diculties that we found in section
3.2.3, coming from imperfect equilibration within each V -state; in particular, underestimation of the eigenvector
of the sampled distribution would result|though as we shall see, this problem may arise anyway.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
219
The ensemble produced by this update procedure has partition function
Z^Nxc =
N
m
X
i=1
ViN exp(xc (Vi ))
Z
0
1
dN s exp[
E 0 (V; sN )]
and we measure
P^Nxc(Vi ) = (1=Z^Nxc)ViN exp(xc (Vi ))
Z
1
0
dN s exp[
E 0 (V; sN )]
and then reconstruct the canonical ensemble by
Z
Vr
Vq
P
PNcan (p; V )dV / 0 i Vi exp[ pVi xc (Vi )]P^Nxc (Vi )
(4.12)
where Vi = Vi+1 Vi and 0 signies that the sum is restricted to the range Vq < Vi < Vr .
As before normalisation follows from
P
N
m
X
i=1
Vi exp[ pVi xc (Vi )]P^Nxc (Vi ) Z
V!
V
PNcan(p; V )dV 1
(4.13)
4.3.2 The Pathological Nature of the Square-Well System
We have not yet said how the set fVi g is to be chosen; as in an expanded ensemble calculation,
there is no `natural granularity' to the macrostate space, so we should choose the number and
spacing of the states to give the best performance. As we outlined in section 4.2.3, there is a
trade-o between the ra , the acceptance ratio of volume-changing moves, and the number of
(accepted) steps required to cross the macrostate space, and near-optimality is obtained for a
fairly wide range of possible choices, as long as the acceptance ratio is kept fairly high ( 0:5).
A suitable way of choosing the states an of expanded ensemble simulation is described in section
2.2.3. However, the square-well system presents unusual diculties in this regard: because of
the hard core in the potential, any trial volume change that produces even a single hard-core
overlap will be rejected, which means that only small volume changes have a reasonable chance
of being accepted. Thus we nd that Nm must increase rapidly with N , and must be made
quite large even for a small system, to avoid a very low acceptance rate. Moreover, the problem
is more acute for small specic volume, where the particles are more closely crowded together,
so the spacing of the states should vary with V to achieve a roughly constant acceptance ratio.
In practice, for N = 32 we experimented to nd spacings that gave an acceptance ratio of about
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
220
0.1{0.5. We used only two values of V , the smaller at low volumes and the larger (twice as big)
at high. For N = 108 and N = 256 we used a rough ansatz for the dependence of ra on V and
V to generate a suitable set fVi g, with Nm determined by the desired starting and nishing
volumes. This method turned out to be quite eective in keeping ra constant; we nd that for
the N = 108 system at = 1:0 it varies between 0.3 and 0.7 while for the N = 256 system at
the same temperature it varies between 0.3 and 0.6 (it is higher at the low-volume end, so we
are slightly overcompensating for the increased diculty of accepting volume transitions here).
For comparison, ra varies between 0.4 and 0.05 for the N = 32 system with two dierent r's.
More important is the scaling of Nm with N . The procedure just described, which as we
have said is found to keep ra roughly constant, produces Nm = 281; 1292; 2949 for = 1:0 and
N = 32; 108; 256, so that it seems that Nm N (which can also be predicted by approximate
scaling arguments). In a normal expanded ensemble calculation, by contrast, we would expect
that Nm would not need to be so large even for the smallest N and that to keep ra constant
would require only Nm N 1=2 , since the scaling of the microstate space to be covered ( N )
would be partially cancelled by the N 1=2 scaling of the size of typical uctuations (see section
2.2.3 and [124, 125]). It is this necessity of using large numbers of volume macrostates, combined
with the rather slow speed per replica simulation of the Connection Machine, that produces the
very long random walk time rw . As we shall show in section 4.3.4, this forces us to modify
the way that the multicanonical ensemble is used, extending the use of the TP estimators to
all stages of the simulation process.
We shall now go on to explain how an approximately multicanonical distribution is generated
(section 4.3.3), and then (section 4.3.4) how it is used to produce the desired nal estimators of
canonical averages and quantities related to the phase transition. The procedure is, once again,
an iterative one, with the iterations numbered with n as in chapter 3.
4.3.3 Finding the Preweighting Function
Here we shall deal with the `nding' stage of the multicanonical distribution, where n converges
uniformly towards its `ideal' value . In section 3.2.3 we have already established the utility
of using TP estimators in this stage of the process, so we shall not justify their use further.
However, we shall nd that the way that coordinate updates and volume changes are separated
in the square-well system enables us to gain useful further insight into why the convergence
process takes place as it does.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
221
Let us show how the nding process works by considering it in action for an N = 32 system.
We start with all the particles on their lattice sites, and for each simulation choose the volume
macrostate index i with uniform probability in 1 : : : Nm . (Thus we start with the Nr replica
simulations distributed fairly uniformly through macrostate space, rather than launching them
all from only a few macrostates, as in section 3.2.3). We then perform about 1000 equilibration
sweeps through the lattice, updating the particles' positions but not yet attempting the volumechanging moves. The purpose of this is to allow the Markov chains to reach equilibrium at
constant volume, so that P (sN jv), analogous to P (ji) from section 3.2.3, has its canonical
form. Because the random walk time is so long for this system, it is much quicker and easier to
establish equilibrium this way than it would be by allowing the replica simulations to spread out
over all these volume states from only a few release macrostates. Then we allow volume changes
as well, and gather histograms of volume transitions over short iterations with Ns = 250. This
time is much shorter than rw , so each replica will explore only its local macrostates; however
we can nd estimators for the whole macrostate space by pooling all the transitions of all Nr
replicas into a histogram Cij at the end of each iteration (Cijn at the end of the nth iteration).
Then we use Cijn to estimate the TP matrix nij (using equation 3.25), and update n using the
simple scheme
n+1 = n ln P~ n + constant
(4.14)
where the estimator P~ n is the eigenvector of the estimated transition matrix ~ij . This is the
same scheme that was used in section 3.2.3. In so far as P~ n is a good estimator, we would
expect n+1 to be multicanonical.
In fact, as we shall see, n tends to converge to a limit over the course of several iterations,
just as in section 3.2.3. As always, the accuracy of the TP estimators is reliant on equation 3.26
being satised, which means here that we must be able to achieve equilibration at constant
volume, even for those volumes that lie between the two equilibrium phases, in the initial
constant-volume equilibration phase, and then we must be able to re-establish it by coordinate
updates after each accepted volume change. We shall present strong numerical evidence below
to show that this approximation works well while converging to , and shall show that it is
essentially exact for the multicanonical distribution. However, it does mean that our method
would not be applicable to a system like a spin-glass, where equilibration is very dicult except
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
222
in a particular set of macrostates. To simulate a spin-glass that required the same computational
eort per update as the square-well system would probably require a substantially more powerful
computer than the CM.
We now comment briey on the choice of Ns . It is invariably the case that the initial Pcan (V ),
and the early (small-n) iterates P n (V ), have almost no weight in the low-volume states, so that,
notwithstanding the slow evolution of V , these tend to become unoccupied as the simulation
proceeds. This implies that n should be updated rapidly on the basis of relatively few lattice
sweeps per simulation, so that the simulations do not have time to move far, and it is not
necessary to re-initialise and re-equilibrate them to access the low-volume states once again11.
There is clearly a trade-o to be made between this and the requirement that enough transitions
should be recorded for Cij to estimate ij accurately (an eective `fullness criterion', cf. NTP in
section 3.2.3). We shall return to this matter, and to the eect of the amount of congurational
updating performed between V -transitions, a little later.
In gure 4.10 we show the results of the `nding-' process for the N = 32 system with
= 10=11, Nm = 237, Nr =4096, Ns = 250 (the results are shown as a function of v = V=N ;
this will be our policy from now on, as it allows results from dierent system sizes to be
more easily compared). It is apparent that , shown in the large upper gure and its inset,
converges to xc (covering 36 decades of probability) within 8 or 9 iterations, which require only
about 25 minutes to perform on this small system. The lower gures show the VS histograms;
they are not used for updating but are shown to give an indication of the distribution of
the positions of the simulations at each stage of the iterative procedure. An initial tendency
to `equilibrate' by moving to high volume states, which have high equilibrium probability for
0, is quickly reversed (though the lowest-volume states do briey become unoccupied in the
2nd iteration). We emphasise that the histograms do not reect in any real sense the underlying
sampled distribution; the simulations do not move far enough (an average of only 1=30th of
the width of the macrostate space) during each iteration for the eect of the starting state to
disappear. Thus, here, where the starting states are uniformly spread out, the histograms give
the impression of a sampled distribution that is more uniform than is really the case; wherever
in the macrostate space they were initially clustered, they would seem to indicate that that
region was the most probable, irrespective of its real equilibrium probability.
11 While it is true that simulations that had left the lowest states would eventually occupy them again, once
a multicanonical distribution over that part of macrostate space had been established, the time to re-occupy
these states (under a random walk) is much longer than the time to leave them (under a directed walk).
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
223
To show the inadequacy of VS estimators (see section 3.2.1) at this stage of the process, we
show in gure 4.11 the 2 generated by the TP method above, together with two VS estimators
of it. One (`Visited States (i)') shows the eect of trying to use the VS histogram fCi g1 from
the same iteration, where the replicas were originally spread uniformly. The other (`Visited
States (ii)') derives from a dierent simulation, but of the same length, where the simulations
are started more or less with their equilibrium canonical distribution over the volumes (so that
they can reach it in the same equilibration time as was given to the TP simulation12 ). This
estimator is thus the best that one could expect to do by visited states (without extrapolation).
It is apparent that the TP estimator gives by far the best estimator of the true shape of .
Most of the larger systems (N = 108 and N = 256) are treated in practice by rst using
simple Ld FSS (see section 3.2.4) of xc from a smaller simulation at the same temperature to
give an initial estimate of , which is then rened with the transition method as before. The
FSS estimator is normally found to be very close to xc , (with the discrepancy getting larger
as decreases) so only one or two iterations are required before only random uctuations in
are observed, signaling the end of the `nding-' stage, and it is possible to move to the
`production' stage.
However, to demonstrate that FSS is not essential, we also show the process of nding xc
from a zero start for N = 108, (with Nm = 1292, = 10=11 Nr =4096 and Ns = 400) in gure
4.12.
12 The equilibration time from a start where the simulations were all in the state of highest specic volume
would be more than ten times as long.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
80.0
80.0
η2
η3
η4
η5
η6
η7
η8
η13
60.0
η
224
40.0
75.0
η
70.0
65.0
0.715
0.720
0.725
0.730
0.735
v
20.0
0.0
0.70
0.75
0.80
0.85
v
C
8000.0
8000.0
6000.0
6000.0
C
4000.0
2000.0
0.0
0.70
4000.0
2000.0
0.75
0.80
0.0
0.70
0.85
0.75
v
0.80
0.85
0.80
0.85
v
35000.0
8000.0
30000.0
6000.0
C
25000.0
C
4000.0
20000.0
15000.0
10000.0
2000.0
5000.0
0.0
0.70
0.75
0.80
v
0.85
0.0
0.70
0.75
v
Figure 4.10. Top gure: the convergence of the preweighting for = 10=11 with N = 32.
We show 2 to 8 (1 = 0), with Ns = 250. We also show the preweighting function 13 ,
produced after two more iterations of 250 sweeps and three of 2500. In the inset, a detail of
the low-volume end is shown. The gures below show the histograms of VS fCin g (pooled for
all 4096 simulations) for iterations 1 (top left), 2 (top right), 5 (bottom left) and 13 (bottom
right). It is apparent that the initial tendency to move out of the low-volume states, which
have low equilibrium probability for 0, is reversed by iteration 5 and the distribution of
simulations through the macrostate space is once again approximately uniform for the longer
iterations like iteration 13.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
225
η2 (Transition Probability)
η2 (Visited States (i))
η2 (Visited States (ii))
40.0
30.0
η
20.0
10.0
0.0
0.70
0.75
0.80
0.85
v
Figure 4.11. Estimators of 2 evaluated using the TP estimator and two VS estimators.
Ns = 250 for all. The rst VS estimator is produced by using the histogram C 1 from the same
iteration that gave the TP estimator; the second derives from a simulation where the replicas
are initialised with their equilibrium canonical distribution.
Once again the process of approaching to within a few percent of the multicanonical distribution is quite rapid; to generate the estimators in gure 4.12 requires only 4 hours of processing
time, which is about 30% of the time spent on the production stage.
We have not made a detailed study of the eect of the length of each iteration on the speed
and stability of the algorithm during the `nding' stage. However, the following approximate
argument may be used (and was used in the generation of the data in gure 4.12) to predict
a value of Ns that is found empirically to be more than adequate for stability, while still
maintaining a distribution of the simulations that is at all times fairly uniform over fVi g. Let
the average number of visits to each macrostate per iteration (summed over all the replicas) be
Nv . Approximate the transition matrix by i;i+1 = a8 i, i;i 1 = c8 i. Let R = PNm =P1 . Then
R (a=c)Nm
so, assuming the transitions (and the resulting estimates of ij ) are all independent and using
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
300.0
226
300.0
280.0
η 260.0
200.0
η
100.0
0.0
0.70
η2
η3
η4
η5
η xc
240.0
η2
η3
η4
η5
η xc
220.0
0.710
0.720
0.730
0.740
v
0.75
0.80
0.85
v
Figure 4.12. The convergence of the preweighting for = 10=11 with N = 108, Ns = 400
and Nr =4096. We show 2 to 5 (1 = 0). We also show the eventual nal preweighting
function xc, produced after 13 iterations of 2000 sweeps but starting from a FSS estimator.
In the inset, a detail of the low-volume end is shown. Visited states histograms are not shown,
but are similar to those in gure 4.10.
simple error-propagation, we nd
a=c)
R=R Nm ((a=c
)
p
(a=c)=(a=c) is controlled by Nv and (largely) the smaller of a and c; taking it to be c, we
have
(a=c) p 1
(a=c)
cNv
Now Nv Nm = Ns Nr , and for stability we may demand R=R = O(1), which leads to
Nm2
Ns cN
r
(4.15)
In fact, the results for various test runs imply that the algorithm is robust down to a value
of Ns rather smaller even than this; the value of Ns = 400 used for N = 108 above was arrived
at by using equation 4.15 but with c = 1, and the algorithm still converged to , though with
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
227
much more noise, with Ns = 150 and Nr = 1000. Certainly the 250 sweeps allowed for N = 32
was far more than required. Equation 4.15 has certain intuitively appealing properties; one
would expect that for a single serial computation, the length of time required to estimate R to
a minimal accuracy would be approximately equal to rw , and so increase as Nm2 ; the result
4.15 then shows that, on a parallel computer, this time is reduced by Nr , the number of replicas
run in parallel13. We should note that there is no contradiction between this result and the
results of section 3.4.2, which apply to a slightly dierent system. There, we were considering a
single random walker, and the necessary assumption was made initially that the total run-time
Ns was much greater than rw Nm2 .
The most striking dierence between gures 4.10 and 4.12 is that the N = 108 simulation
actually converges faster than the N = 32, despite having the smaller Ns . As we shall conrm
below, this is in fact the result of better equilibration between volume-changing moves; the
rate of convergence does not depend on Ns to any measurable extent. As in section 3.2.3, the
TP method gives an accurate estimator of P n (v) to the extent that equation 3.26 is true, i.e.
to the extent that P n (sN jv) has and maintains its canonical value. Since the volume changes
preserve the conguration, and P n (sN jv) varies with v, this must be dependent on the amount
of congurational updating done per sweep. In the N = 108 simulation, all the particles'
coordinates were updated between attempted volume changing moves, but, because of the way
that the N = 32 simulation was mapped onto the CM, this was in fact not the case there. In
the limit of perfect equilibration, nij =nji should estimate Pjn =Pin exactly whatever the sampled
distribution, so immediate convergence would be observed, that is to say, 2 would already
be multicanonical (apart from the eect of random uctuations). The underestimate of the
dierence between the present sampled distribution and the multicanonical distribution is the
result of incomplete equilibration, just as it was for Ising energy in section 3.2.3. There are two
issues arising from this that must be checked. First, and more importantly for the nal accuracy
of the results, it is necessary to establish whether the same is the xed point of the iterative
process (apart from random uctuations), whether or not equilibration is good. To check this,
we have run simulations each starting with the eventual limiting xc obtained for N = 32,
= 10=11, with imperfect equilibration between volume changes. All the simulations perform
the same number of volume updates, but dier in the number Neq of coordinate updates (of
13 it also implies that the accuracy in R obtained is independent of whether or not the range of macrostates is
subdivided.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
228
all the particles) that they attempt between them. Several blocks of data are generated for
each Neq (and without updating xc), so that error bars can be obtained to see if there is
any evidence that xc(Neq ) alters systematically away from . It is found that there is no
discernible Neq -dependent change in xc even when several volume changes are made for every
coordinate update. Thus we expect that, whatever Neq , the algorithm will still eventually
converge to the correct multicanonical limit, if it converges at all. We shall comment again on
the special status of the multicanonical distribution a little later.
Having reassured ourselves that the multicanonical limit is correctly found, we now show
in gure 4.13 the eect of varying Neq on the early stages of convergence of n (for the same
N = 32 system, again with the same number of volume updates), to conrm that this is indeed
the cause of the dierent rates of convergence in gures 4.10 and 4.12. The iterative process is
started from = 0, and 4 iterations performed, with updated after each iteration this time;
all the simulations had Ns = 50, but Neq varies between 0 and 8.
This time increasing the amount of equilibration performed during each iteration does have
a clear eect: it increases the speed of convergence to the multicanonical limit. This occurs
because the extent to which equation 3.26 is satised controls the extent to which equation 3.28
provides a good estimate of P n (v). That equation 3.26 is not satised immediately is reected
in the fact that convergence to is not immediate.
However, this does not explain why it is that the eigenvector of n continually underestimates
the change required to reach . To understand this, it is necessary to consider what is occurring
physically in the simulations. Initially, they clearly tend to drift to higher volume states. Moves
to higher volumes can be made freely, since the conguration is preserved so there can be no
hard-core overlaps and there is only the energy cost to consider, while the reverse moves to
lower volume are strongly suppressed by the likelihood of a hard-core overlap. However, as Neq
is reduced, the moves to higher volume are largely unaected, while the moves to lower volume
become more likely. This occurs because they are likely to be simply reversals of a move that
came from the lower volume state on the previous sweep; to the extent that equilibration is
imperfect, the conguration is preserved from that sweep, and so less likely to contain a hardcore overlap than one that truly reects P n (sN jv). Thus, the ratio ~nij =~nji is nearer to unity
than the true nij =nji , the magnitude of the eigenvector is underestimated and so is n+1 .
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
Neq=0
80.0
60.0
η
Neq=1/2
80.0
η2
η3
η4
η5
η xc
60.0
η
40.0
20.0
0.0
80.0
60.0
η
0.70
0.75
60.0
η
0.80
0.0
0.85
0.70
0.75
v
Neq=1
Neq=2
80.0
η2
η3
η4
η5
η xc
60.0
η
0.80
0.85
0.80
0.85
0.80
0.85
η2
η3
η4
η5
η xc
40.0
20.0
0.70
0.75
0.80
0.0
0.85
0.70
0.75
v
v
Neq=4
Neq=8
80.0
η2
η3
η4
η5
η xc
60.0
η
40.0
20.0
0.0
40.0
v
20.0
80.0
η2
η3
η4
η5
η xc
20.0
40.0
0.0
229
η2
η3
η4
η5
η xc
40.0
20.0
0.70
0.75
0.80
v
0.85
0.0
0.70
0.75
v
Figure 14.13. Convergence of n for n = 2 : : : 5 (1=0) N = 32, Ns = 50, Nr =4096,
Neq =0, 2 ,1,2,4,8. Also shown is a suitable xc, which in fact is 13 from gure 4.10.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
230
Though we have not studied in detail the behaviour of the size of the systematic underestimate as a function of the `distance to the multicanonical limit,' n , the results of gure
4.13 (and gures 4.10 and 4.12) seem to indicate empirically that the fractional extent of the
underestimate remains about the same at each iteration, (or perhaps decreases slightly as the
more diusive movement of the simulations through macrostate space gives greater time for
equilibration within a set of a macrostates). However, this constant fractional error in the
eigenvector corresponds to a decreasing absolute error and thus a geometric convergence towards , whatever Neq may be (we have already shown that simulations conducted in what
we believe to be the multicanonical limit but with various Neq do not drift away from it, so
implying the same is the limit of the generation process for all Neq ). Once we have arrived
at a situation where further iterations produce only uctuations in n around , we move to
the production stage.
Although it applies rather more to the next section (section 4.3.4) than to this one, another
particularly important result that becomes apparent from this investigation of the eect of Neq
is why it is important to generate and use the multicanonical distribution in particular. It might
at rst seem that, because TP estimators can generate an estimate of a sampled distribution that
varies over many orders of magnitude, multicanonical sampling is not necessary: any sampled
distribution would appear to be adequate, even the original canonical distribution. It seems
that we require only that adjacent macrostates are similar enough in equilibrium probability
for transitions in both directions between them to occur. The investigation of the eect of
Neq shows why this is not adequate: the fact that n+1 generated at the end of any stage n
is not generally equal to shows that the estimator P~ n is not generally equal to the real
underlying P n , the dierence being due to incomplete equilibration at constant volume. Thus
any estimates of P can or canonical averages made on the basis of P~ n would be heavily biased,
unless Neq were extremely large. It is only in the multicanonical limit, where P~ n P n (
constant), that P~ n becomes more or less independent of Neq , and the arrival in this limit is
signaled by the convergence of n . Thus we must reach this limit to be able to accurately
reconstruct P can .
Why should it be that P alone does not depend on Neq ? To understand this it is necessary
to return to the equation (cf. equation 1.21)
Ps (t = 1) =
X
r
Pr (t = 0)nrs
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
231
that describes the evolution of the p.d.f. of the microstates r and s (which here are the joint
set of coordinates and volumes, fsN ; vg). As we know from section 1.2.2, this converges for
large t to the equilibrium distribution Psn = Prn nrs . We have in fact two update matrices n ,
one (equation 4.10) describing coordinate updates and the other (equation 4.11) v-updates.
They both preserve Psn once it is established. Now to be sampling from Psn implies both
that local equilibrium P n (sji) P n (sN jv) should be established and that the distribution
over the macrostates Pin should have its equilibrium value. In early iterations, the rst condition is satised at t = 0 (because before starting we relax all the replicas with coordinate
updates) but the second is not, because the replicas are distributed uniformly over macrostate
space but the equilibrium Pin is far from uniform. Thus, when we update with a v-transition,
Ps (t = 1) 6= Ps (t = 0) 6= Psn , and P (sji; t = 1) loses its equilibrium value. It must be relaxed
again with coordinate updates if equation 3.26 is to be satised. However, if we have Psn = Ps ,
then Pi (t = 0) = Pi because we chose the distribution of the replicas to be uniform. The result,
given that P (sji; t = 0) = P (sji), is that the the underlying Markov process is in equilibrium
right from the start (Pr (t = 0) = Prn ) and so stays in equilibrium at all later times, even
under the action of v-updates alone; Neq is irrelevant. The special status of multicanonical
sampling, then, is due to the fact that the equilibrium Pin accurately reects the distribution
of the replicas14.
It should be noted that the investigation of Neq that has just been carried out is only possible
because the congurational updates and the volume changes are completely separate here and
the one can be performed without aecting the other. In the Ising case, though similar eects
are clearly present (see section 3.2.3), the congurational updates (spin ips) are also the means
by which the macrostate is changed, so an investigation which relies on disentangling the two
would be much harder.
Finally, we remark that it is not clear from the results of gure 4.13 what the best (computationally cheapest) strategy is in practice in the `nding-' stage. Increasing Neq decreases the
number of iterations required, but of course the total computer time/iteration increases linearly
with Neq (from 2 12 mins/iteration for Neq = 1 to 10 mins/iteration for Neq = 8). The best
strategy is probably intermediate between these two, though we do not expect any improvement
to be very great, and we have not investigated it in detail.
14 If we chose some other (non-uniform) distribution of replicas, then of course the `special' sampled distribution
would be the one that reected that distribution.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
232
4.3.4 The Production Stage
As we have seen, imperfect equilibration at constant volume leads to an initial stage of simulation during which n is not close to , but does converge to it (n+1 n is uniformly
positive). The corollary of this is that the arrival of n close to is signaled by the move to a
situation where n undergoes only random uctuations between iterations. We have also seen
that it is necessary to reach this regime before P~ n is a trustworthy estimator of P n . At this
point we move to the production stage.
In the applications of the multicanonical method in chapter 3, it was found best to use VS
estimators at this stage, whatever method had been used to establish xc : with TP estimators,
there was a reduction in speed and, more importantly, a systematic bias (at least for energy
preweighting). Therefore, to motivate and justify our use of these estimators for the square-well
system, we shall show analytically, and conrm by direct simulation, that the VS method is
unusable because of the extremely long random walk time. We shall also show that any bias is
very small here on the scale of the random error15.
The VS estimator only provides a good estimator of the probability if the run-time of each
replica simulation is much greater than the ergodic time (at least equal to the random walk
time in this case), so that the replica simulations have `forgotten' all information about their
starting states, which would otherwise heavily bias the result (see the discussion in section
3.2.5). Let us see what this would entail for the simulation with N = 108, for which it is
found that Nm = 1292 and the average acceptance ratio of volume transitions is about 1=2.
Simple random walk arguments imply that about 2 106 volume updates, i.e. sweeps, would
be required for a single simulation to traverse the whole range of macrostates. By contrast, the
highest speed we can achieve on the CM is 6000 sweeps per simulation per hour (for N = 108,
Nr =64), implying the need for over 300 hours even for one random walk per simulation16 . In
fact, about 6000 sweeps are needed for a simulation to perform a directed walk from one end
of the macrostate space to the other in the =0 case, when these states dier in probability
by more than 100 orders of magnitude. Thus, we see that, using visited states estimators, it is
impossible to take advantage of the full power of the parallel computer to run many replicas
of the simulation, since the initial distribution of the simulations over the macrostate space
The extra bookkeeping of the transition probability method is negligible for the square-well system.
We should note that this is far from the greatest value of Nc = Ns Nr that can be obtained; with Nr = 64,
Nc = 374000 sweeps/hour, while with Nr =4096, 500 sweeps/simulation/hour can be attained, so Nc = 2:5 106
/hour.
15
16
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
233
will persist throughout the simulation. Of course, given that we keep the uniform distribution
of replicas over the macrostates that was used in the `nding' stage, then once we are close
to the ideal multicanonical distribution, the VS histograms do appear to reect the sampled
distribution, as we remarked in the previous section. However, this is only a result of the
fact that the distribution of simulations through the volume macrostates is itself chosen to be
nearly uniform. Any further information that we might appear to gain about the the sampled
distribution from using such VS estimators would be illusory.
The inadequacy of VS estimators means that we have no choice but to stay with the TP
estimators used in the `nding' stage. As we have seen, TP estimators implicitly include and
correct for the eect of the starting state, so the lack of equilibration over all the volume
macrostates is not a problem. In chapter 3 it was found that TP estimators were sometimes
biased; however, after the analysis of section 4.3.3 showing that here the underlying Markov
process should always be in equilibrium, we would not expect any problem with this. Nevertheless, to be doubly certain, we shall present substantial evidence below to show that there
is no bias. As further corroborating evidence, we recorded VS estimators throughout the long
simulations that were used to generate the results in sections 4.3.5 and 4.3.7, and none of them
ever showed any indication of systematic drift of the replicas through macrostate space, showing
that any bias is certainly smaller than can be detected by VS in a run of practical length.
We shall now describe how the TP estimators are used in practice to generate the results.
Rather than keeping xc constant in the production stage, as was done in generating the results
in section 3.3, it is updated after each iteration using equation 4.14, just as in the `nding-'
stage. The only dierence is that Ns is normally longer, of the order of a few thousand sweeps
per iteration rather than a few hundred. This simple scheme is used since we found in chapter
3 that the more complex updated scheme described by equations 3.24 and 3.23 seems to yield
only a marginal improvement. We can now recover an estimator of the canonical probability
for each iteration using equation 4.12.
The procedure in nding the coexistence point and generating canonical averages is then to
nd a set fpng by identifying, for each iteration, pn as that pressure which gives a double-peaked
P can (v) with equal weight in the two peaks. Next, the members of the set are averaged to give
a mean pcoex = pn and an error bar. Finally, fP n;can(pcoex ; v)g are re-evaluated using the same
best estimate pcoex for every iteration. Properties like the densities and compressibilities of
the coexisting phases then follow by calculating < v > and < (v < v >)2 > for each phase
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
234
Type of Estimator
pcoex
simple average TP
30.16(8)
single jackknife TP
30.16(8)
double jackknife bias-corrected TP 30.15(7)
Table 4.3. Various estimators of coexistence pressure for N = 108, = 10=11, Ns = 1000,
Nr =4096 with constant xc. All estimators come from transition probabilities.
on each member of fP n;can(pcoex ; v)g and averaging, while averaging the members of the set
fP n;can(pcoex ; v)g itself gives a best estimate of the p.d.f. of v at coexistence.
All the above can be done (if desired) for pressures away from pcoex, provided that equation
4.13 is satised. In all the simulations performed here, all the iterations used were the same
length, though it is not essential that this should be the case.
The one disadvantage of this continual updating of xc is that we lose the ability to use
jackknife bias-corrected estimators. To see whether bias-correction would yield signicantly
dierent estimators we have performed a test run with constant xc to allow jackknife biascorrected TP estimators to be calculated17 . The parameters of the simulation were N = 108,
= 10=11, Ns = 1000, Neq = 1, Nr =4096, and eight blocks of data were gathered. The results,
for the coexistence pressure evaluated using the equal-weights criterion, are shown in table 4.3.
To evaluate p~coex , the canonical probability distribution is reconstructed, using equation 4.12,
with three dierent kinds of TP estimator for P xc (v). The rst (`simple average') is dened
on each transition histogram Cijn separately; to produce the second (`simple jackknife'), all
the transition histograms except the nth are pooled to make the nth estimator. The third
is a double-jackknife bias-corrected estimator (see appendix D). It is clear that, within the
error bars, the three estimators are identical. The procedure of continually updating n is
thus justied, and so is the analysis of section 4.3.3 showing that equilibrium is always exactly
maintained, (and so that there should be no bias), given that the actual distribution of the
replica simulations reects their equilibrium distribution. It thus seems likely that the bias in
chapter 3 had this source, because no eort was made to keep the visited states histograms at;
simulations were purposely released from only a few starting macrostates.
Having justied the use of TP estimators in all stages of the simulation process, including
the obtaining of canonical averages, we shall now comment on the accuracy of the estimators
obtained. The absolute magnitude of the uncertainty in xc is O(1) for the larger systems (i.e.
17
Once again 13 from the Nr = 1000 TP simulation was used.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
235
xc = O(1) ), and thus, because P~ can exp(xc ), the fractional error in the reconstructed
canonical probability is also O(1), that is, (P~ can P can)=P can = O(1), at least for some states.
This can be conrmed by inspection of the results of section 4.3.5, in particular gure 4.14.
This uncertainty is large compared with what was achieved in chapter 3; we are contenting
ourselves with a lower level of accuracy in our knowledge of the sampled distribution, and
thus of the canonical distribution and canonical averages. However, the error bars alone do
not quite tell the whole story. Very little of the error is attributable to local uctuations;
the shapes of the peaks of P can are individually well-outlined, so averages calculated over
the peaks separately (i.e. calculated for a single phase), such as the average specic volume
< v > and the isothermal compressibility T , for a particular input pressure, do not have
such large errors as one might expect from the O(1) error in P~ can . Indeed, comparison of
gures 4.14 and 4.15 shows that the error bars are smaller away from coexistence, when only
one phase need be considered. By far the larger part of the error in P~ can (pcoex ) comes from
uncertainty in the relative weights of the two phases: the interphase distance (in V -space) is
< V >expanded < V >dense = N (< v >expanded < v >dense ), while the width of the p.d.f.
of a single phase N 1=2 , as we know from statistical mechanical arguments (see section 1.1),
and shall conrm in section 4.3.6. Thus, for all but very small systems, or systems very near to
criticality, the interphase distance is larger and so has the greater eect on error, because the
local uncertainties in P~ can (pcoex ) accumulate over the large distance in volume that separates
the phases. In addition, the error in the intensive parameter pcoex , (or, equivalently, in the
dierence in free energy density) is smaller than might be expected. This occurs because pcoex
aects the relative weights of the phases via a term p(< V >expanded < V >dense );
thus, because < V > is extensive, an O(1) error in the relative weights corresponds only to
an O(1=N ) error in pcoex . We anticipate that for still larger systems accurate results would
be obtained even if the error in the sampled p.d.f. were larger than O(1), that is to say, even
before the establishing of what we have dened as a multicanonical distribution.
That the error bars on pcoex are indeed small even for N = 256 is shown in gure 4.22 below.
In fact, as we shall see, notwithstanding the O(1) error, our results are at least as good as the
results already published for this system [69, 171].
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
236
The Multicanonical Method Compared with Literature Thermodynamic Integration
It is apparent that by using the transition method, possibly combined with FSS, we can measure
the entire F (v) of the square-well solid from knowledge only of a volume interval that we expect
to contain the coexisting phases. We should contrast this with the thermodynamic integration
procedure used in [69, 171], where a reference system must be used for each F (v) point. When
enough of F (v) is mapped out, pcoex and < v > are located with the double-tangent construction
[47]. The reference system used in [69, 171] is the hard sphere solid, for which a good equation
of state is known [181]; this is therefore not a computationally expensive procedure, since the
potentials of the two systems are similar and so only a short integration path is required|though
of course for other systems such a convenient reference system might not exist, and there is
some evidence even here that there are problems with it near the critical point (see section
4.3.7). In any case, compared with the multicanonical method, extra complexity is introduced
by the use of the reference systems, while we still need an a priori guess of the volumes where
we believe the phase coexistence lies. Moreover, the double-tangent construction is equivalent
to nding p0coex that produces equal heights of the peaks of P can (p; v); the equal-weights and
equal-heights criteria both have the same large-system limit but for small systems the equalweights criterion is the more natural. We shall discuss in section 4.3.6 which gives the better
estimate of the innite-volume transition point.
4.3.5 Canonical Averages
We now present some results for various canonical averages, evaluated using this version of
the multicanonical method. The averages are calculated both at the nite-size coexistence
point and for a substantial range of pressure around coexistence. We choose as an example
the N = 108 system at = 10=11, though similar calculations can be made for all the other
system sizes and temperatures investigated. We used Nr = 1000 and Nm = 1292. First, in
gures 4.14 and 4.15 we show how P can(v) can be reconstructed for various pressures from
the multicanonical results. Figure 4.14 and its inset show how P can (pcoex ; v) can be accurately
measured over a range of (here) more than 10 decades. Once again we emphasise that, even if
we had a serial computer as fast as the Connection Machine, and so were not so hampered by
the long random walk time, Boltzmann sampling would fail on a problem like this, where two
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
237
modes are separated by a region of very low probability.
800.0
p=70
p=30.2
p=25
600.0
1.5e-07
P(v)
400.0
1e-07
P(v)
5e-08
200.0
0
0.72
0.73
0.74
0.75
0.76
v
0.0
0.70
0.75
0.80
v
Figure 4.14. Main gure: P can(v) at high (p = 70), low (p = 25) and intermediate (p = 30:2)
pressures, for = 10=11 and N = 108. p = 30:2(1) is the nite-size estimate of the coexistence
pressure, where the two phases have equal weights; note that the probability density is much
smaller in the rare phase (on the right), because of its higher compressibility. p = 25 is the lowest
pressure which can be reliably investigated, because at lower pressures P can has appreciable
weight outside the investigated range of volume. The inset shows P can , with typical error bars,
in the region between the peaks. P can was smoothed using a moving average over a window of
50 volume states.
Then in gure 4.16 we show < v >, the average volume per particle, evaluated from these
distributions, as a function of pressure. < v > is calculated by averaging only over the phase
that is favoured at the pressure under consideration (this should be compared with gure
4.21 below). The discontinuity in < v > at p 30, showing the presence of a rst-order
phase transition, is clearly visible. The estimates of < v > at pcoex are also shown as data
points on this gure. Figure 4.17 shows the isothermal compressibility T = ( 1=v)@v=@p
= (N= < v >)(< v2 > < v >2 ) as a function of p, with the averages once again calculated
only over the favoured phase. Once again the eect of the phase transition is clearly apparent;
the very dierent values of T reect the dierent structures of the two phases (see section
4.3.8). We comment upon the nite-size scaling of < v > in section 4.3.6.
We should also conrm that the system is indeed solid for all densities studied. To do this, we
show in gure 4.18 the pair correlation function g12 (r) P12 (r)=4r2 for N = 108; = 10=11,
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
238
where P12 (r)dr is the probability that a particle has a neighbour in a shell of radius r and
thickness dr centred on its present position. The full line shows averages gathered using those
simulations that were in the dense phase (in fact the average was gathered for a range of
densities rather wider than the peak of P can(pcoex ; v), and was formed without weighting by
P can (pcoex ; v), so we would expect it to be slightly dierent in detail from the true g12 (r) at
constant NpT , but qualitatively the same); the dashed line shows the corresponding results
for the expanded phase. It is clear that the particles in the expanded phase have substantially
more freedom of movement; nevertheless, the fact that g12 drops to zero between the peaks
shows that the particles remain localised on their lattice sites, that is, that both phases are
p
solid. The ratio of the location of the nth and mth peaks (for n; m = 1; 2; 3; : : :) is n=m, as
expected for a fcc lattice.
In gure 4.19 we show a detail of g12 at = 10=11 around r = . This gure clearly
shows that g12 has discontinuities at r = and r = (1 + ), matching the discontinuities in
the potential. The discontinuity at r = is simply of a consequence of the hard core in the
potential, which prevents any particles approaching more closely than . The presence of the
other can be rationalised by considering g~12 for two isolated particles. Since g~12 (r) P (r)=r2 ,
we would expect that, as ! 0,
g~12 ((1 + ))=g~12 ((1 + + )) ! exp(E0 )
(4.16)
In the solid, g12 is of course modulated by the presence of all the other particles, but there
is no reason to expect their contribution to produce a discontinuity at r = (1 + ), and any
other behaviour will leave equation 4.16 unaected. The presence of this discontinuity is thus
explained, and from the gure we can estimate its magnitude to be
g12 ((1 + ))=g12 ((1 + + )) = 71:5(5)=29:0(3) = 2:47(4) (dense phase)
= 32:0(3)=13:5(3) = 2:42(6) (expanded phase)
in good agreement with exp(10=11) = 2:48.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
239
800.0
p=70
p=50
p=35
p=30.5
p=30.25
p=30
p=29.75
p=29.5
600.0
P(v)
400.0
200.0
0.0
0.716
0.718
0.720
0.722
0.724
v
60.0
p=29
p=28
p=27
p=26
p=25
40.0
P(v)
20.0
0.0
0.70
0.75
0.80
v
Figure 4.15. A more detailed view of the peaks of P can(p; v) for the same = 10=11 and
N = 108 system featured in gure 4.14 but at a range of dierent pressures. P can was smoothed
using a moving average over a window of 50 volume states and some typical error bars are shown.
The upper diagram shows the peak corresponding to the dense phase, while the lower diagram
shows the peak corresponding to the expanded phase. Note that, in this diagram, it is just
discernible for p = 29 that there is some weight at volumes corresponding to the dense phase.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
240
0.84
0.82
0.80
v
0.78
0.76
0.74
0.72
0.70
20.0
25.0
30.0
35.0
40.0
45.0
p
Figure 4.16. < v > of the favoured phase as a function of pressure for = 10=11 and
N = 108. Typical error bars are also shown. The estimates of < v > for each phase separately
at coexistence are also shown (triangles).
1.0e-02
1.0e-03
κΤ
1.0e-04
1.0e-05
20.0
25.0
30.0
35.0
40.0
45.0
p
Figure 4.17. Isothermal compressibility T of the favoured phase as a function of pres-
sure for = 10=11 and N = 108. Because of the large variation in T between the
two phases, a logarithmic vertical scale is used. The compressibility is evaluated using
T = (N= < v >)(< v2 > < v >2). Typical error bars are also shown.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
241
300.0
dense phase
expanded phase
200.0
g12
100.0
0.0
0.0
1.0
2.0
3.0
4.0
r/ σ
Figure 4.18. g12(r) for N = 108; = 10=11. Full line: average for over the dense phase;
dashed line: average over the expanded (rare) phase.
120.0
dense phase
rare phase
100.0
80.0
g12
60.0
40.0
20.0
0.0
0.98
1.00
1.02
1.04
1.06
1.08
1.10
r/ σ
Figure 4.19. g12(r) for N = 108; = 10=11; detail of the region r . Full line: average over
the dense phase; dashed line: average over the expanded (rare) phase.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
242
4.3.6 Finite-Size Scaling and the Interfacial Region
We shall now look at the nite-size behaviour of P can (pcoex ; v) and the various estimators
obtained from it. We shall use both N and L, the side length of the simulation volume, to
measure the system size. It is instructive to compare P can (v) for various system sizes, to show
explicitly the decreasing size of the fractional uctuations with increasing N expected from
elementary statistical mechanics (section 1.1). This is done for = 1:0, N = 32, 108 and
256 in gure 4.20. As well as the expected narrowing of the peak, it is also apparent that the
nite-size estimate of < v > at the nite-size coexistence point depends quite strongly on N ,
and an extrapolation must be used in making an estimate of < v > in the N ! 1 limit.
500.0
N=32
N=108
N=256
400.0
50.0
300.0
P(v)
40.0
30.0
200.0
P(v)
20.0
10.0
0.0
0.76
100.0
0.0
0.70
0.78
0.80
0.82
0.84
v
0.75
0.80
0.85
v
Figure 4.20. P can(v) for = 1:0, N = 32, 108 and 256. The peaks narrow as N increases,
and there is also some variation of < v > with N . The inset shows the expanded phase in
greater detail.
The inset shows the expanded phase in more detail. The narrowing of P can(v), and its
movement to lower volumes, are clearly visible. In fact, in addition to the expected qualitative
p
behaviour of P can (v), there is also qualitative agreement with the prediction v= < v > 1= N ;
for example, the compressibility T of the expanded phase, evaluated for the three system sizes
at a pressure just below the equal-weights transition point, is T = 8:6(2) 10 3 for N = 32,
T = 8:5(5) 10 3 for N = 108 and T = 8:2(7) 10 3 for N = 256. The approximate equality
p
of the estimates of T shows that var(v)= < v > 1=N , i.e. v= < v > 1= N .
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
243
Figures 4.21 and 4.22 show more clearly the nite-size scaling of estimators, in particular
those related to the transition point (again for = 1:0). Figure 4.21 shows < v >N as a function
of p, where < v > is now averaged over all volume states (cf. gure 4.16). The narrowing of
the pressure region over which the transition takes place is clearly visible, as is its movement
to higher pressure. The diagram also shows thermodynamic integration results for an N = 108
system, taken from [171]. There is clearly good agreement between the two sets of results.
0.820
0.800
0.780
<v>
0.760
0.740
0.720
N=32
N=108
N=256
Thermodynamic Integration (N=108)
0.700
18.0
20.0
22.0
24.0
26.0
p
Figure 4.21. < v >N (evaluated by averaging over all volume states) as a function of p for
N = 32; 108; 256, = 1:0. Also shown are thermodynamic integration results for N = 108,
from [171].
Figure 4.22 shows the nite-size scaling of the estimate of the transition pressure pcoex , evaluated both by equal-heights and equal-weights criteria. Once again, the results seem consistent
with the estimate given in [171]. Both the estimators seem to experience a 1=N nite-size
correction with respect to the innite-volume limit, though the correction is rather smaller
in the equal-weights case than the equal-heights. For this reason, and because it is the more
natural nite-size estimator, we have generally preferred to use the equal-weights criterion.
The least-squares ts to both sets of data (dashed lines) are equally good and both have the
same intercepts, within error. The ordinate intercepts, which are the best estimates of the
innite-volume transition point, are both at pcoex = 22:78(6).
Our ndings are interesting in the light of the theoretical prediction, made for lattice models
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
244
with periodic boundary conditions in [23, 182, 183], that the equal-heights estimator of the
transition `eld' (temperature or magnetic eld) should indeed have 1=N corrections, but the
equal-weights estimator should have only exponentially small corrections. It might be thought
that these arguments should apply to o-lattice systems too, but our results suggest that neither
estimator has exponentially small corrections. This may be due to the fact that the ensemble
under consideration here, with its variable volume, is dierent to the constant-volume ensemble
of the lattice model.
24.0
Equal Heights
Equal Weights
Thermodynamic Integration
p coex
23.0
22.0
21.0
20.0
19.0
0.000
0.010
0.020
0.030
0.040
1/N
Figure 4.22. pcoex as a function of 1=N , evaluated using equal-weights and equal-heights
criteria for = 1:0. Also shown are thermodynamic integration results for N = 108, from [171],
and least-squares ts to the data (dashed lines), both of which have their ordinate intercept at
p = 22:78(6).
Another interesting application of nite-size scaling theory is to the canonical probability
P can (pcoex ; v) in the region between its two peaks. Let us dene
can
rp ln PP can((vv0))
l
(4.17)
where P can(v0 ) is the probability density at one of the peaks of Pcoex (v) and P can(vl ) is the
p.d.f. at its lowest point between them (v0 and vl vary slightly with system size).
We show the measured behaviour of ln(rp ) against ln(L) for = 1:0 in gure 4.23; the graph
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
245
5.0
ln(rp)
4.0
3.0
2.0
0.6
0.8
1.0
1.2
1.4
ln(L)
Figure 4.23. ln(rp ) against ln(L) for = 1:0 and system size L =2, 3, 4 (N = 32, 108, 256);
rp = ln(P can (v0 )=P can(vl )) by simulation.
has a gradient of 2.91(1). Now, the statistical mechanics of phase coexistence (see section 1.1)
predicts that the dominant congurations around vl should consist of regions typical of each
of the stable phases, separated by the phase boundaries. Accordingly, the free energy in this
region has the form
F (v) Ld fb (v0 ) + a0 (v)Ld 1 fs + where fb (v0 ) is the free energy of the bulk, a0 depends on the geometry and fs is a surface
free energy. Thus, since P can(v) exp( F (v)), we would expect rp Ld 1 = L2. The
discrepancy between this prediction and the result of the simulation is, we believe, a consequence
of the particular simulation method we have chosen: periodic boundary conditions are used and
changes in volume are made by a uniform scaling of the particles' positions, which leaves the
shape (cubic) of the simulation volume unchanged. If one tries to generate a mixed-phase
conguration in such a cubic box, one nds that it cannot be easily made to t; the size of the
box is determined by the largest length in the region of expanded phase, and it cannot contract
around the regions of dense phase. Some planes of particles in the dense phase are therefore
separated, eectively breaking up the uniform structure of the phase. In a simulation of a uid,
particles would move between the phases to ll up the `gaps,' but that does not occur here
since the particles are held on their lattice sites, and, even if they could move, there would be
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
246
commensurability problems with the simulation box, whose side length is only a few times the
lattice spacing. Thus, the simulation fails to open up what is presumably, for a real system,
the dominant conguration-space pathway between the two phases, because of the suppression
of congurations containing interfaces. Instead, the most probable congurations in the interphase region have a uniform structure that is commensurate with the shape of the simulation
box.
This implies
F^ (v) Ldf^(v) + where f^(v) > f (v0 ) is the bulk free energy at intermediate volume. This implies rp Ld = L3,
in much better agreement with what is found. Inspection of the distribution of free volume in
simulations around vl conrms that the simulations here seem to consist of a single phase only.
This means, then, that the simulation cannot be used in its present form for the evaluation
of interfacial free energies. We should note, however, that all our previous results about the
coexistence pressure or the volumes of the stable phases remain valid: these phases are of course
homogeneous in structure and so present no diculty to the simulation, while to determine their
relative weight (to calculate the coexistence pressure) simply demands the presence of some
reversible conguration-space path connecting them. Whether this path is the one followed
in the `real' system is immaterial, given that the states along the path have negligible weight
themselves, both in the canonical ensemble and the ensemble which is accessible to simulation.
4.3.7 Mapping the Coexistence Curve
Using a series of multicanonical simulations of dierent nite sizes and having dierent temperatures, we have mapped out the solid-solid coexistence curve of the square-well system with
= 0:01 between = 1:0 and the critical point, which appears to lie at 0:6. Simulations
were carried out for N = 32, N = 108 and (for = 1:0 only) N = 256, and for each simulation
the canonical p.d.f. of v was reconstructed and the equal-weights criterion used to identify
pcoex .
While in sections 4.3.5 and 4.3.6 it was always easy to distinguish the regions of macrostate
space associated with the two phases, because the temperature was low, this becomes progressively more dicult as the critical region is approached. The canonical probability of the
region between the two modes increases and nally the modes merge together. This is shown
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
247
happening for N = 108 in gure 4.24; the p.d.f. has become unimodal by the time = 10=17.
For N = 32 the same process occurs at lower temperature.
400.0
β=1.0
β=10/11
β=10/12
β=10/13
β=10/14
β=10/15
β=10/16
β=10/17
400.0
300.0
300.0
P(v)
200.0
P(v)
200.0
100.0
0.0
100.0
0.0
0.718
0.720
0.722
0.724
v
0.72
0.74
0.76
0.78
0.80
0.82
v
Figure 4.24. P can(v) for N = 108 and a range of values, at coexistence. Ns =540 ( =
10=17) { 1292 ( = 1:0). The inset shows the dense phase on an expanded scale.
In implementing the equal-weights criterion, we have used an arbitrary division of the range of
v at or near the point where P can is minimal, even though a tting of two overlapping Gaussians
might perhaps be better. This is not expected to have a great eect on the assignment of pcoex,
which as we have already seen is little aected even by the choice of equal-heights or equalweights criterion. However, it does mean that the estimates of < v > tend to move away from
the modes of the canonical P can(v). Therefore, once the temperature exceeds a certain value,
chosen by inspection, we move to using the location of the modes as nite-size estimators. The
`best estimates' of the innite-volume limits of pcoex and the specic volumes < v > are then
calculated by extrapolating the nite-size data against 1=N .
We saw in section 4.3.5 that this procedure works well for low temperatures: it is applied
here even to near-critical temperatures because to treat the critical region properly, to obtain
estimates of c and critical exponents, it is necessary to make highly accurate measurements of
the joint p.d.f. of the order parameter and energy [121, 140], which we have not had time to
do. The phase diagrams are thus not expected to be particularly accurate in the critical region.
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
N
Nm
Nr
Ns
speed
(sweeps/iteration) (sweeps/hour)
32 159{281 ( =10/15{1.0) 4096
3500
1800
108 540{1292 ( =10/17{1.0) 4096
2000
500
256
2949( =1.0)
1728
2000
500
248
NEDGE
1
3
4
Neq
<1
1
1
Table 4.4. Parameters for the various `production' simulations used in the evaluation of
the phase diagram. P can and derived quantities were generated by averaging over about 3{6
iterations.
The reader might be concerned that the suppression of congurations containing interfaces,
as described in section 4.3.6, might aect the estimates of pcoex and < v > at higher temperatures, in the region where for a nite-size system the interphase congurations do have
signicant weight. However, these congurations only acquire signicant weight once the correlation length, or the width of typical interfaces, has exceeded the size of the system [17], so
the dominant congurations are once more homogeneous in structure. In particular, we do not
foresee incommensurability problems aecting simulations in the critical region, and we believe
that an accurate treatment using the methods of [121, 140] would be a fairly straightforward
extension of the work performed here.
The multicanonical distributions were produced rst for the N = 32 systems, sometimes
starting from =0, as described in section 4.3.3, but sometimes using a pre-existing xc from
a simulation at a dierent temperature as a rst estimate. xc for the larger systems were
produced using nite-size scaling followed by renement. The parameters used in the various
simulations are shown in table 4.4.
The results are shown below; gure 4.25 is the p phase coexistence curve, while gure
4.26 is the v phase diagram. We also show results from [171] for comparison (dashed lines).
It is apparent that there is quite good agreement between the two estimates of the p coexistence curve. Any discrepancies are at most of the order of 1% and most lie within the
error bars on the multicanonical points; some do not but since we have no error bars on the
thermodynamic integration data this need cause no concern. However, it should be noted that
the discrepancies that do exist still correspond to O(1) dierences in the relative probabilities
of the two phases for a N = 108 system.
The agreement in the location of the phase boundary in the v solid-solid phase diagram
is also fairly good, though there is a small but clear systematic disagreement in the specic
volume of the expanded solid: at low , the integration method consistently ascribes to it a
higher v than the multicanonical. There are several possible explanations for this, none of which
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
249
120.0
Thermodynamic Integration
Extrapolated Multicanonical
100.0
80.0
p
60.0
40.0
20.0
0.0
0.4
0.6
0.8
β
1.0
1.2
Figure 4.25. The p solid-solid coexistence curve for the square-well solid with = 0:01.
The data points are produced by extrapolating N = 32 and N = 108 (and for = 1, N = 256)
data against 1=N ; the equal-weights criterion is used to nd the nite-size estimators of pcoex.
The dashed line shows thermodynamic integration results for an N = 108 system, taken from
[171].
is completely satisfactory. It may be due in part to the dierence between the equal-heights and
equal-weights criteria: equal-heights (implicit in the double-tangent construction of TI) would
tend to produce the larger estimate of < v >. However, this cannot be the whole story because
equal-heights would also produce a lower estimate of pcoex , while gures 4.25 and 4.22 show
that, if anything, the opposite seems to be true. Another possible cause is the nite-size scaling
movement of the peak of P can(pcoex ; v) to lower v (visible in gure 4.20), but again this does not
seem a large enough eect to account for all the discrepancy; the `raw' multicanonical results
for N = 108 still lie much closer to the extrapolated data points than to the thermodynamic
integration curve. The third, and possibly most likely, explanation is that the hard-sphere solid
used as the reference system in [171] is unsatisfactory near to the solid-solid critical point. One
might expect this to happen because the hard-sphere solid does not itself have a critical point,
and so `typical' congurations of the two systems are far less similar than they are at higher ,
making the integration path longer and more awkward. The eect is in fact like that of having
a phase transition on the integration path itself. In any case, whatever the cause of the low-
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
250
1.4
Thermodynamic Integration
Extrapolated Multicanonical
1.2
1.0
β
0.8
0.6
0.4
0.70
0.75
0.80
0.85
0.90
v
Figure 4.26. The v solid-solid phase diagram for the square-well solid with = 0:01. The
data points are produced by extrapolating N = 32 and N = 108 (and for = 1, N = 256)
data against 1=N . The dashed line shows thermodynamic integration results for an N = 108
system, taken from [171].
dierence in the estimates of v, one of its consequences is that the multicanonical data suggest
that the critical temperature is probably lower (i.e. c is higher) than stated in [171]. c for
the integration method is at ~cTI 0:576, while gure 4.26 implies instead that ~cxc 0:60(1),
though in the absence of a proper critical FSS analysis we would not assert even this result
with very much condence.
4.3.8 The Physical Basis of the Phase Transition
So far in our investigations we have concentrated only on describing the solid-solid transition
that is observed, without attempting to give a physical explanation of why it should occur at
all. A full understanding would require the study of dierent potential ranges, and of the uid
phase too, so we shall give only a brief semiqualitative description, concentrating mainly on the
N = 108 system at = 1 and = 10=16.
A rst-order phase transition is always associated with a nite-size p.d.f. of the order parameter (PNcan (v) here) that has a double-peaked structure, the peaks being associated with the
coexisting phases in the thermodynamic limit. Since the logarithm is a monotonically increasing
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
251
function, this means that
ln PNcan (pcoex; v) = N (f (v) + pcoex v) + constant
has the same double-peaked shape. Now this shape means that the derivative @ 2 ln P=@v2 must
pass from -ve to +ve and back to -ve. But @ 2 ( Npv)=@v2 0, so it is the other term on the
RHS that must produce the curvature: @ 2 f (v)=@v2 must go from +ve to -ve to +ve. We now
express f (v) itself as the sum of an energy and an entropy term:
f (v) = e(v) s(v)=
where e(v) = E (v)=N is an average internal energy density, and so is the means by which
the interparticle potential exerts its inuence, and s(v) is an entropy density (we remind the
reader that in the present units kB = 1). The multicanonical simulations give us f (v), and e(v)
was also measured at the same time, internal energies being easily accessible in any sensible
MC sampling scheme. s(v), which we expect to be mainly geometric in origin, is thus easily
calculated. We show s(v) and e(v), together with f (v) + pcoex v ln PNcan =(N ), in gure
4.27. The functions s(v) and f (v) + pcoexv are arbitrarily shifted vertically so that they equal
zero at the lowest v for which measurements were made. We show f (v) + pcoexv rather than
f (v) itself so that the behaviour of the curvature is made more apparent by the removal of the
overall `trend.' The +ve to -ve to +ve pattern of @ 2 f (v)=@v2 is visible, though the magnitude
of the 2nd derivative is clearly small except at low v.
Before proceeding, we shall comment on the behaviour of the functions outside the range of
v that is shown in the gure (i.e. the range that was investigated by simulation). We must
p
have e(v) ! 6 as v ! vCP = 1= 2, since then all particles interact with all twelve of their
nearest neighbours, and e(v) and all its v-derivatives ! 0 as v ! 1 and the particles become
very widely separated. However, s(v) ! 1 as v ! vCP , because the volume of phase space
accessible to the particles ! 0. Therefore f (v) ! +1 in the same limit. As v ! 1 we expect
s(v) ! 1 and @s(v)=@v to remain nite. We should note that for some nite v rather larger
than shown in the gure, the particles would no longer be held on their lattice sites by their
neighbours even at zero temperature|that is, the solid would no longer be mechanically stable.
As a rule of thumb, this occurs for fcc crystals when they have expanded to (1:07)3 of their
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
252
8.0
s(v), e(v) & f(v)+pcoexv
6.0
s, β =1.0
e, β =1.0
s, β =10/16
e, β =10/16
f+pcoexv, β =1.0
f+pcoexv, β =10/16
4.0
2.0
0.0
-2.0
-4.0
-6.0
0.72
0.74
0.76
0.78
0.80
0.82
v
Figure 4.27. The internal energy density e(v), entropy density s(v) and free energy functional
f (v)+ pcoexv for N = 108 at = 1:0 (full lines with symbols) and = 10=16 (dashed lines). An
arbitrary constant is added to the entropy and free energy
p so that they are zero at the lowest
volumes investigated. The v-axis begins at vCP = 1= 2.
close-packed volume [184], which corresponds to v 0:866. At this point, we would expect s(v)
to increase very rapidly, or even discontinuously, as the particles become delocalised.
Let us rst consider the = 1:0 data only. Comparison with gure 4.20 conrms that the
dense solid (at v 0:72) has low entropy but is stabilised by low internal energy, while the
expanded solid (at v 0:80) has higher internal energy but also higher entropy, in line with
gure 4.3. As we move from low v to high, the internal energy increases, rst slowly (e remains
p
very close to 6 for 1= 2 < v < 0:717), then more quickly as the interparticle separation
reduces to a point where the particles move in large numbers out of their neighbours' potential
wells (producing the strongly positive curvature around v = 0:72). The rate of loss of energy
then slows, giving a strongly -ve curvature around v = 0:73{0.74, and after this it increasingly
levels o and attens out, as further increases in v make less dierence to the number of
neighbours with which most particles interact. The rapid loss of internal energy even though
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
253
the density remains high, will be crucial in understanding the phase behaviour. The point of
inection seems to be quite close to the point where the internal energy would drop to zero
p
with a uniform dilation of a perfect fcc crystal, which is at v = (1:01)3 = 2 0:7285. Apart
from the expected divergence at v = vCP , the shape of s(v) is quite similar, though its high
volume gradient is non-zero and its low-volume curvature (both +ve and -ve) is smaller in
magnitude. We attribute s(v) largely to free volume in the crystal, so initially as we move away
from close packing it increases rapidly, then more slowly and with a -ve curvature. The region
of +ve curvature around v = 0:72 cannot be due simply to free volume and must relate to the
unknown behaviour of P (sN jv).
The dierence of the curvatures of these two functions gives the curvature of f (v), which is
the determiner of the phase behaviour. At low v, the stronger curvature of e(v) is dominant, rst
+ve, producing the dense phase, then -ve, producing the minimum of the canonical probability.
At high v, e(v) attens out and the curvature of s(v) becomes the greater, producing a weak net
+ve curvature of f (v) which stabilises the expanded phase. The smallness of the 2nd derivative
of f (v) here is responsible for the large compressibility of the expanded solid: the `restoring
force' resisting compression and dilation is essentially an entropic eect.
At = 10=16, s(v) and e(v) have changed only a little from = 1:0; the largely geometrical
factors that produce their shape are only slightly altered by the dierent relative probabilities of
the various congurations within each v-macrostate. The main dierence in the phase diagram
is produced not by these slight dierences but by the fact that the inverse temperature aects
only the entropy term, making its eect greater, and not the energy term. Thus the low-volume
dominance of @ 2 e(v)=@v2 is weaker, while the high-v dominance of @ 2 s(v)=@v2 is stronger and
occurs for smaller v. Hence we expect the two phases to move together, the dense phase to
become more compressible and the expanded phase less so, and the depth of the probability
minimum between them to fall|just as is observed in the simulations (see gure 4.24). Indeed,
though we have not carried out the calculations, we believe that the whole v phase diagram
could be recovered qualitatively from the results of = 1:0 alone. At = 10=16 itself, the two
phases have eectively fused together, as shown by the single minimum of f ( = 10=16; v).
Conversely, the opposite would be true for coexistence at > 1. The dominance of the curvature of e(v) would be greater and the expanded phase would move to higher volume, eventually
reaching those volumes (v 0:866 as we said before) where mechanical instability sets in. However it is very likely that, even before this, the expanded solid would become thermodynamically
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
254
unstable in comparison with the uid. The uid is always favoured entropically compared with
the expanded solid, and the dierence in < e > cannot be very great since it is already near zero
for the expanded solid, so for small the uid is suppressed, we believe, largely by the pcoexv
term: < v >f for the uid is much larger than < v >xs for the expanded solid and pcoex is high.
However as increases pcoex falls and < v >xs increases, so pcoex(< v >f < v >xs), which
measures the net eect of this term, becomes smaller. Eventually, we presume, the uid will
become the favoured phase and we will have moved above the triple line on the phase diagram
in gure 4.2.
We can also speculate on the eect that increasing has. e(v) will maintain a similar shape
while being scaled in the v-direction, so its regions of high +ve and -ve curvature, that produce
the dense phase and the interphase region, as well as the region of weak -ve curvature that
allows the +ve curvature of s(v) to produce the expanded phase, will all move to higher
volumes. Once is large enough, the expanded phase becomes unstable at all temperatures
(either because the region of +ve curvature of f (v) moves out of the range of solid mechanical
stability, or because pcoex (< v >f < v >xs ) becomes small enough that uid always has the
lower g) and a phase diagram containing only two phases results, as in the central diagram of
gure 4.4. This is consistent with the results of [171].
Finally, we note that the shape of e(v) should not be strongly aected by variations in the
shape of the interparticle potential, as long as the short range is preserved, while s(v), being
primarily geometric, should also be similar. The above arguments thus remain valid for more
realistic shapes of E (rij ), supporting the assertion in the introduction that the square-well
system should have a similar phase diagram to any other real or simulated system with a
suciently short-ranged potential.
4.4 Discussion
We shall now comment in turn on the two main subdivisions of this chapter, the investigations
made using the Einstein solid reference system and those made using multicanonical sampling
to directly connect coexisting phases.
As regards the comparison of thermodynamic integration and the expanded ensemble using
the Einstein solid (section 4.2), it is clear that in eciency of sampling (as quantied by the
size of error bars for a particular computational eort), the expanded method is competitive
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
255
with TI. Moreover, there is perhaps some evidence (see gure 4.8) that the expanded ensemble's
superior ability to deal with near-singular points gives it an advantage over TI both at = 0,
where it handles the problem of the rare hard-core overlaps implicitly, and near = 1, where
the integrand is rapidly-changing. However, because we used the TI results to bootstrap the
expanded ensemble, and because we did not develop a method to enable the expanded ensemble
to reach = 1 (though we believe such a development is possible), the comparison of the
two methods is not strictly fair, and we certainly cannot claim to have proven the expanded
ensemble's superority. We would add that the awkwardness experienced in transforming the
square-well solid into the Einstein solid, (described as part of in section 4.2.1 for TI, though it
also aects the expanded ensemble), is an argument in favour of avoiding the use of reference
systems, or of using only those that strongly resemble the system of interest.
We have also investigated matters pertaining to the use of the expanded ensemble alone. In
section 4.2.3 we have shown that the subdivision and overlapping of a chain of interpolating
ensembles does not inuence the accuracy with which the relative probabilities of the ends are
measured (see table 4.2), conrming the result derived in section 3.4.2. We have also presented
some results on choosing the spacing of states in expanded ensemble calculations, though we
have done only a little work on this important matter. Our results (see table 4.1) do show the
expected trade-o in random walk time between the number of states and the acceptance ratio,
leading to a fairly wide eciency maximum, though we are not able to make the quantitative
prediction of the acceptance ratio that might lead to an optimisation strategy. We might
speculate (though it is not a matter we have investigated) that the notion of the maintaining
of equilibrium within macrostates/subensembles (see section 4.3.3) may be relevant here: the
larger the separation of subensembles, the less representative of equilibrium within each one will
be the congurations that move into them from adjacent subensembles. Thus we would expect
that the amount of equilibration needed between attempted subensemble transitions would
increase, which would tend to favour the use of a close spacing of subensembles. We suggest
that inadequate equilibration may be the reason that spurious results have been reported in
some expanded ensemble-like techniques (such as Grand Canonical MC, where the process of
inserting or removing a particle is naturally discrete and in a dense system may produce what
is eectively a wide spacing of subensembles).
Now let us discuss the investigations made using the multicanonical NpT -ensemble (section
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
256
4.3). We have studied the square-well solid with = 0:01 and have obtained detailed information about f (v) for various over a range of volumes including the coexisting phases, thus
enabling the construction of the phase diagram (gures 4.25 and 4.26). The results are largely
consistent with those obtained by TI in the literature [69, 171], and we have grounds to think
that where there are discrepancies our results are the better ones. The full measurement of
f (v) and e(v) also provides some physical insight into how the short range of the potential
leads to the solid-solid phase transition (see section 4.3.8). The transition is possible because
the energy function e(v) has the following features, all within a narrow range of v not much
above close-packing: rst it has positive curvature, producing the dense solid, then negative
curvature, producing a minimum of canonical probability, and nally it is nearly at (the system having lost most of its internal energy), so that the curvature of the entropy s(v) is able to
stabilise the expanded solid. Because all this occurs for small specic volumes, thanks to the
short range of the potential, every particle remains trapped in the cage of its nearest neighbours
and both phases are solid. However, a full understanding of the physics of the transition would
require treatment of the uid too.
In order to apply the multicanonical ensemble to this problem, we have also had to extend
it, because the very long random walk time rw in this problem prevents a straightforward
implementation. We have solved this problem by increasing the use of transition probability
estimators to enable ecient parallelisation using many replica simulations. We foresee this
improvement being widely applicable, since it largely overcomes the problems caused by the
inherent serialism in the multicanonical method itself. In section 4.3.3 we show that ecient
convergence to the multicanonical distribution is produced by TP estimators, possibly in conjunction with FSS, and by continuing with the use of TP estimators in the `production' state
(a procedure extensively justied in section 4.3.4) we have arrived at an iterative scheme that
achieves to a very large extent the ideal of a combination of the `nding-' and `production'
stages.
Thanks to the separation in this simulation of updates that alter the preweighted variable
but preserve the conguration, and those that do the opposite, we have also, as described in
section 4.3.3, gained new insight into the importance of the preservation of equilibrium at constant v. Failure to equilibrate completely is the reason why convergence to the multicanonical
distribution is not immediate, and, because the estimator of the sampled distribution P n is
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
257
least aected by incomplete equilibration when P n is multicanonical18, it is also the reason why
the multicanonical distribution should still be used in the production stage. The simulations of
chapter 3 did not provide this insight because the same update procedure both made moves between dierent macrostates of the preweighted variable and equilibrated the microstates within
them.
For the square-well solid we have been able to tackle systems where rw is much longer than
the time devoted to each replica simulation; though because each replica now traverses only a
part of the macrostate space this is achieved at the cost of losing some of the improved ergodicity
that can usually be claimed for the multicanonical ensemble. To simulate systems where ergodic
problems are more severe, like spin glasses, this procedure would not be adequate; connection
to those regions of macrostate space where decorrelation is fast would then be essential. Indeed
the best procedure might well be to launch all replicas from these states instead of spreading
them uniformly. The procedure would then resemble that of section 3.2.5, and, as was the case
there, the time devoted to each replica would have to be O(rw ). This is still much less than
would be required using VS estimators, because the TP estimators would take account of the
starting distribution of the replica simulations. To be sure that there was no bias would require
that the VS histogram stayed at notwithstanding the biased launch points (see section 4.3.3).
It might in practice be found, as in section 3.2.5, that any trend to the histogram had little
eect. Alternatively, we might try modifying the method; we speculate that making the VS
histogram at by including only some of the transitions of certain replicas in Cij may have the
desired eect.
It will be noted that, in the form in which it is applied here, the multicanonical ensemble bears
some resemblance to the `multistage sampling' approach (section 2.1.2), where each simulation
would be constrained to walk (possibly multicanonically) within a narrow section of the full
range of macrostates, overlapping with its neighbours. From a VS histogram, the p.d.f. of V
within each section would then be estimated, and by imposing continuity between the sections,
it could be reconstructed for the whole range of macrostates. In the multicanonical approach,
there are no constraints on the movement of the replica simulations, but they do behave in
a similar way in practice because rw is so long. Nevertheless, the multicanonical approach
retains several advantages. To use multistage sampling, we must decide a priori how to divide
up the range of macrostates|how wide each section should be and how much it should overlap
18
strictly, when P n reects the (imposed) distribution of the replica simulations
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
258
with its neighbours. We also must decide how to match the results from the various histograms,
using just the overlapping states or using a function tted to the whole histogram. The use of
the single pooled histogram with TP estimators handles all of this transparently. Moreover, to
allow full equilibration (in the VS sense) over the range of macrostates in each section of the
multistage sampling simulation would require that each section should contain substantially
fewer macrostates even than are explored by one of the multicanonical replicas in the course
of its run. Thus, even though the multicanonical method does not have very good ergodic
properties here, the multistage sampling approach would be even worse in this regard. It
is also true that the multistage sampling approach would have the lower acceptance ratio of
volume-changing moves, because attempted transitions that would take a replica out of its
allowed section of the macrostate space must be rejected. Finally, we do not think that the
time spent in generating the multicanonical distribution (which is in any case typically only
about a quarter of the total time spent) could be saved by using multistage samplings. It is
true that the sampled distribution can be canonical within each section, but this makes the
overlapping process more dicult because fewer counts are recorded at one end of the range
of states so that uctuations have a greater eect. It is also possible, though we do not know
for certain, that the same problems that were found in using TP to estimate P n in the early
iterations of the multicanonical method (discussed at the end of section 4.3.3) would recur. It
is of course possible to use a sampled distribution in the multistage sampling method that is
multicanonical within each section, but this then requires the same sort of generation process
as does a distribution that is multicanonical over all the macrostates.
Let us nally make some comments on the eciency of this multicanonical procedure as
compared to the thermodynamic integration method used in [69, 171] (it is not possible to
make an absolutely fair comparison with TI because to integrate along a path of variable
V (see equation 2.3) would require the measurement of < p(V ) >, which, as we explained in
section 4.2.1, is inaccessible for this system). Though we do not, of course, know how much time
was required for the simulations of [69, 171], the fact that several dierent ranges of potential
were investigated, and the uid phase was included too, suggests that the thermodynamic
integration done there is appreciably faster. This is surely a consequence of the use of a
reference system which is `similar' to the square-well solid, whereas the multicanonical method
is an ab initio method. The accuracy of these TI results is thus dependent on the accuracy of
the equation of state used for hard spheres, but any slight errors induced by this are probably
CHAPTER 4. A STUDY OF AN ISOSTRUCTURAL PHASE TRANSITION
259
outweighed here by being able to use a very short integration path (except perhaps near the
critical point|see section 4.3.7). In a case where there was no suitable reference system, we
do not believe TI would be in any way superior. Even here, applied to a system to which it
is not particularly well-suited because of the hard-core in its potential (see section 4.3.2), the
multicanonical ensemble has some advantages: it seems to be more accurate near the critical
point (it could certainly be extended to yield accurate information about the critical region)
and it gives estimates of canonical averages away from the phase transition (see section 4.3.5).
We also consider it to be a more transparent way of obtaining information about f (v), because
it uses the interpretation of f (v) as the logarithm of a probability to relate it directly to the
probabilities that are measured in MC simulation.
Chapter 5
Conclusion
Look at the end of work, contrast
The petty done, the undone vast.
ROBERT BROWNING
The main problem we have addressed in this thesis is perhaps the generic problem of statistical mechanics: the identication of which phase is stable at particular given values of the
thermodynamic control parameters, leading to the construction of the phase diagram. As we
described in chapter 1, there are two fundamental ways that MC simulation can be used to
approach this problem. Firstly, each phase can be tackled separately, which requires a measurement of its absolute free energy. This is most usually done by connecting it to a reference
system in some way. Secondly, the free energy dierence between the two phases (closely related
to their relative probabilities) can be measured directly by allowing them to interconvert. This
removes the necessity of measuring the absolute free energy, but some way must be found to
overcome the free energy barriers that separate the two phases.
In our own investigations, we have concentrated on the ways that non-Boltzmann sampling
(particularly the multicanonical ensemble) can be used to tackle the phase coexistence problem.
This type of extended sampling can be used in either of the ways described in the rst paragraph:
it can be used to connect to a `reference state' at innite or zero temperature to measure
absolute free energy (as in much of chapter 3), or it can be used to overcome the free energy
barrier between two phases to measure their relative probability directly (as in section 4.3).
An idea that turned out to have particularly wide applicability was to use the information
260
CHAPTER 5. CONCLUSION
261
contained in transition probabilities between macrostates. The transition probability method
was was originally developed as a way of rapidly producing a distribution approximating to
the multicanonical (section 3.2.3). While very eective for doing this, it also provides a way
of removing the bias due to the starting state in a MC run which is not long in comparison
with rw (see sections 3.2.5 and, particularly, 4.3). This greatly facilitates the implementation
of the method on parallel computers, and may be more generally useful. Consideration of
transition probabilities also prompted us to do the analytic work on Markov chains (section
3.4.2) which led to the result (conrmed by simulation in section 4.2.3) that the accuracy of a
multicanonical/expanded ensemble calculation is not improved by subdividing the macrostates,
and which also enabled us to predict the expected variance of estimators from various sampled
distributions. While the resulting prediction of an `optimal' sampled distribution (section 3.4.4)
is probably mainly of curiosity value, variance calculations were useful in checking the validity
of TP estimators in section 4.3.4. Much of the work done was purely on the development of
the method, but physically interesting results were obtained, in section 3.3.3 on the high-M
scaling of the 2d Ising order parameter, and, particularly, in section 4.3, where aspects of the
solid-solid phase transition in the square-well solid have been investigated with greater accuracy
than before. This work thus provides a contribution to the growing body of evidence describing
variation in the nature of phase-diagrams as qualitative features of the interparticle potential
vary.
Is it likely that the multicanonical/expanded ensemble will ever replace thermodynamic integration as the `standard' method of free energy measurement? There seems to be no reason
why not in principle: Though there may be some cases where TI's eciency is higher (one is
mentioned in section 2.1.4), whenever we have compared the two over the same path, as in
sections 3.3.2 or 4.2, we have found the multicanonical approach to be at least comparable in
accuracy (something we would expect in the light of the results of section 3.4.2 on subdivision
of the range), while it can deal better with phase transitions or ergodic problems. It also has
the advantage of relating the desired free energies more directly to the probabilities that are
measured in MC simulation. Its most serious drawback is still the diculty of constructing
the required sampled distribution, but we can now begin to see that this disadvantage can be
largely overcome, using FSS, extrapolation, or the TP method. The TP method also helps
in the ecient parallelisation of the method, a matter which is bound to become increasingly
CHAPTER 5. CONCLUSION
262
important in the future. Therefore, we expect increasing use of multicanonical/ expanded ensemble methods, particularly for systems like spin-glasses or dense uids, where thermodynamic
integration experiences diculties because of ergodicity problems or phase transitions on the
integration path, or for the measurement of surface tension or the probability of large uctuations, where thermodynamic integration is not appropriate at all. For many other systems (e.g.
uids at fairly low densities), however, it seems unlikely that thermodynamic integration will
be entirely displaced, because the multicanonical ensemble does not oer any great advantage
to compensate for the greater eort needed to code it|thermodynamic integration can be done
with only a little modication of Boltzmann sampling simulations. We should also mention
here that, while we have not investigated them ourselves, several other new methods discussed
in the review section (chapter 2) were either extremely ecient or seemed to hold promise,
particularly the Gibbs ensemble (see section 2.2.7), the dynamical ensemble (section 2.2.5) and
the (MC )3 and tempered transitions methods (discussed in section 2.2.3).
Whatever sampling scheme is used, it seems likely that the free energy problem will always
remain `hard,' simply because of the distance in conguration space that must be sampled over.
When sampling within a single phase, all likely congurations are broadly similar, but in the
free energy problem, whether approached by connecting with a innite or zero temperature
reference state, or by tunneling between two coexisting phases, the simulation must move between congurations which are qualitatively very dierent in structure. Because the Metropolis
algorithm works by accumulating small changes to a starting conguration to produce others,
it inevitably takes a long time to build up large dierences. Thus (as we found in section 3.4.3),
once we have removed the free energy barriers that produce a tunneling time that is exponential in the system size, further changes to the sampled distribution produce no more than a
marginal improvement in the eciency of sampling. to make further large improvements will
require the use of algorithms that can take larger steps through conguration space than the
Metropolis algorithm. Though we have not done any work on them in this thesis, some such
algorithms are already being developed: [116, 118, 115]. We think it is likely that the hybrid
MC technique [39] could also be fruitfully combined with multicanonical sampling.
Finally, let us speculate on some interesting problems that might in the future be tackled
with the multicanonical/expanded ensemble method. Multiple minima problems, for example
the simulation of spin-glasses and protein folding, are examples of applications which demand
the good ergodicity properties of multicanonical-like methods, and some work has already been
CHAPTER 5. CONCLUSION
263
done on these. We also consider that the development of the technique's ability to sample
across phase boundaries would be interesting and productive. This has already been done for
isostructural solid-solid (chapter 4) and uid-uid [121] transitions; we consider it possible that
a way could also be found to sample reversibly across a solid-uid phase boundary, perhaps
using the order parameter introduced in [24], that measures `crystallinity,' as the preweighted
variable. This would be, to our knowledge, the rst simulation of reversible melting and freezing.
As well as enabling the simulation to be done more elegantly, without needing separate reference
systems for the solid and liquid, the pseudodynamics of such a simulation could give insight
into the process of nucleation in real systems, which is itself a subject of considerable interest.
Appendix A
Exact Finite-Size Scaling Results
for the Ising Model
Let the Gibbs free energy density in the innite-volume limit be gb . We shall examine the
behaviour of the nite-size estimator g(L) = G(L)=L2 . For the critical temperature only, we
shall also discuss F (M ; L). Now, the eect that nite size has on a system's properties depends
on its boundary conditions, and on where in its phase diagram it is located|in a single phase
region, at the coexistence of two or more phases, or near the critical point. (For the 2d Ising
p
model, the critical point occurs where the coupling = c = (1=2) ln( 2 + 1) 0:440686.) We
shall consider rst the case where the system has periodic boundary conditions (P.B.C.).
High Temperature (Small )
For many systems with short-ranged forces, in the single phase region, with P.B.C., g(L) has
only exponentially small corrections [185, 186, 183], so
g(L) = gb + O(L 2 e L=L )
0
(A.1)
The source of the correction is the interaction of a particular spin or particle with its `image' in
the periodic boundary; in the absence of long-range forces this interaction has to be mediated
through O(L) spin-spin interactions, and away from criticality correlations decay exponentially
[12]. The constant L0 will therefore be a function of the inverse temperature , having similar
behaviour to the correlation length .
264
APPENDIX A. EXACT FINITE-SIZE SCALING RESULTS FOR THE ISING MODEL 265
Low Temperature (Large )
In a single phase region, there are once again only exponential corrections [183]. However,
for the 2d Ising model, the line H = 0 for > c is the line of coexistence of two phases of
opposite magnetisation. Therefore we must now consider the case of phase coexistence. At the
coexistence point of q + 1 phases, with P.B.C. and away from criticality, the free energy has
been shown [182] to have the following form for a large class of lattice models:
G(L) = 1 ln
"
q
X
m=0
#
exp( Ld gm ( )) + O(Ld e L=L )
0
(A.2)
where gm is a `metastable free energy' of phase m: gm = gb if m is stable, otherwise gm > gb .
For the 2d Ising model, there is an exact symmetry between the two phases, so both terms in
the sum in equation A.2 are the same. This leads to
g(L) = gb 1 L 2 ln 2 + O(e L=L )
0
The Critical Region ( = c)
Before discussing the scaling of g(L), we shall discuss the scaling of the Helmholtz free energy
F (M ; L) and thus of PLcan (M ) at this temperature. We have seen that at high temperatures
FL (; M ) has a single minimum at M = 0 and at low temperatures two minima at M < M >. In both cases as L increases the fractional width of the peaks of PLcan (M ) decreases,
so that at high temperatures PLcan (; M ) is concentrated progressively in the centre of the
distribution, and at low temperatures in its wings. The crossover between these two types
of scaling occurs at the critical point, = c , where PLcan (; M ) has the property that the
relative weights of centre and wings remain constant, in the sense that P can (0)=P can(M ) =
constant ( 0:04 for the 2d Ising model), independent of L. The scaling of PLcan (c ; M ) is quite
dierent from its scaling away from c ; PLcan(c ; M )dm = p (x)dx, where x = ML dL d=(1+)
and where p (x)dx is a universal function and a constant for all the systems in a particular
universality class. For the universality class that includes the 2d Ising model, p (x) has a doublepeak structure and = 15. Thus the modes of PLcan (c ; M ) lie at M where M L15=8 ,
which is to say that the modes of PLcan (c ; m) lie at m L 1=8 . This, together with the
constant value of P can (0)=P can(m ), which is easily large enough for easy tunneling between
the two peaks, produces the large (almost extensive in the system size) uctuations in the order
APPENDIX A. EXACT FINITE-SIZE SCALING RESULTS FOR THE ISING MODEL 266
parameter M that are typical of the critical point.
Now we return to consideration of the Gibbs free energy. Near the critical point with P.B.C.,
it is well established [17, chapter 1] that the free energy g(L) can be decomposed into a singular
part gs , which is zero outside the critical region, and a non-singular `background' part gb :
g(t; H ; L) = gs (t; H ; L) + gb (t; L)
where t = (T Tc )=Tc. It follows from renormalisation [187, 17] or other arguments that gs is
given by
gs (t; H ; L) = L dY (atL1= ; bHL= )
(A.3)
where is the critical exponent describing the divergence of the correlation length, ( = 1 for
the 2d Ising model), and Y is a universal function that goes quickly to zero as its arguments
increase. It is also found [17, chapter 1] that the dependence of gb on L is much weaker than that
of gs , and can be neglected, from which it follows that at the critical point itself (t = H = 0)
we have
g(L) gb + 1 L 2U0
where U0 = Y (0; 0) is universal. The next term in this series is non-universal, but might be
accessible by simulation methods that concentrate on nite-size scaling; the result [17, pps 9-15]
is in general:
g(L) gb + 1 L 2U0 + L d = a1
Where (non-universal) is the critical exponent belonging to the rst (largest-) `irrelevant'
scaling eld u, and a1 = udY=du(0; 0) is the corresponding amplitude (also non-universal). In
general need not be integral, but in the 2d Ising case it is: analysis in Ferdinand and Fisher
[10, eqn 3.39] shows that
g(L) = gb + 1 L 2U0 + O(L 4 ln3 L)
(A.4)
So = 2. The amplitude U0 is also known exactly for the 2d Ising universality class: it is given
by [17, 10]; U0 = ln(21=4 + 2 1=2 ) = 0:6399.
It is instructive to compare this with the amplitude of the L 2 correction for > c , which
APPENDIX A. EXACT FINITE-SIZE SCALING RESULTS FOR THE ISING MODEL 267
was simply ln 2 0:69315, with the 2 coming from the number of coexisting phases (see
equation A.2 and the discussion thereof). However, 21=4 + 2 1=2 0:6399, so in some sense the
critical point is behaving like the coexistence point of slightly less than two phases. This is in a
way physically reasonable, since at criticality large uctuations continually take the system back
and forth between congurations which are themselves similar to those in the truly two-phase
region.
Other Boundary Conditions
Finally let us consider the case where we have xed boundaries. Away from criticality and in
the single-phase region (where the there were previously only exponential corrections) we now
nd [17, 185, p20]
g(L) = gb + L 1 g(s) + L 2 g(e) + + L dg(c) + O(e L= )
(A.5)
where g(s) is the surface free energy due to the presence of (d 1)-dimensional interfaces in
the system, g(e) is due to (d 2)-dimensional edges and so on.
At the critical point this type of scaling and that described by equation A.3 are superimposed
[17, p21 et seq.]; the non-singular part of the free energy gb has an expansion similar to that
of equation A.5, and the singular part gs has one similar to that of equation A.3, though some
modication to this scaling form may be necessary [17, p23].
Appendix B
The Double-Tangent
Construction
It was shown in section 1.1.3 (equation 1.16) that, as a consequence of the scaling of the
probability distribution of the order parameter, which becomes increasingly sharply peaked
about its maximum or maxima at M (H ) or p (V ) as system size increases, we can use the
minimum of FL (; M ) HM or FL (; V ) + pV to give an estimator of g with an error that
disappears as 1=Ld.
This suggests the following method for nding the coexistence value of the eld in cases where
the simulation is most easily performed with a constant order parameter. We shall thus describe
the method for the o-lattice system, where this is more usually the case, and where pcoex is
not determined by symmetry. Suppose we have some method that can measure the absolute
Helmholtz free energy FL (; V ). This should be used to nd FL (; V ) for various V around V of one phase, then the entire process should be repeated for some values of V characteristic of
the other phase. From these measurements it is then possible to construct FL (; V ) + pV for
various values of p to nd pcoex This is most easily done by the double-tangent construction.
We begin from ln P can (; p; V ) = (FL (; V ) pV ) + constant. Equation 1.16 shows that,
to good accuracy, phase coexistence is found when ln P can (; p; V ) has two maxima of equal
heights, at VA in phase A, say, and VB in phase B . This means that
FL (; VA ) + pcoex VA = FL (; VB ) + pcoexVB
268
APPENDIX B. THE DOUBLE-TANGENT CONSTRUCTION
i.e.
269
FL (; VA ) = FL (; VB ) pcoex (VA VB )
(B.1)
which has the form y = y0 + m(x x0 ).
VA and VB themselves are found by solving
@ ln P can = 0
@V
i.e., at coexistence,
@FL = @FL = p
coex
@V VA @V VB
(B.2)
Consideration of equations B.1 and B.2 together shows that pcoex is given by the negative of
the gradient of the common tangent to the two branches of FL (; V ). The points of tangency
give VA and VB . The construction is shown schematically in gure B.1.
F( β ,V)
FA
FB
VA*
VB*
V
Figure B.1. A schematic illustration of the double-tangent construction for the o-lattice pV
case to nd pcoex = -gradient of tangent. The solid lines represent the parts of FL (; V ) that
will typically be measured; the dotted line shows the part that is typically not measured.
We note that this method has the advantage that it is not necessary to map out the whole
of FL (; V ); it suces to have those parts around the eventual minima in FL (; V ) + pV , and
though of course we will not be sure a priori of their exact location, we are likely to have quite a
good idea. However, the method has the disadvantage that the absolute free energies of the two
APPENDIX B. THE DOUBLE-TANGENT CONSTRUCTION
270
phases are required, since the relative positions of the two branches of FL (; V ) must be known
before the double-tangent construction can be applied, and it uses the equal-heights criterion for
phase coexistence when it is now thought that the equal-weights usually has smaller nite-size
error [23, 182, 183].
Appendix C
Statistical Errors and Correlation
Times
Because the Monte-Carlo method uses random numbers to generate a sample of the congurations of the simulated system, the estimates of thermodynamic quantities that it produces are
associated with a statistical error. We shall now describe the behaviour of this error, taking
the example of a Boltzmann-sampling simulation. Though the concepts introduced are general,
some of the results are specic to this (e.g. equation C.5). The analogues of equation C.5 for
multicanonical sampling are extremely complex; see [116, 119].
Suppose we generate a set fi g; i = 1 : : : Nc of congurations from a Markov chain whose
stationary probability distribution is P (). We can dene some operator O (internal energy,
magnetisation...) on each one, giving the set fOi g. The sample mean O, is an estimator of the
P
expectation value < O >= fg O()P (), which, for a Boltzmann sampling algorithm, is the
canonical average of O.
Dene the quantity
(O)2 = (O < O >)2 :
(O)2 is a measure of the statistical error in the estimate O of < O >; it is called the Standard
Error of the Mean (SEM).
Averaging over all possible data sets of the Nc observations, we nd
p
< (O)2 > = < O2 > < O >2
271
APPENDIX C. STATISTICAL ERRORS AND CORRELATION TIMES
272
Nc
Nc
X
X
= < N12 Oi Oj > < O >2
c i=1
j =1
NX
Nc
c 1 X
= < N12 (Nc O2 + 2
Oi Oj ) > < O >2
c
i=1 j =i+1
#
"
Nc
2
X
1
i
1
<
O
O
>
<
O
>
1
i
2
2
= N (< O > < O > ) 1 + 2 (1 N ) < O2 > < O >2
c
c
i=2
(C.1)
where we have used < O2 >=< O2 > and assumed that we have no preference for a particular
starting state, so < Oi Oj > depends only on i j . We shall now simplify the notation by writing
var(O) =< O2 > < O >2
and
2
i = <<OO1 O2 i>> <<OO>>2
(C.2)
To estimate var(O) in practice we use the sample variance
s2 = N1
N
c
X
c i=1
(Oi O)2
. By expanding < O2 > as in the derivation of equation C.1, it can be shown that < s2 >=
(Nc 1)var(O)=Nc .
If adjacent congurations are uncorrelated, all the i are zero and equation C.1 reduces to
< (O)2 >= var(O)=Nc
(C.3)
(see any statistics textbook, for example [188, chapter 14] for another derivation of this). However, because the Monte-Carlo method generates a new conguration by making a small change
to the existing one, adjacent congurations in the Markov chain are in practice always highly
correlated; that is to say, the i remain appreciable until i is quite big. How fast the correlations
decay depends on how well the system can explore its conguration space, which depends in
turn both on the algorithm, which determines the matrix R of allowed transitions, and on the
sampled distribution, which may make certain transitions very unlikely even if Rij 6= 0. In
order to quantify the eect of correlations, we dene the correlation time O of the observable
APPENDIX C. STATISTICAL ERRORS AND CORRELATION TIMES
O, by
O =
1
X
i=1
i
273
(C.4)
which will be dierent for dierent observables in a single simulation. O is measured in units
of the time for a single MC update. Now let us assume that Nc is big enough that O 1 and
that (1 (i 1)=Nc) can be put equal to one for all terms where i is not negligibly small; if
this is not true then it implies that the total sampling time is only of the order of O and the
results will in that case be irredeemable by any amount of variance analysis (we are thinking
here of congurations generated by a single Markov process; this statement is not necessarily
true if many independent simulations are run together in parallel, as in section 4.3). Putting
all this into equation C.1 gives ([53]):
< (O)2 > (1=Nc)var(O)(1 + 2O )
(C.5)
It is normally found that the i decay exponentially, so
i 1 exp( (i 1)=O )
. In that case a slightly better approximation than equation C.5 is [50]
< (O)2 > (1=Nc)var(O)(1 + 1 2 )
where = exp( 1=O ). Other improved approximations, of an accuracy not normally required, are discussed in [189].
In any case, it is clear that if the correlation time O is large it will dominate the error, and
in fact it may be a waste of eort to record O() for every conguration of the Markov chain.
If we sample at regular intervals of k updates, then
k
< (O)2 >= (k=Nc )var(O)(1 + 1 2 k )
(C.6)
which stays within a few percent of its minimum value until k O , and then increases
linearly with k with gradient 1. This tells us that there is no advantage in collecting samples at
intervals more frequent than O . However in practise doing so does little harm, since recording
O() and doing the analysis usually require negligible time compared with the generation of
APPENDIX C. STATISTICAL ERRORS AND CORRELATION TIMES
274
the congurations themselves.
There are two valid approaches to the estimation of < (O)2 > in practice; one is to measure
the correlation functions i and then to estimate O by summing them. It is found to be
essential to cut o the summation at some point, for example when for the rst time a negative
term in the series is encountered, since the estimates of i at large i are very `noisy' and may
seriously distort the answer [53, 50], [57, section 6.3]. We can then use equations C.5 or C.6
directly. Alternatively, we can simply try to measure the standard error of O directly. To do
this we block the congurations into m = 1 : : : Nb blocks (O(10) is enough) and estimate O
on each block [45, chapter 6]. Then we measure the mean and variance of the blocked means
fOm g, and use the simple formula
< (O)2 >= var(O )=Nb
since the blocks should be long enough for the block means to be uncorrelated (if they are not
then Nc is not large enough for good results anyway). A variant of this is to dene estimators
OJm on all the blocks of congurations except the mth, and then to nd the mean and variance
of these (see appendix D). It is the blocking approach that we have generally used to measure
(O)2 in this thesis; however, we shall still consider O on occasion, particularly in section 3.4,
where we shall use the fact that it can be expressed in terms of correlation functions to make
an analytic prediction of the variance of estimators from various sampled distributions.
We should note that, whatever algorithm we are working with, we must expect to have to update all the particles (or at least a constant fraction of them) to get uncorrelated congurations.
This implies that the best we can do1 is have O =t Ld.
If we are interested in calculating observables like < E > within a single phase, O normally
has a behaviour not dissimilar to the ideal and accurate answers can be obtained without too
much eort by simple algorithms like the single-spin-ip Metropolis, sampling the Boltzmann
distribution. However if we are trying to measure the free energies associated with phase
transitions then O can be very large indeed.
1 Strictly it is the amount of computing power that goes like Ld . If we have a parallel computer then we may
apparently do better, since for small L some processors may be unoccupied, and we can bring them in as we
increase L, thus apparently keeping O constant as L increases. Once all the processors are occupied the given
scaling law applies.
Appendix D
Jackknife Estimators
The estimators that we produce in multicanonical simulations, like O~ in equations 3.6 and 3.7,
are ratio estimators, that is to say, they are ratios of sums, and are in fact slightly biased:
< O~ >xc6=< O >can. It can be shown (see [190, p. 80]) that
O~ ; E C (E ) exp( E )[1=P xc(E )])
P
< O~ >xc < O >can= cov(
< C (E ) exp( E )[1=P xc(E )] >
P
E
xc
P
This bias will not be zero unless O~ and E C (E ) exp( E )[1=P xc(E )] are uncorrelated;
typically it is of order 1=Nc. The same is true of other biased estimators; for instance, the
estimator of free energy we shall use below is the logarithm of a ratio estimator.
p
It should be noted, however, that we expect the standard deviation of O~ to go like 1= Nc,
and we can usually safely regard the bias as negligible in comparison with this. However,
to be sure that the bias is negligible we have in this chapter generally used double-jackknife
bias-corrected estimators [191] for our estimates of canonical averages and their error bars.
A jackknife estimator is dened in the same way as a normal estimator but on a subset of
the data. We divide up the Markov chain into b sets of Nb congurations, so that we have
b histograms, C j (E ); j = 1 b, with b Nb = Nc. Then the j th jackknife estimator O~jJ is
dened like O~ but on all the the pooled data from all the b histograms except the j th.
J = Pb O~ J , while the standard error of the mean (the
We can dene the mean of these, O~AV
j =1 j
Pb
J O~ J )2 . (Simple substitution of O(E ) = E shows
error bar) is given by s2J = (b 1)=b j=1 (O~AV
j
that this reduces to equation 1.24, the normal expression for the standard error in the mean
in the unbiased case.) These single-jackknife estimators provide an estimate of variance that
275
APPENDIX D. JACKKNIFE ESTIMATORS
276
is somewhat more robust (less aected by a small sample size) than the usual blocking. They
can also be used to produce an estimator which has a reduced bias. We assume that the bias of
O~ = c1 =bNb + c2 =(bNb)2 + ; then the bias of all the O~jJ = c1 =(b 1)Nb + c2 =((b 1)Nb)2 + .
J is then unbiased
As can be seen by substitution, the estimator O~ JC = bO~rat (b 1)O~AV
to order 1=Nc. However we no longer have an estimate of the standard error of this new
estimator. To obtain both we can extend the approach, dening double-jackknife estimators:
O~jkJJ is dened on the data with both the j th and kth blocks of congurations omitted (j 6= k).
P
JJ are a set of double-jackknife biasThen OjJJC = (b 1)O~jJ (b 2)(b 1) 1 k6=j O~jk
corrected estimators and we can calculate their mean and variance as for the O~jJ above. For a
fuller explanation of the use of jackknife estimators see [190], and for an account of their use in
multicanonical simulations see [191].
Appendix E
Details of the Square-Well Solid
Simulation
We wish to carry out a simulation of a 3d fcc square-well solid in a constant volume. The
primitive unit call of the fcc lattice consists of four particles arranged in a tetrahedron at
(0; 0; 0), (0; 1=2; 1=2), (1=2; 0; 1=2) and (1=2; 1=2; 0), where the vectors are in units where the
side of the cubic unit cell has unit length. For convenience we wish to simulate in a cubic
volume and to apply periodic boundary conditions to remove the eects of surfaces, edges and
corners (see A). Suitably-sized assemblies of particles then consist of n3 unit cells arranged
in a cube, with n = 1; 2; 3 : : :; the rst few such systems thus contain 4; 32; 108; 256; 500 : : :
particles. To make particle moves we shall use the usual Monte-Carlo procedure of generating
trial random displacements of randomly-chosen particles and accepting or rejecting them using
the Metropolis method.
As regards the choice of system size, there is clearly a trade-o between ease of simulation
and the accuracy in principle achievable in a simulation with unlimited run-time. Because
phase transitions are rounded o and shifted in a nite volume [17], it is ideally desirable to
simulate large systems, to get closer to the innite-volume limit; however larger systems clearly
demand more computer time and the time required for equilibration and to sample all accessible
parts of the phase space quickly becomes excessive. This is particularly true of simulations like
these, where we are interested in free energies; we have seen (section 3.3.2) that thermodynamic
integration requires increasingly many simulation points around a phase transition, because of
277
APPENDIX E. DETAILS OF THE SQUARE-WELL SOLID SIMULATION
278
rapid variation of the integrand, while in a multicanonical/expanded ensemble simulation the
range of macrostates that must be covered to take in both phases is itself extensive. It turns out
in fact that the multicanonical ensemble with a hard core potential is particularly demanding:
see section 4.3.2. Therefore we have in practice chosen to work with quite small systems of 32{
256 particles, and, (at least in section 4.3), to use nite-size scaling to extrapolate the results
[17, 23, 183, 182].
This choice then presents some problems in relation to our chosen computer, the Connection
Machine CM-200. The CM consists of 16k processors grouped into 512 processing nodes. Arrays
are spread across this machine and corresponding elements operated on in parallel. Thus the
obvious way of mapping the square-well solid onto the CM would be geometric decomposition:
break up the simulation volume and assign a region to each processor. Non-interacting particles
may be updated in parallel1, which in this case, where the interparticle potential is (extremely)
short ranged, would require information only from within each processor and, in some cases,
from nearest neighbour processors. However, if the array is too small, then some processors
are assigned no data and are deactivated; clearly this is an inecient use of the machine. The
minimum size of array that uses all the processors depends on the geometry, and in our case
turns out to have 163 =4k elements. Therefore, if we simulated a single system with geometric
decomposition, it would have to contain at least 4k particles, which is much too large to be
dealt with easily. We might, then, think instead of using just primitive parallelism, where we
would simulate 4k independent replicas of a single smaller simulation, with each simulation
being completely local to a processing node and all updates within each simulation being serial.
This strategy, which has the additional advantage of eliminating interprocessor communication
in particle moves, is in fact the way that the simulations of 32 particles have been carried
out. However the machine does not have enough memory to treat systems of more than 108
particles this way. To deal with them, we need a mixture of primitive parallelism and geometric
decomposition. The way we have implemented this in practice is shown in gure E.1. The large
cube shows the way that the parallel dimensions of the array that holds the particles' coordinates
are laid out; it can be thought of as showing the layout of parallel `virtual processors' in a 3d
grid. The small shaded cubes show the array elements that belong to a single simulation;
1 we emphasise that we are here considering only updating the positions of the particles, and in no simulation
performed here does this result in a change of the variable that is preweighted by . As we saw in section 3.2.5,
MC updates where may change introduce an eective coupling between the particles and prevent parallel
updating.
APPENDIX E. DETAILS OF THE SQUARE-WELL SOLID SIMULATION
279
the particles within them exist in a single physically continuous volume, even though they are
separated in the coordinate array.
Figure E.1. The layout of a single simulation volume within a 161616 parallel array holding
the particles' positions. Left: each simulation divided into eight subvolumes, 512 simulations
run in parallel. Right: each simulation divided into 64 subvolumes, 64 simulations run in
parallel.
In the diagram on the left each simulation is divided into 23 = 8 `subvolumes,' and distributed
over eight `virtual processors.' The total number of simulations run in parallel is (16=2)3 = 512.
With eight unit cells 32 particles per subvolume, we would then have 256 particles per
simulation in total. Similarly in the diagram on the right each simulation is divided into 64 and
64 of them are run in parallel. With 32 particles per subvolume this would imply 2048 particles
in total, which is excessive for our purposes, but nevertheless illustrates how the system may be
scaled. The numbers of unit cells/subvolume and subvolumes/simulation (which may of course
be equal to one) and the total size of the coordinate array are controlled by parameters at
compile time and to change them does not require rewriting of code. The relevant parameters
are
NPR3, the edge length of the array that holds the coordinates (so the diagrams in gure E.1
both have NPR3=16).
NEDGE, the number of subvolumes along each edge of each simulation (so the diagrams in
gure E.1 have NEDGE=2 (left) and NEDGE=4).
LPV, the number of unit cells along the edge of a subvolume.
Related quantities are
N , the number of particles/simulation, given by N =4(NEDGE*LPV)3.
APPENDIX E. DETAILS OF THE SQUARE-WELL SOLID SIMULATION
NSIM
280
Ns , the number of independent replica simulations run in parallel, given by NSIM =
(NPR3/NEDGE)3.
The reason that the subvolumes of each simulation are split up across the array is that by
doing this each subvolume can access the coordinates of particles in neighbouring subvolumes
in a single periodic shift operation where all data moves the same distance. For example,
single shifts of NPR3/NEDGE are used to access the coordinates of particles in subvolumes that
share faces; repeated shifts at right-angles are required to get at neighbours that share edges or
corners. Particles in subvolumes that are not nearest neighbours are too far apart to interact
before or after a particle move, so we do not need to check them. If the subvolumes were
grouped together, slower messages would have to be sent using the general communications
router, because the periodic boundary conditions of each simulation would not match those of
the array as a whole.
Having described how each simulation is (or can be) split up into subvolumes, it remains to
describe how the particles are treated within each subvolume. All particles within a subvolume
are local to a processing node and are indexed using extra dimensions, declared :serial, in
the coordinate array. The normal way to treat this problem is to keep a neighbour list [45],
which, for each particle, records all the other particles that it may interact with. However,
to reference and update the neighbour lists, which are in general dierent for each particle,
requires indirect addressing (indices on indices), and this generates slow communication code
on the CM even when the particles are within the same processing node. For this reason we
did not use neighbour lists.
In fact, because we are dealing only with solids, each particle always stays near its lattice site
and so would have had the same, unchanging neighbour list of its twelve nearest neighbours.
Given this, indirect addressing could have been avoided; however we opted at the design stage for
a method that could be applied to uids with little modication, since it was then our intention
to investigate them as well. This led us to the following method: each simulation subvolume
is further subdivided into eight octants, and they are cycled through in a xed order, with a
particle in each being picked at random for a trial displacement (the displacement is chosen at
random within a small sphere of radius x; x itself is chosen to give an acceptance ratio of
particle moves of 1=2). Provided that the subvolume is big enough, this ensures that particles
in corresponding octants of dierent subvolumes of the same simulation cannot interact before
or after they are moved, and so can be updated in parallel; we require, for the general case
APPENDIX E. DETAILS OF THE SQUARE-WELL SOLID SIMULATION
281
where the particles need not be on their lattice sites, that the side length L of the subvolume
should satisfy
L=2 (1 + ) + 2x
For the square-well solid, each particle move requires the calculation of a minimum of 24 interactions (12 with the nearest-neighbours before the move and 12 with them after it). However,
we do not in general know which the nearest-neighbours are, because there are no neighbour
lists, and so must test all possible candidates. This can cause substantial ineciency; for example with NEDGE=1 and LPV=2 (N = 32 and primitive parallelism only) we must test all the
other particles, that is to say, we must calculate 2 31 interactions for each particle move. For
NEDGE=1 and LPV=3 (N = 108) and NEDGE=2 and LPV=2 (N=256) this rises to 2 107, and in
the second case 2 76 of the interactions require interprocessor communication.
The best performance for the solid is in fact obtained by using LPV=1, with NPR3 increased
to keep NSIM constant. Each `virtual processor' then contains only four particles (so half
the octants can be skipped over), and the process of checking within the processor volume
and within neighbouring octants nds just the twelve nearest neighbours, as required, and no
others. Chosing LPV=1 at solid densities violates the general equation for L given above but,
since the forces are short-ranged and the lattice prevents large particle movements, it is still the
case that interacting particles are never updated simultaneously. Because we no longer waste
time calculating interactions that are always zero, this procedure is slightly faster (i.e. does a
slightly greater total number of particle updates per second) than pure primitive parallelism for
N = 32, even though interprocessor communication is now involved. For N = 108 it is about
ve times faster than primitive parallelism alone.
Bibliography
[1] C. Truesdell, The Tragicomical History of Thermodynamics 1822-1854, Springer-Verlag,
Berlin (1980).
[2] A. B. Pippard, Elements of Classical Thermodynamics, Cambridge University Press, Cambridge (1957).
[3] H. B. Callen, Thermodynamics and an Introduction to Thermostatics, John Wiley & Sons,
New York (1985).
[4] K. Huang, Statistical Mechanics, John Wiley & Sons, New York (1963).
[5] R. P. Feynman, Statistical Mechanics|A Set of Lectures, W. A. Benjamin inc., Reading,
MA (1972).
[6] D. Chandler, An Introduction to Modern Statistical Mechanics, Oxford University Press,
Oxford (1987).
[7] J. R. Waldram, The Theory of Thermodynamics, Cambridge University Press, Cambridge
(1985).
[8] S.-K. Ma, Statistical Mechanics, World Scientic, Singapore (1985).
[9] Phase Transitions and Critical Phenomena, ed. C. Domb & M. S. Green, Academic Press,
London (1975).
[10] A. E. Ferdinand & M. E. Fisher, Phys. Rev. 185, 833 (1969); B. Kaufman, Phys. Rev. 76,
1232 (1949).
[11] L. Onsager, Phys. Rev. 65, 117 (1944).
282
BIBLIOGRAPHY
283
[12] B. M. McCoy & T. T. Wu, The Two-Dimensional Ising Model, Harvard University Press,
Cambridge, MA (1973).
[13] R. Baierlein, Atoms and Information Theory, W. H. Freeman & Co. (1971).
[14] E. T. Jaynes, in Maximum Entropy and Bayesian Methods, ed. P. F. Fougere, Kluwer
Academic Publishers, Dordrecht (1992).
[15] J. L. Lebowitz & E. H. Lieb, Phys. Rev. Lett. 22, 631 (1969)
[16] M. Plischke & B. Bergersen, Equilibrium Statistical Physics, Prentice Hall, New Jersey
(1989).
[17] Finite Size Scaling and Numerical Simulation of Statistical Systems, ed. V. Privman, World
Scientic Publishing, Singapore (1990).
[18] D. P. Woodru, The Solid-Liquid Interface, Cambridge University Press, London (1973).
[19] Jooyoung Lee, M. A. Novotny & P. A. Rikvold, Phys. Rev. E 52, 356 (1995).
[20] K. Binder & D. P. Landau, Phys. Rev. B 30, 1477 (1984).
[21] Murty S. S. Challa, D. P. Landau & K. Binder, Phys. Rev. B 34, 1841 (1986).
[22] J. Lee & J. M. Kosterlitz, Phys. Rev. Lett. 65, 137 (1990)
[23] C. Borgs & R. Kotecky, Phys. Rev. Lett. 68, 1734 (1992).
[24] J. S. van Duijneveldt & D. Frenkel, J. Chem. Phys. 96, 4655 (1992).
[25] E. Buenoir and S. Wallon, J. Phys. A 26, 3045 (1993).
[26] P. Martin, Potts Models and Related Problems in Statistical Mechanics, World Scientic
Publishing Co., Singapore (1991).
[27] R. J. Baxter, Exactly Solved Models in Statistical Mechanics, Academic Press, London
(1982).
[28] J. A. Barker & D. Henderson, Rev. Mod. Phys 48, 587 (1976).
[29] A. J. Gutmann & I. G. Enting, J. Phys. A 21, L165 (1988).
[30] M. E. Fisher, Rev. Mod. Phys 46, 597 (1974).
BIBLIOGRAPHY
284
[31] J. Amit, Field Theory, the Renormalisation Group and Critical Phenomena, McGraw-Hill,
New York (1978).
[32] G. S. Pawley, R. H. Swendsen, D. J. Wallace & K. G. Wilson, Phys. Rev. B 29, 4030
(1984).
[33] J. P. Hansen & I. R. McDonald, Theory of Simple Liquids (2nd edition), Academic Press,
London (1986).
[34] J. K. Percus & G. J. Yevick, Phys. Rev. 110, 1 (1958).
[35] J. E. Mayer & M. G. Mayer, Statistical Mechanics, McGraw-Hill, New York (1940).
[36] M. Parrinello & A. Rahman, Phys. Rev. Lett. 45, 1196 (1980).
[37] J. J. Erbenbeck & W. W. Woad, in Statistical Mechanics Vol. 6b, ed. B. J. Berne, Plenum
Press, New York (1977).
[38] J. Kushick & B. J. Berne, in Statistical Mechanics Vol. 6b, ed. B. J. Berne, Plenum Press,
New York (1977).
[39] S. Duane, A. D. Kennedy, B. J. Pendleton & D. Roweth, Phys. Lett. B 195, 216 (1987).
[40] B. Mehlig, D. W. Heerman & B. M. Forrest, Phys. Rev. B 45, 679 (1992).
[41] S. Nose, J. Phys. Cond. Mat. 2, SA115 (1990).
[42] The Monte Carlo Method in Condensed Matter Physics, ed. K. Binder, Springer-Verlag,
Berlin (1992).
[43] K. Binder & D. W. Heerman, Monte Carlo Simulation in Statistical Physics: An Introduction, Springer-Verlag, Berlin (1986).
[44] O. G. Mouritsen, Computer Studies of Phase Transitions and Critical Phenomena,
Springer-Verlag, Berlin (1985).
[45] M. P. Allen & D. J. Tildesley, Computer Simulation of Liquids, Clarendon Press, Oxford
(1987).
[46] K. Binder, J. Comp. Phys. 59, 1 (1985).
BIBLIOGRAPHY
285
[47] D. Frenkel, Free Energy Computation and First-Order Phase Transitions, in MolecularDynamics Simulation of Statistical-Mechanical Systems, ed. G. Ciccotti & W. G. Hoover,
North-Holland, Amsterdam (1986).
[48] D. Frenkel, Monte Carlo Simulations, in Computer Modelling of Fluids, Polymers and
Solids, ed. C. R. A. Catlow, C. S. Parker & M. P. Allen, Kluwer Academic Publishers,
Dordrecht (1990).
[49] W. H. Press, B. P. Flannery, S. A. Teukolsky & W. T. Vetterling, Numerical Recipes,
Cambridge University Press, Cambridge (1989).
[50] C. J. Geyer, in Computer Science and Statistics, Proceedings of the 23rd Symposium Interface, 156 (1991).
[51] C. J. Geyer & E. A. Thompson, J. R. Statist. Soc. B 54, 657 (1992).
[52] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller & E. Teller, J. Chem.
Phys. 21, 1087 (1953).
[53] H. Muller-Krumbhaar & K. Binder, J. Stat. Phys. 8, 1 (1973).
[54] K. S. Shing & K. E. Gubbins, Mol. Phys. 49, 129 (1982).
[55] R. H. Swendsen & J.-S. Wang, Phys. Rev Lett. 58, 86 (1987).
[56] U. Wolfe, Phys. Rev. Lett. 62, 361 (1989).
[57] R. M. Neal, Probabilistic Inference Using Markov Chain Monte-Carlo Methods, Technical
Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto (1993).
[58] B. A. Berg & T. Neuhaus, Phys. Lett. B 267, 249 (1991); Phys. Rev. Lett. 68, 9 (1992).
[59] A. D. Kennedy, review article, Nucl. Phys. B S30, 96 (1993).
[60] A. P. Lyubartsev, A. A. Martsinovski, S. V. Shevkunov & P. N. Vorontsov-Velyaminov,
J. Chem. Phys. 96, 1776 (1992).
[61] M. Abramowitz & I. A. Stegun, Handbook of Mathematical Functions, Dover, New York,
N. Y. (1970).
[62] N. F. Carnahan & K. E. Starling, J. Chem. Phys. 51, 419 (1969).
BIBLIOGRAPHY
286
[63] B. L. Holian, G. K. Straub, R. E. Swanson & D. C. Wallace, Phys. Rev. B 27, 2783 (1983).
[64] G. N. Patey & J. P. Valleau, Chem. Phys. Lett. 21, 297 (1973).
[65] G. N. Patey, G. M. Torrie & J. P. Valleau, J. Chem. Phys. 71, 96 (1979).
[66] W. G. Hoover, M. Ross, D. Henderson, J. A. Barker & B. C. Brown, J. Chem. Phys. 52,
4931 (1970).
[67] W. G. Hoover, S. G. Gray, & K. W. Johnson, J. Chem. Phys. 55, 1128 (1971).
[68] R. E. Swanson, G. K. Straub, B. L. Holian & D. C. Wallace, Phys. Rev. B 27, 2783 (1983).
[69] P. Bolhuis & D. Frenkel, Phys. Rev. Lett. 72, 2211 (1994).
[70] M. Hagen, E. J. Meijer, G. C. A. M. Mooij, D. Frenkel & H. N. W. Lekkerkerker, Nature
365, 425 (1993).
[71] D. Frenkel & A. J. C. Ladd, J. Chem. Phys. 81, 3188 (1984).
[72] W. H. Hoover & F. H. Ree, J. Chem. Phys. 47, 4873 (1967).
[73] W. H. Hoover & F. H. Ree, J. Chem. Phys. 49, 3609 (1968).
[74] J. P. Hansen & L. Verlet, Phys. Rev. 84, 151 (1969).
[75] K. K. Mon, Phys. Rev. B 39, 467 (1989).
[76] T. P. Straatsma, H. J. C. Berendsen & J. P. M. Postma, J. Chem. Phys. 85, 6720 (1986).
[77] D. A. Kofke, Mol. Phys. 78, 1331 (1993).
[78] McDonald & Singer, J. Chem. Phys. 47, 4766 (1967).
[79] McDonald & Singer, J. Chem. Phys. 50, 2303 (1969).
[80] J. P. Valleau & D. N. Card, J. Chem. Phys. 57, 5457 (1972).
[81] Z. Li & H. A. Scheraga, J. Phys. Chem. 92,2633 (1988).
[82] Z. Li & H. A. Scheraga, Chem. Phys. Lett. 154, 516 (1989).
[83] B. S. Whatson & K.-W. Chao, J. Chem. Phys. 96, 9046 (1992).
[84] C. H. Bennett, J. Comp. Phys. 22, 245 (1976).
BIBLIOGRAPHY
287
[85] S. D. Hong, B. J. Yoon & M. S. Jhon, Chem. Phys. Lett. 188, 299 (1992).
[86] K. K. Mon, Phys. Rev. Lett. 54, 2671 (1985).
[87] A. M. Ferrenberg & R. H. Swendsen, Phys. Rev. Lett. 61, 2635 (1988); Erratum: ibid 63,
1658 (1989).
[88] B. Widom J. Chem. Phys. 39, 2808 (1962).
[89] K. S. Shing & K. E. Gubbins, Mol. Phys. 46, 1109 (1982); K. S. Shing & K. E. Gubbins,
Mol. Phys. 49, 1121 (1983)
[90] S. K. Kumar J. Chem. Phys. 97, 3551 (1992).
[91] A. M. Ferrenberg & R. H. Swendsen, Phys. Rev. Lett. 63, 1195 (1989).
[92] A. M. Ferrenberg, in Computer Simulation Studies in Condensed Matter Physics III, ed.
D. P. Landau, K. K. Mon & H.-B. Schuttler, Springer-Verlag, Berlin, Heidelberg (1991).
[93] J. M. Rickman & S. R. Philpot, Phys. Rev. Lett. 66, 349 (1991).
[94] E. P. Munger & M. A. Novotny, Phys. Rev. B 43, 5773 (1991).
[95] G. M. Torrie & J. P. Valleau, Chem. Phys. Lett. 28, 578 (1974).
[96] L. D. Fosdick, Methods Comput. Phys. 1, 245 (1963).
[97] B. Hesselbo & R. B. Stinchcombe, Phys. Rev Lett. 74, 2151 (1995).
[98] M. Mezei, J. Comp. Phys. 68, 237 (1987).
[99] B. A. Berg, Int. J. Mod. Phys. C 4, 249 (1993).
[100] W. Janke, in Computer Simulations in Condensed Matter Physics VII, ed. D. P. Landau,
K. K. Mon & H.-B. Schuttler, Springer-Verlag, Berlin (1994).
[101] B. A. Berg, U. H. E. Hansmann & T. Neuhaus, Phys. Rev B 47, 497 (brief reports)
(1993).
[102] B. A. Berg, U. H. E. Hansmann & T. Neuhaus, Z. Phys. B 90, 229 (1993).
[103] A. Billoire, T. Neuhaus & B. A. Berg, Nucl. Phys. B 413, 795 (1994).
[104] W. Janke, B. A. Berg & M. Katoot, Nucl. Phys. B 382, 649 (1992).
BIBLIOGRAPHY
288
[105] W. Beirl, B.A. Berg, B. Krishnan, H. Markum & J. Reidler, Nulc. Phys. B S42, 707
(1995); B.A. Berg & B. Krishnan, Phys. Lett. B 318, 59 (1993).
[106] B. Grossman, M. L. Laursen, T. Trappenberg & U. J. Wiese, Phys. Lett. B 293, 175
(1992).
[107] B. Grossman & M. L. Laursen, Nucl. Phys. B 408, 637 (1993).
[108] B. A. Berg & T. Celik, Phys. Rev. Lett. 69, 2292 (1992); Int. J. Mod. Phys. C 3, 1251
(1992).
[109] B. A. Berg, T. Celik & U. H. E. Hansmann, Europhys. Lett. 22, 63 (1993).
[110] B. A. Berg, U. H. E. Hansmann & T. Celik, Nucl. Phys. B S42, 905 (1995).
[111] T. Celik, U. H. E. Hansmann & M. Katoot, J. Stat. Phys. 73, 775 (1993).
[112] B. A. Berg, Nature 361, 708 (1993).
[113] U. H. E. Hansmann & Y. Okamoto, J. Comp. Chem. 14, 1333 (1993).
[114] B. A. Berg, U. H. E. Hansmann & Y. Okamoto, J. Phys. Chem. 99, 2236 (1995);
U. H. E. Hansmann & Y. Okamoto, preprint ETH-IPS-95-06 NWU-1/95, March 1995.
[115] K. Rummukainen, Nucl. Phys. B 390, 621 (1993).
[116] W. Janke & T. Sauer, Phys. Rev. E 49 3475 (1994); Nucl. Phys. B S34, 771 (1994).
[117] W. Janke & T. Sauer, J. Stat. Phys. 78, 759 (1995).
[118] W. Janke & S. Kappler, Nucl. Phys. B S42, 876 (1995); Phys. Rev. Lett. 74 212 (1995).
[119] A. M. Ferrenberg, D. P. Landau & R. H. Swendsen, Phys. Rev. E, 51, 5092 (1995).
[120] Jooyoung Lee, Phys. Rev. Lett. 71, 211 (1993); Erratum ibid 71, 2352 (1993).
[121] N. B. Wilding, Phys. Rev. E 52, 602 (1995).
[122] B. A. Berg,
preprint,
available as paper 9503019 from archive at
http:://xxx.lanl.gov/hep-lat.
[123] A. P. Lyubartsev, A. Laarksonen & P. N. Vorontsov-Velyaminov, Mol. Phys. 82, 455
(1994).
BIBLIOGRAPHY
289
[124] E. Marinari & G. Parisi, Europhysics Lett. 19, 451 (1992).
[125] L. A. Fernandez, E. Marinari & J. J. Ruiz-Lorenzo, unpublished.
[126] G. Iori, E. Marinari & G. Parisi, Europhysics Lett. 25, 491 (1994), Int. J. Mod. Phys. C
4, 1333 (1993)
[127] A. Irback & F. Potthast, preprint LU TP 95-10.
[128] D. Bouzida, S. K. Kumar & R. H. Swendsen, Phys. Rev. A 45, 8894 (1992).
[129] E. Marinari, G. Parisi, J. Ruiz-Lorenzo & F. Retort, preprint, available as paper 9508036
from archive at http:://babbage.sissa.it/cond-mat, submitted to Phys. Rev. Lett.
[130] N. B. Wilding, M. Muller & K. Binder, J. Chem. Phys. 101, 4324 (1994).
[131] G. C. A. M. Mooij, D. Frenkel & B. Smit, J. Phys.: Cond. Mat. 4, L255 (1992).
[132] W. Kerler & P. Rehberg, Phys. Rev. E 50, 4220 (1994).
[133] W. Kerler, C. Rebbi & A. Weber, Nucl. Phys. B S42, 678 (1995), same
authors, preprint BUHEP-95-10, available as paper 9503021 from archive at
http:://xxx.lanl.gov/hep-lat.
[134] J. P. Valleau, J. Comp. Phys. 22, 193 (1991).
[135] J. P. Valleau, J. Chem. Phys. 95, 584 (1991).
[136] R. W. Gerling & A. Huller, Z. Phys. B 90, 207 (1993).
[137] M. Promberger & A. Huller, Z. Phys. B 97, 341 (1995).
[138] G. E. Norman & V. S. Filinov, High Temp. Res. USSR 7, 216 (1969).
[139] D. J. Adams, Mol. Phys. 29, 307 (1975).
[140] N. B. Wilding & A. D. Bruce, J. Phys. Condens. Matter 4, 3087 (1992).
[141] A. Z. Panagiotopoulos, Mol. Phys. 61, 813 (1987).
[142] A. Z. Panagiotopoulos, N. Quirke, M. Stapleton & D. J. Tildesley, Mol. Phys. 63, 527
(1988).
[143] B. Smit, P. De Smedt & D. Frenkel, Mol. Phys. 68, 931 (1989).
BIBLIOGRAPHY
290
[144] B. Smit, in Computer Simulation in Chemical Physics, ed. M. P. Allen & D. J. Tildesley,
Kluwer Academic Publishers, Dordrecht (1992).
[145] A. Z. Panagiotopoulos, Mol. Simulation 9, 1 (1992).
[146] S.-K. Ma, J. Stat. Phys. 26, 221 (1981).
[147] H. M. Huang, S.-K. Ma & Y. M. Shih, Solid State Communs. 51, 147 (1984).
[148] H. Meirovich, J. Phys. A 16, 831 (1983).
[149] A. G. Schlijper, A. R. D. Van Bergen & B. Smit, Phys. Rev. A 41, 1175 (1990).
[150] Kikuchi, Phys. Rev. 81 (1988), reprinted in Phase Transitions and Critical Phenomena
Vol. 2, ed. C. Domb & M. S. Green, Academic Press, London (1975).
[151] K. Binder, Z. Phys. B 45,61 (1981).
[152] J. M. Rickman & S. R. Philpot, J. Chem. Phys. 95, 7562 (1991).
[153] J. M. Rickman & D. J. Srolovitz, J. Chem. Phys. 99, 7993 (1993).
[154] G. Bhanot, S. Black, P. Carter & R. Salvador, Phys. Lett. B 183, 331 (1987); G. Bhanot,
R. Salvador, S. Black, P. Carter & R. Toral, Phys. Rev. Lett. 59, 803 (1987).
[155] K. M. Bitar, Nucl. Phys. B 300, 61 (1988).
[156] A. B. Bortz, M. H. Kalos & J. L. Lebowitz, J. Comp. Phys. 17, 10 (1975).
[157] M. A. Novotny, Phys. Rev. Lett. 74, 1 (1995); Erratum: ibid 75, 1424 (1995).
[158] G. L. Bretthorst, in Maximum Entropy and Bayesian Methods, ed. P. F. Fougere, Kluwer
Academic Publishers, Dordrecht (1992).
[159] T. J. Loredo, in Maximum Entropy and Bayesian Methods, ed. P. F. Fougere, Kluwer
Academic Publishers, Dordrecht (1992).
[160] T. Bayes, reprinted in Biometrika 45, 293 (1958).
[161] H. Jereys, The Theory of Probability, Clarendon Press, Oxford (1939); later editions
1948, 1961.
[162] E. S. Ristad, preprint CS-TR-495-95; available as paper 9508012 from archive at
http:://xxx.lanl.gov/cmp-lg.
BIBLIOGRAPHY
291
[163] J. J. Martin, Bayesian Decision Problems and Markov Chains, Wiley, New York (1967).
[164] R. A. Howard, Dynamic Probabilistic Systems Vol. 1: Markov Models, Wiley, New York
(1971).
[165] M. Krajci & J. Hafner, Phys. Rev. Lett. 74, 5100 (1995).
[166] D. Nicolaides & A. D. Bruce, J. Phys. A 21, 233 (1988).
[167] A. D. Bruce, submitted to J. Phys. E (1995).
[168] R. Hilfer, Z. Phys. B 96, 63 (1994).
[169] A. Aharony & M. E. Fisher, Phys. Rev. B 27, 4394 (1983).
[170] E. W. Montroll, in Proc. Symp. Applied Maths. 16, 193 (1964).
[171] P. Bolhuis, M. Hagen & D. Frenkel, Phys. Rev. E 50, 4880 (1994).
[172] C. F. Tejero, A. Daanoun, J. N. W. Lekkerkerker & M. Baus, Phys. Rev. Lett. 73, 752
(1994).
[173] A. Daanoun, C. F. Tejero & M. Baus, Phys. Rev. E 50, 2913 (1994).
[174] C. F. Tejero, A. Daanoun, J. N. W. Lekkerkerker & M. Baus, Phys. Rev. E 51, 558 (1995).
[175] P. N. Pusey, in Les Houches, Session LI, 1989: Liquides, Cristallisation et Transition
Vitreuse/ Liquids, Freezing and Glass Transition, ed. J. P. Hansen, D. Levesque & J. ZinnJustin, Elsevier Science Publishers, B. V. (1992).
[176] S. Asakura & F. Oosawa, J. Polymer. Sci. 33, 183 (1958).
[177] D. A. Young, Phase Diagrams of the Elements, University of California Press (1991).
[178] P. R. Sperry, J. Coll. Interface Sci. 99, 97 (1984).
[179] J. N. W. Lekkerkerker, W. C.-K. Poon, P. N. Pusey, A. Stroobants & P. B. Warren,
Europhysics Lett. 20, 559 (1992).
[180] M. Hagen and D. Frenkel, J. Chem. Phys. 101, 4093 (1994).
[181] R. Hall, J. Chem. Phys. 57, 2252 (1972).
[182] C. Borgs & R. Kotecky, J. Stat. Phys. 61, 79 (1990).
BIBLIOGRAPHY
292
[183] C. Borgs & W. Janke, Phys. Rev. Lett. 68, 1738 (1992).
[184] A. R. Ubbelohde, The Molten State of Matter, Wiley, New York (1978).
[185] M. N. Barber, in: Phase Transitions and Critical Phenomena, Vol. 8, p. 145, ed. C. Domb
& J. L. Lebowitz, Academic Press, New York (1983), and references therein.
[186] V. Privman & J. Rudnick, J. Stat. Phys. 60, 551 (1990).
[187] V. Privman & M. E. Fisher, Phys. Rev. B 30, 322 (1984).
[188] H. Cramer, The Elements of Probability Theory, Wiley, New York (1955).
[189] M. Kikuchi, N. Ito & Y. Okabe, in Computer Simulations in Condensed Matter Physics
VII, ed. D. P. Landau, K. K. Mon & H.-B. Schuttler, Springer-Verlag, Berlin (1994).
[190] H. L. Grey & W. R. Schucany, The Generalized Jackknife Statistic, New York M. Dekker
(1972).
[191] B. A. Berg, Comp. Phys. Communs. 69, 7 (1992).