Download • Phase Space: a concept of classical Statistical Mechanics • Each

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Phase Space
• Phase Space: a concept of classical Statistical Mechanics
• Each Phase Space dimension corresponds to a particle degree of freedom
• 3 dimensions correspond to Position in (real) space: x, y, z
• 3 dimensions correspond to Momentum: px, py , pz
(or Energy and direction: E, θ, φ)
• More dimensions may be envisaged, corresponding to other possible
degrees of freedom, such as quantum numbers: spin, etc.
• Each particle is represented by a point in phase space
• Time can also be considered as a coordinate, or it can be considered
as an independent variable: the variation of the other phase space coordinates as a function of time (the trajectory of a phase space point)
constitutes a particle “history”
FLUKA course, Pavia, March 2006
1
The Boltzmann Equation
• All particle transport calculations are (explicit or implicit) attempts to
solve the Boltzmann Equation
• It is a balance equation in phase space: at any phase-space-point, the
increment of particle phase-space-density is equal to the sum of all
“production terms” minus a sum of all “destruction terms”
• Production: Sources, “Inscattering”, Particle Production, Decay
• Destruction: Absorption, “Outscattering”, Decay
• We can look for solutions of different type: at a number of (real
or phase) space points, averages over (real or phase) space regions,
projected on selected phase space hyperplanes, stationary or timedependent
FLUKA course, Pavia, March 2006
2
Mean of a distribution (I)
• In one dimension:
Given a variable x, distributed according to a function f (x), the mean or average of
another function of the same variable A(x) over an interval [a, b] is given by:
Ā =
Z
b
a A(x) f (x) dx
Z
b
a f (x) dx
Or, introducing the normalized distribution f 0 :
f (x)
f (x) = Z b
a f (x) dx
0
Ā =
Z
b
0
A(x)
f
(x) dx
a
A particular case is that of A(x) = x: x̄ =
Z
b
a
x f 0(x) dx
FLUKA course, Pavia, March 2006
3
Mean of a distribution (II)
• In several dimensions:
Given n variables x, y, z,. . . , distributed according to the (normalized) functions
f 0 (x), g 0 (y), h0(z). . . , the mean or average of a function of those variables
A(x, y, z . . .) over an n-dimensional domain D is given by:
Ā =
Z
Z
x y
Z
Z
0
0
0
·
·
·
A(x,
y,
z,
.
.
.)
f
(x)
g
(y)
h
(z) . . . dx dy dz . . .
z
Often impossible to calculate with traditional methods, but we can sample N values of
A with probability f 0 · g 0 · h0 · · · and divide the sum of the sampled values by N :
Central Limit Theorem:
lim
N →∞
N
X
1
A(x, y, z, . . .)f 0(x) g 0(y) h0(z) . . .
N
= Ā
Each term of the sum is distributed like A (Analog Monte Carlo). Integration, but also simulation!
FLUKA course, Pavia, March 2006
4
Analog Monte Carlo
• In an analog Monte Carlo calculation (“honest” simulation), not only the
mean of the contributions converges to the mean of the real distribution,
but also the variance and all moments of higher order:

lim
N →∞









N
X
1
(x − x̄)
N
1
n  n







= σn
and fluctuations and correlations are faithfully reproduced.
FLUKA course, Pavia, March 2006
5
Integration efficiency
• Traditional numerical integration methods (Simpson, etc.) converge to
1
the true value as N − n , where N = number of “points” (intervals) and
n = number of dimensions
• Monte Carlo converges instead as √1N
Number
of dimensions
Traditional
methods
Monte Carlo
n=1
1
N
√1
N
1
√
n
N
√1
N
√1
N
√1
N
n=2
n>2
FLUKA course, Pavia, March 2006
Remark
MC not convenient
about equivalent
MC converges faster
6
Sampling from a distribution
Sampling from a uniform distribution
• All computers provide a pseudo-random number generator (or even several of them).
In most computer languages (e.g., Fortran 90, C) a PRNG is even available as an
intrinsic routine
• All such PRNGs return a sequence of numbers uniformly distributed between 0 and 1.
Generally 0 is included and 1 is not. It is easy to change from ξ ∈ [0, 1[ to ξ ∈]0, 1]
by taking ξ 0 = 1 − ξ (for instance to avoid an overflow due to division by ξ)
• Uniform and Random are independent concepts. For MC, sometimes uniformity is
more important than randomness (but not if correlations are important!)
• But it is important also that sampled histories be uncorrelated with each other
• A simple example of sampling from a uniform distribution: a linear source along the
z axis, between z1 and z2:
Z = z1 + (z2 − z1) ξ
FLUKA course, Pavia, March 2006
7
Sampling from a distribution
Sampling from a generic distribution
• Using one random number
• Integrate the distribution function, analytically or numerically, and normalize to 1 to
obtain the normalized cumulative distribution:
Z
F (x) =
Z
x
xmin
xmax
xmin
f (x) dx
f (x) dx
• Sample a uniform pseudo-random number ξ
• Get the desired result by finding the inverse value X = F −1(ξ) , analytically or most
often by interpolation (table look-up)
• Using two random numbers (rejection technique)
• Choose a constant value C > f (x) for any x
• Sample two random numbers ξ1 and ξ2
• If ξ2 < f (ξ1)/C, X = ξ1 , otherwise re-sample ξ1, ξ2
FLUKA course, Pavia, March 2006
8
Analog vs. Biased
Analog Monte Carlo
• samples from actual physical phase space distributions
• predicts average quantities and all statistical moments of any order
• preserves correlations (provided the physics is correct, of course!)
• reproduces fluctuations (provided. . . see above)
• is (almost) safe and (sometimes) can be used as a “black box”
But:
• is inefficient and converges very slowly
• fails to predict important contributions due to rare events
FLUKA course, Pavia, March 2006
9
Analog vs. Biased
Biased Monte Carlo
• samples from artificial distributions, and applies a weight to the particles to correct for the bias
• predicts average quantities, but not the higher moments (on the contrary, its goal is to
minimize the second moment!)
• same mean with smaller variance =⇒ faster convergence
• allows sometimes to obtain acceptable statistics where an analog Monte Carlo would take years
of CPU time to converge
But:
• cannot reproduce correlations and fluctuations
• with a few exceptions, requires physical judgment, experience and a good understanding of the
problem (it is not a black box!)
• in general, a user does not get the definitive result after the first run, but needs to do a series of
test runs in order to optimize the biasing parameters
• =⇒ balance between user’s time and CPU time
FLUKA course, Pavia, March 2006
10
Reduce variance or CPU time?
Computer cost
A Figure of Merit:
Computer cost of an estimator = σ 2 × t
(σ 2 = Variance, t = CPU time per primary particle)
• some biasing techniques are aiming at reducing σ 2 , others at reducing t
• often reducing σ 2 increases t, and viceversa
• therefore, minimizing σ 2 × t means to reduce σ at a faster rate than t increases,
or viceversa
• =⇒ the choice depends on the problem, and sometimes a combination of several
techniques is most effective
• bad judgement, or excessive “forcing” on one of the two variables, can have
catastrophic consequences on the other one, making computer cost “explode”
FLUKA course, Pavia, March 2006
11
Two different worlds (I)
Physics codes
• In general, physicists are more interested in single events and in
correlations within a same event than in averages over thousands of
events
• therefore, Monte Carlo codes used in physics research have been
mostly fully analog so far
• but there are cases (punchthrough, detector background studies, very
high energy showers with large multiplicities) where an analog
simulation can require prohibitive computer times
FLUKA course, Pavia, March 2006
12
Two different worlds (II)
Engineering codes
• engineers and applied physicists working in the nuclear energy field
need to simulate endless neutron histories in scattering materials
• all the low energy neutron transport codes have always used variance
reduction techniques, starting with the first code of Fermi and von
Neumann
• but some studies of nuclear physics detectors would benefit from
simulations wich could predict coincidences and anticoincidences
FLUKA course, Pavia, March 2006
13
Two different worlds (III)
Simulation vs. Integration
• In high-energy physics and in medical physics, people are used to think about Monte Carlo in
terms of simulation:
since all moments of phase space distributions are faithfully reproduced by analog Monte Carlo,
results look “like real life”
• For nuclear engineers, Monte Carlo is a technique to integrate the Boltzmann equation:
what matters is the expectation value of a finite number of quantities of interest, which is
obtained by integrating over phase space (actually integration takes place over the individual
particle paths in phase space, i.e., the histories =⇒ ergodic theorem)
• It is important to realize that the statistical weighting techniques used in biased Monte Carlo are
mathematically rigorous:
for N → ∞, all calculated averages tend to the corresponding physical expectation values
(although the speed of convergence may be very different in different regions of phase space)
Biasing is neither tuning nor tricks, it’s sound applied mathematics !!!
FLUKA course, Pavia, March 2006
14
Importance biasing
(FLUKA option: BIASING)
• Importance biasing combines two techniques:
• Surface Splitting (also called Geometry Splitting), which reduces
σ but increases t
• Russian Roulette (which does the opposite)
• it is the simplest, most “safe” and easiest to use of all biasing
techniques
• the user assigns a relative importance to each geometry region (the
actual absolute value doesn’t matter), based on:
1. expected fluence attenuation with respect to other regions
2. probability of contribution to score by particles entering the region
FLUKA course, Pavia, March 2006
15
Importance biasing
Surface Splitting
• If a particle crosses a region boundary, coming from a region of
importance I1 and entering a region of higher importance I2 > I1 :
• the particle is replaced on average by n = I2/I1 identical
particles with the same characteristics
• the weight of each “daughter” is multiplied by I1/I2
• If I2/I1 is too large, excessive splitting may occur with codes which
do not provide an appropriate protection
• An internal limit in FLUKA prevents excessive splitting if I2/I1 is
too large (> 5), a problem found in many biased codes.
FLUKA course, Pavia, March 2006
16
Importance biasing
Russian Roulette
• RR acts in the opposite direction: if a particle crosses a boundary
from a region of importance I1 to one of lower importance I2 < I1,
the particle is submitted to a random survival test.
• with a chance I2/I1 the particle survives with its weight increased
by a factor I1/I2
• with a chance (1 − I2/I1) the particle is killed
Importance biasing is commonly used to maintain a constant particle
population, compensating for attenuation due to absorption or distance.
In FLUKA it can be tuned per type of particle
FLUKA course, Pavia, March 2006
17
Importance biasing
Note: In FLUKA, for technical reasons, importances are internally stored
as integers. Therefore:
• importances can only take values between 0.0001 and 100000.
• an input value 0.00015 is read as 0.0001, 0.00234 is read as 0.0023,
etc.
There is also a user routine USIMBS which allows to assign importances
not only at boundaries, but at each step, according to any logic desired
by the user (as a function of position, direction, energy. . . ).
Very powerful, but very time-consuming (it is called at each step!). The
user must balance the time gained by biasing with that wasted by calls.
FLUKA course, Pavia, March 2006
18
Importance biasing: problems
• Although importance biasing is relatively easy and safe to use, there are a few cases
where caution is recommended
+-----+---------------------------+
|
|
I=8
D |
|
+---------------------------+
|
|
I=4
C |
| I=? +---------------------------+
|
|
I=2
B |
|
+---------------------------+
|
E |
I=1
A |
+---------------------------------+
Which importance shall we assign to region E?
Whatever value we choose, we will get inefficient
splitting/RR at some of the boundaries
• Another case is that of splitting in vacuum (or air). Splitting daughters are strongly
correlated: it must be made sure that their further histories are differentiated enough
to “forget” their correlation. (Difficult to do in ducts and mazes)
• The above applies in part also to muons (the differentiation provided by multiple
scattering and by dE
dx fluctuations is not always sufficient)
FLUKA course, Pavia, March 2006
19
Weight Window (I)
(FLUKA options: WW-FACTOr, WW-THRESh, WW-PROFIle)
• the WW technique too is a combination of splitting and RR, but it is based on the
absolute value of the weight of each individual particle, rather than on relative
region importance
• the user sets an upper and a lower weight limit, generally as a function of region,
energy, and particle
• particles having a weight larger than the upper limit are splitted, those with weight
smaller than the lower limit are submitted to RR =⇒ killed or put back “inside the
window”
• WW is a more powerful biasing tool than Importance Biasing, but it requires also
more experience and patience to set it up correctly.
“It is more an art than a science” (Quote from the MCNP manual)
• use of the WW is essential whenever other biasing techniques generate large weight
fluctuations in a given phase space region.
FLUKA course, Pavia, March 2006
20
Weight Window (II)
• Killing a particle with a very low weight (with respect to the average for a given
phase space region) decreases t but has very little effect on the score (and therefore
on σ)
• Splitting a particle with a very large weight
• increases t (in proportion to the number of additional particles to be followed)
• but at the same time reduces σ by avoiding large fluctuations in the
contributions to scoring
• The global effect is to reduce σ 2 t
• A too wide window is of course ineffective, but also too narrow windows should be
avoided. Otherwise, too much CPU time would be spent in repeated splitting/RR.
A typical ratio between the upper and the lower edge of the window is about 10. It
is also possible to do RR without splitting (setting the upper window edge = ∞) or
splitting without RR (setting the lower edge = 0)
FLUKA course, Pavia, March 2006
21
The FLUKA Weight Window
FLUKA course, Pavia, March 2006
22
Leading Particle Biasing (I)
(FLUKA option: EMF-BIAS)
• Leading particle biasing is available only for e+, e−and photons
• it is generally used to avoid the geometrical increase with energy of
the number of particles in an electromagnetic shower
• it is characteristic of EM interactions that
2 particles are present in the final state (at least in the approximation
made by most MC codes)
• if LPB is activated, only one is randomly retained and its weight is
adjusted so as to conserve weight × probability
• the most energetic of the two particles is kept with higher probability
(as the one which is more efficient in propagating the shower)
The FLUKA implementation, unlike that of other codes, takes into account the
indirectly enhanced penetration potential of positrons due to the emission of
annihilation photons
FLUKA course, Pavia, March 2006
23
Leading Particle Biasing (II)
• LPB is very effective at reducing t, but increases σ by introducing
large weight fluctuations (a few low energy particles carry a large
weight, while many energetic ones have a small weight)
• therefore, it should nearly always be backed up by WW
• Very useful for shielding calculations at both electron and proton
accelerators!
In FLUKA, LPB can be finely tuned by region, by physical effect
(Compton, bremsstrahlung, etc.) and by particle energy (LPB played
only below a given energy threshold)
FLUKA course, Pavia, March 2006
24
Multiplicity tuning
(FLUKA option: BIASING)
• Multiplicity tuning is unique to FLUKA (or it was until not long ago. . . ), and it is
meant to be for hadrons what LPB is for electrons and photons
• a hadronic nuclear interaction at LHC energies can end in hundreds of secondaries
⇒ to simulate a whole hadronic cascade in bulk matter may take a lot of CPU time
• except for the leading particle, many secondaries are of the same type and have
similar energies and other characteristics
• therefore, it is possible to discard a predetermined average fraction of them,
provided the weight of those which are kept and transported be adjusted so that
total weight is conserved (but the leading particle must not be discarded)
• the user can tune the average multiplicity in different regions of space by setting a
region-dependent reduction factor (in fact it can even be > 1! But this
possibility is seldom used)
FLUKA course, Pavia, March 2006
25
Biased downscattering
Only for experts!
(FLUKA option: LOW-DOWN)
• Especially suited to the multi-group treatment of low-energy neutron transport
• the group-to-group transfer probabilities used in low-energy neutron transport (the
so-called downscattering matrix) can be biased in order to artificially accelerate or
slow down the moderation process
• very difficult to use: the user must be able to visualize the neutron path in phase
space and especially the path projection on the energy axis
• but very efficient for medium-heavy materials when thermal or epithermal neutrons
play an important role
• in inexpert hands it can give results which only apparently converge fast, with large
rare spikes over what appears as a smooth, statistically sound background
(uniform undersampling, i.e., systematic sampling in the tail of a
distribution, gives the illusion of a small variance! )
FLUKA course, Pavia, March 2006
26
Non-analog absorption (I)
Another biasing technique for low energy neutron transport
(FLUKA option: LOW-BIAS)
• also called survival biasing
• implemented in most low-energy neutron transport codes, where the
user must choose between two options:
• at each neutron collision either analog scattering or absorption
according to the actual physical probability σs/σT and
(1 − σs/σT ), respectively
• systematic survival with a weight reduced by a factor σs/σT
• in FLUKA there is also a third possible choice: the user can force the
neutron absorption probability to take an arbitrary value, preassigned
on a region-by-region basis and as a function of energy
FLUKA course, Pavia, March 2006
27
Non-analog absorption (II)
When and how to use it?
• a small survival probability is often assigned to thermal neutrons to
limit the number of scatterings in non-absorbing media
• also very useful in materials with unusual scattering properties (e.g.,
iron)
• survival probabilities too small with respect to the physical one
σs/σT may introduce large weight fluctuations due to the very
different number of collisions suffered by individual neutrons: in these
cases, a Weight Window should be applied (never set the absorption
probability more than few times larger than the physical one!!)
FLUKA course, Pavia, March 2006
28
Biasing mean free paths
Decay length
(FLUKA option: LAM-BIAS)
• the mean life/average decay length of unstable particles can be
artificially shortened
• it is also possible to increase the generation rate of decay products
without the parent particle actually disappearing
• typically used to enhance statistics of muon or neutrino production
• the kinematics of decay can also be biased (decay angle)
FLUKA course, Pavia, March 2006
29
Biasing mean free paths
Interaction length
(FLUKA option: LAM-BIAS)
• in a similar way, the hadron or photon
mean free path for nuclear inelastic interactions can be artificially
decreased by a predefined particle or material dependent factor. (Much
more sound than the “forced interaction” implemented in some old codes!)
• this option is useful for instance to increase the probability of beam
interaction in a very thin target or in a material of very low density
• also necessary to simulate photonuclear reactions with acceptable
statistics, the photonuclear cross section being much smaller than
that for EM processes
FLUKA course, Pavia, March 2006
30
User-written biasing (I)
Do it yourself. . .
(FLUKA user routines: source, ubsset)
• the user routine ubsset.f allows to generate many biasing
parameters from simple algorithms instead of entering each input
value by hand
• the user can also prepare a biased source
• the case most often encountered is that of a biased source energy
spectrum, in which some energies are preferentially sampled
• but it is also common to bias the source spatial and/or
angular distribution
• of course, in such cases it is the user’s responsibility to ensure that
weights are adjusted in a statistically correct way
FLUKA course, Pavia, March 2006
31
User-written biasing (II)
Do it yourself. . .
Some tools available for user code:
• The FLUKA PRNG is provided by function FLRNDM(DUMMY). It returns a double
precision random number uniformly distributed in the [0, 1[ interval
• Subroutine SFECFE(SFE,CFE) returns a random azimuthal sine and cosine
• Subroutine RACO(TXX,TYY,TZZ) returns the cosines of a vector of random
orientation
• Subroutine SFLOOD(XXX,YYY,ZZZ,UXX,VXX,WXX) returns a random position and
direction on the surface of a sphere of unit radius (generating a uniform and
isotropic field inside the sphere)
• Subroutine FLNRRN(RGAUSS) returns a random number normally distributed with
unit variance
FLUKA course, Pavia, March 2006
32
User-written biasing (III)
Sampling from a biased distribution
• The sampling technique based on the cumulative distribution can be extended to modify the probability of sampling in different parts of the interval (importance sampling)
• We replace f (x) by a function g(x) = f (x) h(x), where h(x) is any appropriate weight function
• We normalize g(x) in the same way as f (x) in the normal sampling
G(x) =
Z x
g(x) dx
Z x
max
g(x) dx
xmin
xmin
=
Z x
xmin
f (x) h(x) dx
B
and we need also the integral of f (x) over the whole interval:
A=
Z x
max
xmin
f (x) dx
• Now sample from the biased cumulative normalized distribution G instead of the original unbiased
F : take a uniform PRNG ξ, and get the sampled value X by inverting G(x): X = G −1 (ξ)
B 1
• The particle is assigned a weight A h(X)
FLUKA course, Pavia, March 2006
33
User-written biasing (IV)
Sampling from a biased distribution: a simple example
• Uniform linear source along z axis, between z1 and z2
• Non biased was:
f (z) = 1
A = z 2 − z1
F =
Z z
z1
dz
A
Z = z1 + A ξ = z1 + (z2 − z1 ) ξ
= z − z1
• We decide to bias with a linear function, in order to sample preferentially larger values of z:
h(z) = z
g(z) = h(z) f (z) = z
B=
z22
−
2
z12
G(z) =
2
q
Z = z12 + ξ (z22 − z12 )
Z z
z1
z dz
B
=
z 2 −z12
2
z22 −z12
2
2
z2 −z1
1
B 1
z2 + z1
2
W =
=
=
A h(X) z2 − z1 Z
2Z
FLUKA course, Pavia, March 2006
34
User-written biasing (V)
Uniform sampling from a steep distribution
• A special case of importance sampling is when the biasing function chosen is the inverse of the
unbiased distribution function:
h(x) =
G(x) =
1
f (x)
g(x) = f (x) h(x) = 1
Z x
g(x) dx
Z x
max
g(x) dx
xmin
xmin
=
x − xmin
xmax − xmin
• In this case, X = G−1(ξ) = xmin + ξ (xmax − xmin)
and the sampled particle is assigned a weight f (X)
xmax −xmin
A
• Because X is sampled with the same probability over all possible values of x, independently of the
value f (X) of the function, this technique is used to ensure that sampling is done uniformly over
the whole interval, even though f (x) might have very small values somewhere
• For instance it may be important to avoid undersampling in the high-energy tail of a spectrum,
steeply falling with energy but more penetrating, such as that of cosmic rays or synchrotron radiation
FLUKA course, Pavia, March 2006
35