Download “Statistics 102” for Multisource-Multitarget Detection and Tracking Ronald Mahler

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
100
“Statistics 102” for Multisource-Multitarget
Detection and Tracking
Ronald Mahler
Abstract— This tutorial paper summarizes the motivations,
concepts and techniques of finite-set statistics (FISST), a systemlevel, “top-down,” direct generalization of ordinary single-sensor,
single-target engineering statistics to the realm of multisensor,
multitarget detection and tracking. Finite-set statistics provides
powerful new conceptual and computational methods for dealing
with multisensor-multitarget detection and tracking problems.
The paper describes how “multitarget integro-differential calculus” is used to extend conventional single-sensor, single-target
formal Bayesian motion and measurement modeling to general
tracking problems. Given such models, the paper describes
the Bayes-optimal approach to multisensor-multitarget detection
and tracking: the multisensor-multitarget recursive Bayes filter.
Finally, it describes how multitarget calculus is used to derive
principled statistical approximations of this optimal filter, such
as PHD filters, CPHD filters, and multi-Bernoulli filters.
Index Terms— multitarget tracking, multitarget detection, data
fusion, finite-set statistics, FISST, random sets.
I. I NTRODUCTION
This paper is a sequel to, and update of, a tutorial published
in 2004 for the IEEE Aerospace and Electronic Systems
Magazine [21]. That tutorial described some central ideas of
finite-set statistics (FISST). Finite-set statistics is a systematic,
unified approach to multisensor-multitarget detection, tracking,
and information fusion. It has been the subject of considerable
worldwide research interest during the last decade, including
more than 600 research publications by researchers in more
than a dozen nations. I attribute this interest to the fact that
finite-set statistics:
•
•
•
•
is based on explicit, comprehensive, unified statistical
models of multisensor-multitarget systems;
unifies two disparate goals of multitarget tracking—target
detection and state-estimation—into a single, seamless,
Bayes-optimal procedure;
results in new multitarget tracking algorithms—PHD filters, CPHD filters, multi-Bernoulli filters, etc.—that do
not require measurement-to-track association, while still
achieving tracking performance (localization accuracy,
speed) comparable to or better than conventional multitarget tracking algorithms;
results in promising generalized CPHD and multiBernoulli filters that can operate in unknown, dynamically
changing clutter and detection backgrounds; and
Copyright (c) 2013 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to [email protected].
R. Mahler is with Lockheed Martin Advanced Techology Laboratories, Eagan,
MN. E-mail: [email protected].
more generally, has been a fertile source of fundamentally
new approaches in multisource-multitarget tracking and
information fusion.
The emphasis of the earlier tutorial was on the answers to
three questions:
1
• How does one Bayes-optimally detect and track multiple
noncooperative targets using multiple, imperfect sensors?
• How does one correctly model multisensor-multitarget
systems so that Bayes-optimality is possible?
• How does one accomplish this using a “Statistics 101”like formalism that is specifically designed for solving
multitarget tracking and data fusion problems?
The answer to the first question—the multisourcemultitarget Bayes recursive filter—is computationally intractable in all but the simplest problems. The answers to the
second and third questions—multitarget formal Bayes modeling and multitarget integro-differential calculus, respectively—
were addressed only at a very high level. Thus this paper
begins where the previous one left off, with emphasis on
answers to the following, consequent questions:
• How does one actually construct faithful Bayesian models
of multisensor-multitarget systems?
• How does one approximate the optimal multisourcemultitarget Bayes recursive filter in a principled statistical
manner—meaning that the underlying models and their
relationships are preserved as faithfully as possible?
• What mathematical machinery—what specific “multitarget Statistics 101” methodology—makes this possible?
The earlier tutorial paper was written at a very elementary
level (it was presumed that even Bayes’ rule might be an
unfamiliar concept). This paper continues along the same
path, but does presume some basic knowledge. This includes undergraduate probability and calculus, motion and
measurement models, probability density functions, likelihood
functions, Markov transition densities, and so on.
It should also be emphasized that the paper is a tutorial
introduction to, not a survey of, finite-set statistics. It includes
pointers to the some of the most significant developments, but
these are by no means exhaustive.
The paper is organized as follows. Section II describes the
engineering philosophy that motivates finite set statistics. Section III presents a review of “Statistics 101” for single-sensor,
single-target systems, focusing on the single-sensor, singletarget recursive Bayes filter. Section IV summarizes its generalization to “multisensor-multitarget Statistics 101,” focusing
•
1 By “Bayes-optimal,” I mean that target state(s) are determined by a state
estimator, applied to the posterior distribution of a Bayes filter, that minimizes
the Bayes risk corresponding to some cost function (see [20], p. 63).
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
on motion and measurement modeling and the multisensormultitarget recursive Bayes filter. Section V provides an
overview of the primary approximations of this filter, the PHD,
CPHD, and multi-Bernoulli filters. Section VI summarizes the
principled statistical approximation methodology that leads to
these filters. Conclusions can be found in Section VII.
II. T HE P HILOSOPHY OF F INITE -S ET S TATISTICS
Multisensor, multitarget systems introduce a major complication that is absent from single-sensor, single-target problems: they are comprised of randomly varying numbers of
randomly varying objects of various kinds. These include
varying numbers of targets; varying numbers of sensors with
varying number of sensor measurements collected by each
sensor; and varying numbers of sensor-carrying platforms. A
rigorous mathematical foundation for stochastic multiobject
problems—point process theory [4], [37]—has been in existence for a half-century. However, this theory has traditionally
been formulated with the requirements of mathematicians
rather than tracking and data fusion engineers in mind. The
formulation usually preferred by mathematicians—random
counting-measures—is inherently abstract and complex (especially in regard to probabilistic foundations) and not easily
assimilable with engineering physical intuition.
A. Motivations and Objectives
The fundamental motivation for finite-set statistics is:
• tracking and information fusion R&D engineers should
not have to be virtuoso experts in point process theory in
order to produce meaningful engineering innovations.
As was emphasized in [21], engineering statistics is a tool
and not an end in itself. It must have two qualities:
• Trustworthiness: Constructed upon a systematic, reliable
mathematical foundation, to which we can appeal when
the going gets rough.
• Fire and forget: This foundation can be safely neglected
in most situations, leaving a serviceable mathematical
machinery in its place.
These two qualities are inherently in tension. If foundations
are so mathematically complex that they cannot be taken
for granted in most engineering situations, then they are
shackles and not foundations. But if they are so simple that
they repeatedly result in engineering blunders, then they are
simplistic rather than simple.
This inherent gap between trustworthiness and engineering
pragmatism is what finite-set statistics attempts to bridge. Four
objectives are paramount:
• Directly generalize familiar single-sensor, single-target
Bayesian “Statistics 101” concepts to the multisourcemultitarget realm.
• Avoid all avoidable abstractions.
• As much as possible, replace theorem-proving with “mechanical,” “turn-the-crank,” purely algebraic procedures.
• Nevertheless retain all mathematical power necessary for
effective engineering problem-solving.
101
B. Overview
It is worthwhile to begin by first comparing the FISST
“random finite set” (RFS) paradigm with the ubiquitous conventional paradigm: report-to-track association (MTA).
1) The “Standard” Measurement Model : The most familiar tracking algorithms presume the following measurement
model, one that has its origins in radar tracking. A radar
amplitude-signature for a given range-bin , azimuth , and
elevation  is subjected to a thresholding procedure such as
CFAR (constant false alarm rate). If the amplitude exceeds
the threshold, there are two possible reasons for the existence
of this “blip.” First, it was caused by an actual target—in
which case a “target detection” has occurred at z = (  ).
Second, it was caused by a momentary surge of background
noise—in which case a “false detection” or “false alarm” has
occurred at z. A third possibility—that a target was present
but was not detected—is referred to as a “missed detection.”
For target-generated detections, the “small target” model is
presumed. Targets are distant enough (relative to the radar’s
resolution capability) that a single target generates a single
detection. But they are also near enough that only a single
target is responsible for any detection.
2) Measurement-to-Track Association (MTA): Because of
the small-target assumption, a bottom-up, “divide and conquer” strategy can be applied to the multitarget detection and tracking problem ([20]. pp. 321-335). Suppose
that, at time  , we are in posession of  “tracks”
|
|
|
|
(1  x1  1 )  (  x   )—i.e.,  possible targets

where, for the  track, x is its state (position, velocity),
 its error covariance matrix, and  its “track label.” The
|
Gaussian distribution  (x) =  | (x − x ) is the “track

density” of the  track.
Next, suppose that at time +1 we collect  detections
 = {z1   z }. Typically,    because of false alarms. The prediction step of an extended
Kalman filter (EKF) is used to construct predicted tracks
+1|
+1|
+1|
+1|
 1
)  (  x
 
). We can then
(1  x1
construct the following hypothesis : for each , the target
+1|
+1|
 
) generated the detection z () ; or, alter(  x
natively, generated no detection. The excess measurements
{z1   z }−{z (1)   z () } are interpreted as false alarms
or as having been generated by previously undetected targets.
The hypothesis  is a MTA. Taking all possibilities
+1|+1
+1|+1
  
into account, we end up with a list 1
+1|+1
of MTAs. For each 
, we can apply the update
step of an EKF to use z () to construct a revised track
+1|+1
+1|+1
( ()  x ()   () ).
Multi-hypothesis trackers (MHTs) are currently the dominant tracking algorithms based on MTA.
3) Association-Free Multitarget Detection and Tracking
: In contrast to MTA, FISST employs a top-down paradigm
grounded in point process theory—specifically, in the theory
of random finite sets (RFS’s). In place of the hypothesis-list
+1|+1
+1|+1
  
, one has a probability distribution
1
()
| (| ) on the finite-set variable  = {x1   x }
with  ≥ 0, where  () : 1    is the time-history
of measurement-sets at time  . Instead of the standard
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
measurement model just described, one constructs from it a
multitarget likelihood function  () = +1 (|). The
value  () is the likelihood that a measurement-set 
will be generated, if targets with state-set  are present.
Given this, a multitarget version of the recursive Bayes filter
(Section IV-C) is applied instead of the MTA procedure. Since
this Bayes filter will in general be computationally intractable,
it must be approximated, resulting in the PHD, CPHD, multiBernoulli and other filters (Section V).
The following point should be emphasized: RFS algorithms
are capable of “true tracking.” It is often, to the contrary,
asserted that RFS algorithms are inherently incapable of constucting time-sequences of labeled tracks, because finite sets
are order-independent. This is not the case. As was explained
in [20], pp. 505-508 target states have, in general, the form
x = ( x), where  is a identifying label unique to each
track. Given this, the multitarget Bayes filter—as well as any
RFS approximation of it, including PHD and CPHD filters—
can maintain temporally-connected tracks.
In particular, Vo and Vo have used this approach to devise
an exact, closed-form, computationally tractable solution to
the multitarget recursive Bayes filter [42], [43]. Because
the solution is exact, this filter’s track-management scheme
is provably Bayes-optimal.
4) “Nonstandard” Measurement Models: However ubiquitous the standard model may be, it is actually an
approximation—the result of applying a detection process to a
sensor signature. RFS models and filters are being developed
for “nonstandard” sensor sources that supply “raw” signature
information. Perhaps the two most interesting instances are:
•
•
RFS models and multi-Bernoulli track-before-detect
(TBD) filters for pixelized image data [11], [12]. These
filters have been shown to outperform the previously-best
TBD filter, the histogram-PMHT filter.
RFS models and CPHD filters for superpositional sensors
[24], [35], [47]. These filters have been shown to
significantly outperform a conventional MCMC (Markov
chain Monte Carlo) approach, while also being much
faster.
III. S INGLE -S ENSOR , - TARGET S YSTEMS
The purpose of this section is to summarize the basic
elements of the conventional “Statistics 101” toolbox.
102
through time:
 −→ | (x|  )
corrector
−→
predictor
−→
+1| (x|  )
+1|+1 (x| +1 ) −→ 
↓estimator
x̂+1|+1
Here, | (x|  ) is the probability (density) that the target
has state x, given the accumulated information   . The
predictor (time-update) step accounts for the increase in uncertainty in the target state between measurement collections.
The corrector (measurement-update) step permits fusion of the
newest measurement z+1 with previous measurements   .
These steps are defined by the time-prediction integral
Z

+1| (x| ) = +1| (x|x0 ) · +1| (x0 |  )x0 (1)
and Bayes’ rule
+1 (z+1 |x) · | (x|  )
+1 (z+1 |  )
(2)
+1 (z+1 |x) · | (x|  )x
(3)
+1|+1 (x| +1 ) =
where
+1 (z+1 |  ) =
Z
is the Bayes normalization factor. The estimator step consists
of a Bayes-optimal state estimator, such as the maximum a
posteriori (MAP) estimator:
+1
)
xMAP
+1|+1 = arg sup +1|+1 (x|
x
(4)
(“Bayes-optimal” means that the estimator minimizes the
Bayes risk corresponding to some cost function [20], p. 63.)
The Bayes filter formulas require knowledge of two a
priori density functions: the target Markov transition density
+1| (x|x0 ) and the sensor likelihood function +1 (z|x).
The former, +1| (x|x0 ), is the probability (density) that
the target will have state x at time +1 if it had state x0
at time  . The latter, +1 (z|x), is the probability (density)
that the sensor will collect measurement z at time +1
if a target with state x is present. By “true” formulas for
+1| (x|x0 ) and +1 (z|x) is meant:
0
• +1| (x|x ) and +1 (z|x) faithfully incorporate the
motion and measurement models; and
• no extraneous information has inadvertently been introduced into them.
B. Moment Approximations of the Bayes Filter
A. Single-Sensor, Single-Target Recursive Bayes Filter
The primary tool is the recursive Bayes filter—the foundation for optimal single-sensor, single-target tracking. At
various times 1    , a single sensor with unity probability
of detection and no clutter, interrogates a single noncooperative target. The time-sequence of collected measurements is
  : z1   z and the state of the target—the information
about it that we wish to know (position, velocity, type, etc.)—is
x. The Bayes filter propagates a Bayes posterior distribution
Historically, the Bayes filter has typically been implemented
using moment approximations. Let  (x) = 0 (x − x0 )
denote a Gaussian distribution with mean x0 (first-order
moment of  (x)) and covariance matrix 0 (a second-order
moment of  (x)). Assume that signal-to-noise ratio (SNR) is
large enough that the track distributions can be approximately
characterized by their first- and second-order moments:
| (x|  ) ∼
= | (x − x| )

+1| (x| ) ∼
= +1| (x − x+1| )
(5)
(6)
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
Then the Bayes filter can be replaced by a filter—the extended
Kalman filter (EKF)—that propagates the first-and secondorder moments:
x+1|
x+1|+1
x|
→
→
→ 
 →
|
+1|
+1|+1
Similarly, assume that SNR is large enough that the track
distributions can be approximately characterized by their firstorder moments:
(8)
for some fixed covariance  . Then the Bayes filter can
be replaced by a filter—for example, an - filter—that
propagates only the first-order moment:
 → x|
→ x+1|
→ x+1|+1
→ 
C. Single-Target Motion Modeling
Target modeling is schematically summarized in Figure
1. At the top, interim target motion is mathematized as a
statistical motion model. The function x =  (x0 ) states
that the target will have state x at time +1 if it had state
x0 at time  . Since this equation is typically just a guess,
it is randomly perturbed by the motion noise (“plant noise”)
W with probability distribution W (x). The information
contained in this model is equivalent to that contained in the
next line, the probability mass function (p.m.f.)
+1| (|x0 ) = Pr(X+1| ∈ |x0 )
(9)
+1| (x|x0 ) = W (x −  (x0 ))
(10)
The p.m.f. is equivalent to the probability density function
(p.d.f.) +1| (x|x0 ):
This formula is a standard result easily found in standard
textbooks. It is a consequence of the following equation:
Z
0
+1| (x|x0 )x
(11)
+1| (|x ) =

The validity of Eq. (11) ensures that Eq. (10) is “true” because
it means that +1| (|x0 ) and +1| (x|x0 ) are entirely
equivalent statistical descriptors of X+1| .
motion model
X+1| =  (x0 ) + W
|{z}
| {z }
| {z }
predicted target deterministic plant noise
probabilitymass
function
“true”
Markov
density
Bayes filter
predictor
D. Single-Sensor, Single-Target Measurement Modeling
Sensor modeling is schematically summarized in Figure 2.
We begin with a statistical measurement model. The function
z =  +1 (x) states that the sensor will collect measurement
z at time +1 if a target with state x is present. Because
of sensor noise, this formula must be randomly perturbed
by a noise-vector V with distribution V+1 (z). The
information in this model is equivalent to that in the p.m.f.
(7)
| (x|  ) ∼
=  (x − x| )

+1| (x| ) ∼
=  (x − x+1| )
⇓⇑
+1| (|x0 )
= Pr(X+1| ∈ |x0 )
R
+1|
x
⇓⇑
x
This p.m.f.—and thus the original measurement model—is
equivalent to the p.d.f. +1 (z|x):
The fact that this formula is “true” is assured by the equation
Z
+1 ( |x) =
+1 (z|x)z
(14)

measurement
model
Z+1 = +1 (x) + V+1
| {z } | {z } | {z }
measurement deterministic sensor noise
⇓⇑
probabilitymass
function
+1 ( |x)
= Pr(Z+1 ∈  |x)
R
+1
z
⇓⇑
z
single-object calculus
“true”
likelihood
function
z (x) = +1 (z|x)
⇓
single-target Bayes’ rule
Bayes filter
corrector
+1| (x|  ) → +1|+1 (x| +1 )
Figure 2: Single-Sensor, Single-Target Measurement Modeling
E. Single-Target, Multisensor Data Fusion
Suppose that we have two sensors, with—as in the singlesensor case—unity probability of detection and no clutter.
Measurement-collection times are identical (synchronous), so
1
2
that the sensors collect measurement-streams   and  
1
2
with z and z collected simultaneously at time  for any
 = 1  . Let the respective likelihood functions be
1
1
2
2
z1 (x) abbr.
=   (z|x),
(15)
z2 (x) abbr.
=   (z|x)
1
2
If the sensors are independent, then their joint likelihood
function has the form
12
1
2
(16)
z
1 2 (x) = 1 (x) · 2 (x)
z
z
z
Measurements are optimally fused using Bayes’ rule:
⇓
Figure 1: Single-Sensor, Single-Target Motion Modeling
(13)
+1 (z|x) = V+1 (z −  +1 (x))
1
2
(17)
| (x|     )
single-target prediction integral
| (x|  ) → +1| (x|  )
(12)
+1 ( |x) = Pr(Z+1 ∈  |x)
single-object calculus
x (x0 ) = +1| (x|x0 )
103
12
=
z1
1
2
 z
(x) · |−1 (x|
1
2
1
−1
2
2
 
 (z  z | −1   −1 )
−1
)
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
where
1
2
(18)
 (z  z | −1   −1 )
Z
12
1
2
=
z1 z2 (x) · |−1 (x| −1   −1 )x
1
2


The Bayes-optimal fusion approach, Eq. (16), employs a
product likelihood. Consider the counterproposal in [38] that
one should, instead, use an average:
2
1 1
(z1 (x) + z2 (x))
z
1 2 (x) =
z
2
12
(19)
One reviewer of this paper objected that this method hardly
merits mention, because “data fusion averaging is not a sensible approach.” Readers’ patience is nevertheless requested,
because it is being explicitly or implicitly promoted by
rather powerful individuals. Eq. (19) is problematic because
whereas product-likelihoods inherently improve target localization, average-likelihoods inherently worsen it. Consider
the following simple example: two bearing-only sensors in
the plane with respective Gaussian likelihood functions
1
 ( ) = 2 ( − ),
2
(21)
+1| ( ) = 20 ( − 0 ) · 20 ( − 0 )
where  20 is arbitrarily large—so that +1| ( ) is
effectively uniform. Let 1  2 be the measurements collected
by the sensors. Then Bayes’ rule yields
Bayes
( ) ∼
+1|+1
= 2 ( − 1 ) · 2 ( − 2 )
(22)
This results in a triangulated localization at (1  2 ) with
variance ∼
= 2 2 . But with the average likelihood,
1
∼
=
2
∙
2 ( − 1 ) · 20 ( − 0 )
+20 ( − 0 ) · 2 ( − 2 )
x
∼
= +1| (x|x0 ) · |x |
(25)
+1| (x |x0 )

|x |
|x |&0
(26)
+1| (x|x0 ) = lim
This, the Lebesgue differentiation theorem, provides one
(but not the only) way of deriving +1| (x|x0 )
from
+1| (|x0 ) using a constructive Radon-Nikodým derivative
([9], pp. 144-150). For the model in Figure 1 we have
+1| (x |x0 ) = Pr( (x0 ) + W ∈ x )
∼
= W (x −  (x0 )) · |x |
(27)
(28)
from which Eq. (10) follows.
IV. M ULTISENSOR -M ULTITARGET S YSTEMS
The purpose of this section is to summarize the basic
elements of “multisensor-multitarget Statistics 101.”
(20)
 ( ) = 2 ( − )
That is, the sensors are oriented so as to triangulate the position
of a target located at ( ). For conceptual clarity, let the
prior distribution be
av
( )
+1|+1
This can be accomplished as follows. Let x be an
arbitrarily small region around x of size |x |. Then
Z
0
+1| (x |x ) =
+1| (x|x0 )x
(24)
and therefore
F. Data Fusion Via Averaging?
104
¸

(23)
This distribution has four “tails” whose lengths increase with
the size of its variance, which is ∼
=  20 → ∞.
Now apply additional bearing-only sensors, all with orientations different from the first two and each other. The variance
increases with the number of averaged sensors—whereas it
greatly decreases if Bayes’ rule is used instead.
As we shall see in Section V-C, a generalization of Eq. (19)
is what leads to the very poor performance of the averagebased multisensor PHD filter proposed in [38].
G. Constructing Markov Densities and Likelihoods
Eqs. (10,13) do not tell us how to construct explicit formulas
for +1| (x|x0 ) and +1 (z|x) from explicit formulas for
+1| (|x0 ) and +1 ( |x).
A. Random Finite Sets (RFS’s)
This section introduces the concept of an RFS as the
multitarget analog of a random vector.
1) Random Single-Target States : The state x of a singletarget system may (as an example) have the form x =
(         ) where    are position variables,
     are velocity variables, and  ∈  is a discrete
identity variable (which could be, for instance, a track label).
In a Bayesian approach, the state at time  must be a random
state X| . The precise mathematical definition of a random
state X| requires that it actually be a “measurable mapping”
from a “probability space” to the state space. In turn, the
state space must be equipped with a “topology,” typically (but
not always) a Euclidean topology. While such details are
mathematically necessary, for engineering purposes they can
usually be taken for granted.
2) Random Multitarget States : The state of a multitarget
system, on the other hand, is most accurately represented as
a finite set of the form  = {x1   x }. Here, not only
the individual target states x1   x are random but also
their number (cardinality) . This includes the possibility
 = 0 (no targets are present), in which case we write  = ∅
(the null set). The finite-set representation is most natural
because—given that each target already has its own unique
identity, as indicated by a variable such as —from a physical
point of view the targets have no inherent order. Thus in a
Bayesian formulation, a state-set is actually a random state-set
Ξ| —it is a random finite set (RFS).
Similar comments apply to the measurements collected
from the targets by a sensor. These also usually have no
inherent physical ordering. They thus have the form  =
{z1   z }, where not only the individual measurements
z1   z are random, but also their number  ≥ 0. Thus
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
in a Bayesian development a measurement-set is actually a
random measurement-set Σ+1 —an RFS.
The RFS representation is more “engineering friendly” than
the random-measure representation of standard point process
theory. A finite set {x1   x } is easily visualizable
as a point pattern—for example, in the plane or in three
dimensions. Similarly, an RFS is easily visualizable as a
random point pattern. An everyday example of an RFS: the
stars in a night sky, with many stars winking in and out and/or
slightly varying in their apparent position ([20], pp. 349-356).
3) “Fire-and-Forget” Foundations of RFS’s : If we are to
have a trustworthy mathematical foundation for multisensormultitarget systems, we must precisely define RFS’s. This
forces us to define topologies on so-called hyperspaces—that
is, spaces X∞ whose “points” are subsets (in our case
finite subsets) of some other space X. Two hyperspaces are
of engineering interest: the hyperspace of finite subsets of
the state space, and the hyperspace of finite subsets of the
measurement space.
If we employed the random-measure formulation of point
process theory, we would be forced to work with abstract probability measures defined on measurable subsets of an abstract
space whose “points” are counting measures. In an arbitrary
RFS formulation, this would be replaced by equally abstract
probability measures ∞
| () = Pr(Ξ| ∈ ) defined on
measurable subsets  of X∞ , with any “point” of  being a
finite subset of X. Luckily, a simpler “stochastic geometry”
formulation [37] is available. Its hyperspace topology—the
to be equivaFell-Matheron topology—allows
∞
| ()
lently replaced by the multitarget analog of a conventional
probability-mass function | () = Pr(X| ∈ ).2 This is
the belief-mass function (b.m.f.)
 | () = Pr(Ξ| ⊆ )
(29)
which is defined on (closed) subsets  of X.
4) “Fire-and-Forget” Foundations of Multitarget Tracking
: The upshot of Eq. (29) is that, in finite-set statistics, it
is usually possible to entirely avoid abstractions such as
topologies, measurable mappings, and the “randomness” of
finite sets in the formal sense.
More generally, finite-set statistics is intentionally formulated as a stripped-down version of point process theory—one
in which we attempt to avoid all avoidable abstractions. As
an illustration, concepts such as “thinning” and “marking” are
basic to purely mathematical treatments of the subject. But
in multitarget detection and tracking, these concepts appear
only in a few, concrete contexts that can be adequately
addressed at a purely engineering level. Missed detections
and disappearing targets can both be described as forms of
thinning; and target identity as a form of marking. But from an
engineering perspective, does the imposition of such concepts
represent an increase of content—or of pedantry?
2 The reason is as follows. Consider the set function 
| () = 1 −
 | (  ) = Pr(Ξ| ∩  6= ∅). Then the Choquet-Matheron capacity
theorem states that the additive measure | () is equivalent to the
nonadditive measure  | (), in the sense that both completely characterize
the probabilistic behavior of Ξ| (see [9], p. 6 or [20], p. 713).
105
As a second illustration, in FISST density functions are
systematically used in place of measures, except when this
is not possible. Thus the Dirac delta function is employed
even though it produces engineering-heuristic abbreviations of
rigorous expressions (as in Eq. (81), for example).
B. Probability Distributions of RFS’s
Just as a random state-vector X| has a probability density
| (x) = X| (x), so an RFS has a multitarget probability
density function (m.p.d.f.)
(30)
| () = Ξ| ()
Its form varies with the number of targets:
⎧
| (∅)
if
 =∅
⎪
⎪
⎪
⎨ | ({x1 })
if
 = {x1 }
| () =
| ({x1  x2 }) if || = 2,  = {x1  x2 }
⎪
⎪
⎪
..
..
⎩
.
.
(31)
where || denotes the number of elements in . Also, its
units of measurement vary with target number: if  are the
units of x, then the units of | () are −|| .
In general, a function  () that satisfies the same property
with respect to units is a multitarget density function. Its set
integral in the region  is defined to be
Z
 ()
(32)

=  (∅)
Z
∞
X
1
| ({x1   x })x1 · · · x
+
!  ×  × 
≥1
| {z }
 times
where, as a convention, define | ({x1   x }) = 0 whenever |{x1   x }| 6= .
The probability that there are  elements in Ξ| is
Z
| () =
 ()
(33)
||=
Z
1
| ({x1   x })x1 · · · x  (34)
=
!
Thus | () for  ≥ 0 is a probability distribution on the
number of targets—the cardinality distribution of Ξ| .
C. Multisensor-Multitarget Recursive Bayes Filter
This filter is the theoretical foundation for optimal singlesensor, single-target tracking. At times 1    , one or
more of  sensors interrogate an unknown number of unknown, noncooperative target. The time-sequence of collected
measurement-sets is  () : 1    . The multisensormultitarget Bayes filter propagates a multitarget Bayes posterior distribution through time:
 −→ | (| () )
corrector
−→
predictor
−→
+1| (| () )
+1|+1 (| (+1) ) −→ 
↓multitarget estimator
̂+1|+1
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
These steps are defined by multitarget analogs of the timeprediction integral
Z
+1| (| () ) = +1| (| 0 ) · +1| ( 0 | () ) 0
(35)
and of Bayes’ rule
+1|+1 (| (+1) ) =
where
+1 (+1 |
()
)=
Z
+1 (+1 |) · | (| () )
(36)
+1 (+1 | () )
+1 (+1 |) · | (|
()
•
•
x0 will persist into time +1 and transition to some other
(random) state X . Then x0 will transition to
½
∅
if disappears (prob. 1 −  (x0 ))
persists
{X } if
(38)
Suppose that  is the set of targets that newly appear at
time +1 . Then the RFS motion model has the form
 (x0 ) =
Ξ+1| =  (x01 ) ∪  ∪  (x00 ) ∪  
(39)
)
(37)
The estimator step consists of a multitarget Bayes-optimal
state estimator. As was explained in [21], multitarget versions
of the maximum a posteriori (MAP) and expected a posteriori
(EAP) estimators do not exist. Rather, one must use alternative
estimators ([20], pp. 494-508).
As in the single-sensor, single-target case, the multisensormultitarget Bayes filter requires two a priori density functions:
the multitarget Markov transition density +1| (| 0 ) and
the multisensor-multitarget likelihood function +1 (|).
Here, +1| (| 0 ) is the probability (density) that the
targets will have state-set  at time +1 if they had stateset  0 at time  . Also, +1 (|) is the probability
(density) that the sensors will jointly collect measurement-set
 at time +1 if targets with state-set  are present.
In the single-sensor, single-target case, the formulas for the
Markov density and likelihood function—Eqs. (10,13)—are
never derived but, rather, simply looked up. In multisensormultitarget problems, this is not possible because no standard
references exist. Thus one must ask:
•
106
How does one construct statistical multitarget motion
models for any given application? In particular, how
does one model phenomena such as target disappearance
and target appearance?
How does one construct statistical multisensormultitarget measurement models for any particular set of
sensors? In particular, how does one model phenomena
such as sensor fields of view and clutter?
Given such models, how does one construct formulas for
the “true” multitarget Markov density and the “true”
multisensor-multitarget likelihood function? That is, how
does one know that they are not heuristic contrivances,
or that no extraneous information has inadvertently been
introduced?
D. Multitarget Motion Modeling
This is summarized in Figure 3. At the top, interim target
motions are represented as a RFS motion model.
As an example, consider the most commonly assumed
multitarget motion model—the “standard” such model. At
time  suppose that the target state-set is  0 = {x01   x00 }
with | 0 | = 0 . At time +1 , either each of the targets
persists or disappears. Let  (x0 ) be the probability that
The set-theoretic union symbol ‘∪’ indicates that at time
+1 , targets will be either persisting targets or new targets.
It is assumed that  (x01 )   (x00 )  are independent.
The information contained in this model is equivalent
to that contained in the next line of Figure 3, the b.m.f.
 +1| (| 0 ). Because of independence, it is
 +1| (| 0 ) =   (x01 ) () · · ·   (x0 0 ) () ·   () (40)

where the b.m.f. of  (x0 ) is
(41)
  (x0 ) ()
Pr( (x0 )
Pr( (x0 )
⊆ )
= ∅) + Pr( (x0 ) 6= ∅  (x0 ) ⊆ )(42)
Z
= 1 −  (x0 ) +  (x0 ) +1| (x|x0 )x
(43)
=
=

Also, the b.m.f. of  is normally chosen to be Poisson:
µ
¶
Z


  () = exp −+1|
+ +1|

(x)x

+1|

(44)
is the expected number of appearing targets; and
Here

+1| (x), the “spatial distribution,” is the probabilty (density)
that an appearing target will have state x.

+1|
A central aspect of finite-set statistics is a set of procedures
for deriving the formula for the multitarget Markov density
+1| (| 0 ) from the formula for  +1| (| 0 ). These
two statistical descriptors are related by the equation
 +1| (| 0 ) =
Z
+1| (| 0 )
(45)

for all . The validity of this equation ensures that the
formula for +1| (| 0 ) is “true.” This is because Eq. (45)
states that  +1| (| 0 ) and +1| (| 0 ) are equivalent.
The explicit formula for +1| (| 0 ) is too complicated
to state here—see [20], pp. 466-473.
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
multitarget
motion model
belief-mass
function
“true”
multitarget
Markov
density
multitarget
Bayes filter
predictor
The multisensor-multitarget likelihood function +1 (|)
is related to the belief-mass function by the equation
Z
+1| (|)
(51)
 +1 ( |) =
Ξ+1| = ( 0 ) ∪ 
|{z}
| {z } | {z }
predicted targets
surviving
107
appearing

⇓⇑
The validity of this equation ensures that the formula for
+1 (|) is “true,” because it shows that  +1 ( |) and
+1| (|) are equivalent.
The explicit formula for +1| (|) is too complicated
to reproduce here—see [20], pp. 408-421.
 +1| (| 0 )
= Pr(Ξ+1| ⊆ | 0 )
R
 +1|

⇓⇑

multiobject calculus
multitarget
meas’t
model
 ( 0 ) = +1| (| 0 )
Σ+1 = +1 (x) ∪ +1
| {z }
| {z } | {z }
measurements
⇓
belief-mass
function
 +1 ( |x)
= Pr(Σ+1 ⊆  |x)
R
 +1

⇓⇑

Figure 3: Multisensor-Multitarget Motion Modeling
E. Multisensor-Multitarget Measurement Modeling
This is summarized in Figure 4. We begin with a RFS
multisensor-multitarget measurement model.
Consider the “standard” such model. Suppose that the
state-set for the targets at time +1 is  = {x1   x }
with || = . It is assumed that each target generates at most
a single measurement, and that any measurement is generated
by at most a single target. Let  (x ) be the probability
that the target x generates a measurement. Then the set of
measurements Υ+1 (x ) generated by the target with state
x will have at most a single element:
½
∅
if undetected (prob. 1 −  (x ))
Υ (x ) =
detected
{Z } if
(46)
Suppose that +1 is the set of measurements that are
generated by no target—i.e., the clutter measurements. Then
the RFS measurement model has the form
Σ+1 = Υ (x1 ) ∪  ∪ Υ (x ) ∪ +1 
(47)
The symbol ‘∪’ indicates that measurements consist of
target-generated measurements or clutter measurements. It
is assumed that Υ (x1 )  Υ (x ) +1 are statistically
independent.
Because of independence, the b.m.f. of the RFS model is
 +1 ( |) =  Υ+1 (x1 ) ( ) · · ·  Υ+1 (x ) ( ) ·  +1 ( )
(48)
where the b.m.f. of Υ+1 (x0 ) is
Z
+1 (z|x )z (49)
 Υ+1 (x ) ( ) = 1 −  (x ) +  (x )

and the b.m.f. of +1 is usually chosen to be Poisson:
µ
¶
Z
 +1 ( ) = exp −+1 + +1
+1 (z)z  (50)

Here +1 (z) is the spatial distribution of clutter measurements, and +1 is the expected number of clutter
measurements at time +1 (the “clutter rate”).
multiobject calculus
“true”
multitarget
likelihood
function
multitarget
Bayes filter
corrector
clutter
⇓⇑
multitarget prediction integral
| (| () ) → +1| (| () )
targets
 () = +1 (|)
⇓
multitarget Bayes’ rule
+1| (| () ) → +1|+1 (| (+1) )
Figure 8: Multisensor-Multitarget Measurement Modeling
F. Multitarget Calculus for Modeling
Eqs. (45,51) do not tell us how to construct formulas
for +1| (| 0 ) and +1| (|) from formulas for
 +1| (| 0 ) and  +1 ( |). This is the purpose of
multitarget calculus, which generalizes the reasoning used in
Section III-G.
Let x and x0 be arbitrarily small regions surrounding
=  − x0 , where
x and, for any closed subset , let x0 def.
‘−’ indicates set-theoretic difference. Then x0 and x
are disjoint and so
+1| (x0 ∪ x |x0 ) − +1| (x0 |x0 ) = +1| (x |x0 )
(52)
and so from Eq. (26),
(53)
+1|+1 (x|x0 )
+1| (x0 ∪ x |x0 ) − +1| (x0 |x0 )

lim
=
lim
0 |&0 | |&0
|x |
|x
x
1) Set Derivatives : For any real-valued set function ()
define the generalized Radon-Nikodým derivative ([9], pp.
144-150):

() = lim
0 |&0
x
|x
lim
|x |&0
(x0 ∪ x ) − (x0 )

|x |
(54)
Extend this definition as follows. For any  = {x1   x }
with || =  ≥ 1, define
 

() =
()

x1 · · · x
(55)
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
and, if  = ∅, define

() = ()
(56)

Then Eqs. (54-56) define the set derivative of () with
respect to  ([9], p. 150-151).
The set derivative is the inverse operation of the set integral:
Z

(∅)
(57)
() =


µZ
¶

 ( )
=  ()
(58)


2) Construction of “True” Markov Densities and Likelihood Functions : It is possible to derive “turn-the-crank” rules
of differentiation for set derivatives: sum rules, power rules,
product rules, chain rules, etc. ([20], pp. 383-395). These rules
permit the explicit construction of formulas for multitarget
Markov densities and multitarget likelihood functions. This
is because of the following two formulas, which are direct
consequences of Eqs. (57,58):
∙
¸

 +1| (| 0 )
(59)
+1| (| 0 ) =

=∅
∙
¸

 +1 ( |)

(60)
+1 (|) =

 =∅
V. P RINCIPLED A PPROXIMATE M ULTITARGET F ILTERS
The multisensor-multitarget recursive Bayes filter of Section
IV-C is usually computationally intractable. How can we
approximate it in a manner that preserves, as faithfully as
possible, the underlying models and their interrelationships?
This question is answered by assuming that the m.p.d.f.’s
| (| () ) and/or +1| (| () ) have a particular
simplified form—one that permits approximate closed-form
solution of the multitarget Bayes filter. Three types of approximation have been extensively investigated in the literature
thus far: Poisson, independent identically distributed cluster
(i.i.d.c.), and multi-Bernoulli. They result in, respectively,
PHD filters, CPHD filters, and multi-Bernoulli filters. The
purpose of this section is to summarize these filters, as well
as their extensions to unknown clutter and detection profiles.
A. Probability Hypothesis Density (PHD) Filters
Recall from Section III-B that - filters are based on
first-moment approximation of the single-sensor, single-target
Bayes filter as in Eqs. (7,8). By analogy, we assume that
SNR is large enough that the multitarget Bayes filter can be
approximated by its first-order statistical moments. But first
one must ask: What is the first-order moment of a multitarget
probability distribution?
1) Probability Hypothesis Density Functions : The naïve
definition of the first-order moment (expected value) of a
multitarget distribution would be
Z
(61)
| =  · | (| () )
However, it is mathematically undefined for the simple reason
that addition and subtraction  ±  of finite sets  
108
is undefined. Thus instead we must employ an alternative
strategy: replace  by some doppelgänger  for which
addition and subtraction is definable. In point process theory,
one (intuitively speaking) chooses  to be
X
 y (x)
(62)
  (x) =
y∈
where  y (x) is the Dirac delta function concentrated at y.
In this case Eq. (61) is replaced by ([20], pp. 581-582)
Z
  (x) · | (| () ) (63)
| (x| () ) =
Z
(64)
=
| ({x} ∪  | () )
This function is called a probability hypothesis density (PHD)
or first-moment density function.3 It is completely characterized by the following property:
Z
| (x| () )x = expected no. of targets in  (65)

Thus the number | (x| () ) can be understood as the
track density at x. A target with state x is more likely
to be present in the scene when | (x| () ) is large than
when it is small.
Consequently, it is possible to use | (x| () ) to
estimate the number and states of the targets. Let
Z
(66)
| = | (x| () )x
be the total expected number of targets in the scene. Round
| off to the nearest integer , and then determine those
values x1   x of x that correspond to the  highest
“peaks” of | (x| () ). Then ̂| = {x1   x } is an
estimate of the number of targets and their states.
2) Example of a PHD : Suppose that, in a one-dimensional
scenario, the multitarget distribution (m.p.d.f.) corresponds to
two targets located at  = 1 and  = 2 :
| ({ }) = 2| ( − 1 ) · 2| ( − 2 )
(67)
+2| ( − 2 ) · 2| ( − 1 )
Then the corresponding PHD can be shown to be:
| () = 2| ( − 1 ) + 2| ( − 2 )
(68)
3) PHD Filters in the General Sense : The m.p.d.f. of a
Poisson process has the form
Y
(x)
(69)
 () = − ·
x∈
where
(x) is a density function with integral  =
R
(x)x. In analogy with constant-gain Kalman filters—
Eqs. (7,8)—assume that the multitarget densities in the multitarget Bayes filter are all approximately Poisson. Then this
3 PHDs are also known as intensity functions. I avoid this terminology
because of the potential for confusion with the many alternative meanings
of “intensity” in tracking and information fusion applications. “PHD” is a
historical usage [18].
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
filter can be approximately replaced by a first-order moment
filter of the form
 −→ | (x| () )
corrector
−→
predictor
−→
+1| (x| () )
+1|+1 (x| (+1) ) −→ 
Such a filter is called a PHD filter in the general sense.
4) The “Classical” PHD Filter : The “classical” PHD filter
is a PHD filter with these specific modeling assumptions [18],
[20]: (1) a single sensor; (2) all target motions are independent;
(3) measurements are conditionally independent of the target
states; (4) the clutter process is Poisson in the sense of Eq. (50)
and is independent of other measurements; (5) target-generated
measurements are Bernoulli in the sense of Eq. (46); (6) the
surviving-target process is Bernoulli in the sense of Eq. (38),
and independent of persisting targets; and (7) the appearingtarget process is Poisson in the sense of Eq. (44).
Using the methodology in Section VI, it can be shown that
the exact4 time-update equation for the classical PHD filter is
([18], [20], Chapter 16):5
+1| (x| () )
(70)
= +1| (x)
Z
+  (x0 ) · x (x0 ) · +1| (x| () )x0
and, if +1 is the currently-collected measurement-set, that
the approximate measurement-update equation is
+1|+1 (x| (+1) ) ∼
=  (x) · +1| (x| () ) (71)
+1
Here the PHD “pseudolikelihood” is
X
 (x) · z (x)
+1 (x) = 1 −  (x) +
+1 +1 (z) +  +1 (z)
z∈+1
(72)
and +1 (z) is the clutter spatial distribution and +1 is
the clutter rate. Also,
Z
(73)
 +1 (z) =  (x) · z (x) · +1| (x| () )x
The classical PHD filter has attractive computational characteristics: its order of complexity is () where 
is the current number of measurements and  is the current
number of tracks. The major limitation of the PHD filter is that
| is not a stable instantaneous estimate of target number—
its variance is typically large. Thus in practice, | must
be averaged over a time-window in order to get stable targetnumber estimates.
B. Cardinalized PHD (CPHD) Filters
The CPHD filter is a generalization of the PHD filter. It
has low-variance instantaneous estimates of target number and
better tracking performance, but at the cost of greater computational complexity. Its computational order is (3 ),
although this can be reduced to (2 ) using numerical
balancing techniques [10].
4 There seems to be a misconception in some quarters that Eq. (70) is
approximate—i.e., requires the assumption that the prior multitarget distribution | (| () ) be Poisson. This is not true.
5 For clarity, the “target-spawning” model is neglected.
109
1) CPHD Filters in the General Sense: The cardinality distribution | (| () ) of a multitarget track density
| (| () ) was defined in Eq. (33). For each ,
| (| () ) is the probability that there are  targets in the
scene. The multitarget density of an independently identically
distributed cluster (i.i.d.c.) process has the form
Y
(x)
(74)
 () = ||! · (||) ·
x∈
where () is a cardinality distribution and (x) is a
probability density (a “spatial distribution”). Assume that
the multitarget distribuions in the multitarget Bayes filter are
approximately i.i.d.c. Then this filter can be approximately
replaced by a higher-order moment filter of the form
½
½
+1| (x| () )
| (x| () )
predictor
−→
 −→
()
| (| )
+1| (| () )
corrector
−→
½
+1|+1 (x| (+1) )
+1|+1 (| (+1) )
−→ 
Any such filter is a CPHD filter in the general sense.
2) The “Classical” CPHD Filter : The “classical” CPHD
filter is a CPHD filter with these specific modeling assumptions [19], [20]: (1) single sensor; (2) independent target
motions; (3) conditionally independent measurements; (4) the
clutter process is i.i.d.c. and independent of other measurements; (5) target-generated measurements are Bernoulli in the
sense of Eq. (46); (6) surviving targets are Bernoulli in the
sense of Eq. (38), and independent of persisting targets; and
(7) the appearing-target process is i.i.d.c.
Given these assumptions and the methodology in Section
VI, one can derive the time- and measurement-update equations for the classical CPHD filter. These equations are beyond
the scope of this paper (see [20], Chapter 16).
The classical PHD and CPHD filters are most commonly
implemented using either Gaussian mixture techniques (assuming moderate motion and/or measurement nonlinearities)
or particle methods (for stronger nonlinearities). See [20],
Chapter 16, for more details.
The classical PHD and CPHD filters have also been employed in hundreds of research papers addressing dozens of
applications, far too many to address here. A few diverse
examples: ground-target tracking using GMTI (ground moving target indicator) radar [41]; passive-RF air-target racking
[40]; satellite-borne optical satellite tracking [6]; audio speaker
tracking [13]; underwater monostatic-active sonar [1], [2]; and
underwater multistatic active sonar [8].
C. Multisensor Classical PHD/CPHD Filters
The classical PHD and CPHD filters can be extended to the
multisensor case. However, these extensions are nontrivial and
require special theoretical analysis. The exact approach [3],
[17] is combinatorial, and therefore is appropriate only for a
small number of sensors. Moratuwage, Vo, and Danwei Wang
have successfully employed the exact multisensor PHD filter
in a robotics SLAM (simultaneous localization and mapping)
application [30].
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
The most common approach, the iterated-corrector method,
is heuristic. The PHD filter or CPHD filter corrector equation
is applied successively for each sensor. This approach depends
on sensor order, but performs well when the sensors’ probabilities of detection are not too dissimilar. Otherwise, larger-
sensors should be applied before smaller- sensors [34].
The parallel combination approximate multisensor (PCAM)
approach [14] does not depend on sensor order, is computationally tractable, and results in good tracking performance.
Another approach has been proposed [38], in which the
PHD pseudolikelihoods of Eq. (72) are averaged. Given the
discussion in Section III-F, this is an obviously problematic
approach that should result in poor target localization—which
indeed it does.
Nagappa and Clark have conducted simulations comparing the approaches just summarized. In a first set of
three-sensor simulations, two sensors had  = 095
and the third  = 09. In decreasing order of performance: PCAM-CPHD, PCAM-PHD, iterated-corrector
CPHD, iterated-corrector PHD, averaged-pseudolikelihood
PHD. The performance of the averaged-pseudolikelihood
PHD filter was particularly bad, with the iterated-corrector
PHD filter having intermediate performance. Similar results
were observed when the probability of detection of the third
sensor was decreased to  = 085 and again to  = 07.
D. Multi-Bernoulli Filters
Suppose that we have  target tracks, and that each track
has a track distribution  (x) and a probability of existence
 , for  = 1  . Then the multitarget density of a multiBernoulli process has the form, for  = {x1   x } with
|| = ,
X
 () = 
1≤1 6=6= ≤
where
=

Y

Y
 ·  (x )
1 − 
=1
(75)
(76)
(1 −  )
=1
Assume that the multitarget distributions in the multitarget
Bayes filter are approximately multi-Bernoulli. Then it can
be approximately replaced by a multi-Bernoulli filter in the
general sense:
|
|
+1|
 → {   (x)}
+1|+1
→ {
→ {
+1|+1
 
(x)}
+1|
 
→ 
(x)}
The first such filter was proposed in [20], Chapter 17
but, because of an ill-conceived linearization step, exhibited
a pronounced bias in the target-number estimate. Vo, Vo,
and Cantoni corrected this bias with their cardinality-balanced
multitarget multi-Bernoulli (CBMeMBer) filter [44]. The
CBMeMBer filter appears to be well-suited for applications
in which motion and/or measurement nonlinearities are strong
and therefore sequential Monte Carlo (a.k.a. particle) implementation techniques must be used [44]. Dunne and
Kirubarajan have devised a jump-Markov version of this filter
110
[5], and Shanhung Wong, Vo, Vo, and Hoseinnezhad have
applied it to road-constrained ground-target tracking [36].
Vo, Vo, Pham, and Suter subsequently devised a multiBernoulli track-before-detect (TBD) filter for tracking in pixelized image data, assuming that targets have physical extent
and thus cannot overlap [47]. As already noted, this filter outperforms the previously best-known TBD filter, the histogramPMHT filter. It has been successfully applied to challenging
real videos, e.g., hockey and soccer matches [12], [11].
E. “Background-Agnostic” PHD and CPHD Filters
The PHD, CPHD, and multi-Bernoulli filters all require a
priori models of both the clutter (in the form of a clutter
spatial distribution +1 (z) and clutter rate +1 ) and the
detection profile (in the form of a state-dependent probability
of detection  (x)). A series of “second-generation” PHD
and CPHD filters do not require a priori models.
Any PHD, CPHD, or multi-Bernoulli filter can be transformed into a filter that can operate when the probability of
detection is unknown and/or dynamic [25]. The basic idea is
simple: replace the target state x by an augmented target
state x̊ = ( x), where 0 ≤  ≤ 1 is the unknown
probability of detection associated with a target with state
x. The resulting “probability of detection agnostic” (PDAG)
PHD, CPHD, and multi-Bernoulli filters have the same form
as before, except that the PHDs and spatial distributions have
the form ̊| ( x) and ̊| ( x). The PDAG-CPHD filter
has been successfully implemented in [28].
Any PHD, CPHD, or multi-Bernoulli filter can be transformed into one that can operate when the clutter background
is unknown and/or dynamic [26]. One simply replaces the
target state space by a state space that includes both targets
and “clutter generators.” The clutter generators are assumed
to be target-like in that their measurement-generation process
is Bernoulli in the sense of Eq. (46). In this case, any state x̊
of the joint target-clutter system can have two forms: x̊ = x
or x̊ = c where c is the state of a clutter generator.
One defines suitable extensions ̊z (x̊) and ˚+1| (x̊|x̊0 ) of
the likelihood function and Markov density. Targets must not
transition to target generators, and vice-versa:
(77)
˚+1| (x|c0 ) = 0, ˚+1| (c|x0 ) = 0
Otherwise, target statistics and clutter statistics would be
inherently intermixed, thus making it more difficult to distinguish targets from clutter. The filtering equations for “clutter
agnostic” (CAG) versions of the PHD, CPHD, and multiBernoulli filters can then be derived using simple algebra.
A version of the CAG-CPHD filter was successfully implemented in [28]. The PDAG and CAG approaches can be
combined, thus allowing any PHD, CPHD, or multi-Bernoulli
filter to operate under “general background-agnostic” (GBAG)
conditions. Various versions of the GBAG-CPHD filter have
been successfully implemented in [26], and likewise for a
GBAG-CMeMBer filter in [45].
A related development is the “multitarget intensity filter”
(MIF) or “iFilter” [39]. It is actually a CAG-PHD filter,
except that Eqs. (77) are violated: clutter generators are
(problematically) allowed to become targets, and vice-versa.
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
In [39], the authors claimed that: (1) the PHD filter can
be derived as a special case of the MIF using only “elementary” PPP (Poisson point process) theory; and (2) the
MIF can simultaneously estimate the clutter process and
target-appearance process. However, simple counterexamples demonstrate that—because Eqs. (77) are violated—these
claims are false.
First, the MIF cannot always estimate the target-appearance
process, because when there is no clutter (and thus no clutter generators), the target-birth rate is always estimated to
be 0 [15]. Second, the MIF cannot always estimate the
clutter process, since when the probability of birth target and
probability of target death are “conjugate,” its estimate of the
clutter rate is always a fixed multiple of the current number
of measurements [15]. Third, the PHD filter is not a special
case of the MIF because, when there is no clutter, the MIF
predictor has no target-appearance term—unlike the PHD filter
predictor [15]. Fourth, the derivation of the MIF (and along
with it, the claimed elementary derivation of the PHD filter)
has serious mathematical errors and problematic assumptions.6
VI. FISST A PPROXIMATION M ETHODOLOGY


[] =
[]

x1 · · · x
(80)
Intuitively speaking, Eq. (78) is a “functional derivative”
[48], which in physics is defined as a Gâteaux derivative in
the direction of the Dirac delta function  x (y) [7]:
 [ +  ·  x ] −  []

[] = lim

→0
x

(81)
(Note: While Eq. (81) is conceptually equivalent to Eq. (78)
and is a useful engineering heuristic, it is not—like Eq. (78)—
mathematically rigorous.)
Set derivatives of functionals can be derived using a toolbox
of “turn-the-crank” differentiation rules ([20], pp. 383-395),
including a powerful general chain rule due to Clark [3].
Let | (| () ) be the multitarget probability distribution. Then its p.g.fl. is the functional defined by [4], [48]
Z
= | [| () ] =  · | (| () ) (82)
| [] abbr.
where the power functional of the function 0 ≤ (x) ≤ 1 is
defined by  = 1 if  = ∅ and, otherwise, by  =
Q
x∈ (x). The advantage of the p.g.fl. representation is
that mathematical formulas that are complicated at the m.p.d.f.
level often greatly simplify at the p.g.fl. level, thus facilitating
the derivation of approximate filters—see Section VI-D below.
The cardinality distribution of of | (| () ) was defined
in Eq. (33). It can be directly derived from the probability
generating function
£
¤
(83)
| () = | [] =
using the formula
A. Functional Calculus
A functional is a real-valued function  [] whose argument  is a conventional function: (x). The generalized
Radon-Nikodým derivative of  [] with respect to x is
∙
¸


[] =
 ()
(78)
x
x
=∅
where the derivative on the right was defined in Eq. (54) and
the set function  () is defined by
 [ +  · 1 ] −  []
→0

where 1 (x) is the indicator function of the set .7 If
 = {x1   x } with || =  then the set derivative of
 [] with respect to  is
B. Probability Generating Functionals (p.g.fl.’s)
The approximation methodology used for PHD, CPHD, and
multi-Bernoulli filters consists of the following steps:
1) Construct RFS motion and measurement models for the
targets and sensor.
2) Use multitarget calculus to convert these models into
multitarget Markov densities and likelihood functions.
3) From these, construct the optimal approach for the
application: a multitarget Bayes filter.
4) Convert the multitarget Bayes filter into probability
generating functional (p.g.fl.) form.
5) Use simplifying approximations (Poisson, i.i.d.c., multiBernoulli) and multitarget calculus to derive PHD,
CPHD, and/or multi-Bernoulli filters for the application.
The first, second, and third steps have already been summarized in Section IV. The purpose of this section is to
summarize the fourth and fifth steps.
 () = lim
111
(79)
6 The authors implicitly assume that detected targets are well-separated
(which obviates their claim to have a “multitarget” filter). Also, because of
an erroneous assumption, the MIF (and along with it the claimed elementary
derivation of the PHD filter) is invalid for arbitrary sample paths. For details,
see [16], Appendix A.
| (| () ) =
∙
¸
1  |
()

!

=0
(84)
The PHD of | (| () ) was defined in Eq. (64). It also
can be derived from the p.g.fl.:
¸
∙
|
()
[]

(85)
| (x| ) =
x
=1
The belief-mass function of Eq. (29) can also be directly
derived from the p.g.fl.:
 | () = | [1 ]
(86)
7 Eq. (54) exists for all x if  () is a countably additive measure that

is absolutely continuous with respect to the base measure.
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
C. p.g.fl. Form of the Multitarget Bayes Filter
The multitarget Bayes filter can be equivalently expressed
as a filter on p.g.fl.’s:
 −→ | [|
corrector
−→
()
]
predictor
−→
+1|+1 [|
+1| [|
(+1)
()
]
] −→ 
The predictor step is
Z
()
+1| [| ] = +1| [| 0 ] · | ( 0 | () ) 0 (87)
where
0
+1| [| ] =
Z
0
 · +1| (| )
+1|+1 [| (+1) ] =

+1 [0 ]

+1 [0 1]
(88)
(89)
where the bivariate functional  [ ] is defined by
Z
 [ ] =  · +1 [|] · +1 (| () )
+1 [|] =
Z
(90)
(91)
  · +1 (|)
is the p.g.fl. of the multitarget likelihood +1 (|).
D. Deriving Approximate Filters
PHD, CPHD, and multi-Bernoulli filters are derived from
the p.g.fl. Bayes filter as follows. For illustrative purposes,
consider the PHD filter. Given the standard multitarget
models—Eqs. (43-40) and Eqs. (49-48)—it can be shown that
the p.g.fl. forms of the multitarget Markov density (Eq. (88))
and multitarget likelihood function (Eq. (91)) are
(92)
+1| [| 0 ]
= h+1| −1i · (1 −  +  ·  )
+1 [|]
= h+1 −1i · (1 −  +  ·  )
0
(93)
where the power-functional notation  was defined following Eq. (82); and where
Z
(x) · +1| (x|x0 )x
(94)
 (x0 ) =
Z
 (x) =
(z) · +1 (z|x)z
(95)
and
h+1|   − 1i =
h+1   − 1i =
Z
Z
From Eqs. (92,93) it follows that Eqs. (87,90) reduce to the
simplified form
+1| [] = h+1| −1i
·| [(1 −  +   )]
 [ ] = h+1 −1i
·+1| [(1 −  +   )]
(98)
(99)
The classical PHD filter arises if we assume that +1| []
(but not | []) is Poisson. That is, set
+1| [] = h+1|  −1i
(100)
and apply the “turn-the-crank” rules for the set derivative.

is the p.g.fl. form of the multitarget Markov transition density
+1| (| 0 ). The corrector step (Bayes’ rule) is
and where
112
((x) − 1) · +1| (x)x
(96)
((z) − 1) · +1 +1 (z)z (97)
VII. C ONCLUSIONS
Finite-set statistics is a conceptually parsimonious, “Statistics 101”-style foundation for multisensor-multitarget detection, tracking, and data fusion. This tutorial article has
summarized the basic tools necessary for reliably deriving
useful new multitarget tracking and data fusion algorithms,
without virtuoso-level expertise in point process theory.
Because of space limitations, many other significant RFS
tracking topics have been neglected, including:
• SLAM (simultaneous localization and mapping) for robotics applications [31], [32]. In this work, an RFS-based
SLAM filter has been shown to significantly outperform
standard methods such as MHT-FASTSLAM.
• Multitarget smoothing [29], [46].
• Bayes-optimal processing of nontraditional data such
as attributes, features, natural-language statements, and
inference rules (see [27] and [20], Chapters 3-6).
Additional developments include unified approaches for
sensor management [23] and track-to-track fusion [22].
R EFERENCES
[1] D. Clark and J. Bell, “Bayesian multiple target tracking in forward
scan sonar images using the PHD filter,” IEE Proc. Radar, Sonar and
Navigation, Vol. 152, No. 5, pp. 327 - 334, 2005.
[2] D. Clark, I. Ruiz, Y. Petillot, and J. Bell, “Particle PHD filter multiple
target tracking in sonar image,” IEEE Trans. Aerospace and Electronic
Systems, Vol. 43, No. 1, pp. 409-416, 2007.
[3] D. Clark and R. Mahler, “Generalized PHD filters via a general chain
rule,” Proc. 15 Int’l Conf. on Information Fusion, Singapore, July
9-12, 2012.
[4] D. Daley and D. Vere-Jones, An Introduction to the Theory of Point
Processes, First Edition, Springer-Verlag, New York, 1988.
[5] D. Dunne and T. Kirubarajan, “Multiple model tracking for multitarget
multi-Bernoulli filters,” in I. Kadar (ed.), Signal Processing, Sensor
Fusion, and Target Recognition XXI, SPIE Proc. Vol. 8392, Baltimore,
MD, April 23-27, 2012.
[6] A. El-Fallah, A. Zatezalo, R. Mehra, R. Mahler, and K. Pham, “Joint
search and sensor management of space-based EO/IR sensors for LEO
threat estimation,” in J. Cox and P. Motaghedi, (eds.), Sensors and Sys.
for Space Applications III, SPIE Proc. Vol. 7330, 2009.
[7] E. Engel and R. Dreizler, Density Functional Theory, Springer, 2011.
[8] R. Georgescu and P. Willett, “The GM-CPHD applied to real and
realistic multistatic sonar data,” in O. Drummond (ed.), Sign. and Data
Proc. of Small Targets 2010, SPIE Proc. Vol. 7698, 2010.
[9] I. R. Goodman, R. P. S. Mahler, and H. T. Nguyen, Mathematics of
Data Fusion, Kluwer Academic Publishers, New York, 1997.
[10] J. Guern, “Method and System for Calculating Elementary Symmetric
Functions of Subsets of a Set,” U.S. Patent No. 20110040525, Feb. 2,
2011, http://www.faqs.org/patents/app/20110040525.
IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, ISSUE 3, JUNE 2013, PP 376-389.
[11] R. Hoseinnezhad, B.-N. Vo, and B.-T. Vo, “Visual tracking in background subtracted image sequence via multi-Bernoulli filtering,” IEEE
Trans. Sign. Proc., 61(2): 392-397 2012.
[12] R. Hoseinnezhad, B.-N.Vo, B.-T. Vo, and D. Suter, “Visual tracking of
numerous targets via multi-Bernoulli filtering of image data,” Pattern
Recognition, 45(10): 3625-3635, 2012.
[13] W. -K. Ma, B.-N. Vo, S. Singh and A. Baddeley, “Tracking an unknown
time-varying number of speakers using TDOA measurements: A random
finite set approach,” IEEE Trans. Sign. Proc., Vol. 54, No. 9, pp. 32913304, 2006.
[14] R. Mahler, “Approximate multisensor CPHD and PHD filters,” Proc.
13 Int’l Conf. on Information Fusion, Edinburgh, Scotland, July 2629, 2010.
[15] R. Mahler, “A comparison of ‘clutter-agnostic’ PHD/CPHD filters,” in I.
Kadar (ed.), Signal Processing, Sensor Fusion, and Target Recognition
XXI, SPIE Proc. Vol. 8392, Baltimore, MD, April 23-27, 2012.
[16] R. Mahler, “Linear-complexity CPHD filters,” Proc. 13 Int’l Conf. on
Information Fusion, Edinburgh, Scotland, July 26-29, 2010.
[17] R. Mahler, “The multisensor PHD filter, I: General solution via
multitarget calculus,” in I. Kadar (ed.), Sign. Proc., Sensor Fusion, and
Targ. Recogn. XVIII, SPIE Proc. Vol. 7336, 2009.
[18] R. Mahler, “Multitarget filtering via first-order multitarget moments,”
IEEE Trans. Aerospace and Electronics Systems, Vol. 39, No. 4, pp.
1152-1178, 2003.
[19] R. Mahler, “PHD filters of higher order in target number,” IEEE Trans.
Aerospace and Electronic Systems, Vol. 43, No. 4, pp. 1523-1543, 2007.
[20] R. Mahler, Statistical Multisource-Multitarget Information Fusion,
Artech House, Norwood, MA, 2007.
[21] R. Mahler, “‘Statistics 101’ for Multisensor, Multitarget Data Fusion,”
IEEE Aerospace & Electronics Sys. Mag., Part 2: Tutorials, Vol. 19 No.
1, pp. 53-64, 2004.
[22] R. Mahler, “Toward a theoretical foundation for distributed fusion,”
Chapter 8 in D. Hall, M. Liggins II, C.-Y. Chong, and J. Linas (eds.),
Distributed Data Fusion for Network-Centric Operations, CRC Press,
Boca Raton, 2012.
[23] R. Mahler, “A unified approach to sensor and platform management,”
Proc. 2011 Nat’l Symp. on Sensor and Data Fusion, Washington D.C.,
October 24-26, 2011.
[24] R. Mahler and A. El-Fallah, “An approximate CPHD filter for superpositional sensors,” in I. Kadar (ed.), Signal Processing, Sensor Fusion, and
Target Recognition XXI, SPIE Proc. Vol. 8392, Baltimore, MD, April
23-27, 2012.
[25] R. Mahler and A. El-Fallah, “CPHD filtering with unknown probability
of detection,” in I. Kadar (ed.), Sign. Proc., Sensor Fusion, and Targ.
Recogn. XIX, SPIE Proc. Vol. 7697, 2010.
[26] R. Mahler and A. El-Fallah, “CPHD and PHD filters for unknown
backgrounds, III: Tractable multitarget filtering in dynamic clutter,” in
O. Drummond (ed.), Sign. and Data Proc. of Small Targets 2010, SPIE
Proc. Vol. 7698, 2010.
[27] R. Mahler and A. El-Fallah, “The random set approach to nontraditional
measurements is rigorously Bayesian,” in I. Kadar (ed.), Signal Processing, Sensor Fusion, and Target Recognition XXI, SPIE Proc. Vol. 8392,
Baltimore, MD, April 23-27, 2012.
[28] R. Mahler, B.-T. Vo, and B.-N. Vo, “CPHD filtering with unknown
clutter rate and detection profile,” IEEE Trans. Sign. Proc., Vol. 59,
No. 6, pp. 3497-3513, 2011.
[29] R. Mahler, B.-T. Vo, and B.-N. Vo, “Forward-backward probability
hypothesis density smoothing,” IEEE Trans. Electronic & Aerospace
Systems, Vol. 48, No. 1, pp. 707 - 728 , 2012.
[30] D. Moratuwage, B.-N. Vo, and Danwei Wang, “A hierarchical approach
to the multi-vehicle SLAM problem,” Proc. 15 Int’l Conf. on Information Fusion, Singapore, July 9-12, 2012.
[31] J. Mullane, B.-N. Vo, M. Adams, and B.-T. Vo, "A random-finite-set
approach to Bayesian SLAM," IEEE Trans. Robotics, 27(2): 268-282,
2011.
[32] J. Mullane, B.-N. Vo, M. Adams and B.-T. Vo, Random Finite Sets in
Robotic Map Building and SLAM, Springer, 2011.
[33] S. Nagappa, D. Clark, and R. Mahler, “Incorporating track uncertainty
into the OSPA metric,” Proc. 14th Int’l Conf. on Information Fusion,
Chicago, July 5-8, 2011.
[34] S. Nagappa and D. Clark, “On the ordering of the sensors in the iteratedcorrector probability hypothesis density (PHD) filter,” in I. Kadar (ed.),
Signal Processing, Sensor Fusion, and Target Recognition XX, SPIE
Proc. Vol. 8050, Orlando, FL, April 26-28, 2011.
[35] S. Nannuru, M. Coates, and R. Mahler, “Computationally-tractable
approximate PHD and CPHD filters for superpositional sensors,” IEEE
J. on Special Topics in Sign. Proc., (???)???: ???-???, 2013.
113
[36] Shanhung Wong, B-T. Vo, B.-N. Vo, and R. Hoseinnezhad, “MultiBernoulli based track-before-detect with road constraints,” Proc. 15
Int’l Conf. on Information Fusion, Singapore, July 9-12, 2012.
[37] D. Stoyan, W. S. Kendall, and J. Meche, Stochastic Geometry and Its
Applications, Second Edition, John Wiley & Sons, 1995.
[38] R. Streit, “Multisensor multitarget intensity filter,” Proc. 11th Int’l Conf.
on Information Fusion, pp. 1694-1701, Cologne, Germany, June 30-July
3, 2008.
[39] R. Streit and L. Stone, “Bayes derivation of multitarget intensity filters,”
Proc. 11th Int’l Conf. on Information Fusion, pp. 1686-1693, Cologne,
Germany, June 30-July 3, 2008.
[40] R. Tharmarasa, M. McDonald, T. Kirubarajan, “Passive tracking with
sensors of opportunity using passive coherent location,” in O.E. Drummond (ed.), Signal and Data Processing of Small Targets 2008, SPIE
Proc. Vol. 6969, 2008.
[41] M. Ulmke, O. Erdinc, and P. Willett, “Gaussian mixture cardinalized
PHD filter for ground moving target tracking,” Proc. 10 Int’l Conf.
on Information Fusion, Quebec City, Canada, July 9-12, 2007.
[42] B.-T. Vo and V.-N. Vo, “Labeled random finite sets and multi-object
conjugate priors,” submitted to IEEE Trans. Sign. Proc.
[43] B.-T. Vo and B.-N. Vo, “A random finite set conjugate prior and
application to multi-target tracking,” Proc. 2011 Int’l Conf. on Intelligent
Sensors, Sensor Networks, and Information Processing (ISSNIP2011),
Adelaide, Australia, 2011.
[44] B.-T. Vo, B.-N. Vo, and A. Cantoni, “The cardinality balanced multitarget multi-Bernoulli filter and its implementations,” IEEE Trans.
Aerospace and Electronic Sys.., Vol. 57, No. 2, pp. 409-423, 2009.
[45] B.-T. Vo, B.-N. Vo, R. Hoseinnezhad, and R. Mahler, “Multi-Bernoulli
filtering with unknown clutter intensity and sensor field-of-view,” Proc.
45 Conf. on Information Sciences and Systems (CISS2011), Baltimore,
MD, Mar. 23-25, 2011.
[46] B.-N. Vo, B.-T. Vo, and R. Mahler, “Closed-form solutions to forwardbackward smoothing,” IEEE Trans. Sign. Proc., 60(1): 2-17, 2012.
[47] B.-N. Vo, B.-T. Vo, N.-T. Pham, and D. Suter, “Joint detection and
estimation of multiple objects from image observations,” IEEE Trans.
Signal Processing, Vol. 58, No. 10, pp. 5129-5241, 2010.
[48] V. Volterra, Theory of Functionals and of Integral and IntegroDifferential Equations, (trans. M. Long), Blackie and Son, Ltd., London
and Glasgow, 1930.
R onald Mahler was born in Great Falls, MT, in
1948. He received the B.A. degree in mathematics from the University of Chicago, Chicago, IL,
in 1970, the Ph.D. in mathematics from Brandeis
PLACE
University, Waltham, MA, in 1974, and the B.E.E.
PHOTO
in electrical engineering from the University ofMinHERE
nesota, Minneapolis, in 1980. He was an Assistant
Professor of Mathematics at the University of Minnesota from 1974 to 1979. Since 1980, he has been
employed at Lockheed Martin, Eagan, MN, where
currently he is a Senior Staff Research Scientist at
Lockheed Martin Advanced Technology Laboratories. His research interests
include information fusion, expert systems theory, multitarget tracking, and
sensor management. He is the author, coauthor, or coeditor of over 70
publications, including 12 articles in refereed journals, two books, and a
hardcover conference proceedings. He received the 2004 and 2008 Author
of the Year Awards from Lockheed Martin MS2, the 2007 Mignogna Data
Fusion Award, the 2005 IEEE AESS Harry Rowe Mimno Award, and the
2007 IEEE AESS Barry Carlton Award.