Download WELL CALIBRATED, COHERENT FORECASTING SYSTEMS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of randomness wikipedia , lookup

Indeterminism wikipedia , lookup

Randomness wikipedia , lookup

Birthday problem wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
THEORY PROBAB. APPL.
Vol. 42, No. 1
WELL CALIBRATED, COHERENT FORECASTING SYSTEMS*
P. BERTI† , E. REGAZZINI‡ ,
AND
P. RIGO†
Abstract. This paper introduces a definition of predictive inference based on sequential observations in the light of de Finetti’s principle of coherence. It includes characterizations of coherent
predictive inferences and of strategic predictive inferences. It thoroughly analyzes the concepts of well
calibrated and finitarily well calibrated forecasting systems. Apropos to this subject, some new laws
of large numbers are assessed with reference to finitarily additive probability distributions. These
results are then used to critically confute some objections raised against the coherence principle.
Key words. calibration, coherence, conditional probability, conglomerability, extension, finite
additivity, predictive inference, strategy
PII. S0040585X97975988
According to a prevailing view, the goal of statistical inference is a statement
about the probability law governing a given observable phenomenon. Consequently,
statistical methods are conceived just as tools to single out that law. This point
of view presupposes that the existence of probability laws, as well as the existence
of laws governing observable phenomena, is viewed as an objective fact. Hence, the
above mentioned idea of statistical inference might clash with the pragmatic position
according to which the essential role of any scientific theory lies in making previsions
about future possible facts. On the other hand, the process of inferring values for
unknown observable facts, based on current observations and other information, is
immune to the criticism from those who refuse to assume the existence of “true”
probability laws (subjectivistic interpretation of probability). This kind of process,
which is known as predictive statistical approach, was shared by almost all the pioneers
of statistical induction; see [17]. It can be carried out directly, without passing through
the mediation of a family of parametric probability laws (statistical model), under the
sole guidance of de Finetti’s principle of coherence. Obviously, predictive inferences
can also be obtained after assigning a statistical model and a prior distribution on
parameters, according to the usual Bayesian procedures.
The present paper deals with predictive inferences, in the case of sequential observations. Consequently, we will have to suggest conditions on a sequence of conditional
expectations in order that all terms of the sequence may be considered admissible
as a whole. The resulting scheme and conditions have natural applications to concrete problems of forecasting. Moreover, we think that they could be used to set up
a general theoretical basis for some modern nonconventional inferential approaches
and, in particular, for Dawid’s prequential approach; see [8], [9]. We will deal with
the subject within de Finetti’s theory of probability, because we think that this is the
only one which does not surreptitiously introduce extrastatistical technical restrictions
and that, consequently, it is the most suitable as a basis for any discussion about the
logical foundation of statistical methodology.
*Received by the editors April 27, 1994. This work was partially supported by MURST (60%
1992, Inferenza statistica predittiva), MURST (40% 1992, Modelli probabilistici e statistica matematica), and Universita’ “L. Bocconi” (1991–1992).
http://www.siam.org/journals/tvp/42-1/97598.html
† Dipartimento di Statistica, “G. Parenti,” viale Morgagni 59, 50134 Firenze, Italy.
‡ IMQ-Universita’ “L. Bocconi,” via R. Sarfatti 25, 20136 Milano, Italy.
82
well calibrated, coherent forecasting systems
83
The present article consists of three sections. Section 1 contains some basic
statements of de Finetti’s concept of coherent prevision. In particular, it includes
criteria which enable one to decide on the coherence of real-valued functions defined
on suitable classes of bounded conditional random quantities. Section 2 deals with
the concept of predictive inference in the presence of observations made at instants
1, 2, . . . and when, at each instant n, one assesses inferences on future facts, on the
basis of the first n observations. In the same section, one defines and characterizes (via
the notion of conglomerability) strategic predictive inferences which, in a sense, are
finitely additive versions of the Ionescu–Tulcea argument in order to assess probability
measures on infinite dimensional spaces; cf. [21]. Finally, section 3 analyzes calibration
as a check of empirical validity of a given forecasting system. It includes some new
theorems about well calibration and finitary well calibration of strategic predictive
inferences. Among other things, it is shown that coherent predictive inferences need
not be well calibrated, and this statement is used to prove that de Finetti’s coherence
is exempt from the imperfections pointed out by Dawid in [7].
1. Preliminaries.
1.1. Events and random quantities. Given the space Ω of elementary cases,
each relevant event is viewed as a subset of Ω. In particular, Ω corresponds to the sure
event and ∅, the empty set, to the impossible one. We will adopt the useful convention
that the same symbol that designates an event also designates the indicator of that
event. Likewise, the same symbol that designates a class of events also designates the
class of the corresponding indicators. Consequently, if H designates a class of events
and L a class of real-valued functions on Ω, H ⊂ L will mean that the indicators of
the elements of H belong to L. Moreover, if L is the class of the indicators of all the
elements of an algebra of events, then we will also say that L is an algebra of events.
Any real-valued function g on Ω will be said to be a random quantity. Given a
random quantity (r.q.) g and an event H 6= ∅, the restriction of g to H, denoted
by g | H, is said to be a conditional (r.q.). In particular, if g = E, where E is the
indicator of some event, then E | H is also said to be a conditional event. Note that
if H = Ω, then g | H = g.
1.2. De Finetti’s coherence principle. A real-valued function P , defined on
a class C of bounded conditional r.q.’s, is considered as a candidate to represent an
expectation about the true realization of each element of C, if and only if P meets de
Finetti’s coherence principle. More precisely, according to de Finetti [12, Vol. 1], we
introduce the following.
Definition 1. Given a class C of bounded conditional r.q.’s, P : C → R is said
to be a prevision on C if it meets the coherence principle. In particular , if C is a class
of conditional events, then a prevision on C is also said to be a probability on C.
As far as the coherence principle is concerned, suppose that after assigning P
on C, one is committed to accepting any bet whatsoever on each element of C, with
arbitrary (positive or negative) stakes, on the understanding that any bet on g | H
is called off if H does not occur. In this framework, P (g | H) represents the price
of every bet on g | H. Precisely, what one gains from a combination of bets on
g1 | H1 , . . . , gn | Hn with stakes s1 , . . . , sn , respectively, is given by
G(gk | Hk , sk ; k = 1, . . . , n) =
n
X
k=1
sk Hk P (gk | Hk ) − gk | H0
84
p. berti, e. regazzini, and p. rigo
n
with H0 = ∪k=1 Hk . Then, P is said to be coherent if and only if the inequalities
inf G 5 0 5 sup G
n
hold for every choice of n ∈ N, {g1 | H1 , . . . , gn | Hn } ⊂ C, and (s1 , . . . , sn ) in R . In
other words, one has to fix a coherent P , if one wants nobody to make Dutch book
against him.
Definition 1 makes sense because, given any C, there exists at least one prevision
on C. Such a statement is a direct consequence of the following extension theorem,
whose proof — together with a concise treatment of the main consequences of the
coherence principle — can be found in [24].
∗
Theorem 1. Let C and C be classes of bounded conditional r.q.’s such that
∗
C ⊂ C and let P be a prevision on C. Then there exists a prevision (which need not
∗
∗
be unique) P on C for which
∗
P (g | H) = P (g | H)
for every g | H in C.
Throughout the rest of this paper, we shall be basically concerned with the case
in which the domain C of P is given by
(1)
C = {g | H: g ∈ L, H ∈ H}
∞
where H = ∪n=0 Πn , (Πn ) being a sequence of partitions of Ω such that Πn+1 is a
refinement of Πn for every n = 0 and Π0 = {Ω}; L is any class of bounded r.q.’s such
that L ⊃ H and gH ∈ L whenever g ∈ L and H ∈ H.
1.3. A few characterizations of probabilities and previsions. We begin
by mentioning a useful characterization of a prevision on C, whose proof can be found
in [1], whenever C coincides with (1).
Theorem 2. Let C be defined according to (1). Then P : C → R is a prevision if
and only if P meets the following conditions:
(p1 ) P (· | H) is a prevision on L for every H in H;
(p2 ) inf g | H 5 P (g | H) 5 sup g | H for every H in H and g in L;
(p3 ) P (gH1 | H2 ) = P (g | H1 ∩ H2 ) P (H1 | H2 ) for every g, H1 in L and H2 in
H such that H1 ⊂ Ω, gH1 ∈ L and H1 ∩ H2 ∈ H.
If L is a class of events, then (p1 )–(p3 ) can be restated as follows:
(π1 ) P (· | H) is a probability on L for every H in H;
(π2 ) P (A | H) = 1 whenever H ∈ H, A ∈ L and H ⊂ A;
(π3 ) P (A ∩ H1 | H2 ) = P (A | H1 ∩ H2 ) P (H1 | H2 ) provided that A, H1 and
A ∩ H1 belong to L and H1 ∩ H2 , H2 are elements of H.
The following propositions provide characterizations of unconditional previsions
in case L has some suitable structure; for their proof, see [24].
Theorem 3. Let L be a linear space of bounded r.q.’s including the constants.
Then Q: L → R is a prevision if and only if it turns out to be a positive, linear
functional on L such that Q(Ω) = 1.
Theorem 4. Let L be an algebra of events. Then Q: L → R is a probability
if and only if it turns out to be a non-negative-valued , additive function such that
Q(Ω) = 1.
1.4. Basic differences between Kolmogorov’s and de Finetti’s theories. As already noted, in the present paper we essentially deal with classes of random
elements of the type of (1), but it should be remembered that, thanks to Theorem 1,
well calibrated, coherent forecasting systems
85
previsions can be assessed on arbitrary classes of conditional bounded r.q.’s. In particular, one can fix a conditional prevision without having preassigned any unconditional
probability law. In other words, the coherence principle suffices to state whether a
real-valued function, defined on any class of bounded conditional r.q.’s, can be considered as a prevision or not. Consequently, conditional probabilities, in de Finetti’s
theory, need not be evaluated as derivatives of probability measures with respect to
probability measures. Moreover, coherent probabilities (previsions, respectively) need
not be continuous with respect to monotone sequences of events (r.q.’s, respectively).
Apart from the previous remarks which, in any case, stress some peculiarities of de
Finetti’s approach compared with Kolmogorov’s, the main difference being that de
Finetti explicitly considers conditional r.q.’s and events, given a single event and, in
many cases, such an event represents an isolated given hypothesis whose probability
equals zero. In fact, such cases appear frequently enough in probability and statistical
practice; cf. [12, p. 276 of the English translation] and [23, p. 66]. On the contrary,
Kolmogorov [20, p. 51 of the English translation] asserts “the concept of conditional
probability with regard to an isolated given hypothesis whose probability equals zero
is inadmissible.” Consequently, in Kolmogorov’s theory, conditioning is considered
with respect to specific classes of events.
2. Predictive inferences.
2.1. Definition of predictive inference. Roughly speaking, predictive inference is any family of consistent conditional expectations on a class of r.q.’s, given some
facts relating to the realizations of other random elements. As an example, we might
consider a predictor (meteorologist, economist, bookmaker, etc.) who makes regular
periodic forecasts on the basis of past data. In fact, forecasting is a significant case
of predictive inference, which will be of help to us to explain our point of view in a
concrete way.
We start by making precise the domain of a predictive inference, in the case of
sequential observations. This means that observations are made at instants 1, 2, . . .
and, at each instant n, one assesses inferences on future facts, on the basis of the first
n observations.
Let X denote the set of all possible outcomes of each element of a sequence of
∞
trials. According to Dubins and Savage [15] call the elements of Ω := X histories, and
n
call the elements of X partial histories (n = 1, 2, . . .). Hence, the set Πn defined by
(2)
Πn =
n
o
∞
n
(x1 , . . . , xn ) × X : (x1 , . . . , xn ) ∈ X
is the partition of Ω whose elements can be identified with partial histories of length
n. Then, the domain of a predictive inference in case of sequential observations will
∞
be defined to be the class C, defined by (1), with Ω = X and Πn as in (2).
In the framework just described, predictive inferences are assigned without any
reference to some statistical model involving unknown parameters. Anyhow, it is
a common statistical practice to derive previsions on C from joint assignments of a
parametric statistical model and of a prior distribution on its unknown parameters.
This situation will be investigated in a forthcoming paper. Here, we just take into
account predictive inferences, free from parametric superstructures.
∞
Definition 2. Given C as in (1), with Ω = X and Πn as in (2), any prevision P
on C is said to be a predictive inference (p.i.).
We now provide a characterization for p.i.’s based on Theorem 2.
86
p. berti, e. regazzini, and p. rigo
After denoting the indicator of {(x1 , . . . , xn )}×X
is a p.i., if and only if,
∞
with I(x1 , . . . , xn ), P : C → R
P (·), P (· | x1 , . . . , xn ) are previsions on L,
P I(x1 , . . . , xn ) | x1 , . . . , xn = 1,
(iii) P g I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk = P (g | x1 , . . . , xk , . . . , xn )
× P I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk ,
(iv) P g I(x1 , . . . , xn ) = P (g | x1 , . . . , xn ) P I(x1 , . . . , xn )
(i)
(ii)
n
for every g in L, (x1 , . . . , xk , . . . , xn ) in X and for all naturals k, n with k 5 n.
In the above proposition, as well as throughout the rest of this paper, P (·) stands
∞
for P (· | Ω) and P (· | x1 , . . . , xn ) for P (· | {(x1 , . . . , xn )} × X ).
2.2. Example of predictive inference within a Bayesian framework. Let
∞
L = {A1 ×· · ·×An ×X : A1 , . . . , An ∈ A, n ∈ N} where A is some algebra of subsets
of X including the singletons. In compliance with the Bayesian standard procedure,
suppose that for each θ belonging to a specific parameter space Θ, a prevision Pθ (·)
is assigned on L in such a way that
n Z
Y
∞
l(x, θ) λ(dx)
Pθ (A1 × · · · × An × X ) =
i=1
Ai
where λ denotes
a measure on A (finitely additive, possibly) and l: X×Θ −→[0, +∞) is
R
such that l(x, θ) λ(dx) = 1 for every θ ∈ Θ. Throughout the present paper, integrals
are to be meant in the sense of Dunford and Schwartz; cf. [16] and [4]. Moreover, let
us assume that
Pθ (A1 × · · · × An × X
∞
k
∞
| x1 , . . . , xk ) = Pθ (X × Ak+1 × · · · × An × X )
if n > k and xi ∈ Ai for i = 1, . . . , k,
∞
| x1 , . . . , xk ) = 1
∞
| x1 , . . . , xk ) = 0,
Pθ (A1 × · · · × An × X
if n 5 k and xi ∈ Ai for i = 1, . . . , n, and
Pθ (A1 × · · · × An × X
otherwise.
These positions are consistent with the introduction of a sequence of X-valued
random elements which, under Pθ (·), are independent and identically distributed. In
view of (i)–(iv) of subsection 2.1, Pθ := {Pθ (·), Pθ (· | x1 , . . . , xn ): (x1 , . . . , xn ) ∈
n
X , n ∈ N} is a p.i. for every θ. Let us now introduce a countably additive prior
probability q on a σ-algebra B of subsets of Θ, and assume that θ → l(x, θ) is a
B-measurable function for every fixed x ∈ X. Consequently, if
Z Y
n
l(xi , θ) q(dθ) ∈ (0, +∞)
i=1
for each partial history (x1 , . . . , xn ), then
R Qn
l(xi , θ) q(dθ)
q(H | x1 , . . . , xn ) = RH Qni=1
,
i=1 l(xi , θ) q(dθ)
Θ
H∈B
well calibrated, coherent forecasting systems
87
represents a coherent posterior; cf. [2]. At this stage, a p.i. P can be assessed according to
Z
P (g) = Pθ (g) q(dθ),
Z
P (g | x1 , . . . , xn ) = Pθ (g | x1 , . . . , xn ) q(dθ | x1 , . . . , xn )
n
for all g in L, (x1 , . . . , xn ) in X and n ∈ N.
Indeed, since Pθ is a p.i. for every θ, P satisfies (i) and (ii). Further, (iii) trivially
holds whenever P (I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk ) = 0, so that one can assume that
P I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk
Z Y
n
n
Y
=
λ {xi }
l(xi , θ) q(dθ | x1 , . . . , xk )
i=k+1
i=k+1
is strictly positive. In that case, for any g ∈ L
P gI(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk
Z
= Pθ g I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk q(dθ | x1 , . . . , xk )
=
n
Y
λ {xi }
Z
Pθ (g | x1 , . . . , xk , . . . , xn )
i=k+1
n
Y
l(xi , θ) q(dθ | x1 , . . . , xk )
i=k+1
Z
= P I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk
Pθ (g | x1 , . . . , xk , . . . , xn )
Qn
l(xi , θ) q(dθ | x1 , . . . xk )
R
× Qi=k+1
= P I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk
n
l(x
,
θ)
q(dθ
|
x
,
.
.
.
x
)
i
1
k
Z i=k+1
× Pθ (g | x1 , . . . , xk , . . . , xn ) q(dθ | x1 , . . . , xk , . . . , xn ).
Since the last integral represents P (g | x1 , . . . , xk , . . . , xn ), we have stated that (iii)
comes true. Finally, since (iv) can be checked in the same way, P turns out to be a p.i.
2.3. Definition and existence of strategic predictive inferences. This
subsection deals with special kinds of p.i.’s, which satisfy the usual disintegrability
condition characterizing Kolmogorovian conditional expectations.
According to Dubins and Savage [15], a strategy is a sequence σ = (σ0 , σ1 , . . .) in
which σ0 is a probability on P(X), the power set of X, and, for every n in N, σn is a
n
function on X which associates a probability on P(X), denoted by σn (x1 , . . . , xn ) to
every partial history (x1 , . . . , xn ). For any B ⊂ X, σn (x1 , . . . , xn ) (B) can be viewed
∞
as the conditional probability, under σ, of {xn+1 ∈ B}, given {(x1 , . . . , xn )} × X .
If P is a p.i. and L includes
∞
n
∞
B := B × X , X × B × X : B ⊂ X, n ∈ N ,
then the strategy σ given by
∞
σ0 (B) = P (B × X ),
n
σn (x1 , . . . , xn ) (B) = P (X × B × X
is said to be the strategy induced by P .
∞
| x1 , . . . , xn )
88
p. berti, e. regazzini, and p. rigo
Definition 3. If L ⊃ B, a p.i. P is said to be strategic if
Z
P (g) = P (g | x) σ0 (dx),
Z
(3)
P (g | x1 , . . . , xn ) = P (g | x1 , . . . , xn , x) σn (x1 , . . . , xn ) (dx)
n
for every g in L, (x1 , . . . , xn ) in X and n in N, where σ is the strategy induced by P .
As already noted, equations (3) look like the usual conditions requested for conditional expectations in Kolmogorov’s theory. However, strategic p.i.’s need not be
continuous with respect to monotone sequences of r.q.’s.
We also note that, in order to check whether an arbitrary function P : C → R
is a strategic p.i., it suffices to verify (3) together with (i) and (ii) of subsection 2.1;
indeed, under (i) and (ii), (iii) and (iv) follow from (3).
As far as the existence of strategic p.i.’s is concerned, we have the following
∗
proposition involving the class L of the inductively integrable functions on Ω; see [15].
∗
Theorem 5. If B ⊂ L ⊂ L and σ is a strategy, then there exists a unique
strategic p.i. P , such that σ is the strategy induced by P .
Proof. This proof heavily relies on pages 12–20 of [15]. In particular, we refer to
[15] for the notation, the notion of structure and the existence and properties of the
function E(·, ·). Define
P (g) := E(σ, g),
P (g | x1 , . . . , xn ) := E σ[x1 , . . . , xn ], gx1 . . . xn )
n
for every g in L, (x1 , . . . , xn ) in X and n in N. Then, P is a strategic p.i. Indeed,
according to Theorems 1 and 2, pages 17–18 of [15], P satisfies (3) and (i)–(ii) of
0
subsection 2.1. Next, let P be any strategic p.i. inducing σ. We are showing that
0
0
P = P . If g is constant, then P (g) = P (g). Likewise, for every n ∈ N, (x1 , . . . , xn ) ∈
n
0
X and g ∈ L, if gx1 . . . xn is constant, then P (g | x1 , . . . , xn ) = P (g | x1 , . . . , xn ). Fix
0
an ordinal γ > 0, and suppose that P (g) = P (g) for every g ∈ L with structure less
0
n
than γ, and P (g | x1 , . . . , xn ) = P (g | x1 , . . . , xn ) for every n ∈ N, (x1 , . . . , xn ) ∈ X
and g ∈ L such that gx1 . . . xn is of structure less than γ. Let f ∈ L be of structure
0
γ. Then, for every x ∈ X, f x is of structure less than γ. Hence, P (f | x) = P (f | x),
so that strategicity implies
Z
Z
0
0
P (f ) = P (f | x) σ0 (dx) = P (f | x) σ0 (dx) = P (f ).
Similarly, if f ∈ L and f x1 . . . xn is of structure γ, strategicity implies P (f |
0
x1 , . . . , xn ) = P (f | x1 , . . . , xn ), and this concludes the proof.
∗
One can notice that, with respect to significant statistical applications, L is
large enough. For instance, if g is bounded and depends on a fixed finite number of
∗
∗
coordinates, then g belongs to L . Moreover, L includes bounded functions depending
on a random number of coordinates, say τ : Ω → N, provided that τ is a stopping
time.
Some significant characteristic aspects of strategic p.i.’s stand out of the following
proposition, which involves a condition of conglomerability; see [10], [11], and [14].
Theorem 6. If L is a linear space such that L ⊃ B, then P : C → R is a strategic
p.i., if and only if , (i), (ii) in subsection 2.1, and
inf P (g | x) 5 P (g) 5 sup P (g | x),
(4)
x∈X
x∈X
inf P (g | x1 , . . . , xn , x) 5 P (g | x1 , . . . , xn ) 5 sup P (g | x1 , . . . , xn , x)
x∈X
x∈X
well calibrated, coherent forecasting systems
89
n
hold for every g ∈ L, (x1 , . . . , xn ) in X and n in N.
Proof. The “only if” part is trivial, so that it suffices to prove that (i), (ii), and
(4) imply (3). But, under (i) and (ii), the first of conditions (4) means that P (·) is
∞
conglomerable with respect to the partition {{x} × X }, and the second means that
n
∞
P (· | x1 , . . . , xn ) is conglomerable with respect to the partition {X × {x} × X }.
Thus, since L is a linear space and L ⊃ B, Theorem 3.1 of [3] implies that (4) is
equivalent to (3).
2.4. Example of strategic predictive inference. Let X = R. According to
[18], [19], in order to describe a situation of extremely vague a priori knowledge, a set
of reasonable assumptions are:
(a) P ({ω: xi 6= xj for i 6= j and i, j 5 n}) = 1;
∞
∞
(b) P (Bj1 × · · · × Bjn × R ) = P (B1 × · · · × Bn × R ) for every permutation
(j1 , . . . , jn ) of (1, . . . , n), and every n-tuple of intervals of the form Bi = (−∞, xi ];
n
∞
(c) P (R ×Ii ×R | x1 , . . . , xn ) = 1/(n+1) for every partial history (x1 , . . . , xn )
without ties, Ii denoting the open interval (x(i−1) , x(i) ), i = 1, . . . , n + 1, with x(0) =
−∞, x(n+1) = +∞ and x(1) , . . . , x(n) the order statistics of x1 , . . . , xn .
The set of conditions (a), (b), and (c), assessed for a particular n, represents
Hill’s An -model.
We now single out a strategic p.i. satisfying An for every n. Following [6] and
[19], one assigns any strategy σ such that
1
σ0 (−∞, x) = σ0 (x, +∞) =
for every x in R,
2
!
n
X
1
σ0 +
dxi
and σn (x1 , . . . , xn ) =
n+1
i=1
where dx is a probability for which dx ((x − ε, x)) = dx ((x, x + ε)) = 12 for every ε > 0.
In view of Theorem 5, σ admits a unique extension P as a strategic p.i. and, moreover,
it is easy to show that P meets An , for every n.
Clearly, the previous P is not countably additive. In fact, in [18] one shows that
An is incompatible with countable additivity in the framework of Kolmogorov’s theory
of conditional expectations. We now show that if strategicity is not required then a
countably additive p.i., agreeing with An , exists. By “a countably additive p.i.,” we
mean a p.i. P such that P (·) and every P (· | x1 , . . . , xn ) are countably additive on L,
or at least on some relevant subalgebra of L.
∞
n
0
0
Indeed, let L = {B × R : B Borel set in R , n ∈ N}, and let {P (·), P (· |
x1 , . . . , xn )} be any family of countably additive probabilities on L satisfying (ii) of
subsection 2.1 and
0
0
P I(x1 , . . . , xk ) = P I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk = 0
n
0
for every (x1 , . . . , xk , . . . , xn ) in X and every k < n. In view of (i)–(iv), P is a p.i.
0
Furthermore, since the single probabilities composing P can be chosen independently
of one another, it is plain that they can be chosen in such a way that An holds for
every n.
3. Well calibrated predictive inferences.
3.1. Definition of well calibrated predictive inference. After stating the
definition of predictive inference, we now deal with one of the checks of empirical
validity, which one usually applies to appreciate the “correctness” of the forecasting
90
p. berti, e. regazzini, and p. rigo
system. Indeed, the present section is concerned with the concept of well calibrated
inference.
∞
n
Throughout this section one assumes that L = {S × X : S ⊂ X , n ∈ N}, the
set of all cylinders with finite-dimensional base, and that a p.i. P has been assessed
on the corresponding class C. Let us introduce sequences (En ) and (hn ) such that
∞
E1 = B1 × X ,
∞
E2 = X × B2 × X ,
2
∞
E3 = X × B3 × X , . . .
with Bn ⊂ X for each n, and hn : Ω → {0, 1} depends only on the first (n − 1)
coordinates of each history ω ∈ Ω, whenever n > 1, and h1 is constant.
Then, for every ω = (x1 , x2 , . . .), let us put
νn (ω) =
n
X
hi (ω),
n = 1, 2, . . . ,
i=1



n
1 X
hi (ω) Ei (ω)
if νn (ω) 6= 0,
pn (ω) = νn (ω)
i=1


0
if νn (ω) = 0,

n
X

 1
hi (ω) P (Ei | x1 , . . . , xi−1 ) if νn (ω) 6= 0,
πn (ω) = νn (ω)
i=1


0
if νn (ω) = 0,
where P (Ei | x1 , . . . , xi−1 ) := P (E1 ) if i = 1.
Here is a real situation in which hn , νn , pn , and πn acquire a concrete meaning.
If a meteorologist, every evening, assigns a probability to the event of precipitation
within the next 24 hours, then En can be regarded as the event of precipitation on
day n. Under these circumstances, the following is a plausible choice for hn : after
fixing (p, δ) in [0, 1] × (0, +∞), set
n
(5)
hn (ω) = 1 if |P (En | x1 , . . . , xn−1 ) − p| < δ,
0 otherwise.
In other words, hn (n = 2) selects day n if and only if the conditional probability of
precipitation on day n, given the partial history (x1 , . . . , xn−1 ), is suitably close to p.
Consequently, νn represents the number of days in {1, . . . , n} in which the forecaster
assigns an inference suitably close to p, πn is the mean of the inferences assessed on
those very same days and pn the corresponding frequency of precipitation.
A possible requirement for a p.i. is that pn and πn have the same asymptotic
behavior. Clearly, this requires attaching a probability to the events
E = ω: νn (ω) −→ +∞ as n → +∞ ,
F = ω ∈ E: pn (ω) − πn (ω) 6−→ 0 as n → +∞ ,
which usually are not finite-dimensional cylinders. Consequently, the p.i. P , assessed
on C, must be extended.
0
Formally, according to [7], if P is a p.i. on {g | H: g ∈ L ∪ {E, F }, H ∈ H},
0
0
agreeing with P on C, and if P (E) > 0, then P is said to be well calibrated when
(6)
0
P (F ) = 0.
0
Note that (6) trivially holds whenever P (E) = 0.
well calibrated, coherent forecasting systems
91
Even if the previous definition seems to be intuitively appealing, it cannot be
considered as an operational one, since no human being can, in general, decide whether
an inference is well calibrated or not. This state of things depends on the fact that
sets E and F involve limit conditions which, in general, cannot be observed. Hence,
as a property of statistical procedures, well calibration is concretely unimportant. At
the very most, it expresses an unconfirmable belief, which as we will show shortly, is
usually independent of the specification of P .
These remarks lead us to introduce a different notion of well calibration, based on
a finitary, observable condition, which will be denominated finitary well calibration.
More precisely: A p.i. P on C is said to be finitarily well calibrated if
n
o
inf lim inf P ω: max pj (ω) − πj (ω) 5 ε, νn (ω) > c
ε,c>0 n k
n5j 5n+k
(7)
= inf lim P ω: νn (ω) > c
c>0 n
whenever inf c>0 limn P ({ω: νn (ω) > c}) > 0.
Condition (7) is a bit cumbersome but, unlike (6), can be tested just by relying
on the p.i. P actually assessed on C, without selecting any of its extensions. Of
course, the need for (7) is less pressing whenever one is forced to adopt a particular
extension, for instance, if one is committed to fixing a countably additive P and,
0
then, to extending it in a countably additive way. In this connection, note that if P
is countably additive, then (6) and (7) are equivalent. Note also how the requirement
0
that inf c>0 limn P ({ω: νn (ω) > c}) > 0, like its nonfinitely counterpart P (E) > 0, is
logically compelling in order to make (7) meaningful. Examples 2 and 3 will show that
neither well calibration implies finitary well calibration nor finitary well calibration
implies well calibration.
3.2. Well calibration and Kolmogorov’s theory. In this subsection, P is
0
assumed to be strategic and σ denotes the strategy induced by P . Moreover, P (·)
0
and P (· | x1 , . . . , xn ) stand for the so called Lebesgue-like extensions of P (·) and P (· |
x1 , . . . , xn ), respectively. We send back to [13] and [22] for the definition and properties
0
of Lebesgue-like extensions. For our purposes it suffices to note that, setting C = {g |
0
0
0
H: g ∈ L , H ∈ H}, where L is the σ-algebra generated by L, P is a strategic p.i. on
0
C . A further remark is that, under the usual assumptions of countable additivity and
0
measurability, P coincides with the countably additive extension of P . More precisely,
let D be a σ-algebra of subsets of X and let us suppose that σ0 and every σn (x1 , . . . , xn )
are countably additive when restricted to D; (x1 , . . . , xn ) −→ σn (x1 , . . . , xn )(B) is a
n
n
D -measurable function on X for every n and B ∈ D.
∗∗
Then, P (·) and every P (· | x1 , . . . , xn ) are countably additive on L := {S ×
∞
n
∞
∗∗
0
X : S ∈ D , n ∈ N}, and also, denoting by D the σ-algebra generated by L , P (·)
0
∞
and P (· | x1 , . . . , xn ) coincide on D with the countably additive extensions of P (·)
∞
and P (· | x1 , . . . , xn ) to D .
0
To sum up, P is a strategic extension of P and, in case P is assessed in line with
0
the Kolmogorov theory, P is just the Kolmogorovian extension of P . We now show
0
that, in addition, well calibration holds with respect to P .
(What follows is an attempt to make the notation less cumbersome. Whenever a function g on Ω depends only on the first n coordinates of each history
ω = (x1 , . . . , xn , . . .), g(x1 , . . . , xn ) is sometimes used instead of g(ω).)
0
Theorem 7. Let P be a strategic p.i. on C, and let P be its Lebesgue-like exten0
sion. Then, P (F ) = 0.
92
p. berti, e. regazzini, and p. rigo
Proof. With reference to, and using the same notation as, Theorem 6.2 in [5,
p. 353], let us define Yn := hn En and
Mn (x1 , . . . , xn−1 ) := P (Yn | x1 , . . . , xn−1 ) = hn (x1 , . . . , xn−1 ) P (En | x1 , . . . , xn−1 ),
2
Vn (x1 , . . . , xn−1 ) := P (Yn − Mn ) | x1 , . . . , xn−1 .
Pn
P∞
2
Let
i=1 Vi (ω) = +∞}. Since (Yn − Mn ) 5 hn , one obtains
i=1 Vi 5
Pn A = {ω:
h
=
ν
.
Thus,
setting
n
i=1 i
an,ε (ω) =
n
X
i=1
!1/2
Vi (ω)
log
n
X
!1/2+ε
Vi (ω)
,
ε > 0,
i=1
then an,ε (ω) 5 νn (ω) whenever ω ∈ A and n is sufficiently large. Now, Chen’s thePn
0
c
−1 Pn
orem states that i=1 (Yi − Mi ) converges, P (·)-a.s. on A , and (an,ε )
i=1 (Yi −
Pn
0
Mi ) −→ 0, P (·)-a.s. on A. Hence, noting that i=1 (Yi − Mi ) = νn (pn − πn ), one
obtains
−1
0
0 ω ∈ E ∩ A: an,ε (ω)
νn (ω) pn (ω) − πn (ω) −→ 0
P (E) = P
0 c
ω ∈ E ∩ A : νn (ω) (pn (ω) − πn (ω) converges
+P
0 ω ∈ E ∩ A: pn (ω) − πn (ω) −→ 0
5P
0 c
0
ω ∈ E ∩ A : pn (ω) − πn (ω) −→ 0 = P (E\F ).
+P
In view of the previous remarks, Theorem 7 directly yields the following proposition, due to Dawid [7].
0
0
P (F ) = 0, whenever P is assessed according to the Kolmogorov theory of conditional probability.
In order to appreciate the contribution of de Finetti’s theory to the analysis of
the concept of well calibrated inference, let us quote some of the comments made by
Dawid on the above proposition.
0
Even if P (F ) = 0, “in practice . . ., it is rare for probability forecasts to be well
calibrated (so far as can be judged from finite experience) and no realistic forecaster
would believe too strongly in his own calibration performance. We have a paradox: an
event can be distinguished (easily, and indeed in many ways) that is given subjective
probability one and yet is not regarded as “morally certain”” [7, p. 608].
Subsequently, Dawid concludes that the previous alleged paradox has “destructive
implications for the theory of coherence” [7, abstract].
In fact, this line of reasoning rests on a misunderstanding: Dawid thinks that
the only coherent p.i.’s are those assessed in conformity with Kolmogorov’s notion of
conditional probability. Hence, he considers the thesis of the corollary to Theorem 7
as the only one compatible with the coherence principle.
It is then about time that one carefully examines what really happens, regarding
well calibration, within the theory of coherence.
3.3. Well calibration and the theory of coherence. We begin by showing
that, under very weak assumptions, a coherent forecaster can either expect to be well
calibrated or, instead, give strictly positive probability to the event F of miscalibration.
0
Theorem 8. Let P be a p.i. on C, and let P be any coherent extension of P to
{g | H: g ∈ L ∪ {E}, H ∈ H}. If
well calibrated, coherent forecasting systems
93
(j) F ∩ A 6= ∅, (E\F ) ∩ A 6= ∅ for every A in L with A ∩ E 6= ∅, then, given
0
0
00
00
α ∈ [0, P (E)], there is a further coherent extension of P , P , such that P (F ) = α.
0
0
Moreover , P can be taken such that P (E) = 1, whenever
(jj) E ∩ A 6= ∅ for every A in L\{∅}.
Proof. We need the following claim.
Claim. Let A be an algebra of subsets of Ω, B ∈ A and V ⊂ B. If D ∩ V 6= ∅
whenever D ∈ A and D ∩ B 6= ∅, then every probability µ on A can be extended to
0
0
a probability µ on A ∪ {V } in such a way that µ (V ) = µ(B).
∗
In fact, for C ⊂ Ω let µ (C) = inf{µ(H): C ⊂ H ∈ A}. Then, since no proper
∗
subset of B, which belongs to A, can cover V , it must be µ (V ) = µ(B). Hence, the
claim follows from Theorem 3.3.3 of [4, p. 73].
Now, let A be the algebra generated by L ∪ {E}, G the algebra generated by
0
A ∪ {F }, and Q any coherent extension of P to {g | H: g ∈ A, H ∈ H}. We
∗
are showing that there are two p.i.’s Q1 and Q2 , extending Q to C = {g | H: g ∈
0
G, H ∈ H}, such that Q1 (F ) = 0 and Q2 (F ) = P (E). After proving this, it is an
0
easy consequence of the principle of coherence that, for every α ∈ [0, P (E)], there is
00
∗
00
a p.i. P on C , extending Q and such that P (F ) = α.
Let us start with Q1 . By (j) and the Claim (applied with B = E and V = E\F ),
there is a probability on G, say S(·), extending Q(·) and such that S(F ) = 0. Let Z be
the set of partial histories (x1 , . . . , xn ) such that Q(I(x1 , . . . , xn )) = 0. Being a “part”
of a p.i., {Q(· | x1 , . . . , xn ): (x1 , . . . , xn ) ∈ Z} is a coherent family of probabilities on
A. Hence, by Theorem 1, it can be extended to a coherent family of probabilities on
∗
G, say {S(· | x1 , . . . , xn ): (x1 , . . . , xn ) ∈ Z}. Let Q1 be the function on C defined
by Q1 (·) = S(·), Q1 (· | x1 , . . . , xn ) = S(· | x1 , . . . , xn ) for (x1 , . . . , xn ) ∈ Z, and
/
Q1 (g | x1 , . . . , xn ) = S(gI(x1 , . . . , xn ))/Q(I(x1 , . . . , xn )) for g ∈ G and (x1 , . . . , xn ) ∈
∗
Z. By (i)–(iv) of subsection 2.1, Q1 is easily seen to be a p.i. on C . Moreover, the
existence of Q2 can be shown precisely as that of Q1 , after interchanging F with E\F .
This proves the first part of the theorem. As to the remaining part, it can be argued
following the same line of reasoning used in the first.
Condition (j) of Theorem 8 states that F and E\F are tail subsets of E, in
the sense that a point of E belongs to F or to E\F independently of its first n
coordinates, whatever n may be. Furthermore, Theorem 8 definitively dispels the
doubts that Dawid raised about the value of the theory of coherence. As a matter
of fact, coherence does not compel one to ignore the event of miscalibration. This
situation, as the following example witnesses, can occur even within very common
statistical contexts.
Example 1. Let X = R and let σ be any strategy such that, when restricted
to {(−∞, x]: x ∈ R}, σ0 coincides with a normal law with parameter (0, 2) and
σn (x1 , . . . , xn ) coincides
with a normal law with parameter ((n/(n+1)) xn , (n+2)/(n+
Pn
1)), where xn = i=1 xi /n. The above σ is consistent with the model of a sequence
of independent N (θ, 1) random variables, θ being a N (0, 1) random variable.
Let P be the strategic p.i. induced by σ on C, let Bn = (−∞, 0], and let hn be
defined as in (5) with p = 12 and δ any positive number. For ω = (x1 , . . . , xn , . . .), let
Pn
0
xn (ω) = i=1 xi /n. If ω is such that xn (ω) → 0, then ω belongs to E. Moreover, if ω
0
coincides with ω up to a finite number of coordinates, then ω ∈ E. Hence condition
(jj) holds. Fix A ∈ L with A ∩ E 6= ∅, and let (y1 , . . . , ym ) be such that the history
∞
(y1 , . . . , ym , ω) := (y1 , . . . , ym , x1 , x2 , . . .) is in A for all ω ∈ X . Taking ω such that
xi > 0 and xn (ω) → 0, (y1 , . . . , ym , ω) ∈ A ∩ F . Likewise, choosing ω such that
xi > 0, xi+1 < 0 and xn (ω) → 0 sufficiently fast, (y1 , . . . , ym , ω) ∈ A ∩ (E\F ). Hence
94
p. berti, e. regazzini, and p. rigo
condition (j) holds, too. In view of Theorem 8, coherent extensions of P can be found,
say P1 and P2 , such that P1 (E) = P2 (E) = 1, P1 (F ) > 0 and P2 (F ) = 0. Plainly, P2
is well calibrated and P1 is not.
The following proposition provides conditions for a p.i. to be finitarily well calibrated. In spite of the formal analogy with Theorem 7, such a proposition is definitely
more consistent with the finitary nature of real statistical problems. In fact, it does
not assume any hypothesis on possible infinite-dimensional extensions of P . Instead,
Theorem 7 is proved under the adoption of a particular infinite-dimensional extension
of P . On the other hand, the proof of the new proposition involves some fine technical
devices, which the proof of Theorem 7 does not actually require.
Theorem 9. Let P be a strategic p.i. on C, such that
lim P ω: νn (ω) > c = 1
n
for every c > 0. Then, P is finitarily well calibrated.
Proof. Since the proof we are able to show is quite long, let us first state three
claims whose proof will be postponed anyway. Moreover, since P is strategic, according
to Theorem 5 it can be uniquely extended as a strategic p.i. on {g | H: g is a bounded
r.q. depending only on a fixed finite number of coordinates, H ∈ H}. Let us denote
such an extension again with P .
For ω = (x1 , x2 , . . .) and i ∈ N, set ψi (ω) = Ei (ω) − P (Ei | x1 , . . . , xi−1 ),
yi (ω) = hi (ω)/νi (ω) if νi (ω) > 0, and yi (ω) = 0, otherwise. Further, let
 n
X
yi (ω) ψi (ω) if j 5 n,
Mjn (ω) =
 i=j
0
otherwise.
Note that, for r 5 q 5 s, Mrs = Mrq + Mq+1,s .
Claim 1. For 1 5 r < s 5 n, P (yr ψr ys ψs ) = 0.
Claim 2. For every m, n ∈ N with m 5 n and every a > 0,
!
m
∞
[
mX 1
5 2
|Mjn | > a
P
2.
a i=1 i
j=1
Claim 3.
n = m = n0
For every ε, δ > 0 there is n0 such that for each m, n, k ∈ N with
P
|Mj,n+i | 5 ε, m 5 j
5
n + i, 0 5 i 5 k
> 1 − δ.
We now prove that Claims 2–3 imply Theorem 9. Let Un (ω) = pn (ω) − πn (ω) for all
n ∈ N and ω ∈ Ω. Since νn → +∞ in probability, it suffices to show that
∀ ε, δ > 0 ∃ n1 ∈ N: n = n1 =⇒ P |Un | 5 ε, . . . , |Un+k | 5 ε > 1 − δ,
∀ k ∈ N. Note that Un = 0 whenever νn = 0, and otherwise,
Un =
n
n
n
n
n
i
1 X
1 XX
1 X X
1 X
νi yi ψi =
hj y i ψ i =
hj
yi ψi =
hj Mjn .
νn i=1
νn i=1 j=1
νn j=1 i=j
νn j=1
Hence, after fixing ε, c > 0 and n, m, k ∈ N with m 5 n, one obtains
P |Un | 5 ε, . . . , |Un+k | 5 ε
well calibrated, coherent forecasting systems
95
m
(
)!
k
n+i
\
X
1 X
=P
hj Mj,n+i +
hj Mj,n+i 5 ε
νn+i j=1
i=0
j=m+1
)!
(
k
m
n+i
ε
ε
\
1 X
1 X
=P
hj Mj,n+i 5 ,
hj Mj,n+i 5
2 νn+i 2
νn+i j=1
i=0
j=m+1
)!
(
k
m
ε
\
1 X
=P
hj Mj,n+i 5
2
ν
n+i
i=0
j=1
)!
(
k
n+i
ε
\
1 X
−1
hj Mj,n+i 5
+P
2
νn+i j=m+1
i=0
)!
(
m
k
\
ε 1 X =P
Mj,n+i 5 , Mj,n+i 5 c, 1 5 j 5 m
νn+i j=1
2
i=0
ε
−1
+P
Mj,n+i 5 , m + 1 5 j 5 n + i, 0 5 i 5 k
2
)!
(
k
\
ε
mc
+ P Mj,n+i 5 c, 1 5 j 5 m, 0 5 i 5 k
=P
5
νn+i 2
i=0
Mj,n+i 5 ε , m 5 j 5 n + i, 0 5 i 5 k
−2
+P
2
m k [
[
mc ε
Mjn > c
Mn+1,n+i > c
−P
−P
=P
5
νn
2
2
2
j=1
i=1
Mj,n+i 5 ε , m 5 j 5 n + i, 0 5 i 5 k
+P
− 1.
2
(8)
Now, four probabilities are involved in (8). Let δ > 0. By Claim 3, there is n0
such that for every m = n0 and every k and n = m, the third probability is less than
δ/4, whenever c > ε, while the fourth is greater than 1 − δ/4. Fix m = n0 . By
Claim 2, there is c > ε such that the second probability is less than δ/4. Finally, since
νn → +∞ in probability, one can take n1 sufficiently large so that the first probability
is greater than 1 − δ/4, for all n = n1 .
n
In what follows, for n ∈ N, γn denotes the probability on the power set of X
∞
n
given by γn (S) = P (S × X ) for all S ⊂ X .
Proof of Claim 1 of Theorem 9. By strategicity, and since P (ψs |x1 , . . . , xs−1 ) = 0,
Z
P (yr ys ψr ψs ) =
Z
=
P (yr ys ψr ψs | x1 , . . . , xs−1 ) dγs−1
yr (x1 , . . . , xr−1 ) ys (x1 , . . . , xs−1 ) ψr (x1 , . . . , xr )
× P (ψs | x1 , . . . , xs−1 ) dγs−1 = 0.
Proof of Claim 2 of Theorem 9. By Claim 1 and Chebyshev’s inequality,
P
m
[
|Mjn | > a
j=1
!
5
m
X
1
j=1
2
a
P
2 Mjn
=
m
X
1
j=1
2
a
P
n
X
i=j
!
2 2
yi ψi
5
∞
mX 1
2
a
i=1
2.
i
96
p. berti, e. regazzini, and p. rigo
Proof of Claim 3 of Theorem 9. Fix ε > 0, and m, n, k ∈ N with n = m. Set
C1 = |Mjn | 5 ε, m 5 j 5 n , C2 = |Mn+1,n+k | 5 ε ,
C3 = |Mj,n+k | 5 ε, n + 2 5 j 5 n + k .
For ω ∈ C1 ∩ C2 ∩ C3 , m 5 j
5
n + i and 0 5 i 5 k, it must be |Mj,n+i (ω)| 5 3ε, since
Mm+p,n+i = Mm+p,n + Mn+1,n+i = Mm+p,n + Mn+1,n+k − Mn+i+1,n+k
for p = 0, . . . , n − m and i = 0, . . . , k, and
Mn+1+p,n+i = Mn+1+p,n+k − Mn+i+1,n+k
for p = 0, . . . , i − 1 and i = 1, . . . k.
Hence, it suffices to show that
∀ ε, δ > 0 ∃ n0 such that ∀ r, s ∈ N, with s = r
P |Mjs | 5 ε, r 5 j 5 s > 1 − δ.
(9)
=
n0 ,
We now prove (9). To this end, some new notation is needed:
Ap = ω: νr (ω) = p , p = 0, . . . , r,
ρ(i) (ω) = min q: νq (ω) = νr (ω) + i ,
f (t)(ω) = ρ t − νr (ω) (ω) [roughly, f (t)(ω) is the first index q
such that νq (ω) = t ,
(
!)
2
q
νj (ω)
t(j)(ω) = max r, f
,
√ √ p + 1 for p = 0, . . . , r and β =
s − 1,
o
n
2
2
2
Dp = α (p), α(p) + 1 , . . . , (β + 1) .
α(p) =
Then,
P
s
[
|Mjs | > ε
j=r
!
5
P
ε
∃ j = r, . . . , s with Mt(j),s > ,
2
ε
or ∃ j = r, . . . , s with Mjs − Mt(j),s >
2
ε
(∗)
5P
Mrs >
2
r
X
ε
+
(∗∗)
P
∃ j = r + 1, . . . , s with Ap Mt(j),s >
2
p=0
r
X
ε
+
(∗ ∗ ∗)
.
P
∃ j = r + 1, . . . , s with Ap Mjs − Mt(j),s >
2
p=0
We treat (∗), (∗∗), and (∗ ∗ ∗), separately.
well calibrated, coherent forecasting systems
97
Let us start with (∗). By Claim 1, and for every fixed k ∈ N,
!
s
X
ε
4
2
Mrs >
5 2 P
yi
P
2
ε
i=r
5
∞
X
1
4
ε
2
i=1
2
i
P {νr
5
k} +
∞
X
1
i=k+1
2
i
!
P {νr > k}
.
Thus, since νr → +∞ in probability, it is easily verified that (∗) converges to 0 as
r → +∞, uniformly in s.
Let us turn now to (∗∗).
)!
( r
s
X
ε
X
X
P
Ap yi ψi >
(∗∗) 5
2
2
2
p=0 q ∈Dp
(10)
5
i=ρ(q −p)
r
X
X 4
p=0 q 2 ∈Dp
ε
2P
s
X
Ap
!2 !
=
yi ψi
r
X
X 4
p=0 q 2 ∈Dp
i=ρ(q 2 −p)
ε
2P
s
X
Ap
!
2 2
yi ψi
i=ρ(q 2 −p)
where the last equality will be proved later. Then,
(∗∗) 5
∞
r
X
X
4
∞
X
1
2
2
(i + j)
∞
r
r
X X 4 1
X
1
P
(A
5
+
)
=
a(p) P (Ap ),
p
2
4
2
i
i
p=0 i=α(p) ε
p=0
p=0 i=α(p)
where a(p) =
P∞
i=α(p) (4/ε
r
X
2
ε
2 P (Ap )
4
j=0
2
) (1/i + 1/i ). Moreover, for 0 5 M < r,
a(p) P (Ap ) =
p=0
M
X
a(p) P (Ap ) +
p=0
5
M
X
r
X
a(p) P (Ap )
p=M +1
a(p) P (Ap ) + a (M + 1)
p=0
5
sup a(p) 1 − P {νr > M } + a (M + 1).
p
Now, since a(p) ↓ 0 and νr → +∞ in probability, there is M such that a(M + 1) < ε
and, fixed such M , there is r0 such that P ({νr > M }) > 1 − ε for all r = r0 . To sum
up, (∗∗) converges to 0 as r → +∞, uniformly in s.
2
s−1
We still have to prove (10). Let Bt = {ρ (q − p) = t} and B = ∪t=p+1 Bt . Then,
P Ap B
X
y i ψ i yj ψ j
ρ(q 2 −p)5i<j 5s
=
s−1
X
X
t=p+1 t5i<j 5s
=
s−1
X
X
t=p+1 t5i<j 5s
Z
=
s−1
X
X
P (Ap Bt yi ψi yj ψj )
t=p+1 t5i<j 5s
P (Ap Bt yi ψi yj ψj | x1 , . . . , xj−1 ) dγj−1
Z
Ap Bt yi ψi yj P (ψj | x1 , . . . , xj−1 ) dγj−1 = 0,
98
p. berti, e. regazzini, and p. rigo
and this yields (10).
Finally, let’s consider (∗ ∗ ∗).
r
X
ε
P
∃ j = r + 1, . . . , s with Ap Mjs − Mt(j),s >
(∗ ∗ ∗) =
2
p=0
r
X
ε
2
5
P
∃ j with p 5 νj 5 α (p) and Ap Mt(j),j−1 >
2
p=0
r
X
2
2
+
P
∃ j with α (p) < νj 5 α(p) + 1
p=0
2
or α(p) + 1
< νj
5
2
α(p) + 2
ε
2
(β + 1) , and Ap Mt(j),j−1 >
2
r
X
ε
2
+
.
P
∃ j with (β + 1) < νj 5 s and Ap Mt(j),j−1 >
2
p=0
2
or · · · or β < νj
5
Let ϕi = ψi if i 5 s and ϕi = 0 otherwise. Then,
r
X
h r ϕr ε
h r ϕr
ϕf (p+1) ε
(∗ ∗ ∗) 5 P (A0 ) +
>
or Ap +
>
P
Ap p 2
p
p+1 2
p=1
h ϕ ϕf (p+1)
ϕf (α2 (p))
ε
+ ··· +
>
or Ap r r +
2
p
p+1
2
α (p)
β
r
X
X
ϕf (q2 )
ϕf (q2 )
ϕf (q2 +1) ε
ε
+
or Ap 2 + 2
P
Ap 2 >
> 2
2
q
q
q +1
p=1 q=α(p)
ϕf (q2 )
ϕf ((q+1)2 ) ε
>
or · · · or Ap 2 + · · · +
2 2
q
(q + 1)
r
X
ϕf ((β+1)2 ) ε
P
Ap +
2 > 2
(β
+
1)
p=1
ϕf ((β+1)2 )
ϕf (s) ε
>
+
·
·
·
+
or · · · or Ap 2
s 2
(β + 1)
2
r α (p)−p ϕf (p+1)
ϕf (p+t) 2
4 X X
h r ϕr
+
+ ··· +
5 P (A0 ) + 2
P Ap
p
p+1
p+t
ε p=1 t=0
+
+
β
r
4 X X
ε
2
(q+1)2 −q 2
p=1 q=α(p)
r s−(β+1)
4 X X
ε
2
X
p=1
t=0
t=0
2
P Ap
ϕf (q2 )
q
2
+ ··· +
ϕf (q2 +t)
2 2
q +t
ϕf ((β+1)2 +t) 2
ϕf ((β+1)2 )
P Ap
+
·
·
·
+
2
2
(β + 1)
(β + 1) + t
well calibrated, coherent forecasting systems
r α (p)−p 4 X X
1
2
5
P (A0 ) +
2
ε
2
p
(q+1)2 −q 2
X
p=1 q=α(p)
2
r s−(β+1) 4 X X
+
ε
2
p=1
1
q
t=0
+ ··· +
2
t=0
p=1
β
r
4 X X
+
(11)
ε
4
1
4
(β + 1)
t=0
1
(p + t)
+ ··· +
P (Ap )
2
1
2
P (Ap )
2
(q + t)
1
+ ··· +
2
2
((β + 1) + t)
P (Ap )
where the last inequality will be proved later. Next,
r
1
4 X 2
1
1
α (p) − p + 1
P (Ap )
(∗ ∗ ∗) 5 P (A0 ) + 2
2 +
2 + ··· + 4
ε p=1
p
(p + 1)
α (p)
+
β
r
4 X X
2
ε
×
+
5
1
4
+
r
4 X
2
p=1
P (A0 ) +
×
+
1
2
2
(q + 1)
ε
2
4
(β + 1)
2
2
α (p) − α(p) − 1
4
β
r
4 X X
1
+
2
2
((α(p) − 1) + 1)
2
s
P (Ap )
+1
2
+ ··· +
1
4
α (p)
P (Ap )
2
1
5
q
4
P (A0 ) +
1
2
2
(q + 1)
r
4 X
ε
2
+ ··· +
1
P (Ap )
4
(q + 1)
r
4 X
2
2
+ 2
(β + 2) − (β + 1) + 1
ε p=1
1
1
1
P (Ap )
×
4 +
2
2 + ··· +
4
(β + 1)
((β + 1) + 1)
(β + 2)
×
+
1
(q + 1) − q + 1
p=1 q=α(p)
1
+ ··· +
p=1
1
2
P (Ap )
4
(q + 1)
1
2
r
4 X
1
+ ··· +
s − (β + 1) + 1
(α(p) − 1)
ε
2
p=1 q=α(p)
q
ε
2
(q + 1) − q + 1
β+1
X
(2q + 2)
p=1 q=α(p)−1
1
P (Ap )
2
2 + ··· +
4
(q + 1)
(q + 1)
r
4 X X
2q + 1
1
5 P (A0 ) + 2
(2q + 2)
+
P (Ap )
4
2 2
ε p=1 q=α(p)−1
q
(q + 1) q
×
q
4
+
1
= P (A0 ) +
r
X
p=1
a(p) P (Ap ),
99
100
p. berti, e. regazzini, and p. rigo
where
a(p) =
X
4
ε
2
(2q + 2)
q =α(p)−1
1
q
4
+
2q + 1
2 2
(q + 1) q
.
Pr
To sum up, (∗ ∗ ∗) 5 P (A0 ) + p=1 a(p) P (Ap ), with a(p) ↓ 0. Following the same
argument used for (∗∗), one can show that (∗∗∗) converges to 0 as r → +∞, uniformly
in s.
Let’s now turn to (11). As is easily seen, it suffices to show that P (Ap ϕf (i) ϕf (j) ) =
0 for p < i < j. To this purpose, let Dz = {f (j) = z}. Then,
s
X
P Dz Ap ψf (i) ψf (j)
P Ap ϕf (i) ϕf (j) =
z=r
=
s Z
X
P Dz Ap ψf (i) ψf (j) | x1 , . . . , xz−1 dγz−1
z=r
=
s Z
X
Dz Ap ψf (i) P ψz | x1 , . . . , xz−1 dγz−1 = 0.
z=r
This completes the proof.
We conclude the present subsection with a couple of examples in order to show
that neither well calibration implies finitary well calibration nor finitary well calibration implies well calibration.
Example 2 (well calibration does not imply finitary well calibration). Let X = N,
hn ≡ 1 and Bn = B for all n, where B is the set of even numbers. Let P be any p.i.
∞
n
∞
such that P (B × X ) = P (X × B × X ) = b, for some b ∈ (0, 1) and all n. Setting
∞
n
∞
c0 = P (B × X ) and cn (x1 , . . . , xn , . . .) = P (X × B × X | x1 , . . . , xn ), one obtains
Z
n
1X
(pn − πn ) dP = b −
n i=1
Z
ci−1 dP.
R
Since (pn − πn ) dP −→ 0 is necessary forPP toRbe finitarily well calibrated, it suffices
n
to choose cn in such a way that (1/n) i=1 ci−1 dP does not converge to b and
the resulting P satisfies condition (j) of Theorem 8 (so that there are well calibrated
coherent extensions of P ).
For instance, set
n
(12)
n
1X
1X
cn (x1 , . . . , xn , . . .) = αn
B(xi ) + βn
D(xi ) + γn b
n i=1
n i=1
where αn , βn , γn = 0, αn + βn + γn = 1 and D is some subset of X. Moreover, take
n
∞
αn , βn , γn and D in such a way that αn → α, βn →P
β, γnR → 0, P (X × D × X ) = d
n
for each n, with β > 0 and d 6= b. Clearly, (1/n) i=1 ci−1 dP does not converge
to b. Fix A ∈ L\{∅}, and let (y1 , . . . , ym ) be such that (y1 , . . . , ym , ω) ∈ A for all ω.
Taking ω1 such that all its coordinates are in B\D, and ω2 with all its coordinates
in D ∩ B, then (y1 , . . . , ym , ω1 ) ∈ F ∩ A and (y1 , . . . , ym , ω2 ) ∈ (E\F ) ∩ A. Hence,
condition (j) of Theorem 8 holds.
We note that (12) is a rather common assessment in the insurance field, in the
presence of the so-called “collateral data.” For instance, B could represent the event
“occurrence of some accident,” and D any other event different from, but strictly
related to, B.
well calibrated, coherent forecasting systems
101
Example 3 (finitary well calibration does not imply well calibration). Consider
Example 1 again but, for the sake of simplicity, take hn ≡ 1 instead of defining hn as
in (5). According to Theorem 9, P is finitarily well calibrated. Hence, it suffices to
find a coherent extension of P which is not well calibrated. This follows by Theorem 8,
after noting that E = Ω and condition (j) can be checked precisely as in Example 1.
3.4. Further comments. Contrary to what Dawid writes in [7], there is no
conflict between calibration and coherence, since there are coherent p.i.’s which do
not ignore the event F of miscalibration; cf. Theorem 8. In any case, the probability
attached to F rests on a subjective judgment, just as is the case with any other
probability assessment. Clearly, there are specific coherent assignments which may be
conducive to ignoring F . For instance, the assumption that P is strategic enables one
0
to choose a particular strategic extension of P , the Lebesgue-like extension P of P ,
0
and Theorem 7 implies that P (F ) = 0. Incidentally, we note how Theorem 7 involves
one particular, important though it might be, strategic extension of P . However, the
0
0
single probability laws P (·) and P (· | x1 , . . . , xn ) composing an arbitrary strategic
0
extension P of P are mutually linked in such a manner as to suggest that strategicity
implies well calibration. More precisely, it is plausible that Theorem 7, and perhaps
some form of the martingale convergence theorem, holds for every strategic extension
of P . On the other hand, Theorem 7 does not state that the adoption of a strategic
P on C is conducive to ignoring F . Indeed, the p.i. in Example 1 is strategic on C
and F can be given strictly positive probability. Finally, according to Theorem 9, the
assumption that P is strategic is enough for finitary well calibration to come true. In
conclusion, it is the assumption that a p.i. is strategic, rather than its coherence, of
concern to well calibration.
REFERENCES
[1] P. Berti, E. Regazzini, and P. Rigo, De Finetti’s coherence and complete predictive inferences,
Quaderno I.A.M.I. 90. 5 Milano, 1990.
[2] P. Berti, E. Regazzini, and P. Rigo, Coherent statistical inference and Bayes theorem, Ann.
Statist., 19 (1991), pp. 366–381.
[3] P. Berti and P. Rigo, Weak disintegrability as a form of preservation of coherence, J. Italian
Statist. Soc., 1 (1992), pp. 161–181.
[4] K. P. S. Bhaskara Rao and M. Bhaskara Rao, Theory of Charges, Academic Press, London,
1983.
[5] R. Chen, On almost sure convergence in a finitely additive setting, Z. Wahrscheinlichkeitstheorie
verw. Gebiete, 37 (1977), pp. 341–356.
[6] M. D. Cifarelli and E. Regazzini, Sopra una versione finitamente additiva del processo di
Ferguson–Dirichlet, in Scritti in omaggio a L. Daboni, Ed. Lint, Trieste, 1990, pp. 67–81.
[7] P. Dawid, The well calibrated Bayesian (with discussion), J. Amer. Statist. Assoc., 77 (1982),
pp. 605–613.
[8] P. Dawid, Present position and potential developments: Some personal views, statistical theory,
the prequential approach (with discussion), J. Roy. Statist. Soc., Ser. A, 147 (1984), pp. 278–
292.
[9] P. Dawid, Fisherian inference in likelihood and prequential frames of reference (with discussion),
J. Roy. Statist. Soc., Ser. B, 53 (1991), pp. 79–109.
[10] B. De Finetti, Sulla proprieta’ conglomerativa delle probabilita’ subordinate, Rend. R. Istituto
Lombardo di Scienze e Lett., 63 (1930), pp. 414–418.
[11] B. De Finetti, Sull’impostazione assiomatica del calcolo delle probabilità, Annali Triestini
dell’Università di Trieste, 19 (1949), pp. 29–81; (English translation in Probability, Induction and Statistics, Wiley, New York, 1972).
[12] B. De Finetti, Teoria delle probabilità, Einaudi, Torino, 1970 (English translation: Theory of
Probability, Wiley Classics Library Edition, Chichester, 1990).
102
p. berti, e. regazzini, and p. rigo
[13] L. E. Dubins, On Lebesgue-like extensions of finitely additive measures, Ann. Probab., 2 (1974),
pp. 456–463.
[14] L. E. Dubins, Finitely additive conditional probabilities, conglomerability and disintegrations,
Ann. Probab., 3 (1975), pp. 89–99.
[15] L. E. Dubins and L. J. Savage, How to gamble if you must (Inequalities for stochastic processes), McGraw-Hill, New York, 1965; Inequalities for Stochastic Processes (How to Gamble
if You Must), Dover, New York, 1976.
[16] N. Dunford and J. T. Schwartz, Linear operators: Part I, General Theory, Interscience, New
York, 1958.
[17] S. Geisser, Predictive analysis, in Encyclopedia of Statistical Sciences, Wiley-Interscience, New
York, 7 (1986), pp. 158–170.
[18] B. Hill, Posterior distribution of percentiles: Bayes theorem for sampling from a population,
J. Amer. Statist. Assoc., 63 (1968), pp. 677–691.
[19] B. Hill, Parametric models for An : Splitting processes and mixtures, J. Roy. Statist. Soc.,
Ser. B, 55 (1993), pp. 423–433.
[20] A. N. Kolmogorov, Grundbegriffe der Wahrscheinlichkeitsrechnung, Springer, Berlin, 1933
(English translation: Foundations of the Theory of Probability, Chelsea, New York, 1950).
[21] J. Neveu, Bases mathématiques du calcul des probabilités. 2éme ed., Masson, Paris, 1980.
[22] R. A. Purves and W. D. Sudderth, Some finitely additive probability, Ann. Probab., 4 (1976),
pp. 259–276.
[23] M. M. Rao, Conditional Measures and Applications, Marcel Dekker, New York, 1993.
[24] E. Regazzini, Finitely additive conditional probabilities, Rend. Sem. Mat. Fis. Milano, 55
(1985), pp. 69–89 (corrections in Rend. Sem. Mat. Fis. Milano, Vol. 57, p. 599).