* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download WELL CALIBRATED, COHERENT FORECASTING SYSTEMS
Survey
Document related concepts
Transcript
THEORY PROBAB. APPL. Vol. 42, No. 1 WELL CALIBRATED, COHERENT FORECASTING SYSTEMS* P. BERTI† , E. REGAZZINI‡ , AND P. RIGO† Abstract. This paper introduces a definition of predictive inference based on sequential observations in the light of de Finetti’s principle of coherence. It includes characterizations of coherent predictive inferences and of strategic predictive inferences. It thoroughly analyzes the concepts of well calibrated and finitarily well calibrated forecasting systems. Apropos to this subject, some new laws of large numbers are assessed with reference to finitarily additive probability distributions. These results are then used to critically confute some objections raised against the coherence principle. Key words. calibration, coherence, conditional probability, conglomerability, extension, finite additivity, predictive inference, strategy PII. S0040585X97975988 According to a prevailing view, the goal of statistical inference is a statement about the probability law governing a given observable phenomenon. Consequently, statistical methods are conceived just as tools to single out that law. This point of view presupposes that the existence of probability laws, as well as the existence of laws governing observable phenomena, is viewed as an objective fact. Hence, the above mentioned idea of statistical inference might clash with the pragmatic position according to which the essential role of any scientific theory lies in making previsions about future possible facts. On the other hand, the process of inferring values for unknown observable facts, based on current observations and other information, is immune to the criticism from those who refuse to assume the existence of “true” probability laws (subjectivistic interpretation of probability). This kind of process, which is known as predictive statistical approach, was shared by almost all the pioneers of statistical induction; see [17]. It can be carried out directly, without passing through the mediation of a family of parametric probability laws (statistical model), under the sole guidance of de Finetti’s principle of coherence. Obviously, predictive inferences can also be obtained after assigning a statistical model and a prior distribution on parameters, according to the usual Bayesian procedures. The present paper deals with predictive inferences, in the case of sequential observations. Consequently, we will have to suggest conditions on a sequence of conditional expectations in order that all terms of the sequence may be considered admissible as a whole. The resulting scheme and conditions have natural applications to concrete problems of forecasting. Moreover, we think that they could be used to set up a general theoretical basis for some modern nonconventional inferential approaches and, in particular, for Dawid’s prequential approach; see [8], [9]. We will deal with the subject within de Finetti’s theory of probability, because we think that this is the only one which does not surreptitiously introduce extrastatistical technical restrictions and that, consequently, it is the most suitable as a basis for any discussion about the logical foundation of statistical methodology. *Received by the editors April 27, 1994. This work was partially supported by MURST (60% 1992, Inferenza statistica predittiva), MURST (40% 1992, Modelli probabilistici e statistica matematica), and Universita’ “L. Bocconi” (1991–1992). http://www.siam.org/journals/tvp/42-1/97598.html † Dipartimento di Statistica, “G. Parenti,” viale Morgagni 59, 50134 Firenze, Italy. ‡ IMQ-Universita’ “L. Bocconi,” via R. Sarfatti 25, 20136 Milano, Italy. 82 well calibrated, coherent forecasting systems 83 The present article consists of three sections. Section 1 contains some basic statements of de Finetti’s concept of coherent prevision. In particular, it includes criteria which enable one to decide on the coherence of real-valued functions defined on suitable classes of bounded conditional random quantities. Section 2 deals with the concept of predictive inference in the presence of observations made at instants 1, 2, . . . and when, at each instant n, one assesses inferences on future facts, on the basis of the first n observations. In the same section, one defines and characterizes (via the notion of conglomerability) strategic predictive inferences which, in a sense, are finitely additive versions of the Ionescu–Tulcea argument in order to assess probability measures on infinite dimensional spaces; cf. [21]. Finally, section 3 analyzes calibration as a check of empirical validity of a given forecasting system. It includes some new theorems about well calibration and finitary well calibration of strategic predictive inferences. Among other things, it is shown that coherent predictive inferences need not be well calibrated, and this statement is used to prove that de Finetti’s coherence is exempt from the imperfections pointed out by Dawid in [7]. 1. Preliminaries. 1.1. Events and random quantities. Given the space Ω of elementary cases, each relevant event is viewed as a subset of Ω. In particular, Ω corresponds to the sure event and ∅, the empty set, to the impossible one. We will adopt the useful convention that the same symbol that designates an event also designates the indicator of that event. Likewise, the same symbol that designates a class of events also designates the class of the corresponding indicators. Consequently, if H designates a class of events and L a class of real-valued functions on Ω, H ⊂ L will mean that the indicators of the elements of H belong to L. Moreover, if L is the class of the indicators of all the elements of an algebra of events, then we will also say that L is an algebra of events. Any real-valued function g on Ω will be said to be a random quantity. Given a random quantity (r.q.) g and an event H 6= ∅, the restriction of g to H, denoted by g | H, is said to be a conditional (r.q.). In particular, if g = E, where E is the indicator of some event, then E | H is also said to be a conditional event. Note that if H = Ω, then g | H = g. 1.2. De Finetti’s coherence principle. A real-valued function P , defined on a class C of bounded conditional r.q.’s, is considered as a candidate to represent an expectation about the true realization of each element of C, if and only if P meets de Finetti’s coherence principle. More precisely, according to de Finetti [12, Vol. 1], we introduce the following. Definition 1. Given a class C of bounded conditional r.q.’s, P : C → R is said to be a prevision on C if it meets the coherence principle. In particular , if C is a class of conditional events, then a prevision on C is also said to be a probability on C. As far as the coherence principle is concerned, suppose that after assigning P on C, one is committed to accepting any bet whatsoever on each element of C, with arbitrary (positive or negative) stakes, on the understanding that any bet on g | H is called off if H does not occur. In this framework, P (g | H) represents the price of every bet on g | H. Precisely, what one gains from a combination of bets on g1 | H1 , . . . , gn | Hn with stakes s1 , . . . , sn , respectively, is given by G(gk | Hk , sk ; k = 1, . . . , n) = n X k=1 sk Hk P (gk | Hk ) − gk | H0 84 p. berti, e. regazzini, and p. rigo n with H0 = ∪k=1 Hk . Then, P is said to be coherent if and only if the inequalities inf G 5 0 5 sup G n hold for every choice of n ∈ N, {g1 | H1 , . . . , gn | Hn } ⊂ C, and (s1 , . . . , sn ) in R . In other words, one has to fix a coherent P , if one wants nobody to make Dutch book against him. Definition 1 makes sense because, given any C, there exists at least one prevision on C. Such a statement is a direct consequence of the following extension theorem, whose proof — together with a concise treatment of the main consequences of the coherence principle — can be found in [24]. ∗ Theorem 1. Let C and C be classes of bounded conditional r.q.’s such that ∗ C ⊂ C and let P be a prevision on C. Then there exists a prevision (which need not ∗ ∗ be unique) P on C for which ∗ P (g | H) = P (g | H) for every g | H in C. Throughout the rest of this paper, we shall be basically concerned with the case in which the domain C of P is given by (1) C = {g | H: g ∈ L, H ∈ H} ∞ where H = ∪n=0 Πn , (Πn ) being a sequence of partitions of Ω such that Πn+1 is a refinement of Πn for every n = 0 and Π0 = {Ω}; L is any class of bounded r.q.’s such that L ⊃ H and gH ∈ L whenever g ∈ L and H ∈ H. 1.3. A few characterizations of probabilities and previsions. We begin by mentioning a useful characterization of a prevision on C, whose proof can be found in [1], whenever C coincides with (1). Theorem 2. Let C be defined according to (1). Then P : C → R is a prevision if and only if P meets the following conditions: (p1 ) P (· | H) is a prevision on L for every H in H; (p2 ) inf g | H 5 P (g | H) 5 sup g | H for every H in H and g in L; (p3 ) P (gH1 | H2 ) = P (g | H1 ∩ H2 ) P (H1 | H2 ) for every g, H1 in L and H2 in H such that H1 ⊂ Ω, gH1 ∈ L and H1 ∩ H2 ∈ H. If L is a class of events, then (p1 )–(p3 ) can be restated as follows: (π1 ) P (· | H) is a probability on L for every H in H; (π2 ) P (A | H) = 1 whenever H ∈ H, A ∈ L and H ⊂ A; (π3 ) P (A ∩ H1 | H2 ) = P (A | H1 ∩ H2 ) P (H1 | H2 ) provided that A, H1 and A ∩ H1 belong to L and H1 ∩ H2 , H2 are elements of H. The following propositions provide characterizations of unconditional previsions in case L has some suitable structure; for their proof, see [24]. Theorem 3. Let L be a linear space of bounded r.q.’s including the constants. Then Q: L → R is a prevision if and only if it turns out to be a positive, linear functional on L such that Q(Ω) = 1. Theorem 4. Let L be an algebra of events. Then Q: L → R is a probability if and only if it turns out to be a non-negative-valued , additive function such that Q(Ω) = 1. 1.4. Basic differences between Kolmogorov’s and de Finetti’s theories. As already noted, in the present paper we essentially deal with classes of random elements of the type of (1), but it should be remembered that, thanks to Theorem 1, well calibrated, coherent forecasting systems 85 previsions can be assessed on arbitrary classes of conditional bounded r.q.’s. In particular, one can fix a conditional prevision without having preassigned any unconditional probability law. In other words, the coherence principle suffices to state whether a real-valued function, defined on any class of bounded conditional r.q.’s, can be considered as a prevision or not. Consequently, conditional probabilities, in de Finetti’s theory, need not be evaluated as derivatives of probability measures with respect to probability measures. Moreover, coherent probabilities (previsions, respectively) need not be continuous with respect to monotone sequences of events (r.q.’s, respectively). Apart from the previous remarks which, in any case, stress some peculiarities of de Finetti’s approach compared with Kolmogorov’s, the main difference being that de Finetti explicitly considers conditional r.q.’s and events, given a single event and, in many cases, such an event represents an isolated given hypothesis whose probability equals zero. In fact, such cases appear frequently enough in probability and statistical practice; cf. [12, p. 276 of the English translation] and [23, p. 66]. On the contrary, Kolmogorov [20, p. 51 of the English translation] asserts “the concept of conditional probability with regard to an isolated given hypothesis whose probability equals zero is inadmissible.” Consequently, in Kolmogorov’s theory, conditioning is considered with respect to specific classes of events. 2. Predictive inferences. 2.1. Definition of predictive inference. Roughly speaking, predictive inference is any family of consistent conditional expectations on a class of r.q.’s, given some facts relating to the realizations of other random elements. As an example, we might consider a predictor (meteorologist, economist, bookmaker, etc.) who makes regular periodic forecasts on the basis of past data. In fact, forecasting is a significant case of predictive inference, which will be of help to us to explain our point of view in a concrete way. We start by making precise the domain of a predictive inference, in the case of sequential observations. This means that observations are made at instants 1, 2, . . . and, at each instant n, one assesses inferences on future facts, on the basis of the first n observations. Let X denote the set of all possible outcomes of each element of a sequence of ∞ trials. According to Dubins and Savage [15] call the elements of Ω := X histories, and n call the elements of X partial histories (n = 1, 2, . . .). Hence, the set Πn defined by (2) Πn = n o ∞ n (x1 , . . . , xn ) × X : (x1 , . . . , xn ) ∈ X is the partition of Ω whose elements can be identified with partial histories of length n. Then, the domain of a predictive inference in case of sequential observations will ∞ be defined to be the class C, defined by (1), with Ω = X and Πn as in (2). In the framework just described, predictive inferences are assigned without any reference to some statistical model involving unknown parameters. Anyhow, it is a common statistical practice to derive previsions on C from joint assignments of a parametric statistical model and of a prior distribution on its unknown parameters. This situation will be investigated in a forthcoming paper. Here, we just take into account predictive inferences, free from parametric superstructures. ∞ Definition 2. Given C as in (1), with Ω = X and Πn as in (2), any prevision P on C is said to be a predictive inference (p.i.). We now provide a characterization for p.i.’s based on Theorem 2. 86 p. berti, e. regazzini, and p. rigo After denoting the indicator of {(x1 , . . . , xn )}×X is a p.i., if and only if, ∞ with I(x1 , . . . , xn ), P : C → R P (·), P (· | x1 , . . . , xn ) are previsions on L, P I(x1 , . . . , xn ) | x1 , . . . , xn = 1, (iii) P g I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk = P (g | x1 , . . . , xk , . . . , xn ) × P I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk , (iv) P g I(x1 , . . . , xn ) = P (g | x1 , . . . , xn ) P I(x1 , . . . , xn ) (i) (ii) n for every g in L, (x1 , . . . , xk , . . . , xn ) in X and for all naturals k, n with k 5 n. In the above proposition, as well as throughout the rest of this paper, P (·) stands ∞ for P (· | Ω) and P (· | x1 , . . . , xn ) for P (· | {(x1 , . . . , xn )} × X ). 2.2. Example of predictive inference within a Bayesian framework. Let ∞ L = {A1 ×· · ·×An ×X : A1 , . . . , An ∈ A, n ∈ N} where A is some algebra of subsets of X including the singletons. In compliance with the Bayesian standard procedure, suppose that for each θ belonging to a specific parameter space Θ, a prevision Pθ (·) is assigned on L in such a way that n Z Y ∞ l(x, θ) λ(dx) Pθ (A1 × · · · × An × X ) = i=1 Ai where λ denotes a measure on A (finitely additive, possibly) and l: X×Θ −→[0, +∞) is R such that l(x, θ) λ(dx) = 1 for every θ ∈ Θ. Throughout the present paper, integrals are to be meant in the sense of Dunford and Schwartz; cf. [16] and [4]. Moreover, let us assume that Pθ (A1 × · · · × An × X ∞ k ∞ | x1 , . . . , xk ) = Pθ (X × Ak+1 × · · · × An × X ) if n > k and xi ∈ Ai for i = 1, . . . , k, ∞ | x1 , . . . , xk ) = 1 ∞ | x1 , . . . , xk ) = 0, Pθ (A1 × · · · × An × X if n 5 k and xi ∈ Ai for i = 1, . . . , n, and Pθ (A1 × · · · × An × X otherwise. These positions are consistent with the introduction of a sequence of X-valued random elements which, under Pθ (·), are independent and identically distributed. In view of (i)–(iv) of subsection 2.1, Pθ := {Pθ (·), Pθ (· | x1 , . . . , xn ): (x1 , . . . , xn ) ∈ n X , n ∈ N} is a p.i. for every θ. Let us now introduce a countably additive prior probability q on a σ-algebra B of subsets of Θ, and assume that θ → l(x, θ) is a B-measurable function for every fixed x ∈ X. Consequently, if Z Y n l(xi , θ) q(dθ) ∈ (0, +∞) i=1 for each partial history (x1 , . . . , xn ), then R Qn l(xi , θ) q(dθ) q(H | x1 , . . . , xn ) = RH Qni=1 , i=1 l(xi , θ) q(dθ) Θ H∈B well calibrated, coherent forecasting systems 87 represents a coherent posterior; cf. [2]. At this stage, a p.i. P can be assessed according to Z P (g) = Pθ (g) q(dθ), Z P (g | x1 , . . . , xn ) = Pθ (g | x1 , . . . , xn ) q(dθ | x1 , . . . , xn ) n for all g in L, (x1 , . . . , xn ) in X and n ∈ N. Indeed, since Pθ is a p.i. for every θ, P satisfies (i) and (ii). Further, (iii) trivially holds whenever P (I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk ) = 0, so that one can assume that P I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk Z Y n n Y = λ {xi } l(xi , θ) q(dθ | x1 , . . . , xk ) i=k+1 i=k+1 is strictly positive. In that case, for any g ∈ L P gI(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk Z = Pθ g I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk q(dθ | x1 , . . . , xk ) = n Y λ {xi } Z Pθ (g | x1 , . . . , xk , . . . , xn ) i=k+1 n Y l(xi , θ) q(dθ | x1 , . . . , xk ) i=k+1 Z = P I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk Pθ (g | x1 , . . . , xk , . . . , xn ) Qn l(xi , θ) q(dθ | x1 , . . . xk ) R × Qi=k+1 = P I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk n l(x , θ) q(dθ | x , . . . x ) i 1 k Z i=k+1 × Pθ (g | x1 , . . . , xk , . . . , xn ) q(dθ | x1 , . . . , xk , . . . , xn ). Since the last integral represents P (g | x1 , . . . , xk , . . . , xn ), we have stated that (iii) comes true. Finally, since (iv) can be checked in the same way, P turns out to be a p.i. 2.3. Definition and existence of strategic predictive inferences. This subsection deals with special kinds of p.i.’s, which satisfy the usual disintegrability condition characterizing Kolmogorovian conditional expectations. According to Dubins and Savage [15], a strategy is a sequence σ = (σ0 , σ1 , . . .) in which σ0 is a probability on P(X), the power set of X, and, for every n in N, σn is a n function on X which associates a probability on P(X), denoted by σn (x1 , . . . , xn ) to every partial history (x1 , . . . , xn ). For any B ⊂ X, σn (x1 , . . . , xn ) (B) can be viewed ∞ as the conditional probability, under σ, of {xn+1 ∈ B}, given {(x1 , . . . , xn )} × X . If P is a p.i. and L includes ∞ n ∞ B := B × X , X × B × X : B ⊂ X, n ∈ N , then the strategy σ given by ∞ σ0 (B) = P (B × X ), n σn (x1 , . . . , xn ) (B) = P (X × B × X is said to be the strategy induced by P . ∞ | x1 , . . . , xn ) 88 p. berti, e. regazzini, and p. rigo Definition 3. If L ⊃ B, a p.i. P is said to be strategic if Z P (g) = P (g | x) σ0 (dx), Z (3) P (g | x1 , . . . , xn ) = P (g | x1 , . . . , xn , x) σn (x1 , . . . , xn ) (dx) n for every g in L, (x1 , . . . , xn ) in X and n in N, where σ is the strategy induced by P . As already noted, equations (3) look like the usual conditions requested for conditional expectations in Kolmogorov’s theory. However, strategic p.i.’s need not be continuous with respect to monotone sequences of r.q.’s. We also note that, in order to check whether an arbitrary function P : C → R is a strategic p.i., it suffices to verify (3) together with (i) and (ii) of subsection 2.1; indeed, under (i) and (ii), (iii) and (iv) follow from (3). As far as the existence of strategic p.i.’s is concerned, we have the following ∗ proposition involving the class L of the inductively integrable functions on Ω; see [15]. ∗ Theorem 5. If B ⊂ L ⊂ L and σ is a strategy, then there exists a unique strategic p.i. P , such that σ is the strategy induced by P . Proof. This proof heavily relies on pages 12–20 of [15]. In particular, we refer to [15] for the notation, the notion of structure and the existence and properties of the function E(·, ·). Define P (g) := E(σ, g), P (g | x1 , . . . , xn ) := E σ[x1 , . . . , xn ], gx1 . . . xn ) n for every g in L, (x1 , . . . , xn ) in X and n in N. Then, P is a strategic p.i. Indeed, according to Theorems 1 and 2, pages 17–18 of [15], P satisfies (3) and (i)–(ii) of 0 subsection 2.1. Next, let P be any strategic p.i. inducing σ. We are showing that 0 0 P = P . If g is constant, then P (g) = P (g). Likewise, for every n ∈ N, (x1 , . . . , xn ) ∈ n 0 X and g ∈ L, if gx1 . . . xn is constant, then P (g | x1 , . . . , xn ) = P (g | x1 , . . . , xn ). Fix 0 an ordinal γ > 0, and suppose that P (g) = P (g) for every g ∈ L with structure less 0 n than γ, and P (g | x1 , . . . , xn ) = P (g | x1 , . . . , xn ) for every n ∈ N, (x1 , . . . , xn ) ∈ X and g ∈ L such that gx1 . . . xn is of structure less than γ. Let f ∈ L be of structure 0 γ. Then, for every x ∈ X, f x is of structure less than γ. Hence, P (f | x) = P (f | x), so that strategicity implies Z Z 0 0 P (f ) = P (f | x) σ0 (dx) = P (f | x) σ0 (dx) = P (f ). Similarly, if f ∈ L and f x1 . . . xn is of structure γ, strategicity implies P (f | 0 x1 , . . . , xn ) = P (f | x1 , . . . , xn ), and this concludes the proof. ∗ One can notice that, with respect to significant statistical applications, L is large enough. For instance, if g is bounded and depends on a fixed finite number of ∗ ∗ coordinates, then g belongs to L . Moreover, L includes bounded functions depending on a random number of coordinates, say τ : Ω → N, provided that τ is a stopping time. Some significant characteristic aspects of strategic p.i.’s stand out of the following proposition, which involves a condition of conglomerability; see [10], [11], and [14]. Theorem 6. If L is a linear space such that L ⊃ B, then P : C → R is a strategic p.i., if and only if , (i), (ii) in subsection 2.1, and inf P (g | x) 5 P (g) 5 sup P (g | x), (4) x∈X x∈X inf P (g | x1 , . . . , xn , x) 5 P (g | x1 , . . . , xn ) 5 sup P (g | x1 , . . . , xn , x) x∈X x∈X well calibrated, coherent forecasting systems 89 n hold for every g ∈ L, (x1 , . . . , xn ) in X and n in N. Proof. The “only if” part is trivial, so that it suffices to prove that (i), (ii), and (4) imply (3). But, under (i) and (ii), the first of conditions (4) means that P (·) is ∞ conglomerable with respect to the partition {{x} × X }, and the second means that n ∞ P (· | x1 , . . . , xn ) is conglomerable with respect to the partition {X × {x} × X }. Thus, since L is a linear space and L ⊃ B, Theorem 3.1 of [3] implies that (4) is equivalent to (3). 2.4. Example of strategic predictive inference. Let X = R. According to [18], [19], in order to describe a situation of extremely vague a priori knowledge, a set of reasonable assumptions are: (a) P ({ω: xi 6= xj for i 6= j and i, j 5 n}) = 1; ∞ ∞ (b) P (Bj1 × · · · × Bjn × R ) = P (B1 × · · · × Bn × R ) for every permutation (j1 , . . . , jn ) of (1, . . . , n), and every n-tuple of intervals of the form Bi = (−∞, xi ]; n ∞ (c) P (R ×Ii ×R | x1 , . . . , xn ) = 1/(n+1) for every partial history (x1 , . . . , xn ) without ties, Ii denoting the open interval (x(i−1) , x(i) ), i = 1, . . . , n + 1, with x(0) = −∞, x(n+1) = +∞ and x(1) , . . . , x(n) the order statistics of x1 , . . . , xn . The set of conditions (a), (b), and (c), assessed for a particular n, represents Hill’s An -model. We now single out a strategic p.i. satisfying An for every n. Following [6] and [19], one assigns any strategy σ such that 1 σ0 (−∞, x) = σ0 (x, +∞) = for every x in R, 2 ! n X 1 σ0 + dxi and σn (x1 , . . . , xn ) = n+1 i=1 where dx is a probability for which dx ((x − ε, x)) = dx ((x, x + ε)) = 12 for every ε > 0. In view of Theorem 5, σ admits a unique extension P as a strategic p.i. and, moreover, it is easy to show that P meets An , for every n. Clearly, the previous P is not countably additive. In fact, in [18] one shows that An is incompatible with countable additivity in the framework of Kolmogorov’s theory of conditional expectations. We now show that if strategicity is not required then a countably additive p.i., agreeing with An , exists. By “a countably additive p.i.,” we mean a p.i. P such that P (·) and every P (· | x1 , . . . , xn ) are countably additive on L, or at least on some relevant subalgebra of L. ∞ n 0 0 Indeed, let L = {B × R : B Borel set in R , n ∈ N}, and let {P (·), P (· | x1 , . . . , xn )} be any family of countably additive probabilities on L satisfying (ii) of subsection 2.1 and 0 0 P I(x1 , . . . , xk ) = P I(x1 , . . . , xk , . . . , xn ) | x1 , . . . , xk = 0 n 0 for every (x1 , . . . , xk , . . . , xn ) in X and every k < n. In view of (i)–(iv), P is a p.i. 0 Furthermore, since the single probabilities composing P can be chosen independently of one another, it is plain that they can be chosen in such a way that An holds for every n. 3. Well calibrated predictive inferences. 3.1. Definition of well calibrated predictive inference. After stating the definition of predictive inference, we now deal with one of the checks of empirical validity, which one usually applies to appreciate the “correctness” of the forecasting 90 p. berti, e. regazzini, and p. rigo system. Indeed, the present section is concerned with the concept of well calibrated inference. ∞ n Throughout this section one assumes that L = {S × X : S ⊂ X , n ∈ N}, the set of all cylinders with finite-dimensional base, and that a p.i. P has been assessed on the corresponding class C. Let us introduce sequences (En ) and (hn ) such that ∞ E1 = B1 × X , ∞ E2 = X × B2 × X , 2 ∞ E3 = X × B3 × X , . . . with Bn ⊂ X for each n, and hn : Ω → {0, 1} depends only on the first (n − 1) coordinates of each history ω ∈ Ω, whenever n > 1, and h1 is constant. Then, for every ω = (x1 , x2 , . . .), let us put νn (ω) = n X hi (ω), n = 1, 2, . . . , i=1 n 1 X hi (ω) Ei (ω) if νn (ω) 6= 0, pn (ω) = νn (ω) i=1 0 if νn (ω) = 0, n X 1 hi (ω) P (Ei | x1 , . . . , xi−1 ) if νn (ω) 6= 0, πn (ω) = νn (ω) i=1 0 if νn (ω) = 0, where P (Ei | x1 , . . . , xi−1 ) := P (E1 ) if i = 1. Here is a real situation in which hn , νn , pn , and πn acquire a concrete meaning. If a meteorologist, every evening, assigns a probability to the event of precipitation within the next 24 hours, then En can be regarded as the event of precipitation on day n. Under these circumstances, the following is a plausible choice for hn : after fixing (p, δ) in [0, 1] × (0, +∞), set n (5) hn (ω) = 1 if |P (En | x1 , . . . , xn−1 ) − p| < δ, 0 otherwise. In other words, hn (n = 2) selects day n if and only if the conditional probability of precipitation on day n, given the partial history (x1 , . . . , xn−1 ), is suitably close to p. Consequently, νn represents the number of days in {1, . . . , n} in which the forecaster assigns an inference suitably close to p, πn is the mean of the inferences assessed on those very same days and pn the corresponding frequency of precipitation. A possible requirement for a p.i. is that pn and πn have the same asymptotic behavior. Clearly, this requires attaching a probability to the events E = ω: νn (ω) −→ +∞ as n → +∞ , F = ω ∈ E: pn (ω) − πn (ω) 6−→ 0 as n → +∞ , which usually are not finite-dimensional cylinders. Consequently, the p.i. P , assessed on C, must be extended. 0 Formally, according to [7], if P is a p.i. on {g | H: g ∈ L ∪ {E, F }, H ∈ H}, 0 0 agreeing with P on C, and if P (E) > 0, then P is said to be well calibrated when (6) 0 P (F ) = 0. 0 Note that (6) trivially holds whenever P (E) = 0. well calibrated, coherent forecasting systems 91 Even if the previous definition seems to be intuitively appealing, it cannot be considered as an operational one, since no human being can, in general, decide whether an inference is well calibrated or not. This state of things depends on the fact that sets E and F involve limit conditions which, in general, cannot be observed. Hence, as a property of statistical procedures, well calibration is concretely unimportant. At the very most, it expresses an unconfirmable belief, which as we will show shortly, is usually independent of the specification of P . These remarks lead us to introduce a different notion of well calibration, based on a finitary, observable condition, which will be denominated finitary well calibration. More precisely: A p.i. P on C is said to be finitarily well calibrated if n o inf lim inf P ω: max pj (ω) − πj (ω) 5 ε, νn (ω) > c ε,c>0 n k n5j 5n+k (7) = inf lim P ω: νn (ω) > c c>0 n whenever inf c>0 limn P ({ω: νn (ω) > c}) > 0. Condition (7) is a bit cumbersome but, unlike (6), can be tested just by relying on the p.i. P actually assessed on C, without selecting any of its extensions. Of course, the need for (7) is less pressing whenever one is forced to adopt a particular extension, for instance, if one is committed to fixing a countably additive P and, 0 then, to extending it in a countably additive way. In this connection, note that if P is countably additive, then (6) and (7) are equivalent. Note also how the requirement 0 that inf c>0 limn P ({ω: νn (ω) > c}) > 0, like its nonfinitely counterpart P (E) > 0, is logically compelling in order to make (7) meaningful. Examples 2 and 3 will show that neither well calibration implies finitary well calibration nor finitary well calibration implies well calibration. 3.2. Well calibration and Kolmogorov’s theory. In this subsection, P is 0 assumed to be strategic and σ denotes the strategy induced by P . Moreover, P (·) 0 and P (· | x1 , . . . , xn ) stand for the so called Lebesgue-like extensions of P (·) and P (· | x1 , . . . , xn ), respectively. We send back to [13] and [22] for the definition and properties 0 of Lebesgue-like extensions. For our purposes it suffices to note that, setting C = {g | 0 0 0 H: g ∈ L , H ∈ H}, where L is the σ-algebra generated by L, P is a strategic p.i. on 0 C . A further remark is that, under the usual assumptions of countable additivity and 0 measurability, P coincides with the countably additive extension of P . More precisely, let D be a σ-algebra of subsets of X and let us suppose that σ0 and every σn (x1 , . . . , xn ) are countably additive when restricted to D; (x1 , . . . , xn ) −→ σn (x1 , . . . , xn )(B) is a n n D -measurable function on X for every n and B ∈ D. ∗∗ Then, P (·) and every P (· | x1 , . . . , xn ) are countably additive on L := {S × ∞ n ∞ ∗∗ 0 X : S ∈ D , n ∈ N}, and also, denoting by D the σ-algebra generated by L , P (·) 0 ∞ and P (· | x1 , . . . , xn ) coincide on D with the countably additive extensions of P (·) ∞ and P (· | x1 , . . . , xn ) to D . 0 To sum up, P is a strategic extension of P and, in case P is assessed in line with 0 the Kolmogorov theory, P is just the Kolmogorovian extension of P . We now show 0 that, in addition, well calibration holds with respect to P . (What follows is an attempt to make the notation less cumbersome. Whenever a function g on Ω depends only on the first n coordinates of each history ω = (x1 , . . . , xn , . . .), g(x1 , . . . , xn ) is sometimes used instead of g(ω).) 0 Theorem 7. Let P be a strategic p.i. on C, and let P be its Lebesgue-like exten0 sion. Then, P (F ) = 0. 92 p. berti, e. regazzini, and p. rigo Proof. With reference to, and using the same notation as, Theorem 6.2 in [5, p. 353], let us define Yn := hn En and Mn (x1 , . . . , xn−1 ) := P (Yn | x1 , . . . , xn−1 ) = hn (x1 , . . . , xn−1 ) P (En | x1 , . . . , xn−1 ), 2 Vn (x1 , . . . , xn−1 ) := P (Yn − Mn ) | x1 , . . . , xn−1 . Pn P∞ 2 Let i=1 Vi (ω) = +∞}. Since (Yn − Mn ) 5 hn , one obtains i=1 Vi 5 Pn A = {ω: h = ν . Thus, setting n i=1 i an,ε (ω) = n X i=1 !1/2 Vi (ω) log n X !1/2+ε Vi (ω) , ε > 0, i=1 then an,ε (ω) 5 νn (ω) whenever ω ∈ A and n is sufficiently large. Now, Chen’s thePn 0 c −1 Pn orem states that i=1 (Yi − Mi ) converges, P (·)-a.s. on A , and (an,ε ) i=1 (Yi − Pn 0 Mi ) −→ 0, P (·)-a.s. on A. Hence, noting that i=1 (Yi − Mi ) = νn (pn − πn ), one obtains −1 0 0 ω ∈ E ∩ A: an,ε (ω) νn (ω) pn (ω) − πn (ω) −→ 0 P (E) = P 0 c ω ∈ E ∩ A : νn (ω) (pn (ω) − πn (ω) converges +P 0 ω ∈ E ∩ A: pn (ω) − πn (ω) −→ 0 5P 0 c 0 ω ∈ E ∩ A : pn (ω) − πn (ω) −→ 0 = P (E\F ). +P In view of the previous remarks, Theorem 7 directly yields the following proposition, due to Dawid [7]. 0 0 P (F ) = 0, whenever P is assessed according to the Kolmogorov theory of conditional probability. In order to appreciate the contribution of de Finetti’s theory to the analysis of the concept of well calibrated inference, let us quote some of the comments made by Dawid on the above proposition. 0 Even if P (F ) = 0, “in practice . . ., it is rare for probability forecasts to be well calibrated (so far as can be judged from finite experience) and no realistic forecaster would believe too strongly in his own calibration performance. We have a paradox: an event can be distinguished (easily, and indeed in many ways) that is given subjective probability one and yet is not regarded as “morally certain”” [7, p. 608]. Subsequently, Dawid concludes that the previous alleged paradox has “destructive implications for the theory of coherence” [7, abstract]. In fact, this line of reasoning rests on a misunderstanding: Dawid thinks that the only coherent p.i.’s are those assessed in conformity with Kolmogorov’s notion of conditional probability. Hence, he considers the thesis of the corollary to Theorem 7 as the only one compatible with the coherence principle. It is then about time that one carefully examines what really happens, regarding well calibration, within the theory of coherence. 3.3. Well calibration and the theory of coherence. We begin by showing that, under very weak assumptions, a coherent forecaster can either expect to be well calibrated or, instead, give strictly positive probability to the event F of miscalibration. 0 Theorem 8. Let P be a p.i. on C, and let P be any coherent extension of P to {g | H: g ∈ L ∪ {E}, H ∈ H}. If well calibrated, coherent forecasting systems 93 (j) F ∩ A 6= ∅, (E\F ) ∩ A 6= ∅ for every A in L with A ∩ E 6= ∅, then, given 0 0 00 00 α ∈ [0, P (E)], there is a further coherent extension of P , P , such that P (F ) = α. 0 0 Moreover , P can be taken such that P (E) = 1, whenever (jj) E ∩ A 6= ∅ for every A in L\{∅}. Proof. We need the following claim. Claim. Let A be an algebra of subsets of Ω, B ∈ A and V ⊂ B. If D ∩ V 6= ∅ whenever D ∈ A and D ∩ B 6= ∅, then every probability µ on A can be extended to 0 0 a probability µ on A ∪ {V } in such a way that µ (V ) = µ(B). ∗ In fact, for C ⊂ Ω let µ (C) = inf{µ(H): C ⊂ H ∈ A}. Then, since no proper ∗ subset of B, which belongs to A, can cover V , it must be µ (V ) = µ(B). Hence, the claim follows from Theorem 3.3.3 of [4, p. 73]. Now, let A be the algebra generated by L ∪ {E}, G the algebra generated by 0 A ∪ {F }, and Q any coherent extension of P to {g | H: g ∈ A, H ∈ H}. We ∗ are showing that there are two p.i.’s Q1 and Q2 , extending Q to C = {g | H: g ∈ 0 G, H ∈ H}, such that Q1 (F ) = 0 and Q2 (F ) = P (E). After proving this, it is an 0 easy consequence of the principle of coherence that, for every α ∈ [0, P (E)], there is 00 ∗ 00 a p.i. P on C , extending Q and such that P (F ) = α. Let us start with Q1 . By (j) and the Claim (applied with B = E and V = E\F ), there is a probability on G, say S(·), extending Q(·) and such that S(F ) = 0. Let Z be the set of partial histories (x1 , . . . , xn ) such that Q(I(x1 , . . . , xn )) = 0. Being a “part” of a p.i., {Q(· | x1 , . . . , xn ): (x1 , . . . , xn ) ∈ Z} is a coherent family of probabilities on A. Hence, by Theorem 1, it can be extended to a coherent family of probabilities on ∗ G, say {S(· | x1 , . . . , xn ): (x1 , . . . , xn ) ∈ Z}. Let Q1 be the function on C defined by Q1 (·) = S(·), Q1 (· | x1 , . . . , xn ) = S(· | x1 , . . . , xn ) for (x1 , . . . , xn ) ∈ Z, and / Q1 (g | x1 , . . . , xn ) = S(gI(x1 , . . . , xn ))/Q(I(x1 , . . . , xn )) for g ∈ G and (x1 , . . . , xn ) ∈ ∗ Z. By (i)–(iv) of subsection 2.1, Q1 is easily seen to be a p.i. on C . Moreover, the existence of Q2 can be shown precisely as that of Q1 , after interchanging F with E\F . This proves the first part of the theorem. As to the remaining part, it can be argued following the same line of reasoning used in the first. Condition (j) of Theorem 8 states that F and E\F are tail subsets of E, in the sense that a point of E belongs to F or to E\F independently of its first n coordinates, whatever n may be. Furthermore, Theorem 8 definitively dispels the doubts that Dawid raised about the value of the theory of coherence. As a matter of fact, coherence does not compel one to ignore the event of miscalibration. This situation, as the following example witnesses, can occur even within very common statistical contexts. Example 1. Let X = R and let σ be any strategy such that, when restricted to {(−∞, x]: x ∈ R}, σ0 coincides with a normal law with parameter (0, 2) and σn (x1 , . . . , xn ) coincides with a normal law with parameter ((n/(n+1)) xn , (n+2)/(n+ Pn 1)), where xn = i=1 xi /n. The above σ is consistent with the model of a sequence of independent N (θ, 1) random variables, θ being a N (0, 1) random variable. Let P be the strategic p.i. induced by σ on C, let Bn = (−∞, 0], and let hn be defined as in (5) with p = 12 and δ any positive number. For ω = (x1 , . . . , xn , . . .), let Pn 0 xn (ω) = i=1 xi /n. If ω is such that xn (ω) → 0, then ω belongs to E. Moreover, if ω 0 coincides with ω up to a finite number of coordinates, then ω ∈ E. Hence condition (jj) holds. Fix A ∈ L with A ∩ E 6= ∅, and let (y1 , . . . , ym ) be such that the history ∞ (y1 , . . . , ym , ω) := (y1 , . . . , ym , x1 , x2 , . . .) is in A for all ω ∈ X . Taking ω such that xi > 0 and xn (ω) → 0, (y1 , . . . , ym , ω) ∈ A ∩ F . Likewise, choosing ω such that xi > 0, xi+1 < 0 and xn (ω) → 0 sufficiently fast, (y1 , . . . , ym , ω) ∈ A ∩ (E\F ). Hence 94 p. berti, e. regazzini, and p. rigo condition (j) holds, too. In view of Theorem 8, coherent extensions of P can be found, say P1 and P2 , such that P1 (E) = P2 (E) = 1, P1 (F ) > 0 and P2 (F ) = 0. Plainly, P2 is well calibrated and P1 is not. The following proposition provides conditions for a p.i. to be finitarily well calibrated. In spite of the formal analogy with Theorem 7, such a proposition is definitely more consistent with the finitary nature of real statistical problems. In fact, it does not assume any hypothesis on possible infinite-dimensional extensions of P . Instead, Theorem 7 is proved under the adoption of a particular infinite-dimensional extension of P . On the other hand, the proof of the new proposition involves some fine technical devices, which the proof of Theorem 7 does not actually require. Theorem 9. Let P be a strategic p.i. on C, such that lim P ω: νn (ω) > c = 1 n for every c > 0. Then, P is finitarily well calibrated. Proof. Since the proof we are able to show is quite long, let us first state three claims whose proof will be postponed anyway. Moreover, since P is strategic, according to Theorem 5 it can be uniquely extended as a strategic p.i. on {g | H: g is a bounded r.q. depending only on a fixed finite number of coordinates, H ∈ H}. Let us denote such an extension again with P . For ω = (x1 , x2 , . . .) and i ∈ N, set ψi (ω) = Ei (ω) − P (Ei | x1 , . . . , xi−1 ), yi (ω) = hi (ω)/νi (ω) if νi (ω) > 0, and yi (ω) = 0, otherwise. Further, let n X yi (ω) ψi (ω) if j 5 n, Mjn (ω) = i=j 0 otherwise. Note that, for r 5 q 5 s, Mrs = Mrq + Mq+1,s . Claim 1. For 1 5 r < s 5 n, P (yr ψr ys ψs ) = 0. Claim 2. For every m, n ∈ N with m 5 n and every a > 0, ! m ∞ [ mX 1 5 2 |Mjn | > a P 2. a i=1 i j=1 Claim 3. n = m = n0 For every ε, δ > 0 there is n0 such that for each m, n, k ∈ N with P |Mj,n+i | 5 ε, m 5 j 5 n + i, 0 5 i 5 k > 1 − δ. We now prove that Claims 2–3 imply Theorem 9. Let Un (ω) = pn (ω) − πn (ω) for all n ∈ N and ω ∈ Ω. Since νn → +∞ in probability, it suffices to show that ∀ ε, δ > 0 ∃ n1 ∈ N: n = n1 =⇒ P |Un | 5 ε, . . . , |Un+k | 5 ε > 1 − δ, ∀ k ∈ N. Note that Un = 0 whenever νn = 0, and otherwise, Un = n n n n n i 1 X 1 XX 1 X X 1 X νi yi ψi = hj y i ψ i = hj yi ψi = hj Mjn . νn i=1 νn i=1 j=1 νn j=1 i=j νn j=1 Hence, after fixing ε, c > 0 and n, m, k ∈ N with m 5 n, one obtains P |Un | 5 ε, . . . , |Un+k | 5 ε well calibrated, coherent forecasting systems 95 m ( )! k n+i \ X 1 X =P hj Mj,n+i + hj Mj,n+i 5 ε νn+i j=1 i=0 j=m+1 )! ( k m n+i ε ε \ 1 X 1 X =P hj Mj,n+i 5 , hj Mj,n+i 5 2 νn+i 2 νn+i j=1 i=0 j=m+1 )! ( k m ε \ 1 X =P hj Mj,n+i 5 2 ν n+i i=0 j=1 )! ( k n+i ε \ 1 X −1 hj Mj,n+i 5 +P 2 νn+i j=m+1 i=0 )! ( m k \ ε 1 X =P Mj,n+i 5 , Mj,n+i 5 c, 1 5 j 5 m νn+i j=1 2 i=0 ε −1 +P Mj,n+i 5 , m + 1 5 j 5 n + i, 0 5 i 5 k 2 )! ( k \ ε mc + P Mj,n+i 5 c, 1 5 j 5 m, 0 5 i 5 k =P 5 νn+i 2 i=0 Mj,n+i 5 ε , m 5 j 5 n + i, 0 5 i 5 k −2 +P 2 m k [ [ mc ε Mjn > c Mn+1,n+i > c −P −P =P 5 νn 2 2 2 j=1 i=1 Mj,n+i 5 ε , m 5 j 5 n + i, 0 5 i 5 k +P − 1. 2 (8) Now, four probabilities are involved in (8). Let δ > 0. By Claim 3, there is n0 such that for every m = n0 and every k and n = m, the third probability is less than δ/4, whenever c > ε, while the fourth is greater than 1 − δ/4. Fix m = n0 . By Claim 2, there is c > ε such that the second probability is less than δ/4. Finally, since νn → +∞ in probability, one can take n1 sufficiently large so that the first probability is greater than 1 − δ/4, for all n = n1 . n In what follows, for n ∈ N, γn denotes the probability on the power set of X ∞ n given by γn (S) = P (S × X ) for all S ⊂ X . Proof of Claim 1 of Theorem 9. By strategicity, and since P (ψs |x1 , . . . , xs−1 ) = 0, Z P (yr ys ψr ψs ) = Z = P (yr ys ψr ψs | x1 , . . . , xs−1 ) dγs−1 yr (x1 , . . . , xr−1 ) ys (x1 , . . . , xs−1 ) ψr (x1 , . . . , xr ) × P (ψs | x1 , . . . , xs−1 ) dγs−1 = 0. Proof of Claim 2 of Theorem 9. By Claim 1 and Chebyshev’s inequality, P m [ |Mjn | > a j=1 ! 5 m X 1 j=1 2 a P 2 Mjn = m X 1 j=1 2 a P n X i=j ! 2 2 yi ψi 5 ∞ mX 1 2 a i=1 2. i 96 p. berti, e. regazzini, and p. rigo Proof of Claim 3 of Theorem 9. Fix ε > 0, and m, n, k ∈ N with n = m. Set C1 = |Mjn | 5 ε, m 5 j 5 n , C2 = |Mn+1,n+k | 5 ε , C3 = |Mj,n+k | 5 ε, n + 2 5 j 5 n + k . For ω ∈ C1 ∩ C2 ∩ C3 , m 5 j 5 n + i and 0 5 i 5 k, it must be |Mj,n+i (ω)| 5 3ε, since Mm+p,n+i = Mm+p,n + Mn+1,n+i = Mm+p,n + Mn+1,n+k − Mn+i+1,n+k for p = 0, . . . , n − m and i = 0, . . . , k, and Mn+1+p,n+i = Mn+1+p,n+k − Mn+i+1,n+k for p = 0, . . . , i − 1 and i = 1, . . . k. Hence, it suffices to show that ∀ ε, δ > 0 ∃ n0 such that ∀ r, s ∈ N, with s = r P |Mjs | 5 ε, r 5 j 5 s > 1 − δ. (9) = n0 , We now prove (9). To this end, some new notation is needed: Ap = ω: νr (ω) = p , p = 0, . . . , r, ρ(i) (ω) = min q: νq (ω) = νr (ω) + i , f (t)(ω) = ρ t − νr (ω) (ω) [roughly, f (t)(ω) is the first index q such that νq (ω) = t , ( !) 2 q νj (ω) t(j)(ω) = max r, f , √ √ p + 1 for p = 0, . . . , r and β = s − 1, o n 2 2 2 Dp = α (p), α(p) + 1 , . . . , (β + 1) . α(p) = Then, P s [ |Mjs | > ε j=r ! 5 P ε ∃ j = r, . . . , s with Mt(j),s > , 2 ε or ∃ j = r, . . . , s with Mjs − Mt(j),s > 2 ε (∗) 5P Mrs > 2 r X ε + (∗∗) P ∃ j = r + 1, . . . , s with Ap Mt(j),s > 2 p=0 r X ε + (∗ ∗ ∗) . P ∃ j = r + 1, . . . , s with Ap Mjs − Mt(j),s > 2 p=0 We treat (∗), (∗∗), and (∗ ∗ ∗), separately. well calibrated, coherent forecasting systems 97 Let us start with (∗). By Claim 1, and for every fixed k ∈ N, ! s X ε 4 2 Mrs > 5 2 P yi P 2 ε i=r 5 ∞ X 1 4 ε 2 i=1 2 i P {νr 5 k} + ∞ X 1 i=k+1 2 i ! P {νr > k} . Thus, since νr → +∞ in probability, it is easily verified that (∗) converges to 0 as r → +∞, uniformly in s. Let us turn now to (∗∗). )! ( r s X ε X X P Ap yi ψi > (∗∗) 5 2 2 2 p=0 q ∈Dp (10) 5 i=ρ(q −p) r X X 4 p=0 q 2 ∈Dp ε 2P s X Ap !2 ! = yi ψi r X X 4 p=0 q 2 ∈Dp i=ρ(q 2 −p) ε 2P s X Ap ! 2 2 yi ψi i=ρ(q 2 −p) where the last equality will be proved later. Then, (∗∗) 5 ∞ r X X 4 ∞ X 1 2 2 (i + j) ∞ r r X X 4 1 X 1 P (A 5 + ) = a(p) P (Ap ), p 2 4 2 i i p=0 i=α(p) ε p=0 p=0 i=α(p) where a(p) = P∞ i=α(p) (4/ε r X 2 ε 2 P (Ap ) 4 j=0 2 ) (1/i + 1/i ). Moreover, for 0 5 M < r, a(p) P (Ap ) = p=0 M X a(p) P (Ap ) + p=0 5 M X r X a(p) P (Ap ) p=M +1 a(p) P (Ap ) + a (M + 1) p=0 5 sup a(p) 1 − P {νr > M } + a (M + 1). p Now, since a(p) ↓ 0 and νr → +∞ in probability, there is M such that a(M + 1) < ε and, fixed such M , there is r0 such that P ({νr > M }) > 1 − ε for all r = r0 . To sum up, (∗∗) converges to 0 as r → +∞, uniformly in s. 2 s−1 We still have to prove (10). Let Bt = {ρ (q − p) = t} and B = ∪t=p+1 Bt . Then, P Ap B X y i ψ i yj ψ j ρ(q 2 −p)5i<j 5s = s−1 X X t=p+1 t5i<j 5s = s−1 X X t=p+1 t5i<j 5s Z = s−1 X X P (Ap Bt yi ψi yj ψj ) t=p+1 t5i<j 5s P (Ap Bt yi ψi yj ψj | x1 , . . . , xj−1 ) dγj−1 Z Ap Bt yi ψi yj P (ψj | x1 , . . . , xj−1 ) dγj−1 = 0, 98 p. berti, e. regazzini, and p. rigo and this yields (10). Finally, let’s consider (∗ ∗ ∗). r X ε P ∃ j = r + 1, . . . , s with Ap Mjs − Mt(j),s > (∗ ∗ ∗) = 2 p=0 r X ε 2 5 P ∃ j with p 5 νj 5 α (p) and Ap Mt(j),j−1 > 2 p=0 r X 2 2 + P ∃ j with α (p) < νj 5 α(p) + 1 p=0 2 or α(p) + 1 < νj 5 2 α(p) + 2 ε 2 (β + 1) , and Ap Mt(j),j−1 > 2 r X ε 2 + . P ∃ j with (β + 1) < νj 5 s and Ap Mt(j),j−1 > 2 p=0 2 or · · · or β < νj 5 Let ϕi = ψi if i 5 s and ϕi = 0 otherwise. Then, r X h r ϕr ε h r ϕr ϕf (p+1) ε (∗ ∗ ∗) 5 P (A0 ) + > or Ap + > P Ap p 2 p p+1 2 p=1 h ϕ ϕf (p+1) ϕf (α2 (p)) ε + ··· + > or Ap r r + 2 p p+1 2 α (p) β r X X ϕf (q2 ) ϕf (q2 ) ϕf (q2 +1) ε ε + or Ap 2 + 2 P Ap 2 > > 2 2 q q q +1 p=1 q=α(p) ϕf (q2 ) ϕf ((q+1)2 ) ε > or · · · or Ap 2 + · · · + 2 2 q (q + 1) r X ϕf ((β+1)2 ) ε P Ap + 2 > 2 (β + 1) p=1 ϕf ((β+1)2 ) ϕf (s) ε > + · · · + or · · · or Ap 2 s 2 (β + 1) 2 r α (p)−p ϕf (p+1) ϕf (p+t) 2 4 X X h r ϕr + + ··· + 5 P (A0 ) + 2 P Ap p p+1 p+t ε p=1 t=0 + + β r 4 X X ε 2 (q+1)2 −q 2 p=1 q=α(p) r s−(β+1) 4 X X ε 2 X p=1 t=0 t=0 2 P Ap ϕf (q2 ) q 2 + ··· + ϕf (q2 +t) 2 2 q +t ϕf ((β+1)2 +t) 2 ϕf ((β+1)2 ) P Ap + · · · + 2 2 (β + 1) (β + 1) + t well calibrated, coherent forecasting systems r α (p)−p 4 X X 1 2 5 P (A0 ) + 2 ε 2 p (q+1)2 −q 2 X p=1 q=α(p) 2 r s−(β+1) 4 X X + ε 2 p=1 1 q t=0 + ··· + 2 t=0 p=1 β r 4 X X + (11) ε 4 1 4 (β + 1) t=0 1 (p + t) + ··· + P (Ap ) 2 1 2 P (Ap ) 2 (q + t) 1 + ··· + 2 2 ((β + 1) + t) P (Ap ) where the last inequality will be proved later. Next, r 1 4 X 2 1 1 α (p) − p + 1 P (Ap ) (∗ ∗ ∗) 5 P (A0 ) + 2 2 + 2 + ··· + 4 ε p=1 p (p + 1) α (p) + β r 4 X X 2 ε × + 5 1 4 + r 4 X 2 p=1 P (A0 ) + × + 1 2 2 (q + 1) ε 2 4 (β + 1) 2 2 α (p) − α(p) − 1 4 β r 4 X X 1 + 2 2 ((α(p) − 1) + 1) 2 s P (Ap ) +1 2 + ··· + 1 4 α (p) P (Ap ) 2 1 5 q 4 P (A0 ) + 1 2 2 (q + 1) r 4 X ε 2 + ··· + 1 P (Ap ) 4 (q + 1) r 4 X 2 2 + 2 (β + 2) − (β + 1) + 1 ε p=1 1 1 1 P (Ap ) × 4 + 2 2 + ··· + 4 (β + 1) ((β + 1) + 1) (β + 2) × + 1 (q + 1) − q + 1 p=1 q=α(p) 1 + ··· + p=1 1 2 P (Ap ) 4 (q + 1) 1 2 r 4 X 1 + ··· + s − (β + 1) + 1 (α(p) − 1) ε 2 p=1 q=α(p) q ε 2 (q + 1) − q + 1 β+1 X (2q + 2) p=1 q=α(p)−1 1 P (Ap ) 2 2 + ··· + 4 (q + 1) (q + 1) r 4 X X 2q + 1 1 5 P (A0 ) + 2 (2q + 2) + P (Ap ) 4 2 2 ε p=1 q=α(p)−1 q (q + 1) q × q 4 + 1 = P (A0 ) + r X p=1 a(p) P (Ap ), 99 100 p. berti, e. regazzini, and p. rigo where a(p) = X 4 ε 2 (2q + 2) q =α(p)−1 1 q 4 + 2q + 1 2 2 (q + 1) q . Pr To sum up, (∗ ∗ ∗) 5 P (A0 ) + p=1 a(p) P (Ap ), with a(p) ↓ 0. Following the same argument used for (∗∗), one can show that (∗∗∗) converges to 0 as r → +∞, uniformly in s. Let’s now turn to (11). As is easily seen, it suffices to show that P (Ap ϕf (i) ϕf (j) ) = 0 for p < i < j. To this purpose, let Dz = {f (j) = z}. Then, s X P Dz Ap ψf (i) ψf (j) P Ap ϕf (i) ϕf (j) = z=r = s Z X P Dz Ap ψf (i) ψf (j) | x1 , . . . , xz−1 dγz−1 z=r = s Z X Dz Ap ψf (i) P ψz | x1 , . . . , xz−1 dγz−1 = 0. z=r This completes the proof. We conclude the present subsection with a couple of examples in order to show that neither well calibration implies finitary well calibration nor finitary well calibration implies well calibration. Example 2 (well calibration does not imply finitary well calibration). Let X = N, hn ≡ 1 and Bn = B for all n, where B is the set of even numbers. Let P be any p.i. ∞ n ∞ such that P (B × X ) = P (X × B × X ) = b, for some b ∈ (0, 1) and all n. Setting ∞ n ∞ c0 = P (B × X ) and cn (x1 , . . . , xn , . . .) = P (X × B × X | x1 , . . . , xn ), one obtains Z n 1X (pn − πn ) dP = b − n i=1 Z ci−1 dP. R Since (pn − πn ) dP −→ 0 is necessary forPP toRbe finitarily well calibrated, it suffices n to choose cn in such a way that (1/n) i=1 ci−1 dP does not converge to b and the resulting P satisfies condition (j) of Theorem 8 (so that there are well calibrated coherent extensions of P ). For instance, set n (12) n 1X 1X cn (x1 , . . . , xn , . . .) = αn B(xi ) + βn D(xi ) + γn b n i=1 n i=1 where αn , βn , γn = 0, αn + βn + γn = 1 and D is some subset of X. Moreover, take n ∞ αn , βn , γn and D in such a way that αn → α, βn →P β, γnR → 0, P (X × D × X ) = d n for each n, with β > 0 and d 6= b. Clearly, (1/n) i=1 ci−1 dP does not converge to b. Fix A ∈ L\{∅}, and let (y1 , . . . , ym ) be such that (y1 , . . . , ym , ω) ∈ A for all ω. Taking ω1 such that all its coordinates are in B\D, and ω2 with all its coordinates in D ∩ B, then (y1 , . . . , ym , ω1 ) ∈ F ∩ A and (y1 , . . . , ym , ω2 ) ∈ (E\F ) ∩ A. Hence, condition (j) of Theorem 8 holds. We note that (12) is a rather common assessment in the insurance field, in the presence of the so-called “collateral data.” For instance, B could represent the event “occurrence of some accident,” and D any other event different from, but strictly related to, B. well calibrated, coherent forecasting systems 101 Example 3 (finitary well calibration does not imply well calibration). Consider Example 1 again but, for the sake of simplicity, take hn ≡ 1 instead of defining hn as in (5). According to Theorem 9, P is finitarily well calibrated. Hence, it suffices to find a coherent extension of P which is not well calibrated. This follows by Theorem 8, after noting that E = Ω and condition (j) can be checked precisely as in Example 1. 3.4. Further comments. Contrary to what Dawid writes in [7], there is no conflict between calibration and coherence, since there are coherent p.i.’s which do not ignore the event F of miscalibration; cf. Theorem 8. In any case, the probability attached to F rests on a subjective judgment, just as is the case with any other probability assessment. Clearly, there are specific coherent assignments which may be conducive to ignoring F . For instance, the assumption that P is strategic enables one 0 to choose a particular strategic extension of P , the Lebesgue-like extension P of P , 0 and Theorem 7 implies that P (F ) = 0. Incidentally, we note how Theorem 7 involves one particular, important though it might be, strategic extension of P . However, the 0 0 single probability laws P (·) and P (· | x1 , . . . , xn ) composing an arbitrary strategic 0 extension P of P are mutually linked in such a manner as to suggest that strategicity implies well calibration. More precisely, it is plausible that Theorem 7, and perhaps some form of the martingale convergence theorem, holds for every strategic extension of P . On the other hand, Theorem 7 does not state that the adoption of a strategic P on C is conducive to ignoring F . Indeed, the p.i. in Example 1 is strategic on C and F can be given strictly positive probability. Finally, according to Theorem 9, the assumption that P is strategic is enough for finitary well calibration to come true. In conclusion, it is the assumption that a p.i. is strategic, rather than its coherence, of concern to well calibration. REFERENCES [1] P. Berti, E. Regazzini, and P. Rigo, De Finetti’s coherence and complete predictive inferences, Quaderno I.A.M.I. 90. 5 Milano, 1990. [2] P. Berti, E. Regazzini, and P. Rigo, Coherent statistical inference and Bayes theorem, Ann. Statist., 19 (1991), pp. 366–381. [3] P. Berti and P. Rigo, Weak disintegrability as a form of preservation of coherence, J. Italian Statist. Soc., 1 (1992), pp. 161–181. [4] K. P. S. Bhaskara Rao and M. Bhaskara Rao, Theory of Charges, Academic Press, London, 1983. [5] R. Chen, On almost sure convergence in a finitely additive setting, Z. Wahrscheinlichkeitstheorie verw. Gebiete, 37 (1977), pp. 341–356. [6] M. D. Cifarelli and E. Regazzini, Sopra una versione finitamente additiva del processo di Ferguson–Dirichlet, in Scritti in omaggio a L. Daboni, Ed. Lint, Trieste, 1990, pp. 67–81. [7] P. Dawid, The well calibrated Bayesian (with discussion), J. Amer. Statist. Assoc., 77 (1982), pp. 605–613. [8] P. Dawid, Present position and potential developments: Some personal views, statistical theory, the prequential approach (with discussion), J. Roy. Statist. Soc., Ser. A, 147 (1984), pp. 278– 292. [9] P. Dawid, Fisherian inference in likelihood and prequential frames of reference (with discussion), J. Roy. Statist. Soc., Ser. B, 53 (1991), pp. 79–109. [10] B. De Finetti, Sulla proprieta’ conglomerativa delle probabilita’ subordinate, Rend. R. Istituto Lombardo di Scienze e Lett., 63 (1930), pp. 414–418. [11] B. De Finetti, Sull’impostazione assiomatica del calcolo delle probabilità, Annali Triestini dell’Università di Trieste, 19 (1949), pp. 29–81; (English translation in Probability, Induction and Statistics, Wiley, New York, 1972). [12] B. De Finetti, Teoria delle probabilità, Einaudi, Torino, 1970 (English translation: Theory of Probability, Wiley Classics Library Edition, Chichester, 1990). 102 p. berti, e. regazzini, and p. rigo [13] L. E. Dubins, On Lebesgue-like extensions of finitely additive measures, Ann. Probab., 2 (1974), pp. 456–463. [14] L. E. Dubins, Finitely additive conditional probabilities, conglomerability and disintegrations, Ann. Probab., 3 (1975), pp. 89–99. [15] L. E. Dubins and L. J. Savage, How to gamble if you must (Inequalities for stochastic processes), McGraw-Hill, New York, 1965; Inequalities for Stochastic Processes (How to Gamble if You Must), Dover, New York, 1976. [16] N. Dunford and J. T. Schwartz, Linear operators: Part I, General Theory, Interscience, New York, 1958. [17] S. Geisser, Predictive analysis, in Encyclopedia of Statistical Sciences, Wiley-Interscience, New York, 7 (1986), pp. 158–170. [18] B. Hill, Posterior distribution of percentiles: Bayes theorem for sampling from a population, J. Amer. Statist. Assoc., 63 (1968), pp. 677–691. [19] B. Hill, Parametric models for An : Splitting processes and mixtures, J. Roy. Statist. Soc., Ser. B, 55 (1993), pp. 423–433. [20] A. N. Kolmogorov, Grundbegriffe der Wahrscheinlichkeitsrechnung, Springer, Berlin, 1933 (English translation: Foundations of the Theory of Probability, Chelsea, New York, 1950). [21] J. Neveu, Bases mathématiques du calcul des probabilités. 2éme ed., Masson, Paris, 1980. [22] R. A. Purves and W. D. Sudderth, Some finitely additive probability, Ann. Probab., 4 (1976), pp. 259–276. [23] M. M. Rao, Conditional Measures and Applications, Marcel Dekker, New York, 1993. [24] E. Regazzini, Finitely additive conditional probabilities, Rend. Sem. Mat. Fis. Milano, 55 (1985), pp. 69–89 (corrections in Rend. Sem. Mat. Fis. Milano, Vol. 57, p. 599).