Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of randomness wikipedia , lookup
Probability box wikipedia , lookup
Probabilistic context-free grammar wikipedia , lookup
Inductive probability wikipedia , lookup
Birthday problem wikipedia , lookup
Ars Conjectandi wikipedia , lookup
Law of large numbers wikipedia , lookup
A Note on Probability, Frequency and Countable Additivity Zvonimir Šikić Thousands of students are taught every year that the probability of an event is the proportion of times the event would occur in a long run of repeated experiments. It means probability is long run frequency. The benefit of this definition is that it makes the proof of the main properties of probability (the axioms of probability) very easy. Even Kolmogorov used it in his “Empirical deduction of the axioms”, a chapter of his [K]. Namely, if n(A) is the number of times an event A occurs in n repeated experiments and fn(A) = n(A)/n is the corresponding frequency, then (where fn(B|A) is frequency of B given A). < Of course, (1)–(4)are the axioms of probability: 1. 0 ≤pr(A) ≤ 1, 2. pr(A) + pr(–A) = 1, 3. pr (A B) = pr(A) + pr(B), if A and B exclude each other, 4. pr(A B) = pr(A)pr(B|A), where pr(B|A) is probability of B given A. < But, there is a big problem here. Which fn is the probability? Is the probability of heads given by 100 tosses i.e. by f100, is it given by 1000 tosses i.e. by f1000 or what? How long should be the long run? 268 Zvonimir Šikić The longest run possible could circumvent the problem and the longest run possible is an infinite run. Hence: This solution to the problem creates new problems. In contrast to finite frequencies, limiting frequencies are unobservable (as Keynes said “in the long run we are all dead”). The limiting frequency has no empirical content. And we should not forget that two infinite sequences, which differ only at a beginning, could be treated as the same sequence if we are interested only in their long run frequencies (hence, there is no connection between limiting frequencies and finite observable frequencies). Of course, we could be interested in the mathematical content of limiting frequencies. After all, it is mathematical foundation of probability we could be interested in and not its applications. So, let us explore the mathematics of pr, which is defined as . First of all, this pr satisfies the probability axioms (1)-(4), exists. And it is easy because all fn satisfy them, but only if to construct examples of infinite runs with non-existent limiting frequencies. Here is a sequence of heads and tails with no limiting frequency of heads (or tails, for that matter): HT HT HHTT HHHHTTTT HHHHHHHHTTTTTTTT . . . It starts with HT HT and after that we have blocks with 2nH and 2nT, for every n> 0. If we stop after the n-th block, the frequency of heads will be 1/2 (because every block has the same number of heads and tails). If we stop in the middle of the n-th block, the frequency of heads will be: The frequency of heads oscillates between 1/2 and 2/3 i.e. the limiting frequency of heads does not exist. Furthermore, even if an infinite sequence of heads and tails has a limiting frequency, there are infinitely many subsequences of this sequence with whatever limiting frequency you like (and, of course, there are infinitely many of them with no limiting frequency). It means that A Note on Probability, Frequency and Countable Additivity 269 appropriately neglecting some tosses you get whatever you choose (so you’d better be sure you’ve seen all the tosses). Suppose, further that results of repeated “head-tail” experiments are space-time distributed in the following way: Heads are represented with the white points. Their coordinates are partial sums of the series: (2,3) + (2,3) + (2,3) + (2,3) + (2,3) + … Tails are represented with the black points. Their coordinates are partial sums of the series: (1,1) + (2,1) + (2,2) + (2,1) + (2,2) + (2,1) + (2,2) + … If you were tossing the coin, your time sequence of heads and tails is: TTH TTH TTH TTH … The limiting frequency of heads, in this sequence, is 1/3. This is your probability of heads. If I am inspecting the field of coins you tossed, my space sequence of heads and tails is: TH TH TH TH TH TH … 270 Zvonimir Šikić The limiting frequency of heads, in this sequence, is 1/2. This is my probability of heads. Should one answer be wright and another wrong? If you prefer one of them, think of Einstein’s special relativity. The solution proposed by R. von Mises, in [M], is to rule problematic sequences out. The sequences of experimental results should be random (von Mises’ term was collective) which means that: 1. they should have limiting frequencies, 2. these limiting frequencies should remain the same in every recursive subsequence of the given sequence (“recursive” was a Church’s clarification in [C]). Our “sequence of heads and tails with no limiting frequency of heads (or tails)” is ruled out by (1). The subsequences “with whatever limiting frequency you like” are ruled out by (2). Space-time sensitive limits are not ruled out. I suppose that the above example, which is a folk knowledge today, was not a folk knowledge in von Mises’ days. If it was, von Mises and Church would have added: 3. the limiting frequencies should remain the same in every recursive reordering of the given sequence. But there is no explanation why should an infinite sequence of repeated experiments produce collectives. Why should an infinite sequence of heads and tails satisfy (1)-(3)? A further problem for frequentists is Kolmogorov’s axiom of continuity (which is equivalent to his theorem of countable additivity). Kolmogorov said in [K] that “it is almost impossible to elucidate its empirical meaning, as has been done for [other] axioms”. Kolmogorov was thinking of frequencies fn as something with empirical meaning. If we move to limiting frequencies, the common opinion is that they violate countable additivity. Van Fraassen in [F] and many others offer counterexamples of the following kind. Consider an infinite lottery with tokens 1,2,3,4,… and let Dj = “token j is drawn”. Suppose that in an infinite sequence of draws (with replacements) none of the tokens is drawn infinitely many times. (Dj) = 0, for every j, and it follows that Then pr(Dj) = pr(D1) + pr(D2) + pr(D3) + pr(D4) + … = 0. On the other hand < < < < pr(D1 D2 D3 D4 …) = 1, A Note on Probability, Frequency and Countable Additivity 271 < D4 …) = 1≠ 0 = pr(D1) + pr(D2) + pr(D3) + < < < pr(D1 D2 D3 pr(D4) + … < < < < because D1 D2 D3 D4 … is a necessary event. Hence and this violates countable additivity. But why should pr(D1) + pr(D2) + pr(D3) + pr(D4) + … be 0? It is an indeterminate form ∞∙0 which could be anything (if you still remember your first course in calculus). As a matter of fact, in this particular case, it is easy to prove that it is 1(as it should be, according to countable additivity). Suppose, for example, that our infinite sequence of draws is: D4, D1, D9, D2, D4, D1, D7, D4, … The corresponding probabilities are: pr(D1) = lim (0/1, 1/2, 1/3, 1/4, 1/5, 2/6, 2/7, 2/8, …) = 0 pr(D2) = lim (0/1, 0/2, 0/3, 1/4, 1/5, 1/6, 1/7, 1/8, …) = 0 pr(D3) = lim (0/1, 0/2, 0/3, 0/4, 0/5, 0/6, 0/7, 0/8, …) = 0 pr(D4) = lim (1/1, 1/2, 1/3, 1/4, 2/5, 2/6, 2/7, 3/8, …) = 0 etc. If we sum all the limits up we get: pr (Dn) = lim ( 1, 1, 1, 1, 1, 1, 1, 1, …) = 1 The calculation is completely the same for every other sequence of draws. Hence, < < < < pr(D1 D2 D3 D4 …) = 1 = pr(D1) + pr(D2) + pr(D3) + pr(D4) + …, i.e. limiting frequencies satisfy countable additivity. My final conclusion is that limiting frequencies satisfy the probability axioms (1)-(4) (this was well known), but that they also satisfy countable additivity (this is a new result). Hence, limiting frequencies have no problems with probability axioms. Their problem is that they may not exist. It is possible that an infinite sequence of experimental results has no limiting frequency. (An appeal to the law of large numbers, “the probability of sequences with no limiting frequencies is 0”, is not available to the frequentist, because it presupposes that probability is defined independently of limiting frequencies. If it is not, the “law” becomes a complete triviality.) 272 Zvonimir Šikić Bibliography [C] A. Church, “On the Concept of a Random Sequence”, Bull. of the American Math. Society 46 (1940), pp. 130-135. [F] B.C. van Fraassen, “Relative Frequencies”, in W. C. Salmon (ed.), Hans Reichenbach, Logical Empiricist, Reidel, 1979, pp. 133-166. [K] A. N. Kolmogorov, Foundations of the Theory of Probability, Chelsea, 1956 (German original 1933). [M] R. von Mises, Probability, Statistics and Truth, Macmillan 1957 (German original 1936).