Download A weak first digit law for a class of sequences

International Mathematical Forum, Vol. 11, 2016, no. 15, 697 - 702 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of Mathematics and Geospatial Sciences RMIT University, Melbourne, Victoria 3001, Australia c 2016 M. A. Nyblom. This article is distributed under the Creative ComCopyright mons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract A non-ergodic approach is employed to establish Benford’s law for the leading digit d = 1 for the sequence of powers of two. For the sequence of powers {αn }, 1 < α ≤ 10/9, this method is extended to obtain a weak first digit law by establishing Benford like lower and upper bounds to the asymptotic relative frequency of terms with a given leading digit. Mathematics Subject Classification: 11A99, 11K36 Keywords: Frequency distribution, Benford’s law 1 Introduction When examining numerical data which is derived from large naturally occurring data sets such as population, lengths of rivers or stock prices, often (but not always), a curious phenomena known as Bendford’s law is observed. This phenomenological law concerns the frequency distribution of the leading digits of numerical data, which typically spans many orders of magnitude. Such a set of numerical data will satisfy Benford’s law if the proportion of data values having a leading digit of d ∈ {1, 2, . . . , 9} is approximately log10 (1 + d1 ). Thus with numerical data sets which obey this law, the number 1 would appear 698 M. A. Nyblom as a leading digit with a frequency of approximately 30%, while larger digits such as 9 would appear as a leading digit with a frequency of approximately 4.5%. Benford’s law was first empirically observed in 1881 by the American astronomer Simon Newcomb [2], but remained largely forgotten until 1938, when Frank Benford [1], a physicist, re-tested the phenomenon against many sources of data as well as on some infinite integer sequences. He found in both case, that the frequency distribution of the leading digit d exhibited a close adherence to the logarithmic rule. For a sequence of positive reals {an }, if one defines Nd (n) = #{0 ≤ k ≤ n : bak c has leading digit d} , d (n) = log10 (1+ d1 ). To establish then {an } will satisfy Benford’s law if limn→∞ Nn+1 that an integer sequence such as {2n } satisfies Benford’s law, one can appeal to an Ergodic property of the irrational numbers proved by Weyl [3]. Namely, if x is a positive irrational number, n a positive integer and bnxc the integer part of nx, then the sequence of fractional part of nx, given by c(n) = nx − bnxc, is homogeneously distributed over the interval [0, 1]. That is, given any arbitrary reals 0 ≤ a < b ≤ 1 then #{0 ≤ k ≤ n : c(k) ∈ (a, b)} =b−a . n→∞ n+1 lim (1) Now if 2n has a leading digit of d, then d · 10k < 2n < (d + 1) · 10k for some integer k. Upon taking logarithms one finds log10 (d) + k < n log10 (2) < log10 (d + 1) + k, and as log10 (d + 1) ∈ [0, 1], one deduces k = bn log10 (2)c, and so log10 (d) < c(n) < log10 (d + 1), where c(n) = n log10 (2) − bn log10 (2)c. Thus when ak = 2k with a = log10 (d) and b = log10 (d + 1), we have Nd (n) = #{0 ≤ k ≤ n : c(k) ∈ (a, b)}, from which the required frequency distribution follows via (1), as log10 (2) is irrational. In this article an entirely elementary proof will first be presented, which establishes the logarithmic rule for the frequency distribution of terms in the sequence {2n } having leading digit d = 1. This is possible, as one can show for an = 2n that N1 (n) = bn log10 (2)c + 1, via an argument based on counting the number of integers in an interval. By applying a variation to this argument, one can then obtain a weak first digit law for the , where in particular it is shown that sequence an = αn , for a fixed 1 < α ≤ 10 9 for all n Nd (n) Ln ≤ ≤ Un , n+1 with Ln ∼ log10 (1 + d1 ) − log10 (α) and Un ∼ log10 (1 + d1 ) + log10 (α) as n → ∞. Although this result is only suggestive of the logarithmic rule of Benford’s law, the advantage of the approach taken here rests on the non-ergodic nature of the argument employed. A weak first digit law for a class of sequences 2 699 First digit law for powers of two We begin with a technical lemma concerning the counting of integers in an interval of the form [a, b). In what follows, define for a real number x, the floor and ceiling of x as bxc = max{n ∈ Z : x ≥ n} and dxe = min{n ∈ Z : n ≥ x} respectively. Furthermore, we shall make use of the fact that bx + nc = bxc + n for all n ∈ Z and x ∈ R. Lemma 2.1 Suppose 0 ≤ a < b with b − a = L, then the number of integers contained in the interval [a, b) is either bLc or bLc + 1. Proof: Let I(a, b) denote the number of integers contained in the interval [a, b). We note that for two integers 0 ≤ m ≤ n, the number of integers in the interval [m, n] is equal to n − m + 1. Case 1: b 6∈ N. In this instance I(a, b) = bbc − dae + 1 since [dae, bbc] ⊆ [a, b). As bbc ≤ b and dae ≥ a one finds I(a, b) ≤ b − a + 1, but as bb − a + 1c is the largest integer less than or equal to b − a + 1, we deduce that I(a, b) ≤ bb − a + 1c = bb − ac + 1 = bLc + 1. Now as dae < a + 1, then b − dae > b − a − 1 ≥ bb − a − 1c and as bb − daec = bbc − dae is the largest integer less than or equal to b − dae, we further deduce bbc − dae ≥ bb − a − 1c = bb − ac − 1, that is I(a, b) ≥ bLc. Hence I(a, b) is either equal to bLc or bLc + 1. Case 2: b ∈ N. If a ∈ N, then I(a, b) = (b − 1) − a + 1 = b − a = bLc, since [a, b − 1] ⊆ [a, b). Alternatively if a 6∈ N, then I(a, b) = (b − 1) − (bac + 1) + 1 = b − bac − 1, since [dae, b − 1) ⊆ [a, b) and dae = bac + 1. But as bac < a < bac + 1 we find b − bac > b − a > b − bac − 1, that is I(a, b) + 1 > b − a > I(a, b) and so I(a, b) = bb − ac = bLc. Theorem 2.1 The frequency distribution of the terms in the sequence {2n } having leading digit d = 1, is log10 (2). Proof: Recall N1 (n) = #{0 ≤ m ≤ n : 2m has leading digit 1}. If 2m has k digits and leading digit 1, then 10k−1 ≤ 2m < 2 · 10k−1 . Upon taking logarithm to base ten yields k − 1 ≤ m log10 (2) < log10 (2) + k − 1 . (2) Now consider the continuous linear function f (x) = log10 (2)x, and suppose the interval of values on which f (x) satisfies (2) is [a, b). As f (b) − f (a) = log10 (2), we deduce that the interval [a, b) must have unit length. Thus from Lemma 2.1, the interval [a, b) can contain either one or two positive integers. To 700 M. A. Nyblom show there is exactly one such integer in [a, b), suppose there are two integers m1 < m2 such that both 2m1 and 2m2 have k digits with leading digit 1. As m2 ≥ m1 + 1, observe 2m2 − 2m1 ≥ 2m1 (21 − 1) = 2m1 and so 2m2 − 2m1 will have at least k digits, but as both 2m2 and 2m1 have k digits with a leading digit 1, the difference 2m2 − 2m1 can have at most k − 1 digits, a contradiction. Hence there is exactly one power of two having k digits with leading digit 1. Consequently N1 (n) must be equal to the positive integer k such that 2n ∈ [10k−1 , 2 · 10k−1 ) ∪ [2 · 10k−1 , 10k ) or equivalently k − 1 ≤ n log10 (2) < k, that is k − 1 = bn log10 (2)c and so N1 (n) = bn log10 (2)c + 1. Finally as x − 1 < bxc ≤ x one finds that n log10 (2) N1 (n) n log10 (2) + 1 < ≤ , n+1 n+1 n+1 from which the desired limiting value of log10 (2) follows upon taking n → ∞. 3 A weak first digit law for αn In Section 2 we were able to determine the correct frequency distribution for the powers of two having a leading digit of d = 1, since an explicit formula for N1 (n) could be derived. By varying the argument used to prove Theorem 2.1, we can obtain upper and lower estimates of Nd (n), for the sequence {αn }, . These estimates can then be used to obtain upper and where 1 < α ≤ 10 9 lower bounds on the frequency distribution for the terms of the sequence {αn } having leading digit d, which are suggestive of the logarithmic rule predicted via Benford’s law. Theorem 3.1 Given the sequence {αn }, where 1 < α ≤ Ln ≤ 10 , 9 then for all n Nd (n) ≤ Un , n+1 with Ln ∼ log10 (1 + d1 ) − log10 (α) and Un ∼ log10 (1 + d1 ) + log10 (α) as n → ∞. Proof: Recall Nd (n) = #{0 ≤ m ≤ n : bαm c has leading digit d}, moreover from assumption observe log10 (α) ≤ log10 ( 10 ) ≤ log10 ( d+1 ). If bαm c has k 9 d k−1 m digits and leading digit d, then d · 10 ≤ α < (d + 1) · 10k−1 . Upon taking logarithm to base ten yields log10 (d) + (k − 1) ≤ m log10 (α) < log10 (d + 1) + (k − 1) . (3) Now consider the continuous linear function f (x) = log10 (α)x, and suppose the interval of values on which f (x) satisfies (3) is [a, b). As f (b) − f (a) = 701 A weak first digit law for a class of sequences ), we deduce that the interval [a, b) must have length log10 ( d+1 d )/ log10 (α) ≥ 1. Thus from Lemma 2.1, the interval [a, b) conL = log10 ( d+1 d tains either bLc or bLc + 1 positive integers. Consequently, the number of terms of the sequence {αm } for which bαn c has k digits and leading digit d, is either bLc or bLc + 1, moreover these upper and lower bounds are independent of k. If bαn c contains s digits, then 10s−1 ≤ αn < 10s , from which one deduces Sbn log (α)c+1 k−1 k [10 , 10 ). that s − 1 = bn log10 (α)c, and so {α0 , α1 , . . . , αn } ⊆ k=1 10 n bn log10 (α)c bn log10 (α)c bn log10 (α)c Now as α ∈ [10 , d·10 )∪[d·10 , (d+1)·10bn log10 (α)c )∪ [(d + 1) · 10bn log10 (α)c , 10bn log10 (α)c+1 ), the leading digit of bαn c may or may not be d, and so the number of sequence elements in [10bn log10 (α)c , 10bn log10 (α)c+1 ) having a leading digit d, can only be bounded between 0 and bLc + 1. While from above, the number of sequence elements having a leading digit of d in [10k−1 , 10k ), for k = 1, . . . , bn log10 (α)c, must be bounded between bLc and bLc + 1. Thus from definition, we have for all n ≥ 0 [L](bn log10 (α)c) Nd (n) ([L] + 1)(bn log10 (α)c + 1) ≤ ≤ . (4) n+1 n+1 n+1 Finally, using again the inequality x − 1 < bxc ≤ x, one finds that the upper and lower bounds of (4) are respectively bounded above and below by Un = log10 ( d+1 d ) +1 log10 (α) ! (n log10 (α) + 1) and Ln = n+1 log10 ( d+1 d ) −1 log10 (α) ! (n log10 (α) − 1) , n+1 with Ln ∼ log10 (1 + d1 ) − log10 (α) and Un ∼ log10 (1 + d1 ) + log10 (α) as n → ∞. To conclude, we note the result of Theorem 3.1 can be strengthened for the terms of the sequence {( 10 )n }, having a leading digit of 9 as follows. If we 9 set α = 10 and d = 9, then the length of the interval [a, b) on which the linear 9 function f (x) satisfies (3) is L = 1, and so there can only be one or two terms of the sequence {( 10 )n } having k digits with a leading digit of 9. To show there 9 is exactly one, assume there are two integers m1 < m2 such that both b( 10 )m1 c 9 and b( 10 )m2 c have k digits with a leading digit 9. As m2 ≥ m1 + 1, observe 9 ( 10 )m2 − ( 10 )m1 ≥ ( 10 )m1 +1 − ( 10 )m1 = ( 10 )m1 ( 19 ), but b( 10 )m1 ( 19 )c has k digits 9 9 9 9 9 9 10 m2 10 m1 with a leading digit of 1. However as both ( 9 ) and ( 9 ) have k digits with )m2 − ( 10 )m1 c can have at most k − 1 digits, a leading digit 9, the difference b( 10 9 9 Sbn log ( 10 )c+1 k−1 k 0 10 1 n , 9 , . . . , 10 } ⊆ k=1 10 9 a contradiction. Now { 10 [10 , 10 ), and as 9 9 the number of sequence elements having a leading digit of 9 in [10k−1 , 10k ), for k = 1, . . . , bn log10 ( 10 )c, is precisely one, we can conclude as ( 10 )n may or may 9 9 not have a leading digit of 9, that n log10 ( 10 )−1 bn log10 ( 10 )c bn log10 ( 10 )c + 1 n log10 ( 10 )+1 N9 (n) 9 9 9 9 ≤ ≤ ≤ ≤ , n+1 n+1 n+1 n+1 n+1 702 M. A. Nyblom ) follows upon taking n → ∞. from which the desired limiting value of log10 ( 10 9 References [1] F. Benford, The law of anomalous numbers, Proceedings of the American Philosophical Society, 78 (1938), 551-572. [2] S. Newcomb, Note on the frequency of use of the different digits in natural numbers, American Journal of Mathematics, 4 (1881), 39-40. http://dx.doi.org/10.2307/2369148 [3] H. Weyl, Uber die Gleichverteilung von Zahlen mod Eins, Mathematische Annalen, 77 (1916), 313-352. http://dx.doi.org/10.1007/bf01475864 Received: June 7, 2016; Published: July 25, 2016

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download A weak first digit law for a class of sequences