Download A weak first digit law for a class of sequences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
International Mathematical Forum, Vol. 11, 2016, no. 15, 697 - 702
HIKARI Ltd, www.m-hikari.com
http://dx.doi.org/10.12988/imf.2016.6562
A Weak First Digit Law for
a Class of Sequences
M. A. Nyblom
School of Mathematics and Geospatial Sciences
RMIT University, Melbourne, Victoria 3001, Australia
c 2016 M. A. Nyblom. This article is distributed under the Creative ComCopyright mons Attribution License, which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Abstract
A non-ergodic approach is employed to establish Benford’s law for
the leading digit d = 1 for the sequence of powers of two. For the
sequence of powers {αn }, 1 < α ≤ 10/9, this method is extended to
obtain a weak first digit law by establishing Benford like lower and
upper bounds to the asymptotic relative frequency of terms with a given
leading digit.
Mathematics Subject Classification: 11A99, 11K36
Keywords: Frequency distribution, Benford’s law
1
Introduction
When examining numerical data which is derived from large naturally occurring data sets such as population, lengths of rivers or stock prices, often (but
not always), a curious phenomena known as Bendford’s law is observed. This
phenomenological law concerns the frequency distribution of the leading digits
of numerical data, which typically spans many orders of magnitude. Such a
set of numerical data will satisfy Benford’s law if the proportion of data values
having a leading digit of d ∈ {1, 2, . . . , 9} is approximately log10 (1 + d1 ). Thus
with numerical data sets which obey this law, the number 1 would appear
698
M. A. Nyblom
as a leading digit with a frequency of approximately 30%, while larger digits
such as 9 would appear as a leading digit with a frequency of approximately
4.5%. Benford’s law was first empirically observed in 1881 by the American
astronomer Simon Newcomb [2], but remained largely forgotten until 1938,
when Frank Benford [1], a physicist, re-tested the phenomenon against many
sources of data as well as on some infinite integer sequences. He found in both
case, that the frequency distribution of the leading digit d exhibited a close
adherence to the logarithmic rule.
For a sequence of positive reals {an }, if one defines
Nd (n) = #{0 ≤ k ≤ n : bak c has leading digit d} ,
d (n)
= log10 (1+ d1 ). To establish
then {an } will satisfy Benford’s law if limn→∞ Nn+1
that an integer sequence such as {2n } satisfies Benford’s law, one can appeal to
an Ergodic property of the irrational numbers proved by Weyl [3]. Namely, if
x is a positive irrational number, n a positive integer and bnxc the integer part
of nx, then the sequence of fractional part of nx, given by c(n) = nx − bnxc, is
homogeneously distributed over the interval [0, 1]. That is, given any arbitrary
reals 0 ≤ a < b ≤ 1 then
#{0 ≤ k ≤ n : c(k) ∈ (a, b)}
=b−a .
n→∞
n+1
lim
(1)
Now if 2n has a leading digit of d, then d · 10k < 2n < (d + 1) · 10k for
some integer k. Upon taking logarithms one finds log10 (d) + k < n log10 (2) <
log10 (d + 1) + k, and as log10 (d + 1) ∈ [0, 1], one deduces k = bn log10 (2)c, and
so log10 (d) < c(n) < log10 (d + 1), where c(n) = n log10 (2) − bn log10 (2)c. Thus
when ak = 2k with a = log10 (d) and b = log10 (d + 1), we have Nd (n) = #{0 ≤
k ≤ n : c(k) ∈ (a, b)}, from which the required frequency distribution follows
via (1), as log10 (2) is irrational. In this article an entirely elementary proof
will first be presented, which establishes the logarithmic rule for the frequency
distribution of terms in the sequence {2n } having leading digit d = 1. This is
possible, as one can show for an = 2n that N1 (n) = bn log10 (2)c + 1, via an
argument based on counting the number of integers in an interval. By applying
a variation to this argument, one can then obtain a weak first digit law for the
, where in particular it is shown that
sequence an = αn , for a fixed 1 < α ≤ 10
9
for all n
Nd (n)
Ln ≤
≤ Un ,
n+1
with Ln ∼ log10 (1 + d1 ) − log10 (α) and Un ∼ log10 (1 + d1 ) + log10 (α) as n → ∞.
Although this result is only suggestive of the logarithmic rule of Benford’s law,
the advantage of the approach taken here rests on the non-ergodic nature of
the argument employed.
A weak first digit law for a class of sequences
2
699
First digit law for powers of two
We begin with a technical lemma concerning the counting of integers in an
interval of the form [a, b). In what follows, define for a real number x, the floor
and ceiling of x as bxc = max{n ∈ Z : x ≥ n} and dxe = min{n ∈ Z : n ≥ x}
respectively. Furthermore, we shall make use of the fact that bx + nc = bxc + n
for all n ∈ Z and x ∈ R.
Lemma 2.1 Suppose 0 ≤ a < b with b − a = L, then the number of integers
contained in the interval [a, b) is either bLc or bLc + 1.
Proof: Let I(a, b) denote the number of integers contained in the interval
[a, b). We note that for two integers 0 ≤ m ≤ n, the number of integers in the
interval [m, n] is equal to n − m + 1.
Case 1: b 6∈ N.
In this instance I(a, b) = bbc − dae + 1 since [dae, bbc] ⊆ [a, b). As bbc ≤ b and
dae ≥ a one finds I(a, b) ≤ b − a + 1, but as bb − a + 1c is the largest integer less
than or equal to b − a + 1, we deduce that I(a, b) ≤ bb − a + 1c = bb − ac + 1 =
bLc + 1. Now as dae < a + 1, then b − dae > b − a − 1 ≥ bb − a − 1c and as
bb − daec = bbc − dae is the largest integer less than or equal to b − dae, we
further deduce bbc − dae ≥ bb − a − 1c = bb − ac − 1, that is I(a, b) ≥ bLc.
Hence I(a, b) is either equal to bLc or bLc + 1.
Case 2: b ∈ N.
If a ∈ N, then I(a, b) = (b − 1) − a + 1 = b − a = bLc, since [a, b − 1] ⊆ [a, b).
Alternatively if a 6∈ N, then I(a, b) = (b − 1) − (bac + 1) + 1 = b − bac − 1,
since [dae, b − 1) ⊆ [a, b) and dae = bac + 1. But as bac < a < bac + 1 we
find b − bac > b − a > b − bac − 1, that is I(a, b) + 1 > b − a > I(a, b) and so
I(a, b) = bb − ac = bLc.
Theorem 2.1 The frequency distribution of the terms in the sequence {2n }
having leading digit d = 1, is log10 (2).
Proof: Recall N1 (n) = #{0 ≤ m ≤ n : 2m has leading digit 1}. If 2m has k
digits and leading digit 1, then 10k−1 ≤ 2m < 2 · 10k−1 . Upon taking logarithm
to base ten yields
k − 1 ≤ m log10 (2) < log10 (2) + k − 1 .
(2)
Now consider the continuous linear function f (x) = log10 (2)x, and suppose the
interval of values on which f (x) satisfies (2) is [a, b). As f (b) − f (a) = log10 (2),
we deduce that the interval [a, b) must have unit length. Thus from Lemma
2.1, the interval [a, b) can contain either one or two positive integers. To
700
M. A. Nyblom
show there is exactly one such integer in [a, b), suppose there are two integers
m1 < m2 such that both 2m1 and 2m2 have k digits with leading digit 1. As
m2 ≥ m1 + 1, observe 2m2 − 2m1 ≥ 2m1 (21 − 1) = 2m1 and so 2m2 − 2m1 will
have at least k digits, but as both 2m2 and 2m1 have k digits with a leading
digit 1, the difference 2m2 − 2m1 can have at most k − 1 digits, a contradiction.
Hence there is exactly one power of two having k digits with leading digit
1. Consequently N1 (n) must be equal to the positive integer k such that
2n ∈ [10k−1 , 2 · 10k−1 ) ∪ [2 · 10k−1 , 10k ) or equivalently k − 1 ≤ n log10 (2) < k,
that is k − 1 = bn log10 (2)c and so N1 (n) = bn log10 (2)c + 1. Finally as
x − 1 < bxc ≤ x one finds that
n log10 (2)
N1 (n)
n log10 (2) + 1
<
≤
,
n+1
n+1
n+1
from which the desired limiting value of log10 (2) follows upon taking n → ∞.
3
A weak first digit law for αn
In Section 2 we were able to determine the correct frequency distribution for
the powers of two having a leading digit of d = 1, since an explicit formula
for N1 (n) could be derived. By varying the argument used to prove Theorem
2.1, we can obtain upper and lower estimates of Nd (n), for the sequence {αn },
. These estimates can then be used to obtain upper and
where 1 < α ≤ 10
9
lower bounds on the frequency distribution for the terms of the sequence {αn }
having leading digit d, which are suggestive of the logarithmic rule predicted
via Benford’s law.
Theorem 3.1 Given the sequence {αn }, where 1 < α ≤
Ln ≤
10
,
9
then for all n
Nd (n)
≤ Un ,
n+1
with Ln ∼ log10 (1 + d1 ) − log10 (α) and Un ∼ log10 (1 + d1 ) + log10 (α) as n → ∞.
Proof: Recall Nd (n) = #{0 ≤ m ≤ n : bαm c has leading digit d}, moreover
from assumption observe log10 (α) ≤ log10 ( 10
) ≤ log10 ( d+1
). If bαm c has k
9
d
k−1
m
digits and leading digit d, then d · 10
≤ α < (d + 1) · 10k−1 . Upon taking
logarithm to base ten yields
log10 (d) + (k − 1) ≤ m log10 (α) < log10 (d + 1) + (k − 1) .
(3)
Now consider the continuous linear function f (x) = log10 (α)x, and suppose
the interval of values on which f (x) satisfies (3) is [a, b). As f (b) − f (a) =
701
A weak first digit law for a class of sequences
), we deduce that the interval [a, b) must have length
log10 ( d+1
d
)/ log10 (α) ≥ 1. Thus from Lemma 2.1, the interval [a, b) conL = log10 ( d+1
d
tains either bLc or bLc + 1 positive integers. Consequently, the number of
terms of the sequence {αm } for which bαn c has k digits and leading digit d, is
either bLc or bLc + 1, moreover these upper and lower bounds are independent
of k. If bαn c contains s digits, then 10s−1 ≤ αn < 10s , from which one deduces
Sbn log (α)c+1 k−1 k
[10 , 10 ).
that s − 1 = bn log10 (α)c, and so {α0 , α1 , . . . , αn } ⊆ k=1 10
n
bn log10 (α)c
bn log10 (α)c
bn log10 (α)c
Now as α ∈ [10
, d·10
)∪[d·10
, (d+1)·10bn log10 (α)c )∪
[(d + 1) · 10bn log10 (α)c , 10bn log10 (α)c+1 ), the leading digit of bαn c may or may not
be d, and so the number of sequence elements in [10bn log10 (α)c , 10bn log10 (α)c+1 )
having a leading digit d, can only be bounded between 0 and bLc + 1. While
from above, the number of sequence elements having a leading digit of d in
[10k−1 , 10k ), for k = 1, . . . , bn log10 (α)c, must be bounded between bLc and
bLc + 1. Thus from definition, we have for all n ≥ 0
[L](bn log10 (α)c)
Nd (n)
([L] + 1)(bn log10 (α)c + 1)
≤
≤
.
(4)
n+1
n+1
n+1
Finally, using again the inequality x − 1 < bxc ≤ x, one finds that the upper
and lower bounds of (4) are respectively bounded above and below by
Un =
log10 ( d+1
d )
+1
log10 (α)
!
(n log10 (α) + 1)
and Ln =
n+1
log10 ( d+1
d )
−1
log10 (α)
!
(n log10 (α) − 1)
,
n+1
with Ln ∼ log10 (1 + d1 ) − log10 (α) and Un ∼ log10 (1 + d1 ) + log10 (α) as n → ∞.
To conclude, we note the result of Theorem 3.1 can be strengthened for
the terms of the sequence {( 10
)n }, having a leading digit of 9 as follows. If we
9
set α = 10
and d = 9, then the length of the interval [a, b) on which the linear
9
function f (x) satisfies (3) is L = 1, and so there can only be one or two terms
of the sequence {( 10
)n } having k digits with a leading digit of 9. To show there
9
is exactly one, assume there are two integers m1 < m2 such that both b( 10
)m1 c
9
and b( 10
)m2 c have k digits with a leading digit 9. As m2 ≥ m1 + 1, observe
9
( 10
)m2 − ( 10
)m1 ≥ ( 10
)m1 +1 − ( 10
)m1 = ( 10
)m1 ( 19 ), but b( 10
)m1 ( 19 )c has k digits
9
9
9
9
9
9
10 m2
10 m1
with a leading digit of 1. However as both ( 9 ) and ( 9 ) have k digits with
)m2 − ( 10
)m1 c can have at most k − 1 digits,
a leading digit 9, the difference b( 10
9
9
Sbn log ( 10 )c+1 k−1 k
0 10 1
n
, 9 , . . . , 10
} ⊆ k=1 10 9
a contradiction. Now { 10
[10 , 10 ), and as
9
9
the number of sequence elements having a leading digit of 9 in [10k−1 , 10k ), for
k = 1, . . . , bn log10 ( 10
)c, is precisely one, we can conclude as ( 10
)n may or may
9
9
not have a leading digit of 9, that
n log10 ( 10
)−1
bn log10 ( 10
)c
bn log10 ( 10
)c + 1
n log10 ( 10
)+1
N9 (n)
9
9
9
9
≤
≤
≤
≤
,
n+1
n+1
n+1
n+1
n+1
702
M. A. Nyblom
) follows upon taking n → ∞.
from which the desired limiting value of log10 ( 10
9
References
[1] F. Benford, The law of anomalous numbers, Proceedings of the American
Philosophical Society, 78 (1938), 551-572.
[2] S. Newcomb, Note on the frequency of use of the different digits in natural
numbers, American Journal of Mathematics, 4 (1881), 39-40.
http://dx.doi.org/10.2307/2369148
[3] H. Weyl, Uber die Gleichverteilung von Zahlen mod Eins, Mathematische
Annalen, 77 (1916), 313-352. http://dx.doi.org/10.1007/bf01475864
Received: June 7, 2016; Published: July 25, 2016