Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROBABILITY AND DIRICHLET’S THEOREM
LUKE MULTANEN
Abstract. In this expository paper, we examine some fundamental theorems
in probability theory, after which we transition and discuss Dirichlet’s theorem on the existence of infinitely many primes of the form p congruent to
a modulo b, where a and b are coprime. We discuss aspects of the proof and
apply the Strong Law of Large Numbers from probability theory to obtain a
heuristic understanding of the density version of Dirichlet’s theorem.
Contents
1. Introduction
2. Probability
3. Dirichlet’s Theorem
3.1. The Riemann Zeta Function
3.2. Dirichlet Characters
4. The Strong Law of Large Numbers and ‘Randomness’ of Primes
Acknowledgements
References
1
2
6
6
9
16
19
19
1. Introduction
The primary purpose of this paper is to understand Dirichlet’s theorem.
Theorem 1.1. [Dirichlet0 s Theorem] Fix a, b 2 N such that (a,b) = 1. There are
infinitely many primes of the form p = a + bn, where n2 N.
Theorem 1.2. [Strong Dirichlet0 s Theorem] Fix a, b 2 N such that (a,b) = 1. Let
1
Pa be the set of prime numbers such that p ⌘ a mod b. The set Pa has density (b)
,
i.e.,
#{primes p < N, p ⌘ a mod b}
1
=
.
N !1
#{primes p < N }
(b)
lim
We will not fully prove these theorems. Instead, we prove the main components
and necessary mechanics of the weak version of the theorem before applying the
strong law of large numbers from probability theory in order to develop a heuristic
understanding of why the strong version of Dirichlet’s theorem holds.
Section 2 provides the necessary probabalistic background needed to accomplish
our goal. This section introduces the fundamental definitions and properties required in order to derive the weak and strong laws of large numbers, then provides
a couple of concrete examples in order to illustrate differences between the two.
1
PROBABILITY AND DIRICHLET’S THEOREM
2
Section 3 deals with the bulk of instruments needed to prove Dirichlet’s theorem,
namely the Riemann Zeta function, L-functions and Dirichlet characters. This section uses the Riemann Zeta function in order to prove that there exist infinitely
many primes, then transitions into a discussion of Dirichlet characters before illustrating how they can be used in conjunction with L-functions in order to prove the
existence of infinitely many primes in any given congruence class.
Section 4 combines the work from sections 2 and 3 in order to provide a probabalistic heuristic for the strong Dirichlet’s theorem. This section, operating under
a heuristic assumption, motivates the strong version of Dirichlet’s Theorem by considering it as an instance of a random process and applying the strong law of large
numbers.
2. Probability
(Note: This section follows sections 1.1 and 1.2 of Billingsley [1])
Definition 2.1. Let ⌦ be a set, and let 2⌦ be the power set of ⌦, i.e. 2⌦ is the set
containing all subsets of ⌦. A subset F ✓ 2⌦ is called a -algebra of ⌦ if it satisfies
the following properties:
(1) ⌦ 2 F,
(2) If A 2 F, then Ac 2 F,
(3) If Ai 2 F is some countable collection of sets, then [i Ai 2 F.
Example 2.2. Let ⌦ = {1, 2, 3}. Then F = 2⌦ is the -algebra
F = {;, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}.
Definition 2.3. A sample space is a set ⌦ that contains all the possible outcomes
of an experiment.
Definition 2.4. A function P : F ! [0, 1) is called a probability measure for a
-algebra F if the following conditions are satisfied:
(1) 0  P (A)  1 for all A 2 F,
(2) P (;) = 0 and P (⌦) = 1,
P1
(3) For any sequence A1 , A2 , . . . of disjoint sets in F, P ([1
i=1 Ai ) =
i=1 P (Ai ).
Remark. If F were not a -algebra, then we would need to assume that [1
i=1 Ai 2 F
for the third property to make sense.
Definition 2.5. A probability space is denoted as (⌦, F, P ), where ⌦ is a sample
space, F is a -algebra such that F ✓ 2⌦ , and P is a probability measure on F where
P (⌦) = 1. The elements of F are often referred to as events, so for any A 2 F,
P (A) is the probability that we end up in event A.
Example 2.6. Suppose we have an experiment where we flip a coin twice. Then,
the sample space ⌦ = {(H, H), (H, T), (T, H), (T, T)}, and let the -algebra F = 2⌦ .
Then, for any A ✓ ⌦, P (A) = #A · 14 . For instance, if A is the event that the first
coin flipped is heads, i.e. A = {(H, H), (H, T)}, then P (A) = 2 · 14 = 12 .
Definition 2.7. A discrete probability space is a probability space (⌦, F, P ) where
⌦ is countable, F = 2⌦ , and the probability measure P is of the form
X
P (A) =
p(!), 8 A ✓ ⌦,
where p : ⌦ ! R such that
+
P
!2A
!2⌦
p(!) = 1.
PROBABILITY AND DIRICHLET’S THEOREM
3
Example 2.8. Let ⌦ = Z, and suppose that p(0) = 12 and p(1) =
any A ✓ ⌦,
8
>
1 if {0}, {1} 2 A
<
1
P (A) =
2 if either {0} or {1} is in A, and
>
:
0 else.
1
2.
Then, for
Definition 2.9. A continuous probability density is a probability space (⌦, F, P )
where ⌦ ✓ R and is non-discrete, F is composed of Lebesgue measurable sets, and
the probability measure P is of the form
ˆ
P (A) =
p,
A
´
where p : ⌦ ! R+ continuous and ⌦ p = 1.
Remark. The Lebesgue measure of a set A is denoted by (A). Any closed interval
[a, b] ✓ R has measure ([a, b]) = b a. The open set (a, b) has the same measure
because single points have a measure of 0. For a rigorous development of Lebesgue
measure, see section 2 of Billingsley [1].
Definition 2.10. A random variable is a function X : ⌦ ! R such that for all
t 2 R, X 1 (( 1, t)) 2 F (i.e. P (X < t) has some measurable value.)
Remark. In classical probability theory, there are two kinds of random variables:
discrete and continuous, which correspond to the aforementioned discrete and continuous probability spaces. The remaining definitions and proofs will use discrete
random variables.
Example 2.11. The value of a fair die may be thought of as a random variable.
For a single roll, the sample space is ⌦ = {1, 2, 3, 4, 5, 6}, set F =2⌦ , and p(1) =
p(2) = · · · = p(6) = 16 . X, the value of the die, is a random variable that can take
on any value x 2 {1, 2, 3, 4, 5, 6}.
Definition 2.12. The expected value is the weighted average of all possible values of a random variable. Suppose some random variable X can take on values
x1 , . . . , xn with probabilities p1 , . . . , pn , respectively. Then,
E[X] = x1 p1 + x2 p2 + · · · + xn pn .
Example 2.13. We can calculate the expected value of the die from Example 2.11,
which is
1
1
1
1
1
1
E[X] = 1 · + 2 · + 3 · + 4 · + 5 · + 6 · = 3.5
6
6
6
6
6
6
Definition 2.14. The variance of a random variable X is defined as the expected
value of the squared deviation of X from the mean E[X] = µ, i.e.,
V ar(X) = E[(X
µ)2 ].
Remark. The variance is often denoted as having the value
2
.
Example 2.15. We can calculate the variance of the die from Example 2.11, which
is
6
6
X
1
1X
V ar(X) = E[(X 3.5)2 ] =
(i 3.5)2 =
(i 3.5)2
6
6
i=1
i=1
PROBABILITY AND DIRICHLET’S THEOREM
=
4
1
1
35
(( 2.5)2 + ( 1.5)2 + ( 0.5)2 + (0.5)2 + (1.5)2 + (2.5)2 ) = (17.5) =
6
6
12
Theorem 2.16. [Cheybyshev0 s Inequality] If X is a random variable with finite
expected value µ and finite non-zero variance 2 , then for any real number k > 0,
2
P (|X
µ|
k) 
k2
Proof. We begin with the definition of variance:
V ar(X) = E[(X E[X])2 ]
X
=
(x E[X])2 · P (X = x)
x2R
X
(x
|x E[X]|>k
>
X
|x E[X]|>k
= k 2 · P (|X
from which the result follows.
E[x])2 · P (X = x)
k 2 · P (X = x)
E[X]| > k),
⇤
Remark. The above proof assumes that we are dealing with discrete variables rather
than continuously distributed ones. The proof of this inequality for continuously
distributed variables is very similiar, using the integral representation of variance
instead of sums.
Definition 2.17. A stochastic process is a sequence [xi : i 2 T ] of random variables
on a probability space (⌦, F, P ), where each i 2 T represents a different point in
time. ([1], Chapter 7)
Definition 2.18. A sequence {Xn } of independent, random variables is said to
converge in probability towards a value µ if for all " > 0,
Pn
Xi
lim P (|X n µ| ") = 0, where X n = i
.
n!1
n
Definition 2.19. For every i 2 T of a stochastic process, we work within the
probability space (⌦i , Fi , Pi ). The product space of a stochastic process is denoted
by
Y
Y
Y
( ⌦i , “
Fi ”, “
Pi ”),
Q
i
where i ⌦i is the sample space
Q created by taking the cartesian product ofQeach
individual sample space ⌦i , “ Fi ” is the corresponding -algebra of the space i ⌦i
generated
by subsets of the form A1 ⇥ A2 ⇥ · · · , Q
where A1 2 F1 , A2 2 F2 , . . ., and
Q
“ Pi ” is the resulting probability measure on “ Fi ”.
Remark. Observe that an infinite product of discrete probability spaces is not discrete.
PROBABILITY AND DIRICHLET’S THEOREM
5
Theorem 2.20. [Weak Law of Large Numbers] The mean of a sequence of discrete,
identically distributed, independent random variables converges in probability towards the expected value as the number of elements in the set approaches infinity.
i.e., for any " > 0,
X n ! µ, as n ! 1,
lim P (|X n
n!1
µ| > ") = 0.
Proof. We assume finite variance V ar(Xi ) = 2 , for all i. Since the variables xi
are independent, through standard properties of the variance, we get:
1
1
V ar(X n ) = V ar( (X1 + · · · + Xn )) = 2 V ar(X1 + · · · Xn )
n
n
n
2
1 X
1
= 2
V ar(Xi ) = 2 · n 2 =
.
n i=1
n
n
By applying Chebyshev’s inequality to X n , with E[X n ] = µ, we see that
2
P (|X n
") 
.
n"2
As n ! 1, n"2 ! 0, so by Definition 2.17, we conclude that the Weak Law of
Large Numbers holds.
⇤
2
µ|
Remark. Notice that since at any step we only deal with finite n, the product space
is discrete and has finite measure.
Theorem 2.21. [Strong Law of Large Numbers] The mean of a set of discrete,
identically distributed, independent random variables will almost surely converge to
the expected value, i.e.,
X n ! µ almost surely as n ! 1, or,
⇣
⌘
P lim X n = µ = 1.
n!1
Remark. The strong law of large numbers is more difficult to prove than the weak
law because we are now dealing with an infinite product
Q space rather
Qthan a discrete
one. Under an infinite product space, the terms “ Fi ” and “ Pi ” no longer
make sense as we have previously defined them because they are no longer discrete.
Suppose
we flip a coin infinitely many times. If we were to look at a specific sequence
Q1
Xn ✓ i ⌦i , the probability of this sequence happening is 12 ⇥ 12 ⇥ 12 ⇥ · · · = 0, so
it would appear that no sequence has a non-zero probability. However,
Q1there are
still many questions that can be asked about such an infinite space i ⌦i , such
as what the probability is that the first coin flipped will be heads (which should
always be 12 in every trial). The framework that we have already introduced is
not enough to prove the strong law of large numbers, particularly the statement
about almost sure convergence. In order to prove this theorem, one must first more
carefully develop methods from measure theory.
These two theorems are very similar in nature, but there is an important, subtle
distinction between what they say. The weak law states that for n sufficiently
large, the mean X n is likely to be near the expected value µ. It is possible under
the weak law to have a nonzero probability that |X n µ| > " for infinitely many
PROBABILITY AND DIRICHLET’S THEOREM
6
n. The strong law states that this will almost surely not happen. For any " > 0,
the strong law implies with probability 1 that X n µ < " holds for n sufficiently
large.
Example 2.22. [Discrete Variables] Consider the case of flipping a coin. Assuming
the coin is fair, there is an equal chance that the coin will land heads or tails. Now
suppose that we place a number on each side of the coin, 1 for heads and 0 for tails,
and sum the results from each flip, starting at 0. If we flip the coin infinitely many
times, we should expect that the mean will be close to 12 . The weak law allows for
a positive probability that for infinitely many n and some " > 0, X n 12 > ", but
the strong law almost surely eliminates
this possibility from occuring. The sample
Q1
space for such an experiment is i ⌦i = {0, 1} ⇥ {0, 1} ⇥ {0, 1} ⇥ · · · .
Remark. The strong law implies the weak law.
Example 2.23. [Continuous Distribution] Now consider a person throwing darts
at a dart board. The person in question is not a skilled player and is equally likely
to land a dart at any position on the board. If we look at the board as a coordinate
system, we can measure each dart throw by its displacement from the bullseye,
which serves as the origin. In that sense, we could find the “mean,” so to speak,
of a set of dart throws. Assuming that where each individual dart lands is in fact
uniformly random, then the expected value, or expected landing point, would be
the bullseye. That said, as the thrower throws infinitely many darts, the weak law
of large numbers says that the mean displacement of the individual throws is likely
to be arbitrarily close to the origin, but the strong law nearly assures that outcome.
3. Dirichlet’s Theorem
(Note: This section loosely follows Xiao’s [4] notes from lectures 1 and 2 and
borrows from Chapter VI of Serre [3].)
3.1. The Riemann Zeta Function.
Theorem 3.1. [Euclid] There are infinitely many primes.
Proof. Assume by contradiction that there are a finite number of primes {p1 , . . . , pn }
and let
X = p1 · p2 · · · · · p n .
Now, consider the number X + 1. X + 1 ⌘ 1 mod pi for all pi in our set of primes,
so no pi divides X + 1. However, X + 1 > 1, so 9 a prime p that divides X + 1,
and p 2
/ {p1 , . . . , pn }, which is a contradiction. Therefore, there must be infinitely
many primes.
⇤
Definition 3.2. We will denote by (a, b) the greatest common divisor of a and b.
Theorem 3.3. [Dirichlet0 s Theorem] Fix a, b 2 N such that (a,b) = 1. There are
infinitely many primes of the form p = a + bn, where n2 N.
This theorem may not seem as intuitively apparent as Euclid’s Theorem, and at
the moment, we are not fully equipped to prove it. First, we must introduce the
Riemann Zeta function and use it to give an alternate proof of the infinitude of
P
1
primes. We will prove that
p prime ps ! 1 as s ! 1, Re s > 1, s 2 C. This is
PROBABILITY AND DIRICHLET’S THEOREM
7
a stronger statement than simply stating there are infinitely many primes and is a
proof that generalizes to Dirichlet’s Theorem.
Definition 3.4. The Riemann Zeta function is defined as
X 1
⇣(s) =
, for Re s > 1, s 2 C.
ns
n2N
which converges absolutely when the real component of s is greater than 1.
P
1
Claim.
n2N ns converges absolutely for Re s > 1 and diverges when s = 1.
Proof. Let Re s > 1. We get
X 1
X 1
=
s
n
|ns |
n2N
n2N
Since s is complex and of the form s =
so
+ it,
1
|ns |
=
1
|n ||nit |
=
1
|n ||eitlogn |
=
1
|n | ,
X 1
X 1
.
ns
n
n2N
P1
n2N
´1
as a lower Riemann sum for 1 x1 dx, then we see that
ˆ 1
X 1
1
1+
dx
n
x
1
n2N
✓
◆
X 1
1
1
)
 1 + lim
x!1
n
1 (
1)x 1
n2N
✓
◆
X 1
1
)
, which is < 1 for = Re s > 1.
n
1
n2N
P
For s = 1, it is clear that the sum diverges (it is just n2N n1 ).
⇤
If we think of
1
n=2 n
Claim. lims!1, Re s>1 |⇣(s)| = +1.
Proof. Consider what happens as s ! 1, with Re s > 1 and Im s arbitrary. If we
can show that ⇣(s) s 1 1 is bounded for Re s > 1 in some neighborhood of 1,
then lims!1+ |⇣(s)| = 1.
◆
X ✓ˆ n+1 1
X ✓ˆ 1 1
1
1
dx
=
dt
=
s
s 1
xs
ns
ts
1
n
0 n
n2N
n2N
n2N
⇣
⌘
1
Now, we want to bound the function fn (t) = n1s
. Note that fn (0) = 0
s
(n+t)
⇣(s)
1
=
0
X 1
ns
and that fn (t) = s ·
1
ˆ
1
(n+t)s+1 .
Therefore, when 0  t  1,
0
|fn (t)|  sup fn (t) · (1
0t1
)
ˆ
0
1
fn (t)dt 
0) 
s
ns+1
s
ns+1
1
dt
(n + t)s
◆
PROBABILITY AND DIRICHLET’S THEOREM
So,
X ✓ˆ
n2N
0
1
1
ns
1
dt
(n + t)s
which is convergent for Re s > 0.
◆
X
n2N
8
s
ns+1
⇤
Remark. The previous claimPillustrates that the sum diverges along the real
´ 1axis.
1
Using the observation that
expresses an upper Riemann sum of 1 x1 ,
n2N n
we can see that when Im s = 0,
ˆ 1
X 1
1
dx
xs
ns
1
n2N
✓
◆ X
1
1
)
s 1
ns
n2N
X 1
)
! +1,
ns
n2N
giving us an alternate proof of the infinitude of primes.
Theorem 3.5. lims!1 |⇣(s)| = 1 implies that there are infinitely many primes.
Proof. Let us, for now, assume that the real component of s is greater than 1 so
that we can work within a realm of absolute convergence in order to rearrange sums.
Since every natural number greater than one is uniquely the product of powers of
prime numbers, we can rewrite the sum as
◆
X 1
Y ✓
1
1
=
1
+
+
+
·
·
·
.
ns
ps
p2s
p prime
n2N
Taking the log of both sides, we get
0
log(⇣(s)) = log @
Y
(1 +
p prime
1
1
1
+ 2s + · · · )A ,
ps
p
and by the additive property of logarithms, we see that
✓
◆
X
1
1
log(⇣(s)) =
log 1 + s + 2s + · · · .
p
p
p prime
Notice that the inside of the log function on the right hand side is a geometric
series, meaning the right hand side can be rewritten as
!
X
1
log
,
1 p1s
p prime
and by properties of the logarithm, this is equivalent to
✓
◆
X
1
log 1
.
ps
p prime
The Taylor expansion of log(1
expression becomes
x) is
(x +
x2
2
+
x3
3
+ · · · ), therefore the above
PROBABILITY AND DIRICHLET’S THEOREM
9
◆
◆
X ✓1
X 1
X ✓ 1
1
1
1
+ 2s + 3s + · · · =
+
+ 3s + · · · .
ps
2p
3p
ps
2p2s
3p
p prime
p prime
p prime
If we can show that the second sum on the right converges, then there are infinitely
many primes: lims!1 |⇣(s)| = 1 =) lims!1 | log ⇣(s)| = 1, which means the
above expression will diverge as well. Thus, by showing that the second sum is
P
1
bounded as s ! 1, then lims!1
p prime ps = 1, which implies that there must
be infinitely many primes. In fact, the second sum on the right converges absolutely
for Re s > 12 .
◆
X 1 ✓
1
1
1
1
+ 3s + · · · 
1 + s + 2s + · · ·
(⇤)
2p2s
3p
p2s
p
p
p prime
p prime
◆ ✓
◆ ✓
◆
⇣
⌘ ✓
1
1
1
1
1
The geometric sum 1 + ps + p2s + · · · =
 1 p1
 1 p1
=
1
X
1
C, so
(⇤)  C
which converges for s >
1
2
real.
X
p prime
ps
X 1
1
C
,
2s
p
n2s
p
2
n2N
⇤
3.2. Dirichlet Characters.
Dirichlet characters are useful because they enable one to use the methods seen in
the alternative proof of infinitude of primes using the Riemann Zeta function in
order to prove Dirichlet’s Theorem.
Definition 3.6. A function
the following properties:
: Z ! C is called a Dirichlet character if it satisfies
(1) 9 N 2 N s.t. (x + N) = (x) (cyclic in N)
(2) If (x, N) > 1, then (x) = 0, and if (x, N) = 1, then (x) 6= 0
(3) (mn) = (m) · (n) (multiplicative)
From the above properties, we can see that
• (1) = 1
• if a ⌘ b mod N, then (a) = (b)
⇥
• Characters are equivalent to maps of finite abelian groups (Z/nZ) !
C⇥ [4]
Definition 3.7. Euler’s phi function is defined by (n) = k, where k is the number
of positive integers less than or equal to n with (n, k) = 1.
In order to construct Dirichlet characters, we find an isomorphism from a finite
⇥
abelian group (Z/nZ) to a product of additive groups. By the theory of finite
abelian groups, 9 m1 , . . . mk such that
(Z/nZ)⇥ ⇠
= Z/m1 Z ⇥ · · · ⇥ Z/mk Z.
⇥ ⇠
In the case that (Z/nZ) = Z/ (n)Z (i.e. the group is cyclic), we can give the
characters by
PROBABILITY AND DIRICHLET’S THEOREM
⇥
: (Z/nZ) ! C⇥ ()
10
⇥
: Z/mZ ! C⇥ , where (Z/nZ) ⇠
= Z/mZ.
Suppose that g is a generator of (Z/nZ)⇥ . For any (x, n) = 1, we define b(x) = ↵,
where x ⌘ g ↵ mod n. Now, for h = 0, . . . , (n) 1, we define
(
e2⇡ihb(x)/ (n) , if (x, n) = 1
[4]
,
h (n) =
0 else
which give all of the characters of (Z/nZ)⇥ .
Example 3.8. We find the Dirichlet characters for various n:
(1) n = 9: Notice that 2 is a generator of (Z/9Z)⇥ , so we map (Z/9Z)⇥ !
Z/6Z by 2a 7! a.
1
1 7! 6 = 0
2 7! 1
4 7! 2
8 7! 3
7 7! 4
5 7! 5
1
1
1
1
1
1
1
1
1
1
1
1
1
e
i⇡
3
e
1
2i⇡
3
1
i⇡
e3
2i⇡
e 3
1
4i⇡
e 3
5i⇡
e 3
e
4i⇡
3
1
2i⇡
5i⇡
3
1
4i⇡
e 3
4i⇡
e 3
1
2i⇡
e 3
4i⇡
e 3
e
5i⇡
e 3
2i⇡
e 3
1
4i⇡
e 3
2i⇡
e 3
e 3
4i⇡
e 3
1
2i⇡
e 3
i⇡
e3
(2) n = 12: {1, 5, 7, 11} 2 (Z/12Z)⇥ , but the squares of all these numbers are
congruent to 1 mod 12, so we map from (Z/12Z)⇥ ! Z/2Z ⇥ Z/2Z by
5a · 7b 7 ! (a, b). We get
1,1
1 7! (0, 0)
5 7! (1, 0)
7 7! (0, 1)
11 7! (1, 1)
1
1
1
1
1, 1
1,1
1
1
1
1
1, 1
1
1
1
1
1
1
1
1
(3) n = 24: The squares of all eight members of the group (Z/24Z)⇥ are
congruent to 1 mod 24, so we map from (Z/24Z)⇥ ! Z/2Z ⇥ Z/2Z ⇥ Z/2Z
by 5a · 7b · 13c 7 ! (a, b, c). We get
1,1,1
1 7! (0, 0, 0)
5 7! (1, 0, 0)
7 7! (0, 1, 0)
11 7! (1, 1, 0)
13 7! (0, 0, 1)
17 7! (1, 0, 1)
19 7! (0, 1, 1)
23 7! (1, 1, 1)
1
1
1
1
1
1
1
1
1,1, 1
1, 1,1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1,1,1
1
1
1
1
1
1
1
1
1, 1, 1
1
1
1
1
1
1
1
1
1,1, 1
1
1
1
1
1
1
1
1
1, 1,1
1
1
1
1
1
1
1
1
(4) n = 15: Four of the elements in the group are of order two, while the
other four are of order four. We map from (Z/15Z)⇥ 7! Z/4Z ⇥ Z/2Z by
1, 1, 1
1
1
1
1
1
1
1
1
PROBABILITY AND DIRICHLET’S THEOREM
11
2a · 11b 7 ! (a, b). We get
1,1
1 7! (0, 0)
2 7! (1, 0)
4 7! (2, 0)
7 7! (1, 1)
8 7! (3, 0)
11 7! (0, 1)
13 7! (3, 1)
14 7! (2, 1)
1
1
1
1
1
1
1
1
1, 1
1,1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1, 1
i,1
i, 1
1
1
1
1
1
1
1
1
1
i
1
i
i
1
i
1
1
i
1
i
i
1
i
1
i,1
1
i
1
i
i
1
i
1
i, 1
1
i
1
i
i
1
i
1
Definition 3.9. Let m be an integer greater than 1 and let
be a Dirichlet
character of the multiplicative group mod m. The L-function is defined by
L(s, ) =
1
X
(n)
, for Re s > 1.
ns
n=1
Remark. We are primarily concerned with the behavior of L-functions as s ! 1.
P
Lemma 3.10. Suppose that
i Ui (s) exists 8s > 1, where Ui is continuous in
some neighborhood of 1 8i, and that the sum converges uniformly in s in some
neighborhood V of 1, i.e.,
8" > 0, 9N such that
Now suppose that
P
i
X
i>N
Ui (s) < ", for s 2 V.
Ui (1) exists (though it may be infinite). Then,
X
X
lim
Ui (s) =
Ui (1).
s!1
i
i
Re s > 1
Remark. In our application, each Ui is a grouping of successive terms of a given
sequence. This will become more clear in the following example.
P1
PN
Proof. Consider the expression
i=1 Ui (s)
i=1 Ui (1) . By regrouping terms,
we get
N
X
i=1
(Ui (s)
Ui (1)) +
1
X
i=N +1
Ui (s) 
If we fix N large enough, then
(⇤) <
N
X
N
X
(Ui (s)
Ui (1)) +
i=1
(Ui (s)
1
X
Ui (s)
(⇤)
i=N +1
Ui (1)) + ".
i=1
Further, since Ui (s) is continuous in some neighborhood of 1 for all i, we can make
s close enough to 1 such that
"
|Ui (s) Ui (1)| < , 8 1  i  N,
N
=) (⇤) < 2".
⇤
PROBABILITY AND DIRICHLET’S THEOREM
12
Remark. This result allows us to show that for non-trivial , lims!1, Re s>1 L(s, ) =
P
“L(1, )” = n2N (n)
n . We use the notation “L(1, )” here because we have not
introduced analytic continuation, which is a technique necessary to actually define that lims!1, Re s>1 L(s, ) = L(1, ). For the purposes of this paper, though,
analytic continuation is not necessary.
Example 3.11. We show that there are infinitely many primes of the form p ⌘
1 mod 3.
Just as in the proof of the infinitude of primes via the Riemann Zeta function,
we want to show that
lim
s!1
Re s > 1
X F (p)
= 1,
ps
p prime
where F (p) = 1 if p ⌘ 1 mod 3 and 0 otherwise. Notice that the sum
X
p prime
F (p)
(⇤)
ps
converges absolutely for Re s > 1. Now consider the following Dirichlet characters:
(
1 if n ⌘ 1 or 2 mod 3
1 (n) =
0 else
8
>
< 1 if n ⌘ 1 mod 3
(n)
=
1 if n ⌘ 2 mod 3
1
>
:
0 else
Thus,
1 (n) +
1 (n)
F (n) =
, so
2
00
1 0
11
X
X
1
1 (p) A
1 (p) AA
(⇤) = @@
+@
,
2
ps
ps
p prime
p prime
since we are working within the realm of absolute convergence. In order to prove
there exist infinitely many primes congruent to 1 mod 3, it suffices to show that
the first function diverges while the second function is bounded as s ! 1.
L(s,
1)
=
X
n
1 (n)
ns
=
Y ✓
1+
p prime
1 (p)
ps
+
1 (p)
p2s
2
+ ···
◆
Taking the log of both sides, we get
log (L(s,
1 )) =
X ✓
p prime
1 (p)
ps
+
1 (p)
2p2s
2
+
1 (p)
3p3s
=
Y
p prime
3
+ ···
◆
1
1
1 (p)
ps
!
PROBABILITY AND DIRICHLET’S THEOREM
X
=
1 (p)
ps
p prime
X ✓
+
p prime
1 (p)
2p2s
2
+
1 (p)
3p3s
13
3
+ ···
◆
Recall from our proof of the Riemann Zeta function (Definition 3.4) that the rightP
most sum is bounded for Re s > 12 . So now, if we want to show that p prime p1s(p)
is bounded as s ! 1, we must show that L(s, 1 ) converges positively at s = 1 and
P
1 (p)
is non-zero, for if L(s, 1 ) were to converge to zero for s = 1, then
p prime
ps
would diverge to negative infinity by properties of the logarithm. But first, we will
apply Lemma 3.10 in order to justify that lims!1, Re s>1 L(s, 1 ) = “L(1, 1 )”.
We want to show that
X
X
1 (n)
1 (n)
lim
=
,
s
n
n
s!1
n
n
Re s > 1
1 (n+1)
so let Un = n1s(n)
(n+1)s , where n ⌘ 1 mod 3. We must check that
converges uniformly. For some arbitrarily large N,
✓
◆
X
X
1
1
Un (s) =
+
ns
(n + 1)s
n>N
X
Thus,
n⌘1 mod 3
Un (s)
n⌘1 mod 3,>N
s
n⌘1 mod 3,>N
P
(n + 1)
n
(n(n + 1))s
s
X C · ns
n2s
1
n>N
X
n>N
1
|ns+1 |
X 1
.
n2
n>N
1 1 1 1 1
+
+
+ ··· ,
2 4 5 7 8
which is a positive, alternating series and therefore converges to a positive real
number.
P
1 (p)
Now, let’s show that
diverges as s ! 1, so we will consider the
p prime ps
expression
lim
s!1, Re s>1
X
n
1 (n)
ns
L(s,
=
1)
Y ✓
p prime
= “L(1,
1+
1 (p)
ps
1 )”
+
=1
1 (p)
p2s
+ ···
◆
=
Y
p prime
1
1
1 (p)
ps
!
,
and taking the log of both sides, we get
◆
X ✓ 1 (p)
1 (p)
1 (p)
log(L(s, 1 )) =
+
+
+ ···
ps
2p2s
3p3s
p prime
◆
X 1 (p)
X ✓ 1 (p)
1 (p)
=
+
+
+ ··· .
ps
2p2s
3p3s
p prime
p prime
⇣
⌘ P
⇣
⌘
P
1 (p)
1 (p)
Notice that p prime 2p
= p prime, p6=3 2p12s + 3p13s + · · · , which
2s + 3p3s + · · ·
⇣
⌘
P
1
1
is equal to p prime 2p12s + 3p13s + · · ·
2·32s + 3·33s + · · · . Notice that this ex⇣
⌘
P
pression is less than p prime 2p12s + 3p13s + · · · , which we proved converges in the
P 1 (n)
section on the Riemann Zeta function for Re s > 12 . So, if we can show
n ns
PROBABILITY AND DIRICHLET’S THEOREM
14
P
1 (p)
diverges as s ! 1, that implies that
must diverge as well. Notice
p prime ps
that
X 1 (n) X 1
X
1
=
,
s
s
n
n
ns
n
n
n⌘0 mod 3
which is essentially the Riemann Zeta function, excluding every n1s where 3 | n.
Now, recall that the Riemann Zeta function can be expressed using Euler products
as follows:
!
Y
1
⇣(s) =
.
1 p1s
p prime
Therefore, we can observe that
X
n
1 (n)
ns
=⇣
⇣(s)
1
1
1
3s
✓
⌘ = ⇣(s) · 1
1
3s
◆
=
Y
p prime, p6=3
1
1
1
ps
!
.
As s ! 1, we can see that 1 31s is finite, non-zero, and consequently the above
P
product still diverges. Therefore, we know that the sum p prime 1p(p)
diverges, and
s
we conclude that there are infinitely many primes that are congruent to 1 mod 3.
Example 3.12. We show that there are infinitely many primes of the form p ⌘
1 mod 12.
Just as in the last example, we want to show that
lim
s!1
Re s > 1
X F (p)
= 1,
ps
p prime
where F (p) = 1 if p ⌘ 1 mod 12 and equal to 0 otherwise. Notice that the sum
X
p prime
F (p)
(⇤)
ps
converges absolutely for Re s > 1. Now consider the following characters:
(
1 if n ⌘ 1, 5, 7, or 11 mod 12
1,1 (n) =
0 else
8
>
< 1 if n ⌘ 1 or 5 mod 12
1 if n ⌘ 7 or 11 mod 12
1, 1 (n) =
>
:
0 else
8
>
< 1 if n ⌘ 1 or 7 mod 12
(n)
=
1 if n ⌘ 5 or 11 mod 12
1,1
>
:
0 else
8
>
< 1 if n ⌘ 1 or 11 mod 12
(n)
=
1 if n ⌘ 5 or 7 mod 12
1, 1
>
:
0 else
PROBABILITY AND DIRICHLET’S THEOREM
15
Thus,
F (n) =
0
X
1
(⇤) = @
4
p prime
1,1 (n)
+
X
1,1 (p)
+
s
p
1, 1 (n)
1,
p prime
+
4
1 (p)
ps
+
1,1 (n)
X
+
1, 1 (n)
, so
X
1,1 (p)
+
s
p
p prime
p prime
1,
1
(p)
1
A
ps
,
since we are working under conditions of absolute convergence. In order to show
that there are infinitely many primes congruent to 1 mod 12, we want to show that
the first sum diverges while the other three converge as s ! 1, as in the above
example. To show convergence of the last three terms, note that for any character
,
!
X (n)
Y
1
=
.
(p)
ns
1
s
p prime
p
Taking the log of both sides, the above equation simplifies to
◆
X ✓ (p)
(p)2
(p)3
log(L(s, )) =
+
+
+
·
·
·
ps
2p2s
3p3s
p prime
◆
X ✓ (p)2
(p)
(p)3
+
+
+
·
·
·
, so we have
ps
2p2s
3p3s
p prime
p prime
◆
3
X 1,1 (p)
X ✓ 1,1 (p)2
1,1 (p)
(i) log(L(s, 1,1 )) =
+
+
+ ··· ,
ps
2p2s
3p3s
p prime
p prime
◆
3
X
X ✓ 1,1 (p)2
1,1 (p)
1,1 (p)
(ii) log(L(s, 1,1 )) =
+
+
+
·
·
·
,
ps
2p2s
3p3s
p prime
p prime
◆
3
X
X ✓ 1, 1 (p)2
1, 1 (p)
1, 1 (p)
(iii) log(L(s, 1, 1 )) =
+
+
+ ··· .
ps
2p2s
3p3s
=
X
p prime
p prime
In the above equations, the second sums all are all bounded for Re s > 12 . To prove
the convergence of the other terms, we check that the L-functions of each character
are positively convergent and non-zero. As in the last example, lims!1 L(s, ) =
“L(1, )”, thus
1 1
1
1
1
1
+
+
5 7 11 13 17 19
which is convergent and positive, as it alternates in pairs.
L(1,
1, 1 )
=1+
1
+ ··· ,
23
1 1
1
1
1
1
1
+
+
+
+ ··· ,
5 7 11 13 17 19 23
which is convergent and positive because it is an alternating series. Finally,
L(1,
1,1 )
=1
1 1
1
1
1
1
1
+
+
+
+ ···
5 7 11 13 17 19 23
his series is also alternating and convergent, but it is not immediately clear that it
is positive due to the nature of its alternations. So, take some n 2 N and examine
1
1
1
1
the expression 12n+1
12n+5
12n+7 + 12n+11 . If this expression is positive, than
L(1,
1, 1 )
=1
PROBABILITY AND DIRICHLET’S THEOREM
16
we know that the series converges to a positive number. This expression simplifies
to
12n · 48 + 288
,
(12n + 1)(12n + 5)(12n + 7)(12n + 11)
which is positive for all n
0 and converges by the squeeze theorem (the above
expression is roughly less than n13 ), thus L(1, 1, 1 ) is convergent and positive.
P
1,1 (p)
Now, let’s show that
diverges as s ! 1. Once again, note that
p prime
ps
!
X 1,1 (n)
Y
1
=
,
1,1 (p)
ns
1
s
n
p prime
p
and taking the log of both sides, we get
◆
X 1,1 (p)
X ✓ 1,1 (p)
1,1 (p)
log(L(s, 1,1 )) =
+
+
+
·
·
·
.
ps
2p2s
3p3s
p prime
p prime
⇣
⌘
P
1
1
The second sum is less than
+
+
·
·
·
, which we proved con2s
3s
p prime 2p
3p
verges in the section on the Riemann Zeta function for Re s > 12 . Thus, showing
P 1,1 (n)
P
1,1 (p)
that
diverges as s ! 1 implies that
diverges as well.
n
p prime
ns
ps
X
n
1,1 (n)
ns
=
X 1
ns
n
X
n⌘0 mod 2 or 3
1
,
ns
which is a modification of the Riemann Zeta function, where we exclude every
1
a b
ns where n = 2 3 . Therefore,
!
✓
◆✓
◆
X 1,1 (n)
Y
⇣(s)
1
1
1
⌘⇣
⌘ = ⇣(s)· 1
=⇣
1
=
.
s
1
1
n
2s
3s
1 p1s
n
1
1
1
2s
1
3s
p prime, p6=2,3
As s ! 1, 1 21s and (1 31s ) are finite, and the above product is still infinite.
P
1,1 (p)
Thus,
diverges, and we conclude that there are infinitely many
p prime
ps
primes that are congruent to 1 mod 12.
Remark. The above examples hint at this result, but explicitly, L(s, ) diverges as
s ! 1 for trivial and converges positively for non-trivial.
Remark. Although we have seen that there are infinitely many primes in a given
congruence class, we have not shown anything about the distribution of primes
across all classes contained in (b) for any given b. We will not prove this, but it
is true that as we examine primes out to infinity, the density of primes congruent
1
to a mod b is (b)
, for all a in (Z/bZ)⇥ . We will give a probabalistic interpretation
of this result in the next section.
4. The Strong Law of Large Numbers and ‘Randomness’ of Primes
In this section, we will use the strong law of large numbers to provide a probabilistic intuition that, as we examine primes out to infinity, the density of primes
1
p ⌘ a mod b, for (a, b) = 1, should be equal to (b)
, which is essentially the strong
statement of Dirichlet’s Theorem.
PROBABILITY AND DIRICHLET’S THEOREM
17
Fix some b 2 N and denote S = (Z/bZ)⇥ . Let xk be a variable with uniform
random distribution in S, and consider the process
(
1 if xk = a
Yk =
0 otherwise.
Xn
According
Pn to the strong law of large numbers, as n ! 1, the expression n , where
Xn = k=1 Yk , should converge to the expected value of the process Yk , which is
1
(b) .
We will assume the following heuristics:
Heuristic 1 : Primes are distributed among congruence classes as if they are uniform, random variables, i.e., the congruence class of the k th prime is given by the
random variable xk as above.
Heuristic 2 : Only almost-sure events (P (E) = 1) can be reasonably expected to
hold on a single trial of a stochastic process.
Theorem 4.1. [Strong Dirichlet0 s Theorem] Fix a, b 2 N such that (a,b) = 1. Let
1
Pa be the set of prime numbers such that p ⌘ a mod b. The set Pa has density (b)
,
i.e.,
#{primes p < N, p ⌘ a mod b}
1
=
.
N !1
#{primes p < N }
(b)
lim
[See Chaper VI of Serre [3] for a proof of this theorem.)
Remark. Before we begin giving a “proof” of this theorem using the strong law of
large numbers, we must first address some key issues and assumptions. In particular, the strong law of large numbers deals with repeated stochastic processes;
primes are not random in that sense. Thus, what we will attempt to see is that
although primes are not random, they appear to behave as though they are randomly distributed, according to our heuristics. Essentially, the strong law of large
numbers cannot directly prove the strong Dirichlet’s theorem, but it provides us
intuition as to why it is reasonable.
“P roof :” Let us first try to model the drawing of primes as a “random” process.
From Heuristic 1, we suppose that all a 2 S = (Z/bZ)⇥ have uniform probability
1
of (b)
of being drawn. Choose some a 2 S, and as we repeat this process, let Xn
equal the number of times that a has been drawn after n iterations. Then,
Xn =
n
X
i=1
Yi , where Yi is
1 if a was drawn
.
0 else
So, now we can look at the mean of the process above:
PROBABILITY AND DIRICHLET’S THEOREM
Pn
i=1
lim
18
Yi
.
n
We can think about this expression in the context of the strong law of large numbers.
By the strong law of large numbers, this expression would almost surely converge
1
to the expected value µ of Xnn , which is (b)
, i.e.,
n!1
P
✓
lim
Pn
i=1
n!1
Yi
n
1
=
(b)
◆
= 1. (⇤)
Notice that the limit above is an alternate representation of the limit in Theorem
4.1, i.e.,
✓
◆
#{primes p < N, p ⌘ a mod b}
1
(⇤) () P lim
=
.
N !1
#{primes p < N }
(b)
Although primes are only drawn once, Heuristic 2 the Strong Dirichlet Theorem
should hold.
Remark. The sample space for a single drawing is {a1 , . . . , a (b) } = ⌦i . For the
Q1
infinite sequence, the sample space is i ⌦i = ⌦1 ⇥ ⌦2 ⇥ ⌦3 ⇥ · · · .
Example 4.2. Let’s choose b = 3, and let Xn be equal to the number of times that
a prime p ⌘ 1 mod 3 through the first n primes. In this example, we will examine,
for relatively small n, the actual value of Xnn . (Values computed with a computer
program.)
n
10
20
50
100
1000
10000
100000
1000000
Xn
Xn
n
4
0.4
9
0.45
24
0.48
48
0.48
491
0.491
4989
0.4989
49962 ⇡ 0.4996
499829 ⇡ 0.4998
Even for relatively small, increasing numbers of n, we can see that Xnn seems to be
converging towards the expected value of 0.5. Granted, with such a sample size, one
may still not be completely convinced of the result, but according to our heuristics
the strong law of large numbers gives us reason to believe that as n ! 1, Xnn
converges to one-half.
Remark. We have seen how the strong law of large numbers relates to primes and
congruence classes. We may also ask, would the weak law of large numbers have
been sufficient to suggest something useful about primes? Using the same variables
from the above proof and for some " > 0, the weak law of large numbers tells us
that
lim P
n!1
✓ Pn
i=1
n
Yi
µ >"
◆
= 0.
!
PROBABILITY AND DIRICHLET’S THEOREM
19
1
However, if we take any b > 2, then µ = (b)
 12 , so let us fix " to be 13 and
consider the probability
✓ Pn
◆
1
i=1 Yi
(⇤) P
µ >
.
n
3
Continue our heuristic assumptions. Then, consider the particular case where Y1 =
· · · = Yn = 1, meaning the first n primes are all within the same congruence class
mod b. For this certain case, we can observe that
✓
◆n
1
1 (⇤) P (Y1 = · · · = Yn = 1) =
> 0.
(b)
The above expression shows us that for any n, we have a positive probability that
(⇤) could happen. Thus, by Heuristic 2, although the strong law of large numbers
provides intuition as to why Dirichlet’s theorem holds, the weak law of large numbers is not enough to suggest anything significant about the distribution of primes
in congruence classes.
Acknowledgements
Acknowledgements. I would like to thank my mentor Sean Howe for his tremendous assistance and guidance throughout the duration of the program, and I would
also like to thank Peter May for organizing the REU program and allowing me to
participate in it.
References
[1]
[2]
[3]
[4]
Patrick Billingsley. 1995. Probability and Measure, 3rd Edition.
Elias M. Stein and Rami Shakarchi. 2003. Complex Analysis.
J.-P. Serre. A Course in Arithmetic. Graduate T exts in M athematics. Springer, 1973.
Liang Xiao. Number Theory Lecture Notes. U niversity of Chicago REU 2012.
http : //math.uchicago.edu/ ⇠ may/REU2012/.