Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROBABILITY AND DIRICHLET’S THEOREM LUKE MULTANEN Abstract. In this expository paper, we examine some fundamental theorems in probability theory, after which we transition and discuss Dirichlet’s theorem on the existence of infinitely many primes of the form p congruent to a modulo b, where a and b are coprime. We discuss aspects of the proof and apply the Strong Law of Large Numbers from probability theory to obtain a heuristic understanding of the density version of Dirichlet’s theorem. Contents 1. Introduction 2. Probability 3. Dirichlet’s Theorem 3.1. The Riemann Zeta Function 3.2. Dirichlet Characters 4. The Strong Law of Large Numbers and ‘Randomness’ of Primes Acknowledgements References 1 2 6 6 9 16 19 19 1. Introduction The primary purpose of this paper is to understand Dirichlet’s theorem. Theorem 1.1. [Dirichlet0 s Theorem] Fix a, b 2 N such that (a,b) = 1. There are infinitely many primes of the form p = a + bn, where n2 N. Theorem 1.2. [Strong Dirichlet0 s Theorem] Fix a, b 2 N such that (a,b) = 1. Let 1 Pa be the set of prime numbers such that p ⌘ a mod b. The set Pa has density (b) , i.e., #{primes p < N, p ⌘ a mod b} 1 = . N !1 #{primes p < N } (b) lim We will not fully prove these theorems. Instead, we prove the main components and necessary mechanics of the weak version of the theorem before applying the strong law of large numbers from probability theory in order to develop a heuristic understanding of why the strong version of Dirichlet’s theorem holds. Section 2 provides the necessary probabalistic background needed to accomplish our goal. This section introduces the fundamental definitions and properties required in order to derive the weak and strong laws of large numbers, then provides a couple of concrete examples in order to illustrate differences between the two. 1 PROBABILITY AND DIRICHLET’S THEOREM 2 Section 3 deals with the bulk of instruments needed to prove Dirichlet’s theorem, namely the Riemann Zeta function, L-functions and Dirichlet characters. This section uses the Riemann Zeta function in order to prove that there exist infinitely many primes, then transitions into a discussion of Dirichlet characters before illustrating how they can be used in conjunction with L-functions in order to prove the existence of infinitely many primes in any given congruence class. Section 4 combines the work from sections 2 and 3 in order to provide a probabalistic heuristic for the strong Dirichlet’s theorem. This section, operating under a heuristic assumption, motivates the strong version of Dirichlet’s Theorem by considering it as an instance of a random process and applying the strong law of large numbers. 2. Probability (Note: This section follows sections 1.1 and 1.2 of Billingsley [1]) Definition 2.1. Let ⌦ be a set, and let 2⌦ be the power set of ⌦, i.e. 2⌦ is the set containing all subsets of ⌦. A subset F ✓ 2⌦ is called a -algebra of ⌦ if it satisfies the following properties: (1) ⌦ 2 F, (2) If A 2 F, then Ac 2 F, (3) If Ai 2 F is some countable collection of sets, then [i Ai 2 F. Example 2.2. Let ⌦ = {1, 2, 3}. Then F = 2⌦ is the -algebra F = {;, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}. Definition 2.3. A sample space is a set ⌦ that contains all the possible outcomes of an experiment. Definition 2.4. A function P : F ! [0, 1) is called a probability measure for a -algebra F if the following conditions are satisfied: (1) 0 P (A) 1 for all A 2 F, (2) P (;) = 0 and P (⌦) = 1, P1 (3) For any sequence A1 , A2 , . . . of disjoint sets in F, P ([1 i=1 Ai ) = i=1 P (Ai ). Remark. If F were not a -algebra, then we would need to assume that [1 i=1 Ai 2 F for the third property to make sense. Definition 2.5. A probability space is denoted as (⌦, F, P ), where ⌦ is a sample space, F is a -algebra such that F ✓ 2⌦ , and P is a probability measure on F where P (⌦) = 1. The elements of F are often referred to as events, so for any A 2 F, P (A) is the probability that we end up in event A. Example 2.6. Suppose we have an experiment where we flip a coin twice. Then, the sample space ⌦ = {(H, H), (H, T), (T, H), (T, T)}, and let the -algebra F = 2⌦ . Then, for any A ✓ ⌦, P (A) = #A · 14 . For instance, if A is the event that the first coin flipped is heads, i.e. A = {(H, H), (H, T)}, then P (A) = 2 · 14 = 12 . Definition 2.7. A discrete probability space is a probability space (⌦, F, P ) where ⌦ is countable, F = 2⌦ , and the probability measure P is of the form X P (A) = p(!), 8 A ✓ ⌦, where p : ⌦ ! R such that + P !2A !2⌦ p(!) = 1. PROBABILITY AND DIRICHLET’S THEOREM 3 Example 2.8. Let ⌦ = Z, and suppose that p(0) = 12 and p(1) = any A ✓ ⌦, 8 > 1 if {0}, {1} 2 A < 1 P (A) = 2 if either {0} or {1} is in A, and > : 0 else. 1 2. Then, for Definition 2.9. A continuous probability density is a probability space (⌦, F, P ) where ⌦ ✓ R and is non-discrete, F is composed of Lebesgue measurable sets, and the probability measure P is of the form ˆ P (A) = p, A ´ where p : ⌦ ! R+ continuous and ⌦ p = 1. Remark. The Lebesgue measure of a set A is denoted by (A). Any closed interval [a, b] ✓ R has measure ([a, b]) = b a. The open set (a, b) has the same measure because single points have a measure of 0. For a rigorous development of Lebesgue measure, see section 2 of Billingsley [1]. Definition 2.10. A random variable is a function X : ⌦ ! R such that for all t 2 R, X 1 (( 1, t)) 2 F (i.e. P (X < t) has some measurable value.) Remark. In classical probability theory, there are two kinds of random variables: discrete and continuous, which correspond to the aforementioned discrete and continuous probability spaces. The remaining definitions and proofs will use discrete random variables. Example 2.11. The value of a fair die may be thought of as a random variable. For a single roll, the sample space is ⌦ = {1, 2, 3, 4, 5, 6}, set F =2⌦ , and p(1) = p(2) = · · · = p(6) = 16 . X, the value of the die, is a random variable that can take on any value x 2 {1, 2, 3, 4, 5, 6}. Definition 2.12. The expected value is the weighted average of all possible values of a random variable. Suppose some random variable X can take on values x1 , . . . , xn with probabilities p1 , . . . , pn , respectively. Then, E[X] = x1 p1 + x2 p2 + · · · + xn pn . Example 2.13. We can calculate the expected value of the die from Example 2.11, which is 1 1 1 1 1 1 E[X] = 1 · + 2 · + 3 · + 4 · + 5 · + 6 · = 3.5 6 6 6 6 6 6 Definition 2.14. The variance of a random variable X is defined as the expected value of the squared deviation of X from the mean E[X] = µ, i.e., V ar(X) = E[(X µ)2 ]. Remark. The variance is often denoted as having the value 2 . Example 2.15. We can calculate the variance of the die from Example 2.11, which is 6 6 X 1 1X V ar(X) = E[(X 3.5)2 ] = (i 3.5)2 = (i 3.5)2 6 6 i=1 i=1 PROBABILITY AND DIRICHLET’S THEOREM = 4 1 1 35 (( 2.5)2 + ( 1.5)2 + ( 0.5)2 + (0.5)2 + (1.5)2 + (2.5)2 ) = (17.5) = 6 6 12 Theorem 2.16. [Cheybyshev0 s Inequality] If X is a random variable with finite expected value µ and finite non-zero variance 2 , then for any real number k > 0, 2 P (|X µ| k) k2 Proof. We begin with the definition of variance: V ar(X) = E[(X E[X])2 ] X = (x E[X])2 · P (X = x) x2R X (x |x E[X]|>k > X |x E[X]|>k = k 2 · P (|X from which the result follows. E[x])2 · P (X = x) k 2 · P (X = x) E[X]| > k), ⇤ Remark. The above proof assumes that we are dealing with discrete variables rather than continuously distributed ones. The proof of this inequality for continuously distributed variables is very similiar, using the integral representation of variance instead of sums. Definition 2.17. A stochastic process is a sequence [xi : i 2 T ] of random variables on a probability space (⌦, F, P ), where each i 2 T represents a different point in time. ([1], Chapter 7) Definition 2.18. A sequence {Xn } of independent, random variables is said to converge in probability towards a value µ if for all " > 0, Pn Xi lim P (|X n µ| ") = 0, where X n = i . n!1 n Definition 2.19. For every i 2 T of a stochastic process, we work within the probability space (⌦i , Fi , Pi ). The product space of a stochastic process is denoted by Y Y Y ( ⌦i , “ Fi ”, “ Pi ”), Q i where i ⌦i is the sample space Q created by taking the cartesian product ofQeach individual sample space ⌦i , “ Fi ” is the corresponding -algebra of the space i ⌦i generated by subsets of the form A1 ⇥ A2 ⇥ · · · , Q where A1 2 F1 , A2 2 F2 , . . ., and Q “ Pi ” is the resulting probability measure on “ Fi ”. Remark. Observe that an infinite product of discrete probability spaces is not discrete. PROBABILITY AND DIRICHLET’S THEOREM 5 Theorem 2.20. [Weak Law of Large Numbers] The mean of a sequence of discrete, identically distributed, independent random variables converges in probability towards the expected value as the number of elements in the set approaches infinity. i.e., for any " > 0, X n ! µ, as n ! 1, lim P (|X n n!1 µ| > ") = 0. Proof. We assume finite variance V ar(Xi ) = 2 , for all i. Since the variables xi are independent, through standard properties of the variance, we get: 1 1 V ar(X n ) = V ar( (X1 + · · · + Xn )) = 2 V ar(X1 + · · · Xn ) n n n 2 1 X 1 = 2 V ar(Xi ) = 2 · n 2 = . n i=1 n n By applying Chebyshev’s inequality to X n , with E[X n ] = µ, we see that 2 P (|X n ") . n"2 As n ! 1, n"2 ! 0, so by Definition 2.17, we conclude that the Weak Law of Large Numbers holds. ⇤ 2 µ| Remark. Notice that since at any step we only deal with finite n, the product space is discrete and has finite measure. Theorem 2.21. [Strong Law of Large Numbers] The mean of a set of discrete, identically distributed, independent random variables will almost surely converge to the expected value, i.e., X n ! µ almost surely as n ! 1, or, ⇣ ⌘ P lim X n = µ = 1. n!1 Remark. The strong law of large numbers is more difficult to prove than the weak law because we are now dealing with an infinite product Q space rather Qthan a discrete one. Under an infinite product space, the terms “ Fi ” and “ Pi ” no longer make sense as we have previously defined them because they are no longer discrete. Suppose we flip a coin infinitely many times. If we were to look at a specific sequence Q1 Xn ✓ i ⌦i , the probability of this sequence happening is 12 ⇥ 12 ⇥ 12 ⇥ · · · = 0, so it would appear that no sequence has a non-zero probability. However, Q1there are still many questions that can be asked about such an infinite space i ⌦i , such as what the probability is that the first coin flipped will be heads (which should always be 12 in every trial). The framework that we have already introduced is not enough to prove the strong law of large numbers, particularly the statement about almost sure convergence. In order to prove this theorem, one must first more carefully develop methods from measure theory. These two theorems are very similar in nature, but there is an important, subtle distinction between what they say. The weak law states that for n sufficiently large, the mean X n is likely to be near the expected value µ. It is possible under the weak law to have a nonzero probability that |X n µ| > " for infinitely many PROBABILITY AND DIRICHLET’S THEOREM 6 n. The strong law states that this will almost surely not happen. For any " > 0, the strong law implies with probability 1 that X n µ < " holds for n sufficiently large. Example 2.22. [Discrete Variables] Consider the case of flipping a coin. Assuming the coin is fair, there is an equal chance that the coin will land heads or tails. Now suppose that we place a number on each side of the coin, 1 for heads and 0 for tails, and sum the results from each flip, starting at 0. If we flip the coin infinitely many times, we should expect that the mean will be close to 12 . The weak law allows for a positive probability that for infinitely many n and some " > 0, X n 12 > ", but the strong law almost surely eliminates this possibility from occuring. The sample Q1 space for such an experiment is i ⌦i = {0, 1} ⇥ {0, 1} ⇥ {0, 1} ⇥ · · · . Remark. The strong law implies the weak law. Example 2.23. [Continuous Distribution] Now consider a person throwing darts at a dart board. The person in question is not a skilled player and is equally likely to land a dart at any position on the board. If we look at the board as a coordinate system, we can measure each dart throw by its displacement from the bullseye, which serves as the origin. In that sense, we could find the “mean,” so to speak, of a set of dart throws. Assuming that where each individual dart lands is in fact uniformly random, then the expected value, or expected landing point, would be the bullseye. That said, as the thrower throws infinitely many darts, the weak law of large numbers says that the mean displacement of the individual throws is likely to be arbitrarily close to the origin, but the strong law nearly assures that outcome. 3. Dirichlet’s Theorem (Note: This section loosely follows Xiao’s [4] notes from lectures 1 and 2 and borrows from Chapter VI of Serre [3].) 3.1. The Riemann Zeta Function. Theorem 3.1. [Euclid] There are infinitely many primes. Proof. Assume by contradiction that there are a finite number of primes {p1 , . . . , pn } and let X = p1 · p2 · · · · · p n . Now, consider the number X + 1. X + 1 ⌘ 1 mod pi for all pi in our set of primes, so no pi divides X + 1. However, X + 1 > 1, so 9 a prime p that divides X + 1, and p 2 / {p1 , . . . , pn }, which is a contradiction. Therefore, there must be infinitely many primes. ⇤ Definition 3.2. We will denote by (a, b) the greatest common divisor of a and b. Theorem 3.3. [Dirichlet0 s Theorem] Fix a, b 2 N such that (a,b) = 1. There are infinitely many primes of the form p = a + bn, where n2 N. This theorem may not seem as intuitively apparent as Euclid’s Theorem, and at the moment, we are not fully equipped to prove it. First, we must introduce the Riemann Zeta function and use it to give an alternate proof of the infinitude of P 1 primes. We will prove that p prime ps ! 1 as s ! 1, Re s > 1, s 2 C. This is PROBABILITY AND DIRICHLET’S THEOREM 7 a stronger statement than simply stating there are infinitely many primes and is a proof that generalizes to Dirichlet’s Theorem. Definition 3.4. The Riemann Zeta function is defined as X 1 ⇣(s) = , for Re s > 1, s 2 C. ns n2N which converges absolutely when the real component of s is greater than 1. P 1 Claim. n2N ns converges absolutely for Re s > 1 and diverges when s = 1. Proof. Let Re s > 1. We get X 1 X 1 = s n |ns | n2N n2N Since s is complex and of the form s = so + it, 1 |ns | = 1 |n ||nit | = 1 |n ||eitlogn | = 1 |n | , X 1 X 1 . ns n n2N P1 n2N ´1 as a lower Riemann sum for 1 x1 dx, then we see that ˆ 1 X 1 1 1+ dx n x 1 n2N ✓ ◆ X 1 1 1 ) 1 + lim x!1 n 1 ( 1)x 1 n2N ✓ ◆ X 1 1 ) , which is < 1 for = Re s > 1. n 1 n2N P For s = 1, it is clear that the sum diverges (it is just n2N n1 ). ⇤ If we think of 1 n=2 n Claim. lims!1, Re s>1 |⇣(s)| = +1. Proof. Consider what happens as s ! 1, with Re s > 1 and Im s arbitrary. If we can show that ⇣(s) s 1 1 is bounded for Re s > 1 in some neighborhood of 1, then lims!1+ |⇣(s)| = 1. ◆ X ✓ˆ n+1 1 X ✓ˆ 1 1 1 1 dx = dt = s s 1 xs ns ts 1 n 0 n n2N n2N n2N ⇣ ⌘ 1 Now, we want to bound the function fn (t) = n1s . Note that fn (0) = 0 s (n+t) ⇣(s) 1 = 0 X 1 ns and that fn (t) = s · 1 ˆ 1 (n+t)s+1 . Therefore, when 0 t 1, 0 |fn (t)| sup fn (t) · (1 0t1 ) ˆ 0 1 fn (t)dt 0) s ns+1 s ns+1 1 dt (n + t)s ◆ PROBABILITY AND DIRICHLET’S THEOREM So, X ✓ˆ n2N 0 1 1 ns 1 dt (n + t)s which is convergent for Re s > 0. ◆ X n2N 8 s ns+1 ⇤ Remark. The previous claimPillustrates that the sum diverges along the real ´ 1axis. 1 Using the observation that expresses an upper Riemann sum of 1 x1 , n2N n we can see that when Im s = 0, ˆ 1 X 1 1 dx xs ns 1 n2N ✓ ◆ X 1 1 ) s 1 ns n2N X 1 ) ! +1, ns n2N giving us an alternate proof of the infinitude of primes. Theorem 3.5. lims!1 |⇣(s)| = 1 implies that there are infinitely many primes. Proof. Let us, for now, assume that the real component of s is greater than 1 so that we can work within a realm of absolute convergence in order to rearrange sums. Since every natural number greater than one is uniquely the product of powers of prime numbers, we can rewrite the sum as ◆ X 1 Y ✓ 1 1 = 1 + + + · · · . ns ps p2s p prime n2N Taking the log of both sides, we get 0 log(⇣(s)) = log @ Y (1 + p prime 1 1 1 + 2s + · · · )A , ps p and by the additive property of logarithms, we see that ✓ ◆ X 1 1 log(⇣(s)) = log 1 + s + 2s + · · · . p p p prime Notice that the inside of the log function on the right hand side is a geometric series, meaning the right hand side can be rewritten as ! X 1 log , 1 p1s p prime and by properties of the logarithm, this is equivalent to ✓ ◆ X 1 log 1 . ps p prime The Taylor expansion of log(1 expression becomes x) is (x + x2 2 + x3 3 + · · · ), therefore the above PROBABILITY AND DIRICHLET’S THEOREM 9 ◆ ◆ X ✓1 X 1 X ✓ 1 1 1 1 + 2s + 3s + · · · = + + 3s + · · · . ps 2p 3p ps 2p2s 3p p prime p prime p prime If we can show that the second sum on the right converges, then there are infinitely many primes: lims!1 |⇣(s)| = 1 =) lims!1 | log ⇣(s)| = 1, which means the above expression will diverge as well. Thus, by showing that the second sum is P 1 bounded as s ! 1, then lims!1 p prime ps = 1, which implies that there must be infinitely many primes. In fact, the second sum on the right converges absolutely for Re s > 12 . ◆ X 1 ✓ 1 1 1 1 + 3s + · · · 1 + s + 2s + · · · (⇤) 2p2s 3p p2s p p p prime p prime ◆ ✓ ◆ ✓ ◆ ⇣ ⌘ ✓ 1 1 1 1 1 The geometric sum 1 + ps + p2s + · · · = 1 p1 1 p1 = 1 X 1 C, so (⇤) C which converges for s > 1 2 real. X p prime ps X 1 1 C , 2s p n2s p 2 n2N ⇤ 3.2. Dirichlet Characters. Dirichlet characters are useful because they enable one to use the methods seen in the alternative proof of infinitude of primes using the Riemann Zeta function in order to prove Dirichlet’s Theorem. Definition 3.6. A function the following properties: : Z ! C is called a Dirichlet character if it satisfies (1) 9 N 2 N s.t. (x + N) = (x) (cyclic in N) (2) If (x, N) > 1, then (x) = 0, and if (x, N) = 1, then (x) 6= 0 (3) (mn) = (m) · (n) (multiplicative) From the above properties, we can see that • (1) = 1 • if a ⌘ b mod N, then (a) = (b) ⇥ • Characters are equivalent to maps of finite abelian groups (Z/nZ) ! C⇥ [4] Definition 3.7. Euler’s phi function is defined by (n) = k, where k is the number of positive integers less than or equal to n with (n, k) = 1. In order to construct Dirichlet characters, we find an isomorphism from a finite ⇥ abelian group (Z/nZ) to a product of additive groups. By the theory of finite abelian groups, 9 m1 , . . . mk such that (Z/nZ)⇥ ⇠ = Z/m1 Z ⇥ · · · ⇥ Z/mk Z. ⇥ ⇠ In the case that (Z/nZ) = Z/ (n)Z (i.e. the group is cyclic), we can give the characters by PROBABILITY AND DIRICHLET’S THEOREM ⇥ : (Z/nZ) ! C⇥ () 10 ⇥ : Z/mZ ! C⇥ , where (Z/nZ) ⇠ = Z/mZ. Suppose that g is a generator of (Z/nZ)⇥ . For any (x, n) = 1, we define b(x) = ↵, where x ⌘ g ↵ mod n. Now, for h = 0, . . . , (n) 1, we define ( e2⇡ihb(x)/ (n) , if (x, n) = 1 [4] , h (n) = 0 else which give all of the characters of (Z/nZ)⇥ . Example 3.8. We find the Dirichlet characters for various n: (1) n = 9: Notice that 2 is a generator of (Z/9Z)⇥ , so we map (Z/9Z)⇥ ! Z/6Z by 2a 7! a. 1 1 7! 6 = 0 2 7! 1 4 7! 2 8 7! 3 7 7! 4 5 7! 5 1 1 1 1 1 1 1 1 1 1 1 1 1 e i⇡ 3 e 1 2i⇡ 3 1 i⇡ e3 2i⇡ e 3 1 4i⇡ e 3 5i⇡ e 3 e 4i⇡ 3 1 2i⇡ 5i⇡ 3 1 4i⇡ e 3 4i⇡ e 3 1 2i⇡ e 3 4i⇡ e 3 e 5i⇡ e 3 2i⇡ e 3 1 4i⇡ e 3 2i⇡ e 3 e 3 4i⇡ e 3 1 2i⇡ e 3 i⇡ e3 (2) n = 12: {1, 5, 7, 11} 2 (Z/12Z)⇥ , but the squares of all these numbers are congruent to 1 mod 12, so we map from (Z/12Z)⇥ ! Z/2Z ⇥ Z/2Z by 5a · 7b 7 ! (a, b). We get 1,1 1 7! (0, 0) 5 7! (1, 0) 7 7! (0, 1) 11 7! (1, 1) 1 1 1 1 1, 1 1,1 1 1 1 1 1, 1 1 1 1 1 1 1 1 1 (3) n = 24: The squares of all eight members of the group (Z/24Z)⇥ are congruent to 1 mod 24, so we map from (Z/24Z)⇥ ! Z/2Z ⇥ Z/2Z ⇥ Z/2Z by 5a · 7b · 13c 7 ! (a, b, c). We get 1,1,1 1 7! (0, 0, 0) 5 7! (1, 0, 0) 7 7! (0, 1, 0) 11 7! (1, 1, 0) 13 7! (0, 0, 1) 17 7! (1, 0, 1) 19 7! (0, 1, 1) 23 7! (1, 1, 1) 1 1 1 1 1 1 1 1 1,1, 1 1, 1,1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,1,1 1 1 1 1 1 1 1 1 1, 1, 1 1 1 1 1 1 1 1 1 1,1, 1 1 1 1 1 1 1 1 1 1, 1,1 1 1 1 1 1 1 1 1 (4) n = 15: Four of the elements in the group are of order two, while the other four are of order four. We map from (Z/15Z)⇥ 7! Z/4Z ⇥ Z/2Z by 1, 1, 1 1 1 1 1 1 1 1 1 PROBABILITY AND DIRICHLET’S THEOREM 11 2a · 11b 7 ! (a, b). We get 1,1 1 7! (0, 0) 2 7! (1, 0) 4 7! (2, 0) 7 7! (1, 1) 8 7! (3, 0) 11 7! (0, 1) 13 7! (3, 1) 14 7! (2, 1) 1 1 1 1 1 1 1 1 1, 1 1,1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1, 1 i,1 i, 1 1 1 1 1 1 1 1 1 1 i 1 i i 1 i 1 1 i 1 i i 1 i 1 i,1 1 i 1 i i 1 i 1 i, 1 1 i 1 i i 1 i 1 Definition 3.9. Let m be an integer greater than 1 and let be a Dirichlet character of the multiplicative group mod m. The L-function is defined by L(s, ) = 1 X (n) , for Re s > 1. ns n=1 Remark. We are primarily concerned with the behavior of L-functions as s ! 1. P Lemma 3.10. Suppose that i Ui (s) exists 8s > 1, where Ui is continuous in some neighborhood of 1 8i, and that the sum converges uniformly in s in some neighborhood V of 1, i.e., 8" > 0, 9N such that Now suppose that P i X i>N Ui (s) < ", for s 2 V. Ui (1) exists (though it may be infinite). Then, X X lim Ui (s) = Ui (1). s!1 i i Re s > 1 Remark. In our application, each Ui is a grouping of successive terms of a given sequence. This will become more clear in the following example. P1 PN Proof. Consider the expression i=1 Ui (s) i=1 Ui (1) . By regrouping terms, we get N X i=1 (Ui (s) Ui (1)) + 1 X i=N +1 Ui (s) If we fix N large enough, then (⇤) < N X N X (Ui (s) Ui (1)) + i=1 (Ui (s) 1 X Ui (s) (⇤) i=N +1 Ui (1)) + ". i=1 Further, since Ui (s) is continuous in some neighborhood of 1 for all i, we can make s close enough to 1 such that " |Ui (s) Ui (1)| < , 8 1 i N, N =) (⇤) < 2". ⇤ PROBABILITY AND DIRICHLET’S THEOREM 12 Remark. This result allows us to show that for non-trivial , lims!1, Re s>1 L(s, ) = P “L(1, )” = n2N (n) n . We use the notation “L(1, )” here because we have not introduced analytic continuation, which is a technique necessary to actually define that lims!1, Re s>1 L(s, ) = L(1, ). For the purposes of this paper, though, analytic continuation is not necessary. Example 3.11. We show that there are infinitely many primes of the form p ⌘ 1 mod 3. Just as in the proof of the infinitude of primes via the Riemann Zeta function, we want to show that lim s!1 Re s > 1 X F (p) = 1, ps p prime where F (p) = 1 if p ⌘ 1 mod 3 and 0 otherwise. Notice that the sum X p prime F (p) (⇤) ps converges absolutely for Re s > 1. Now consider the following Dirichlet characters: ( 1 if n ⌘ 1 or 2 mod 3 1 (n) = 0 else 8 > < 1 if n ⌘ 1 mod 3 (n) = 1 if n ⌘ 2 mod 3 1 > : 0 else Thus, 1 (n) + 1 (n) F (n) = , so 2 00 1 0 11 X X 1 1 (p) A 1 (p) AA (⇤) = @@ +@ , 2 ps ps p prime p prime since we are working within the realm of absolute convergence. In order to prove there exist infinitely many primes congruent to 1 mod 3, it suffices to show that the first function diverges while the second function is bounded as s ! 1. L(s, 1) = X n 1 (n) ns = Y ✓ 1+ p prime 1 (p) ps + 1 (p) p2s 2 + ··· ◆ Taking the log of both sides, we get log (L(s, 1 )) = X ✓ p prime 1 (p) ps + 1 (p) 2p2s 2 + 1 (p) 3p3s = Y p prime 3 + ··· ◆ 1 1 1 (p) ps ! PROBABILITY AND DIRICHLET’S THEOREM X = 1 (p) ps p prime X ✓ + p prime 1 (p) 2p2s 2 + 1 (p) 3p3s 13 3 + ··· ◆ Recall from our proof of the Riemann Zeta function (Definition 3.4) that the rightP most sum is bounded for Re s > 12 . So now, if we want to show that p prime p1s(p) is bounded as s ! 1, we must show that L(s, 1 ) converges positively at s = 1 and P 1 (p) is non-zero, for if L(s, 1 ) were to converge to zero for s = 1, then p prime ps would diverge to negative infinity by properties of the logarithm. But first, we will apply Lemma 3.10 in order to justify that lims!1, Re s>1 L(s, 1 ) = “L(1, 1 )”. We want to show that X X 1 (n) 1 (n) lim = , s n n s!1 n n Re s > 1 1 (n+1) so let Un = n1s(n) (n+1)s , where n ⌘ 1 mod 3. We must check that converges uniformly. For some arbitrarily large N, ✓ ◆ X X 1 1 Un (s) = + ns (n + 1)s n>N X Thus, n⌘1 mod 3 Un (s) n⌘1 mod 3,>N s n⌘1 mod 3,>N P (n + 1) n (n(n + 1))s s X C · ns n2s 1 n>N X n>N 1 |ns+1 | X 1 . n2 n>N 1 1 1 1 1 + + + ··· , 2 4 5 7 8 which is a positive, alternating series and therefore converges to a positive real number. P 1 (p) Now, let’s show that diverges as s ! 1, so we will consider the p prime ps expression lim s!1, Re s>1 X n 1 (n) ns L(s, = 1) Y ✓ p prime = “L(1, 1+ 1 (p) ps 1 )” + =1 1 (p) p2s + ··· ◆ = Y p prime 1 1 1 (p) ps ! , and taking the log of both sides, we get ◆ X ✓ 1 (p) 1 (p) 1 (p) log(L(s, 1 )) = + + + ··· ps 2p2s 3p3s p prime ◆ X 1 (p) X ✓ 1 (p) 1 (p) = + + + ··· . ps 2p2s 3p3s p prime p prime ⇣ ⌘ P ⇣ ⌘ P 1 (p) 1 (p) Notice that p prime 2p = p prime, p6=3 2p12s + 3p13s + · · · , which 2s + 3p3s + · · · ⇣ ⌘ P 1 1 is equal to p prime 2p12s + 3p13s + · · · 2·32s + 3·33s + · · · . Notice that this ex⇣ ⌘ P pression is less than p prime 2p12s + 3p13s + · · · , which we proved converges in the P 1 (n) section on the Riemann Zeta function for Re s > 12 . So, if we can show n ns PROBABILITY AND DIRICHLET’S THEOREM 14 P 1 (p) diverges as s ! 1, that implies that must diverge as well. Notice p prime ps that X 1 (n) X 1 X 1 = , s s n n ns n n n⌘0 mod 3 which is essentially the Riemann Zeta function, excluding every n1s where 3 | n. Now, recall that the Riemann Zeta function can be expressed using Euler products as follows: ! Y 1 ⇣(s) = . 1 p1s p prime Therefore, we can observe that X n 1 (n) ns =⇣ ⇣(s) 1 1 1 3s ✓ ⌘ = ⇣(s) · 1 1 3s ◆ = Y p prime, p6=3 1 1 1 ps ! . As s ! 1, we can see that 1 31s is finite, non-zero, and consequently the above P product still diverges. Therefore, we know that the sum p prime 1p(p) diverges, and s we conclude that there are infinitely many primes that are congruent to 1 mod 3. Example 3.12. We show that there are infinitely many primes of the form p ⌘ 1 mod 12. Just as in the last example, we want to show that lim s!1 Re s > 1 X F (p) = 1, ps p prime where F (p) = 1 if p ⌘ 1 mod 12 and equal to 0 otherwise. Notice that the sum X p prime F (p) (⇤) ps converges absolutely for Re s > 1. Now consider the following characters: ( 1 if n ⌘ 1, 5, 7, or 11 mod 12 1,1 (n) = 0 else 8 > < 1 if n ⌘ 1 or 5 mod 12 1 if n ⌘ 7 or 11 mod 12 1, 1 (n) = > : 0 else 8 > < 1 if n ⌘ 1 or 7 mod 12 (n) = 1 if n ⌘ 5 or 11 mod 12 1,1 > : 0 else 8 > < 1 if n ⌘ 1 or 11 mod 12 (n) = 1 if n ⌘ 5 or 7 mod 12 1, 1 > : 0 else PROBABILITY AND DIRICHLET’S THEOREM 15 Thus, F (n) = 0 X 1 (⇤) = @ 4 p prime 1,1 (n) + X 1,1 (p) + s p 1, 1 (n) 1, p prime + 4 1 (p) ps + 1,1 (n) X + 1, 1 (n) , so X 1,1 (p) + s p p prime p prime 1, 1 (p) 1 A ps , since we are working under conditions of absolute convergence. In order to show that there are infinitely many primes congruent to 1 mod 12, we want to show that the first sum diverges while the other three converge as s ! 1, as in the above example. To show convergence of the last three terms, note that for any character , ! X (n) Y 1 = . (p) ns 1 s p prime p Taking the log of both sides, the above equation simplifies to ◆ X ✓ (p) (p)2 (p)3 log(L(s, )) = + + + · · · ps 2p2s 3p3s p prime ◆ X ✓ (p)2 (p) (p)3 + + + · · · , so we have ps 2p2s 3p3s p prime p prime ◆ 3 X 1,1 (p) X ✓ 1,1 (p)2 1,1 (p) (i) log(L(s, 1,1 )) = + + + ··· , ps 2p2s 3p3s p prime p prime ◆ 3 X X ✓ 1,1 (p)2 1,1 (p) 1,1 (p) (ii) log(L(s, 1,1 )) = + + + · · · , ps 2p2s 3p3s p prime p prime ◆ 3 X X ✓ 1, 1 (p)2 1, 1 (p) 1, 1 (p) (iii) log(L(s, 1, 1 )) = + + + ··· . ps 2p2s 3p3s = X p prime p prime In the above equations, the second sums all are all bounded for Re s > 12 . To prove the convergence of the other terms, we check that the L-functions of each character are positively convergent and non-zero. As in the last example, lims!1 L(s, ) = “L(1, )”, thus 1 1 1 1 1 1 + + 5 7 11 13 17 19 which is convergent and positive, as it alternates in pairs. L(1, 1, 1 ) =1+ 1 + ··· , 23 1 1 1 1 1 1 1 + + + + ··· , 5 7 11 13 17 19 23 which is convergent and positive because it is an alternating series. Finally, L(1, 1,1 ) =1 1 1 1 1 1 1 1 + + + + ··· 5 7 11 13 17 19 23 his series is also alternating and convergent, but it is not immediately clear that it is positive due to the nature of its alternations. So, take some n 2 N and examine 1 1 1 1 the expression 12n+1 12n+5 12n+7 + 12n+11 . If this expression is positive, than L(1, 1, 1 ) =1 PROBABILITY AND DIRICHLET’S THEOREM 16 we know that the series converges to a positive number. This expression simplifies to 12n · 48 + 288 , (12n + 1)(12n + 5)(12n + 7)(12n + 11) which is positive for all n 0 and converges by the squeeze theorem (the above expression is roughly less than n13 ), thus L(1, 1, 1 ) is convergent and positive. P 1,1 (p) Now, let’s show that diverges as s ! 1. Once again, note that p prime ps ! X 1,1 (n) Y 1 = , 1,1 (p) ns 1 s n p prime p and taking the log of both sides, we get ◆ X 1,1 (p) X ✓ 1,1 (p) 1,1 (p) log(L(s, 1,1 )) = + + + · · · . ps 2p2s 3p3s p prime p prime ⇣ ⌘ P 1 1 The second sum is less than + + · · · , which we proved con2s 3s p prime 2p 3p verges in the section on the Riemann Zeta function for Re s > 12 . Thus, showing P 1,1 (n) P 1,1 (p) that diverges as s ! 1 implies that diverges as well. n p prime ns ps X n 1,1 (n) ns = X 1 ns n X n⌘0 mod 2 or 3 1 , ns which is a modification of the Riemann Zeta function, where we exclude every 1 a b ns where n = 2 3 . Therefore, ! ✓ ◆✓ ◆ X 1,1 (n) Y ⇣(s) 1 1 1 ⌘⇣ ⌘ = ⇣(s)· 1 =⇣ 1 = . s 1 1 n 2s 3s 1 p1s n 1 1 1 2s 1 3s p prime, p6=2,3 As s ! 1, 1 21s and (1 31s ) are finite, and the above product is still infinite. P 1,1 (p) Thus, diverges, and we conclude that there are infinitely many p prime ps primes that are congruent to 1 mod 12. Remark. The above examples hint at this result, but explicitly, L(s, ) diverges as s ! 1 for trivial and converges positively for non-trivial. Remark. Although we have seen that there are infinitely many primes in a given congruence class, we have not shown anything about the distribution of primes across all classes contained in (b) for any given b. We will not prove this, but it is true that as we examine primes out to infinity, the density of primes congruent 1 to a mod b is (b) , for all a in (Z/bZ)⇥ . We will give a probabalistic interpretation of this result in the next section. 4. The Strong Law of Large Numbers and ‘Randomness’ of Primes In this section, we will use the strong law of large numbers to provide a probabilistic intuition that, as we examine primes out to infinity, the density of primes 1 p ⌘ a mod b, for (a, b) = 1, should be equal to (b) , which is essentially the strong statement of Dirichlet’s Theorem. PROBABILITY AND DIRICHLET’S THEOREM 17 Fix some b 2 N and denote S = (Z/bZ)⇥ . Let xk be a variable with uniform random distribution in S, and consider the process ( 1 if xk = a Yk = 0 otherwise. Xn According Pn to the strong law of large numbers, as n ! 1, the expression n , where Xn = k=1 Yk , should converge to the expected value of the process Yk , which is 1 (b) . We will assume the following heuristics: Heuristic 1 : Primes are distributed among congruence classes as if they are uniform, random variables, i.e., the congruence class of the k th prime is given by the random variable xk as above. Heuristic 2 : Only almost-sure events (P (E) = 1) can be reasonably expected to hold on a single trial of a stochastic process. Theorem 4.1. [Strong Dirichlet0 s Theorem] Fix a, b 2 N such that (a,b) = 1. Let 1 Pa be the set of prime numbers such that p ⌘ a mod b. The set Pa has density (b) , i.e., #{primes p < N, p ⌘ a mod b} 1 = . N !1 #{primes p < N } (b) lim [See Chaper VI of Serre [3] for a proof of this theorem.) Remark. Before we begin giving a “proof” of this theorem using the strong law of large numbers, we must first address some key issues and assumptions. In particular, the strong law of large numbers deals with repeated stochastic processes; primes are not random in that sense. Thus, what we will attempt to see is that although primes are not random, they appear to behave as though they are randomly distributed, according to our heuristics. Essentially, the strong law of large numbers cannot directly prove the strong Dirichlet’s theorem, but it provides us intuition as to why it is reasonable. “P roof :” Let us first try to model the drawing of primes as a “random” process. From Heuristic 1, we suppose that all a 2 S = (Z/bZ)⇥ have uniform probability 1 of (b) of being drawn. Choose some a 2 S, and as we repeat this process, let Xn equal the number of times that a has been drawn after n iterations. Then, Xn = n X i=1 Yi , where Yi is 1 if a was drawn . 0 else So, now we can look at the mean of the process above: PROBABILITY AND DIRICHLET’S THEOREM Pn i=1 lim 18 Yi . n We can think about this expression in the context of the strong law of large numbers. By the strong law of large numbers, this expression would almost surely converge 1 to the expected value µ of Xnn , which is (b) , i.e., n!1 P ✓ lim Pn i=1 n!1 Yi n 1 = (b) ◆ = 1. (⇤) Notice that the limit above is an alternate representation of the limit in Theorem 4.1, i.e., ✓ ◆ #{primes p < N, p ⌘ a mod b} 1 (⇤) () P lim = . N !1 #{primes p < N } (b) Although primes are only drawn once, Heuristic 2 the Strong Dirichlet Theorem should hold. Remark. The sample space for a single drawing is {a1 , . . . , a (b) } = ⌦i . For the Q1 infinite sequence, the sample space is i ⌦i = ⌦1 ⇥ ⌦2 ⇥ ⌦3 ⇥ · · · . Example 4.2. Let’s choose b = 3, and let Xn be equal to the number of times that a prime p ⌘ 1 mod 3 through the first n primes. In this example, we will examine, for relatively small n, the actual value of Xnn . (Values computed with a computer program.) n 10 20 50 100 1000 10000 100000 1000000 Xn Xn n 4 0.4 9 0.45 24 0.48 48 0.48 491 0.491 4989 0.4989 49962 ⇡ 0.4996 499829 ⇡ 0.4998 Even for relatively small, increasing numbers of n, we can see that Xnn seems to be converging towards the expected value of 0.5. Granted, with such a sample size, one may still not be completely convinced of the result, but according to our heuristics the strong law of large numbers gives us reason to believe that as n ! 1, Xnn converges to one-half. Remark. We have seen how the strong law of large numbers relates to primes and congruence classes. We may also ask, would the weak law of large numbers have been sufficient to suggest something useful about primes? Using the same variables from the above proof and for some " > 0, the weak law of large numbers tells us that lim P n!1 ✓ Pn i=1 n Yi µ >" ◆ = 0. ! PROBABILITY AND DIRICHLET’S THEOREM 19 1 However, if we take any b > 2, then µ = (b) 12 , so let us fix " to be 13 and consider the probability ✓ Pn ◆ 1 i=1 Yi (⇤) P µ > . n 3 Continue our heuristic assumptions. Then, consider the particular case where Y1 = · · · = Yn = 1, meaning the first n primes are all within the same congruence class mod b. For this certain case, we can observe that ✓ ◆n 1 1 (⇤) P (Y1 = · · · = Yn = 1) = > 0. (b) The above expression shows us that for any n, we have a positive probability that (⇤) could happen. Thus, by Heuristic 2, although the strong law of large numbers provides intuition as to why Dirichlet’s theorem holds, the weak law of large numbers is not enough to suggest anything significant about the distribution of primes in congruence classes. Acknowledgements Acknowledgements. I would like to thank my mentor Sean Howe for his tremendous assistance and guidance throughout the duration of the program, and I would also like to thank Peter May for organizing the REU program and allowing me to participate in it. References [1] [2] [3] [4] Patrick Billingsley. 1995. Probability and Measure, 3rd Edition. Elias M. Stein and Rami Shakarchi. 2003. Complex Analysis. J.-P. Serre. A Course in Arithmetic. Graduate T exts in M athematics. Springer, 1973. Liang Xiao. Number Theory Lecture Notes. U niversity of Chicago REU 2012. http : //math.uchicago.edu/ ⇠ may/REU2012/.