Download essay

Szemerédi’s Theorem via Ergodic Theory Jonathan Hickman January 25, 2016 Abstract This essay investigates Furstenberg’s proof of Szemerédi’s Theorem. The necessary concepts and results from ergodic theory are introduced, including the Poincaré and Mean Ergodic Theorems which are proved in full. The Ergodic Decomposition Theorem is also discussed. Furstenberg’s Multiple Recurrence Theorem is then stated and shown to imply Szemerédi’s Theorem. The remainder of the essay concentrates on proving the Multiple Recurrence Theorem, following the method laid out in [10]. As part of this proof the notions of conditional expectation and measure are introduced and discussed in some detail. 1 Introduction This essay is concerned with exploring whether certain subsets of the integers contain arithmetic progressions. Recall that an (integer) arithmetic progression is a set {a + jb}k−1 j=0 ⊂ Z where k ∈ N≥1 is the length of the progression; a ∈ Z and b ∈ N≥1 . Usually we will not be concerned with the values of the starting point a and step-size b, we merely want to know if given a set Λ ⊆ Z and k ∈ N there exists a progression of length k within Λ. In 1927 van der Waerden proved the following fundamental theorem: Theorem 1.1 (van der Waerden). Suppose A1 , . . . , An ⊆ Z are disjoint sets which partition the integers, Z = A1 t A2 t · · · t An Then there exists some l ∈ {1, . . . , n} such that Al contains arithmetic progressions of arbitrary length. We will not present a proof of Theorem 1.1 here1 , rather remark that when studying questions pertaining to the presence of arithmetic progressions, it transpires that the idea of partitioning the integers into a finite number of sets (or ‘colouring’ the integers, as it is often termed in such combinatorial contexts) is something of a red herring. A more general approach is to consider the ‘density’ of a set of integers. Definition 1. Let Λ ⊆ Z. We define the upper Banach density of Λ to be |Λ ∩ [M, N )| d¯B (Λ) := lim sup N −M N −M →∞ In 1936 Erdös and Turán [6] conjectured the stronger statement that any subset of Z with positive upper Banach density contains arithmetic progressions of arbitrary length. Proving this conjecture was no easy task: indeed, it was not until 1952 that the first non-trivial case, existence of arithmetic progressions of length k = 3, was shown by Roth [20] who used techniques form harmonic analysis. Szemerédi [22] then proved the case k = 4 in 1969 by appling combinatorial methods and finally the general case was proven, again by Szemerd́i [23], in 1975. The result now bears his name. 1 A short proof of van der Waerden’s Theorem can be found in [5, 11]. Alternatively, it follows directly from Szemerédi’s Theorem, that is Theorem 1.2 below. 1 Theorem 1.2 (Szemerédi’s Theorem). Let Λ ⊆ Z be a set of positive upper Banach density. Then Λ contains arithmetic progressions of arbitrary length. Szemerédi’s proof of this theorem is long and difficult and we will not concern ourselves with it here. Instead, we investigate an alternative, more accessible proof given by Furstenberg. Soon after Szemerédi’s proof was published, Furstenberg noticed Theorem 1.2 would follow from a substantial generalisation of the classical Poincaré Recurrence Theorem from ergodic theory. He subsequently proved this generalisation, now known as Furstenberg’s Multiple Recurrence Theorem, and provided a scheme, the Correspondence Principle, for relating problems in combinatorial number theory to problems in ergodic theory [7]. The importance of Furstenberg’s work is arguably two-fold: first of all the Multiple Recurrence Theorem is itself a substantial result in ergodic theory. Secondly, the correspondence principle provides an effective tool for tackling problems in number theory. Some results, such as the multidimensional analogue of Szemerédi’s Theorem [9], have only been proven through the use of the ergodic theoretical methods laid out by Furstenberg. This document presents the proof of Szemerédi’s Theorem as given in Furstenberg, Katznelson and Ornstein’s article [10] assuming the reader has no prior knowledge of ergodic theory (although some familiarity with elementary measure theory and functional analysis). In order to do this we will introduce the necessary definitions and results from ergodic theory and get a flavour of what the discipline entails. We then proceed to state Furstenberg’s Multiple Recurrence Theorem and show how this implies Szemerédi’s Theorem. The bulk of the essay then attempts to explain the proof of the Multiple Recurrence Theorem. 2 Upper Banach and Asymptotic Density We begin by briefly discussing notions of density of a subset of the integers. The results proved here will not be required until Section 4 and they can thus be considered something of a prelude to the mathematics described later in the essay. It is easy to deduce from Definition 1 that if a set Λ ⊆ Z has upper Banach density d¯B (Λ) then there exists a sequence of intervals [an , bn ) with bn − an → ∞ as n → ∞ such that |Λ ∩ [an , bn )| d¯B (Λ) = lim n→∞ bn − an (2.1) If we enforce the stronger condition that (2.1) must hold for [an , bn ) = [1, n + 1) for all n ∈ N then we arrive at the stronger notion of asymptotic density. Definition 2. For Λ ⊆ N define the lower and upper asymptotic densities as d(Λ) = lim inf N →∞ and |Λ ∩ [1, N ]| N |Λ ∩ [1, N ]| ¯ d(Λ) = lim sup N N →∞ ¯ respectively. If d(Λ) = d(Λ) then we define the asymptotic density as d(Λ) = lim N →∞ |Λ ∩ [1, N ]| N ¯ so that d(Λ) = d(Λ) = d(Λ). Obviously any subset Λ ⊂ N of positive upper asymptotic density is automatically of positive upper Banach density. Example 1. 1. Trivially, any finite set F ⊂ N has d(F ) = 0. 2. For any a, k ∈ N the arithmetic sequence Λ = a + kN has density d(Λ) = k1 . 2 An example which is only slightly less trivial, though important, is that of a syndetic set [5]. Definition 3. We say that a subset K ⊆ N is syndetic if there exists some a ∈ N such that a−1 [ K −i=N (2.2) i=0 where K − i = {n ∈ N : n + i ∈ K}. This is equivalent to the property that for every k ∈ N we have {k, k + 1, . . . , k + a − 1} ∩ K 6= ∅, so syndetic therefore are sets with ‘bounded gaps’. It is easy to see a syndetic set K ⊆ N has positive lower density. For suppose a ∈ N satisfies (2.2). Then for each k ∈ N the interval ((k − 1)a, ka] contains at least one element of K. Let N ∈ N with N ≥ a. By writing N = ka + r where k ∈ N and 0 ≤ r < a observe k k 1 |K ∩ [1, N ]| X |K ∩ ((j − 1)a, ja]| |K ∩ (ka, ka + r]| = + ≥ > >0 N N N ka + r 2a j=1 and so d(K) > 0 as required. A significant result concerning asymptotic density is a theorem of Koopman and von Neumann, adapted here from ([5], Chapter 2; [1]), which relates asymptotic density and Cesàro summation of sequences. As we shall see, Cesàro sums play a dominant rôle in ergodic theory. The following lemma effectively tells us that the behaviour of a sequence on terms belonging to a set of zero density is ‘averaged out’ when taking the Cesàro mean. Theorem 2.1. Let (an )n∈N ∈ l∞ (R≥0 ) be a bounded sequence of non-negative real numbers. Then the following are equivalent 1. The Cesàro Mean of (an )n∈N is 0: N 1 X an = 0 N →∞ N n=1 lim 2. There exists some set Z ⊂ N of zero density such that lim a n→∞ n n∈Z / =0 Before presenting the proof of Theorem 2.1 we note the following corollary, which is an immediately consequence. This corollary will be repeatedly exploited in the proceeding discourse. Corollary 2.2. Let (an )n∈N ∈ l∞ (R≥0 ) be a bounded sequence of non-negative real numbers. Then N N 1 X 1 X 2 lim an = 0 if and only if lim an = 0 N →∞ N N →∞ N n=1 n=1 We now turn to the proof of Theorem 2.1. Proof (of Theorem 2.1). We first assume 1. Define for each k ∈ N≥1 the set 1 Zk := j ∈ N : aj ≥ k Note that the Zk form an increasing chain Z1 ⊆ Z2 ⊆ . . . . Furthermore, fixing k ∈ N, |Zk ∩ [1, N ]| k ≤ N N X n∈Zk ∩[1,N ] 3 aj ≤ N k X an N n=1 Thus, by taking the limit a N → ∞ each Zk is a set of zero density. Moreover, there exists an increasing sequence 1 < N1 < N2 <, . . . such that 1 |Zk ∩ [1, N ]| < N k Now define Z= for all N ≥ Nk and k ∈ N≥1 ∞ [ Zk ∩ [Nk−1 , Nk ) k=1 Note that if N ∈ {Nk0 −1 , . . . , Nk0 − 1} for some k0 ∈ N then, as the Zk form an increasing chain, Z ∩ [1, N ] = Zk0 ∩ [1, N ]. Thus |Z ∩ [1, N ]| |Zk0 ∩ [1, N ]| 1 ≤ < N N k0 By taking limits we conclude Z is a set of zero density. Furthermore, if n ∈ N \ Z and n ≥ Nk then by the definition of Z we must have n ∈ / Zk and so an < k1 . Hence it is easy to see that lim a n→∞ n n∈Z / =0 as required. Conversely, suppose 2 holds. Then for any > 0 there exists N1 ∈ N such that for all n ≥ N1 , n ∈ /Z an < 3 Let M > 0 be an upperbound on the sequence (an )n∈N . As Z is a set of zero density there exists N2 ∈ N such that |Z ∩ [1, N ]| < for all N ≥ N2 N 3M 1 −1) < 3 . Then for all N ≥ N0 we have Finally, let N0 > max{N1 , N2 } be such that M (N N0 # " N −1 N N N 1 X X 1 X 1 X an an = an + an + N n=1 N n=1 n=N1 n∈Z ≤ n=N1 n∈Z / N M (N1 − 1) M |Z ∩ [1, N ]| 1 X + + an < N N N n=N1 n∈Z / and we are done. Relating combinatorial concepts (such as asymptotic or upper Banach density) to notions of averaging (such as Cesàro summation or measure) is a major theme of this study and we will return to it in Section 4 when we discuss the connections between ergodic theory and Szemerédi’s theorem. Before then, however, we take some time to introduce the fundamentals of ergodic theory. 3 Elementary Ergodic Theory This section acts as a brief introduction to ergodic theory. It is stressed that this survey is far from comprehensive2 ; indeed it was the author’s intention to minimise the number of prerequisites needed in proving Szemerédi’s Theorem and for the most part we focus only on introducing these prerequisites. The reader is encouraged to consult other sources3 for a broader introduction to this field. 2 3 For instance, one conspicuous omission is Birkhoff’s Pointwise Ergodic Theorem. For example [5, 13, 17, 19, 21]. 4 3.1 Basic Definitions Ergodic theory belongs to the broader discipline of the qualitative study of dynamical systems. A dynamical system is essentially consists of a ‘state space’ X and transformation T : X → X and one wishes to determine to some extent the behaviour of X under iterations of T . In the case of ergodic theory the state space and transformation are endowed with measure theoretic structure; this is natural in the context of analysing the long-term average behaviour of systems [19]. The fundamental objects of study are measure preserving systems. Definition 4. Let (X, B, µ) be a measure space. Suppose T : X → X is measurable and for all A ∈ B we have µ(T −1 A) = µ(A). Then we say T is a measure preserving transformation on (X, B, µ). Furthermore, the pair (X, B, µ, T ) consisting of the space and transformation is called a measure preserving system. We say (X, B, µ, T ) is invertible if the transformation T is invertible with measurable inverse. Remark 1. In some instances it is convenient to change the perspective somewhat: given a measurable space (X, B) and measurable function T : X → X we say a measure µ on (X, B) is T -invariant if T is a measure preserving transformation on (X, B, µ). The study of these objects arose from attempts to model certain physical systems in applied mathematics, particularly statistical mechanics (see, for instance [3]). Since then a rich and varied theory of measure preserving systems has been developed with applications in pure mathematics, not least combinatorial number theory and Furstenberg’s proof of Szemerédi’s Theorem. 3.1.1 Examples of Measure Preserving Systems This discussion will only consider a few simple examples of measure preserving systems. Although practical in the context of this study, our examples do not reflect the true diversity of potential systems and are not representative of ergodic theory as a whole4 . Example 2 (Rotations of the Circle). Consider the circle T := R/Z with the usual Lebesgue measure. Then for any τ ∈ R the rotation map Rτ : T → T Rτ : x 7→ x + τ is clearly measurable with measurable inverse R−τ . We claim this map is measure preserving. Indeed, by a simple application of Dynkin’s Lemma as Λ = {A ∈ B : µ(Rτ−1 (A)) = µ(A)} is a d-system5 we only need check µ(Rτ−1 [a, b)) = µ([a, b)) for all intervals [a, b) ∈ T. But this is obvious. In studying a measure preserving system (X, B, µ, T ) we wish to determine qualitatively how the space (X, B, µ) behaves under repeated applications of the transformation T . If we wish to investigate the behaviour of a single point x ∈ X we define its orbit {T n x : n ∈ N} and study this set. Even in the simple case of circle rotation we can make non-trivial conclusions relating to the qualitative properties of the dynamics. The following result is adapted from ([21], Chapter 3). Proposition 3.1. subset of T. 1. If τ is rational then the orbit of any point x ∈ T under Rτ is a finite 4 There is an abundance of examples of measure preserving systems to be found within the literature. For example, see [19] which give a much better idea of the scope of ergodic theory. 5 Recall, Λ is a d-system on T if: 1. T ∈ Λ 2. For all A, B ∈ Λ such that A ⊆ B we have B \ A ∈ Λ 3. If (An )n∈N ⊆ Λ is an increasing sequence of sets, that is A1 ⊆ A2 ⊆ . . . , then All these properties are easily verified in this case. 5 S n∈N An ∈ Λ 2. If τ is irrational then the orbit of any point x ∈ T under Rτ is a dense subset of T. Proof. Let x ∈ T be given and suppose τ = pq is rational. Then Rτq is the identity map on T and it follows immediately the orbit {T n x : n ∈ N} is finite. Now consider the case τ is irrational. First we show that the points Rτn x and Rτm x are distinct for n 6= m. For suppose m, n ∈ N are such that Rτm x = Rτn x. Then Rτn−m x = x and it follows that x + (n − m)τ = x mod 1 so (n − m)τ = k for some k ∈ Z. Hence, as τ is irrational we must have n = m. Let d : T × T → [0, ∞) denote the metric d(x, y) = min{|x − y|, 1 − |x − y|} It is easy to verify that d is an Rτ -invariant metric and the quotient topology on T induced from R is precisely the metric topology on T with given by d. Furthermore, if d(x, y) < 21 then d(x, y) = |x − y|. Let 0 < < 21 be given. By the Bolzano-Weierstrass property the sequence (Rτn x)n∈N must have some convergent subsequence. In particular, the must be some l, k ∈ N such that |Rτl x − Rτk x| = d(Rτl−k x, x) < . Consider the set {Rτ(l−k)n x : n ∈ N} (l−k)n (l−k)(n+1) We claim this set is -dense in T. Indeed, for consecutive points Rτ x and Rτ have 0 < d(Rτ(l−k)n x, Rτ(l−k)(n+1) x) = d(Rτl−k x, x) < and the claim follows immediately. As 0 < < dense subset of T. 1 2 x we was arbitrary it follows {Rτ x : n ∈ N} is a Example 3 (Bernoulli Shifts). The reader is assumed some familiarity with this archetype probability space and thus we proceed rapidly. For further details see [5]. Consider the finite probability space on [n] = {1, . . . , P n} where the elements are given probabilities p1 , . . . , pn , n respectively where each pi ≥ 0 with i=1 pi = 1. Explicitly, we define a probability measure µ0 on the set [n] by µ0 ({i}) = pi for all i = 1, . . . , n Consider the set X = [n]Z of two-sided sequences in [n], X := (ωl )l∈Z : ωl ∈ [n] for all l ∈ N Let B be the σ-algebra Q generated by the projections πi : ω 7→ ωi for each i ∈ Z and µ the product measure µ = Z µ0 . Then (X, B, µ) is a probability space. Define a cylinder set to be any subset of X containing precisely the sequences which agree on a finite number of fixed entries. Explicitly, A ∈ B is a cylinder set if there exists a finite co-ordinate set I ⊂ Z, and a : I → [n] such that A = (ωl )l∈Z ∈ X : ωi = a(i) for all i ∈ I Note that the measure of A is given by µ(A) = Y pa(i) i∈I If we denote G the collection of all cylinder sets it is not difficult to deduce that G is an algebra6 and the σ-algebra B is generated by G . 6 Recall, a set system G ⊆ P(X) is an algebra if 1. X ∈ G 2. If A ∈ G then Ac ∈ G 3. If A, B ∈ G then A ∪ B ∈ G 6 There is a simple and natural measure preserving transformation on (X, B, µ). Define the left shift transformation T : X → X by T : (ωl )l∈Z 7→ (ωl+1 )l∈Z We claim T is measurable and measure preserving. As in the previous example it suffices only to check that T −1 (A) ∈ B and µ(T −1 A) = µ(A) for all A ∈ G , and both these conditions are obvious. 3.1.2 Measure Preserving Transformations and L2 Throughout this section, unless otherwise stated, we assume (X, B, µ, T ) is an arbitrary measure preserving system. It is often fruitful to study (X, B, µ, T ) by considering the associated Hilbert Space L2 (X, B, µ), exploiting its additional linear structure. We will briefly discuss some basic properties of T in relation to L2 (X, B, µ). Definition 5. The measure preserving transformation T induces a linear map UT : L2 (X, B, µ) → L2 (X, B, µ) UT : f 7→ f ◦ T called the associated operator [5]. We will adopt the convention T f = UT f and hence, depending on the context, T denotes a map on either X or L2 (X, B, µ). Proposition 3.2. Let f ∈ L1 (X, B, µ). Then Z Z T f dµ = f dµ Proof. As T is measure preserving the proposition holds for all characteristic functions. Explicitly, if A ∈ B then Z Z Z T 1A dµ = 1T −1 A dµ = µ(T −1 A) = µ(A) = 1A dµ Thus, by the linearity of the integral the proposition holds for all simple functions φ ∈ L1 (X, B, µ). For any non-negative g ∈ L1 (X, B, µ) there exists an increasing sequence of simple functions (φn )n∈N tending pointwise to g. Note (T φn )n∈N is an increasing sequence of simple functions tending pointwise to T g. By monotone convergence Z Z Z Z φn dµ = g dµ T φn dµ = lim T g dµ = lim n→∞ n→∞ The general case f ∈ L (X, B, µ) is then obtained by writing f = f+ − f− where f+ , f− ∈ L1 (X, B, µ) are non-negative. 1 An immediate implication of Proposition 3.2 is that the associated operator T : L2 (X, B, µ) → L (X, B, µ) is an isometry. Furthermore, if (X, B, µ, T ) is an invertible system then T is unitary. 2 3.2 Poincaré Recurrence Much of this essay is concerned with notions of ‘recurrence’ in measure preserving systems and here we introduce the simplest of these: that of Poincaré Recurrence. Recall, given a system (X, B, µ, T ) and x ∈ X we define the orbit of x ∈ X as the set {T n x : n ∈ N}. Assuming the reader has some familiarity with the basic premise of the the study of dynamical systems, it is natural (or at least convenient for discursive purposes) to imagine x as the ‘initial state’ of a system and the T n x as describing the evolution of the system over discrete units of time. An obvious question is whether an orbit will return to its initial state (or something close to it) at some point in the future. That is, given some ‘neighbourhood’ A of x does there exist some n ∈ N such that T n x ∈ A? In the measure-theoretic context it is natural that our neighbourhood should be a measurable set with µ(A) > 0. The Poincaré Recurrence Theorem states that for any finite7 measure preserving systems (X, B, µ, T ), given a measurable set A ∈ B the orbits 7 We say the system (X, B, µ, T ) is finite if µ(X) < ∞. 7 of almost all the points x ∈ A return to A infinitely many times. The statement and proof of this theorem is adapted from ([5] Chapter 2; [19] Chapter 2). Theorem 3.3 (Poincaré Recurrence Theorem). Let (X, B, µ, T ) be a finite measure preserving system. For any A ∈ B, for almost every x ∈ A there exists a strictly increasing sequence (nk )k∈N ⊆ N such that T nk x ∈ A for all k ∈ N. Proof. Consider the set E := {x ∈ A : T n x ∈ / A for all n ∈ N≥1 }. Alternatively, E= ∞ \ A \ T −n A n=1 Clearly E is measurable. It is easy to see the sets E, T −1 E, T −2 E, . . . are pairwise disjoint and, as T is measure preserving, they all have measure µ(E). Note that ∞ X µ(E) [ ∞ −n = µ T E n=1 n=1 ≤ µ(X) < ∞ Hence µ(E) = 0 and if we define F1 = A\E then µ(F1 ) = µ(A) and for all x ∈ F1 there exists an n ∈ N≥1 such that T n x ∈ A. Note that T 2 , T 3 , . . . define measure preserving transformations on (X, B, µ) and we may repeat the above construction to give F2 , F3 , · · · ⊆ A each with measure µ(A) and every point in Fk returning to A under the transformation T k . Finally, define F := ∞ \ Fk k=1 then µ(F ) = µ(A) and every x ∈ F returns to A under T infinitely many times. We now consider the orbit of an entire measurable set A with µ(A) > 0 rather than just individual points. Corollary 3.4. Let (X, B, µ, T ) be a finite measure preserving system and A ∈ B with µ(A) > 0. Then there exists some n ∈ N such that µ(T −n A ∩ A) > 0 (3.1) Proof. For each j ∈ N≥1 consider the set T −j A ∩ A = {x ∈ A : T j x ∈ A}. Then by the Poincaré Recurrence Theorem [ ∞ µ T −j A ∩ A = µ(A) > 0 j=1 and so there must exist some n ∈ N such that µ(T −n A ∩ A) > 0 as required. The main body of this essay is dedicated to proving a remarkable generalisation of the version of the Poincaré Recurrence Theorem stated in Corollary 3.4. Later we will investigate a much stronger notion of recurrence than that described in (3.1) known as multiple recurrence. The analagous result to Corollary 3.4 for multiple recurrence is precisely what is required in order to prove Szemerédi’s Theorem. This connection will be explained in detail in Section 4. Remark 2. It is clear from the proof of the Poincaré Recurrence Theorem that the measure space in question must be finite8 . As we will be considering a generalisation of this result, we will henceforth assume (tacitly) all measure preserving systems are finite, and in particular are probability systems. 8 It is not difficult to conceive counter-examples in an infinite setting; for instance, consider the shift map x 7→ x+1 on R with Lebesgue measure. 8 3.2.1 Ergodicity A key notion in the study of measure preserving systems is that of ergodicity. Definition 6. Let (X, B, µ, T ) be a measure preserving system. A set A ∈ B is T -invariant if T −1 A = A. The collection of all T -invariant sets forms a σ-algebra, which we denote ET . Definition 7. A measure preserving transformation T on (X, B, µ) is ergodic if the σ-algebra ET contains only null and conull sets. Remark 3. In some instances it is convenient to change the perspective: if (X, B) is a measurable space and T : X → X a measurable function we say that a T -invariant measure µ is an ergodic measure for T if T is ergodic as a measure preserving transformation on (X, B, µ). We characterise ergodicity in two (equivalent) ways: • Immediately from the definition we deduce that ergodicity is an indecomposability condition. If A ∈ B is a T -invariant set with 0 < µ(A) < 1, then so too is B = Ac . For any x ∈ A (resp. B) the orbit of x never leaves A (resp. B) and hence, for all intents and purposes, A and B may be considered9 separate dynamical systems. When T is ergodic every T -invariant set has either measure 0 or measure 1: it is only possible to decompose the system into a portion which is essentially the whole system and a portion which is negligible. • Ergodicity is precisely the condition that for almost all x ∈ X the orbit {T n x : n ∈ N} visits almost all of the space. This will be explained in detail below (Lemma 3.6). In particular, we will show that T is an ergodic transformation if and only if for all A, B ∈ B with µ(A), µ(B) > 0 there exists some n ∈ N such that µ(T −n A ∩ B) > 0 Note that this condition is much stronger than the similar-looking statement of Poincaré Recurrence, given in (3.1). In order to explain these characterisations fully, first we note that the definition of ergodicity is equivalent to a similar statement concerning ‘almost invariant’ rather than invariant sets. The proof of this lemma is adapted from ([5], Chapter 2). Lemma 3.5. A system (X, B, µ, T ) is ergodic if and only if for all A ∈ B such that µ(A4T −1 A) = 0 either µ(A) = 0 or µ(A) = 1. Proof. If the condition holds then it is obvious that the system is ergodic. For the converse, suppose (X, B, µ, T ) is ergodic and A ∈ B satisfies µ(A4T −1 A) = 0. It suffices to show that there exists a T -invariant set B such that µ(B) = µ(A). Define B as the measurable set ∞ [ ∞ \ B= T −n A N =0 n=N S∞ −n First T∞we show that T∞ B is T -invariant. Let BN = n=N T A. Then B0 ⊇ B1 ⊇ B2 ⊇ . . . and so N =0 BN = N =k BN for all k ∈ N. In particular, T −1 B = ∞ [ ∞ \ T −(n+1) A = N =0 n=N ∞ [ ∞ \ T −n A = B N =1 n=N Hence B is T -invariant. To see that µ(B) = µ(A) note that for all N ∈ N, [ ∞ µ(A4BN ) ≤ µ A4T −n A n=N ≤ ∞ X n=N 9 By restricting B, µ and T to these sets. 9 µ(A4T −n A) = 0 where we have used fact that µ(A4T −n A) = 0 for all n ∈ N, which follows immediately from A4T −n A ⊆ n−1 [ T −j A4T −(j+1) A j=0 As µ(A4BN ) = 0 for all N ∈ N and B = as required. S∞ N =0 BN , it follows µ(A4B) = 0 and so µ(A) = µ(B) We are now in a position to elucidate precisely the second characterisation of ergodicity. Proposition 3.6. Let (X, B, µ, T ) be a measure preserving system. The following are equivalent: 1. The system is ergodic. 2. For any A ∈ B with µ(A) > 0 we have [ ∞ −n µ T A =1 (3.2) n=0 3. For any A, B ∈ B with µ(A), µ(B) > 0 there exists some n0 ∈ N such that µ(T −n0 A ∩ B) > 0 (3.3) Remark 4. The behaviour of the orbits of an ergodic system can be understood thus: (3.2) tells us that for any given set A of positive measure, for almost every x ∈ X the orbit of x will at some point pass through A. From this it is easy to deduce (3.3): given another set B of positive measure, at some point the orbits of a significant proportion of the elements of B will pass through A. It is in this sense that ergodicity is the condition that ‘almost all of the orbits visit almost all of the space’. Proof (of Proposition 3.6). (1 ⇒ 2). Let A ∈ B with µ(A) > 0 and consider the set B= ∞ [ T −j A j=0 Then T −1 B ⊆ B and µ(T −1 B) = µ(B) so we must have µ(B4T −1 B) = 0. By Lemma 3.5 and noting A ⊆ B it follows µ(B) = 1. (2 ⇒ 3). Let A, B ∈ B with µ(A), µ(B) > 0 and note ∞ ∞ [ X −j (B ∩ T −j A) T A ≤µ 0 < µ(B) = µ B ∩ j=1 j=1 Thus µ(B ∩ T −n A) > 0 for some n ∈ N. (3 ⇒ 1). Suppose A ∈ B is T -invariant. Then for all j ∈ N we have µ(T −j A ∩ Ac ) = 0 and thus either µ(A) = 0 or µ(Ac ) = 0 as required. Finally, we note an easily deduced equivalent definition or ergodicity, described in terms of invariant functions rather than invariant sets. Definition 8. Let (X, B, µ, T ) be a measure preserving space. We say f ∈ L1 (X, B, µ) is T -invariant if f = T f . Explicitly, f is T -invariant if for almost all x ∈ X we have f (x) = f (T x) Proposition 3.7. Let (X, B, µ, T ) be a measure preserving space. Then T is ergodic if and only if every T -invariant function is constant. 10 Proof. Suppose all T -invariant functions are constant. Given an invariant set A ∈ B it is easy to see that the characteristic function 1A is T -invariant. Hence either 1A (x) = 0 or 1A (x) = 1 almost everywhere and so T is ergodic. Conversely, suppose T is ergodic and f ∈ L1 (X, B, µ) is T -invariant. Let a ∈ R and define the set A(a) := {x ∈ X : f (x) < a}. Then A(a) is measurable and T −1 A(a) = {x ∈ X : T f (x) < a} = {x ∈ X : f (x) < a} = A(a) For each a ∈ R the set A(a) is T -invariant and so by ergodicity µ(A(a)) = 0 or µ(A(a)) = 1. Taking a0 = sup{a ∈ R : µ(A(a)) = 0} it is easy to see f (x) = a0 for almost all x ∈ X. 3.2.2 Examples of Ergodicity In order to prove the ergodicity of certain systems we will utilise the following approximation argument. Lemma 3.8. Let (X, B, µ) be a probability space where B = σ(G ) is generated by an algebra G . Then for any > 0 and any A ∈ B there exists an A1 ∈ G such that µ(A4A1 ) < (3.4) In particular, we may take A1 ∈ G to be such that A ⊆ A1 and A1 = B1 , . . . , Bn ∈ G disjoint. Sn i=1 Bi for some Proof. The proof relies on the notion of outer measure, as usually defined in the proof of Carathéodory’s Extension Theorem (see [14] Chapter 1; [4] Chapter 4). In particular, for A ∈ B define [ UA := F ⊆ G : F is countable; A ⊆ F F ∈F Then the outer measure of µ∗ (A) of A is the value X µ∗ (A) := inf F ∈UA µ(F ) F ∈F It is well known that the outer measure and measure of A ∈ B coincide. Hence given > 0 there exists a sequence (Bi )i∈N ⊆ A such that µ(A) ≥ ∞ X µ(Bi ) − i=1 2 (3.5) As the left hand side of this inequality is finite, the sum on the right hand side must converge. Let n ∈ N be such that ∞ X µ(Bi ) < (3.6) 2 i=n+1 and define A1 = n [ Bi ∈ A A2 = i=1 ∞ [ Bi ∈ B i=n+1 Note that without loss of generality we may assume the B1 , . . . , Bn are disjoint. To see the choice of A1 satisfies (3.4), note that A4A1 ⊆ (A1 \ A) ∪ (A \ (A1 ∪ A2 )) ∪ A2 ⊆ (A1 ∪ A2 )4A ∪ A2 Hence, using the bounds stated in (3.5) and (3.6) we have µ(A4A1 ) ≤ µ((A1 ∪ A2 )4A) + µ(A2 ) < as required. 11 We now apply Lemma 3.8 to our two examples of measure preserving systems to see they are both ergodic. The proof of the next proposition is adapted from ([21], Chapter 3). Proposition 3.9. Irrational rotations of the circle (c.f. Example 2) are ergodic. Proof. We use the characterisation (3) of ergodicity from Proposition 3.6. Let A, B ⊆ T be measurable subsets with µ(A), µ(B) > 0. Claim. There exist intervals I = [a, b), J = [c, d) ⊆ T such that µ(A ∩ I) > 3 µ(I) 4 µ(B ∩ J) > 3 µ(J) 4 (3.7) The collection of finite unions of (disjoint) intervals of the form [x, y) ⊆ T is an algebra which we denote G . Note that G generates the Borel σ-algebra on T. Thus, bySLemma 3.8 there exists N a sequence of disjoint intervals I1 , . . . , IN such that, by denoting A1 = i=1 Ii we have A ⊆ A1 and µ(A1 \ A) < 1 µ(A) 7 Now, µ(A ∩ A1 ) = µ(A1 ) − µ(A1 \ A) > 6 µ(A) 7 (3.8) On the other hand, µ(A1 ) = µ(A1 \ A) + µ(A) < 8 µ(A) 7 (3.9) We claim that for some i0 ∈ {1, . . . , N } we have µ(A ∩ Ii0 ) > 43 µ(Ii0 ). Once this is shown by setting I = Ii0 and repeating the argument for the set B to obtain J the claim is proven. Suppose not; then for all i ∈ {1, . . . , N } we have µ(A ∩ Ii ) ≤ 3 µ(Ii ) 4 Thus, by applying (3.9) we have µ(A ∩ A1 ) ≤ 43 µ(A1 ) < 67 µ(A). But this contradicts (3.8). Now we have the claim, the result follows easily. Let I = [a, b) and J = [c, d) satisfy (3.7). Without loss of generality µ(I) ≥ µ(J). Note that, we may also assume 12 µ(I) ≤ µ(J). Indeed, otherwise we can partition I into two intervals I1 = a, a+b and I2 = a+b 2 2 , b . At least one half Ii must be such that µ(Ii ∩ A) > 34 µ(Ii ) and so we rename the interval Ii as I and repeat the halving process until 12 µ(I) ≤ µ(J) . By Proposition 3.1, the orbit of d ∈ T is dense in T and so there exists some n ∈ N such that b+a < T nd < b 2 It follows that µ(T −n I ∩ J) ≥ 12 µ(I). Finally µ(T −n A ∩ B) ≥ µ(T −n I ∩ J) − µ(I \ A) − µ(J \ B) 1 1 1 > µ(I) − µ(I) − µ(J) > 0 2 4 4 as required. The proof of the next proposition is adapted from ([5], Chapter 2). Proposition 3.10. Bernoulli Shifts (c.f. Example 3) are ergodic. 12 Proof. Let (X, B, µ, T ) denote the Bernoulli system where X = [n]Z . As in the previous proposition, we show that the characterisation (3) of ergodicity from Proposition 3.6 holds for this system. Moreover, we show the stronger property that for all A, B ∈ B we have (3.10) lim µ(T −n A ∩ B) − µ(A)µ(B) = 0 n→∞ Thus, considering the case µ(A), µ(B) > 0, ergodicity follows directly. Let A, B ∈ B and > 0 be given. By Lemma 3.8 there exists cylinder sets A1 , B1 such that A ⊆ A1 , B ⊆ B1 and µ(A1 4A) < and µ(B1 4B) < 4 4 Let IA1 , IB1 ⊂ Z be the finite co-ordinate sets for A1 and B1 respectively and a : IA1 → [n] and b : IB1 → [n] be such that A1 = (ωl )l∈Z ∈ X : ωi = a(i) for all i ∈ IA1 B1 = (ωl )l∈Z ∈ X : ωi = b(j) for all j ∈ IB1 Choose any n > maxi∈IA1 |i|+maxj∈IB1 |j| so that the co-ordinate sets IT −n A1 = {i−n : i ∈ IA1 } and IB1 are disjoint. Then T −n A1 = {(ωl )l∈Z ∈ X : ωi = a(i + n) for all i ∈ IT −n A1 is a cylinder set and it follows that the sets T −n A1 and B1 are independent. Thus µ(T −n A1 ∩ B1 ) = µ(A1 )µ(B1 ) (3.11) Also note that µ(T −n A ∩ B) ≤ µ(T −n A1 ∩ B1 ) ≤ µ(T −n A ∩ B) + µ(A1 \ A) + µ(B1 \ B) (3.12) Hence, by applying the triangle inequality and (3.11) and (3.12) we have µ(T −n A ∩ B) − µ(A)µ(B) ≤ µ(T −n A1 ∩ B1 ) − µ(T −n A ∩ B) + µ(A1 )|µ(B1 ) − µ(B)| + |µ(A1 ) − µ(A)|µ(B) ≤ 2µ(A1 \ A) + 2µ(B1 \ B) < as required. Remark 5. The property of the Bernoulli system described in (3.10) is known as mixing. This is an important concept and later in this discourse we will spend some time investigating the notion of mixing in its own right. 3.3 The Mean Ergodic Theorem For our purposes, the Mean Ergodic Theorem will provide an invaluable tool in the study of measure preserving systems: in particular it gives a useful equivalent definition of ergodicity (see Corollaries 3.12 and 3.13) and also serves as a partial motivation for the study of mixing in Section 6. Having said that, the result is certainly of interest in its own right and it would be negligent to omit some explanation of the intuition behind it. Given a (finite) measure preserving system (X, B, µ, T ) and function f ∈ L2 (X, B, µ) there are two canonical definitions of the average value of f . The first is the ‘space’ average f¯, whereby we take the average value of f over all possible states: Z 1 ¯ f := f dµ (3.13) µ(X) 13 Of course, if (X, B, µ) is a probability space this is merely the expected value of f . The second notion of averaging is the ‘time’ average. The time average (provided it exists) is a function fˆ ∈ L1 (X, B, µ) where fˆ(x) is defined almost everywhere as the average value of f over the orbit {T n x : n ∈ N}: N −1 1 X n T f (3.14) fˆ := lim N →∞ N n=0 An important question originally motivated by statistical mechanics [21] which we will now partially examine asks under what conditions the two averages agree. Perhaps it is reasonable to hypothesise that if the orbits of the system penetrated almost every facet of the state space, as in the case of T ergodic, then (3.13) and (3.14) are equal. The classical Ergodic Theorems answer questions relating to the existence of (3.14) and detail precisely the relationship between the values (3.13) and (3.14). As the name suggests, the Mean Ergodic Theorem shows convergence PN −1 of N1 n=0 T n f in the L2 (X, B, µ) sense. In particular, if the system is ergodic we have −1 1 NX lim T n f − f¯ 2 = 0 N →∞ N L (µ) n=0 providing an answer to our question. The statement and proof of von Neumann’s Theorem are adapted from ([17], Chapter 2; [21], Chapter 5). Theorem 3.11 (von Neumann’s Mean Ergodic Theorem). Suppose (X, B, µ, T ) is a measure preserving system on a probability space and let πI be the orthogonal projection of L2 (X, B, µ) onto the closed linear subspace I of T -invariant L2 (X, B, µ) functions I := g ∈ L2 (X, B, µ) : T g = g Then for any fixed f ∈ L2 (X, B, µ) −1 1 NX lim T n f − πI (f ) =0 N →∞ N L2 (µ) n=0 Proof. Consider the set J := {T g − g : g ∈ L2 (X, B, µ)}. We claim that J ⊥ = I. Indeed, given f ∈ J ⊥ we have hf, T gi = hf, gi for all g ∈ L2 (X, B, µ) Hence, T ∗ f = f where T ∗ denotes the adjoint of T . Note that kf − T f k2L2 (µ) = kf k2L2 (µ) − hf, T f i − hT f, f i + kT f k2L2 (µ) = 2kf k2L2 (µ) − hT ∗ f, f i − hf, T ∗ f i = 0 and so f = T f as required. Conversely, if f ∈ I then f = T f and thus for all g ∈ L2 (X, B, µ) hf, T g − gi = hT f, T gi − hf, gi = 0 Hence I = J ⊥ as claimed. Therefore, by orthogonal decomposition we have L2 (µ) = I ⊕ clos(J) and so if we fix f ∈ L2 (µ) we may write f − πI (f ) = h where h ∈ clos(J) To conclude the proof we show that N −1 1 X n T h lim N →∞ N n=0 14 L2 (µ) =0 Indeed, given > 0, as h ∈ clos(J) there exists h0 ∈ J such that kh−h0 kL2 (µ) < 2 . Furthermore, there exists g0 ∈ L2 (X, B, µ) such that h0 = T g0 − g0 and we have NX 1 −1 n T (T g − g ) 0 0 N = L2 (µ) n=0 2 1 T N g0 − g0 2 ≤ 2kg0 kL (µ) L (µ) N N and thus there exists N0 ∈ N such that NX 1 −1 n T h0 N n=0 ≤ L2 (µ) 2 for all N ≥ N0 Finally, it follows that for any N ≥ N0 we have N −1 1 X n T h N n=0 L2 (µ) N −1 N −1 1 X n 1 X n ≤ T (h − h0 ) L2 (µ) + T h0 2 < N n=0 N n=0 L (µ) Remark 6. Considering the case (X, B, µ, T ) is an ergodic system, for any f ∈ L2 (X, B, µ) the function πI (f ) as definedRabove is T -invariant and thus, by Proposition 3.7 constant. It is then easy to deduce πI (f ) = f dµ, as we will see in the proof of the following corollary. A corollary to the Mean Ergodic Theorem is the following equivalent definition of ergodicity. Corollary 3.12. A measure preserving system (X, B, µ, T ) is ergodic if and only if for every pair f, g ∈ L2 (X, B, µ) the following holds −1 1 NX hT n f, gi − hf, 1ih1, gi = 0 lim N →∞ N n=0 (3.15) Proof. Suppose (X, B, µ, T ) is ergodic and fix f ∈ L2 (X, B, µ). By the Mean Ergodic Theorem −1 1 NX lim T n f − πI (f ) =0 N →∞ N L2 (µ) n=0 Where πI (f ) is some T -invariant function. By ergocity πI (f ) = c1 for some constant c. As norm convergence implies weak convergence, it follows that for all g ∈ L2 (X, B, µ) −1 1 NX lim hT n f, gi − ch1, gi = 0 N →∞ N n=0 In particular, taking g = 1 and noting hT n f, 1i = hT n f, T n 1i = hf, 1i gives c = hf, 1i. Conversely, suppose (3.15) holds and f ∈ L2 (X, B, µ) is T -invariant. Then for all g ∈ 2 L (X, B, µ) we have −1 1 NX lim hT n f, gi − hf, 1ih1, gi = 0 N →∞ N n=0 On the other hand, for all N ∈ N, −1 1 NX hT n f, gi − hf, 1ih1, gi = hf − hf, 1i1, gi N n=0 Thus hf − hf, 1i, gi = 0 for all g ∈ L2 (X, B, µ) and so by elementary properties of the inner product we must have f ≡ hf, 1i1 is constant. 15 We can weaken the statement of Corollary 3.12. Corollary 3.13. A measure preserving system (X, B, µ, T ) is ergodic if and only there exists a dense subset H ⊆ L2 (X, B, µ) such that for every pair φ, ψ ∈ H the following holds −1 1 NX lim hT n φ, ψi − hφ, 1ih1, ψi = 0 N →∞ N n=0 (3.16) Proof. By Corollary 3.12 it suffices to show that if (3.16) holds for all φ, ψ ∈ H, then it holds for all f, g ∈ L2 (X, B, µ). Let > 0 be given and f, g ∈ L2 (X, B, µ) with f, g 6= 0. Then there exist φ, ψ ∈ H such that kg − ψk2 < 4kf k2 kf − φk2 < 8kψk2 Now, for each N ∈ N NX 1 −1 n hT f, ψi − hf, 1ih1, ψi N n=1 ≤ N −1 1 X n hT (f − φ), ψi − hf − φ, 1ih1, ψi (3.17) N n=1 N −1 1 X n hT φ, ψi − hφ, 1ih1, ψi + N n=1 Note that as (X, B, µ) is a probability space, |hf − φ, 1i| ≤ kf − φk2 . Hence, by the triangle and Cauchy-Schwarz inequalities, n hT (f − φ), ψi − hf − φ, 1ih1, ψi ≤ 2kf − φk2 kψk2 < (3.18) 4 By assumption (3.16) holds for φ, ψ ∈ H and thus by combining (3.17) and (3.18) there exists some N0 ∈ N such that −1 1 NX hT n f, ψi − hf, 1ih1, ψi < N n=0 2 for all N > N0 (3.19) By what amounts to the same procedure, this time estimating n hT f, gi − hf, 1ih1, gi ≤ hT n f, g − ψi − hf, 1ih1, g − ψi + hT n f, ψi − hf, 1ih1, ψi and then applying (3.19), we conclude −1 1 NX hT n f, gi − hf, 1ih1, gi < N n=0 for all N > N1 for some N1 ∈ N, as required. 3.4 Borel Probability Spaces We conclude this lengthy introductory section with a brief discussion of Borel probability spaces and some of their interesting properties. Definition 9. Suppose (X, B, µ) is a measure space where X is a compact metric space, B is the Borel σ-algebra generated by the metric topology on X and µ a (Borel) probability measure. Then we say (X, B, µ) is a Borel probability space. 16 Example 4. Note that all our previous examples of measure preserving systems are defined on Borel probability spaces. In particular, the state space X = [n]Z in the Bernoulli system (X, B, µ, T ) is a compact space by Tychonoff’s Theorem. Define the δ metric on [n] by 0 if a = b δ(a, b) = for all a, b ∈ [n] 1 otherwise Then let d : X × X → [0, 1] be given by d(α, β) = 1 X δ(αk , βk ) 3 2|k| for all α = (αl )l∈Z , β = (βl )l∈Z ∈ X k∈Z Then d is a metric on X and it is easy to see that the product topology on X corresponds precisely to the metric topology induced by d. There are numerous advantages in considering measure preserving systems on Borel probability space; they essentially constitute the class of ‘well behaved’ systems. Here we will highlight some of their favourable properties. We begin with a basic result from topology which will be used frequently throughout this study. Proposition 3.14. Let (X, d) be a compact metric space. Then the normed space (C(X), k·k∞ ) of real-valued continuous functions on X with uniform norm is separable. Proof. The rather elegant proof described here is adapted from ([5] Appendix B) and uses the Stone Weierstrass Theorem (see [2]). Any compact metric space is separable so there exists some dense countable set (xn )n∈N ⊂ X. Define the function fn : X → R by fn (x) = d(xn , x) for each n ∈ N. Each of these functions is continuous and by the density of (xn )n∈N , the fn separate points. If F 0 denotes the algebra generated by (fn )n∈N then it follows that F 0 is dense in C(X) by the Stone-Weierstrass Theorem. Let F denote the algebra generated by (fn )n∈N over Q (that is F is the smallest set containing (fn )n∈N which is closed under both (finite) Q-linear combinations and multiplication). Then F is countable and dense in F 0 . Thus F is a countable dense subset of C(X). Definition 10. Let X be a compact metric space and B the Borel σ-algebra on X. Denote by M (X) the set of Borel probability measures on (X, B). Proposition 3.15. If (X, B, µ) is a Borel probability space then there exists a natural choice of topology on the set M (X) such that M (X) is a compact, metrizable topological space. Proof. The details of the proof are sketched; for a more thorough discussion of the properties of M (X) see [18, 5]. By the Riesz Representation Theorem [18], the dual space C(X)∗ is identified with the space M of finite signed measures on (X, B). Explicitly, for all θ ∈ C(X)∗ there exists a µθ ∈ M such that Z θ(f ) = f dµθ for all f ∈ C(X) Each measure can be viewed as a linear functional and thus it makes sense to define the weak*topology R on M. This is the weakest topology such that for every f ∈ C(X), the linear operator λ 7→ f dλ evaluating each functional at f is continuous. The set M (X) of Borel probability measures is then given the subspace topology induced by the weak*-topology on M. We claim that the topological space M (X) is compact and metrizable. The proof that the space is metrizable follows by explicitly defining a metric on M (X). By the previous proposition (C(X), k · k∞ ) is separable and so there exists some dense sequence (fn )n∈N ⊂ C(X). Define a metric dM on M (X) by R R ∞ X 1 | fn dλ − fn dκ| R R dM (λ, κ) = 2n 1 + | fn dλ − fn dκ| n=1 17 The topology induced by the metric dM then correpsonds precisely to the weak*-topology. Here we omit the details (for a full proof see [5], Appendix B). To demonstrate compactness, recall the famous Alaoglu’s Theorem [2] states that the closed unit ball of the dual of a normed space is weak*-compact. Write Z Z M (X) = λ ∈ M : 1X dλ = 1, f dλ ≥ 0 for all f ∈ C(X), f ≥ 0 It is then easy to see M (X) is a subset of the closed unit ball of M ∼ = C(X)∗ . Furthermore, if (λn )n∈N ⊆ M (X) is a sequence of measures which converge to λ ∈ M in the weak*-topology then Z Z f dλn → f dλ for all f ∈ C(X) Considering the case f ≥ 0 and f = 1X it follows λ ∈ M (X). Therefore M (X) is a closed subspace of a compact space and so is itself compact. The topological properties of C(X) and M (X) are of great benefit when studying measure preserving systems defined on Borel probability space. For this reason we will for the most part concern ourselves only with the theory of this specific class of systems. Exploiting their additional structure is crutial in our proof of Szemerédi’s Theorem. As an example of this, the following subsection describes the powerful Ergodic Decomposition Theorem which is a direct consequence of the geometric and topological structure of M (X). The Ergodic Decomposition Theorem is important to us as it allows us to relate our study of ergodic systems to more general classes of measure preserving systems. Further, in Section 9 we use the properties of C(X) and M (X) to introduce the notions of conditional expectation and measure. These have a very significant rôle in our study. 3.5 Ergodic Decomposition So far many of our major results are applicable only to ergodic systems. We shall briefly describe how the study of ergodic systems can be related to the study of systems in more generality. As explained above, ergodicity is an indecomposability (or irreducibility) condition. An obvious question is whether it is possible to decompose an arbitrary system into irreducible components, akin to decomposing a positive integer into prime factors. A remarkable fact, and one of the major strengths of ergodic theory, is that for the class of systems defined on Borel probability space this is indeed possible. What we have described here is the content of the Ergodic Decomposition Theorem, which we will state (but refrain from proving) in this section. We present a statement of the Ergodic Decomposition Theorem, adapted from [13]. Theorem 3.16 (Ergodic Decomposition Theorem). Suppose (X, B, µ, T ) is a measure preserving system on a Borel probability space. Then there exists an almost everywhere defined function x 7→ µx from X to the space MT (X) ⊆ M (X) of T -invariant Borel probability measures on (X, B) such that 1. µx is ergodic for almost every x ∈ X. 2. For any f ∈ L1 (X, B, µ) we have f ∈ L1 (X, B, µx ) for almost every x ∈ X and Z Z Z f dµ = f dµx dµ(x) Proof. The details of the proof are omitted. See ([5] Chapter 4, [13] Chapter 2) for a more precise picture. The idea is to show that the space MT (X) is a compact, convex space and, moreover, that the ergodic measures are precisely the extreme points of MT (X). The decomposition theorem then follows by applying the Krein-Milman Theorem (see [2]) to show that any measure µ ∈ MT (X) is a convex combination of ergodic measures. 18 Remark 7. Later in the discourse the idea of conditional measures is introduced. The construction of conditional measures generalises the Ergodic Decomposition Theorem and the latter result can be obtained from this construction (although it does not follow directly). Essentially, Ergodic Decomposition tells us that any property which is preserved under integration and holds for ergodic systems, holds for systems in general. As we shall see later in our study, this is an important observation: it implies ergodicity is a far more reasonable condition to assume than it may first appear. 4 Furstenberg’s Correspondence Principle In this section we will establish the connection between ergodic theory and Szemerédi’s Theorem. In particular we will show that a ‘generalised Poincaré Recurrence Theorem’ implies Szemerédi’s Theorem. In fact, the two results are equivalent but for the sake of brevity we will omit the proof of the converse (although it is not too difficult to establish). Theorem 4.1. Let (X, B, µ) be a probability space and T a measure preserving transformation on (X, B, µ). Then for any A ∈ B with µ(A) > 0 and any k ∈ N there exists an n ∈ N such that k−1 \ µ T −jn A > 0 (4.1) j=0 Remark 8. Note that the case k = 2 is given by the Poincaré Recurrence Theorem, specifically Corollary 3.4. Proposition 4.2. Theorem 4.1 implies Szemerédi’s Theorem. Remark 9. Before presenting a proof of Proposition 4.1 it is remarked that the connection described here is one instance of a more general scheme laid out by Furstenberg [7, 8] relating problems in combinatorial number theory to recurrence in ergodic theory. The idea, as explained in [7], is to exploit the similarities between asymptotic density in sets of integers and measure in a probability space. In particular, the shift map S : Z → Z given by S : x 7→ x + 1 preserves the upper Banach density of sets and thus is analogous to a measure preserving transformation. In short, in order to prove Proposition 4.2, following the argument given in [5, 10], we will take a set Λ ⊂ Z of positive upper Banach density and use the aforementioned analogies to construct a measure preserving system (X, B, µ, T ), the (hypothesised) dynamics of which reflect the structure of Λ. Proof (of Proposition 4.2). Assuming Theorem 4.1, let Λ ⊆ Z be a subset of the integers with positive upper Banach density and fix some k ∈ N. We will show Λ contains an arithmetic progression of length k. First consider the measure space (X0 , B0 ) of Bernoulli trials on two letters. Explicitly, let X0 := {0, 1}Z be the space of all two-sided sequences in 0s and 1s given the product topology induced from the discrete topology on {0, 1} and B0 the corresponding Borel σ-algebra. Equivalently, B0 is the σ-algebra generated by the collection of projections πi : ω 7→ ωi for all i ∈ Z. Let T0 : X0 → X0 be the left shift transform on X0 , that is for all (ωl )l∈Z ∈ X0 T0 (ωl )l∈Z = (ωl+1 )l∈Z For each i ∈ Z let Ai denote the cylinder set Ai := ω ∈ X0 : ωi = 1 Let ωΛ ∈ X0 be the characteristic function of Λ. Now for any a ∈ Z and b ∈ N consider the length k arithmetic progression {a + jb}k−1 j=0 which starts at a and has step size b. Note that k−1 \ −(a+jb) T0 A0 j=0 = k−1 \ j=0 19 Aa+jb and it follows that {a + jb}k−1 j=0 ⊆ Λ ⇐⇒ ωΛ ∈ k−1 \ −(a+jb) T0 A0 j=0 We are interested only in showing there exists an arithmetic progression in Λ of length k. In particular, the values of the starting point a and step size b are immaterial. Removing these conditions, it suffices to show ωΛ ∈ ∞ ∞ k−1 [ [ \ −(a+jb) T0 A0 a=−∞ b=1 j=0 or, equivalently that there exists some a ∈ Z such that ∞ k−1 [ \ T0a ωΛ ∈ T0−jb A0 (4.2) b=1 j=0 Prompted by (4.2) we restrict our attention to the (two-sided) orbit of ωΛ under T0 . Define X = clos{T a ωΛ : a ∈ Z} with the subspace topology induced from X0 and let T be the restriction of T0 to X0 . Consider the open set A = X ∩ A0 . To see Λ contains an arithmetic progression of length k it suffices to show ∞ k−1 [ \ T −jb A 6= ∅ b=1 j=0 Tk−1 Indeed, if ω ∈ j=0 T −jb A for some b ≥ 1 then, as this set is open and {T a ωΛ : a ∈ Z} is dense in X, there must be some a ∈ Z for which (4.2) holds and we are done. We claim, using the assumption Λ has positive upper Banach density, that we can construct a T -invariant measure µ on X with the property µ(A) > 0. Then by Theorem 4.1 there exists some b ∈ N such that k−1 \ −jb T A >0 µ j=0 and we are done. We begin by constructing a sequence of ‘almost T -invariant’ measures on X. By assumption |Λ ∩ [M, N )| ¯ >0 d(Λ) = lim sup N −M N −M →∞ Moreover, there exists a sequence of intervals [an , bn ) with bn − an → ∞ such that lim n→∞ |Λ ∩ [an , bn )| ¯ = d(Λ) >0 bn − an Define the measure µn = (4.3) bX n −1 1 δ j bn − an j=a T ωΛ n where δx denotes the delta measure δx (ω) = 1 if ω = x 0 otherwise Then clearly each µn is a probability measure. Moreover, it is easy to see µn (A) = |Λ ∩ [an , bn )| bn − an 20 (4.4) Futhermore, the µn are ‘almost T -invariant’ in the sense that for all B ∈ B 1 |µn (B) − µn (T −1 B)| = (δT an ωΛ − δT bn ωΛ ) bn − a n 2 ≤ bn − an so that limn→∞ |µn (B) − µn (T −1 B)| = 0. In Proposition 3.15 we established that M (X) with the weak*-topology is a compact, metrizable space. Hence there must exist a weak*-subsequential limit µ of (µn )n∈N . Clearly µ must be a Borel probability measure on X. By comparing (4.3) and (4.4) we conclude µ(A) > 0. Finally, to see that µ is T -invariant, note by the definition of the weak*-limit for every f ∈ C(X) we have Z Z lim T f − f dµnl = T f − f dµ l→∞ for some subsequence (µnl )l∈N . As the µnl are almost T -invariant we conclude Z Z T f dµ = f dµ Given any U ⊆ X open, there exists an increasing sequence of continuous functions fn → 1U pointwise. Then (T fn )n∈N is an increasing sequence of continuous functions such that T fn → 1T −1 U pointwise. Hence by the Monotone Convergence Theorem µ(T −1 U ) = µ(U ). But the open sets form an algebra which generates B, so µ is T -invariant. 5 Examples of Multiple Recurrence In the previous section we deduced that in order to prove Szemerédi’s Theorem it suffices to show Theorem 4.1. Thus the remainder of this essay will concentrate on proving Theorem 4.110 . We follow the argument described in Furstenberg, Katznelson and Ornstein’s article [10] and first prove the result separately for the familiar examples of Bernoulli shifts and circle rotations. We will see that Theorem 4.1 holds in both cases, but for strikingly different reasons. 5.1 Multiple Recurrence for the Bernoulli and Circle Rotation Systems Proposition 5.1. Let (X, B, µ, T ) be a Bernoulli system and A0 , . . . , Ak−1 ∈ B. Then k−1 k−1 \ Y lim µ T −jn Aj − µ(Aj ) = 0 n→∞ j=0 (5.1) j=0 Moreover, it follows that if A ∈ B with µ(A) > 0 then k−1 \ lim µ T −jn A = µ(A)k−1 > 0 n→∞ j=0 Hence Theorem 4.1 holds for Bernoulli systems. Proof. In the proof of Proposition 3.10, where we showed Bernoulli systems are ergodic, we demonstrated the case k = 2. It is easy to see that the general case can be obtained by a completely analogous method. First suppose A0 , . . . , Ak−1 ∈ B are all cylinder sets. Take 10 In fact we will prove the stronger Furstenberg Multiple Recurrence Theorem, which is introduced later in this section. 21 n ∈ N so large that the sets defining the co-ordinates for the T j Aj are all disjoint. Then the T j Aj are independent sets and k−1 k−1 \ Y µ T −jn Aj = µ(Aj ) j=0 j=0 The general case where A0 , . . . , Ak−1 ∈ B are arbitrary measurable sets is now obtained by using approximating sequences of cylinder sets, as in Proposition 3.10. The property of Bernoulli systems described in (5.1) is called ‘k-order mixing’. Here the asymptotic independence of the sets T jn Aj can be thought of as indicative of ‘random-like’ behaviour within the system. We will discuss this phenomenon in far more detail in the forthcoming section, but for now it is good to acknowledge that in this case Theorem 4.1 holds here due to the random-like behaviour of the system. Proposition 5.2. Let (X, B, µ, T ) denote the system corresponding to rotations of the circle by τ . Then given any A ∈ B with µ(A) > 0 and any k ∈ N k−1 N \ 1 X −jn µ lim inf T A >0 N →∞ N n=1 j=0 Proof. Fix A ∈ B with µ(A) > 0 and k ∈ N. If τ is rational then there exists some q ∈ Z such that T −q A = A and the result follows immediately. For the case τ irrational, by Lemma 3.8 there exists a finite sequence of (non-empty) disjoint SN intervals I1 , . . . , IN such that, writing H = j=1 Ij , we have A ⊆ H and µ(H \ A) < µ(A) 4k . Now suppose y ∈ T with |y| < δ where µ(A) µ(Ij ) : j = 0, . . . , N ∪ δ = min k−1 4(k − 1) Then k−1 \ µ(A) H − jy ≥ µ H ∩ (H − (k − 1)y) > µ(H) − µ 4 j=0 Now for each 0 ≤ j ≤ k − 1 we have A − jy ⊆ H − jy. Note that µ(A ∩ (A − y)) ≥ µ(H ∩ (H − y)) − µ(H \ A) − µ((H − y) \ (A − y)) By a simple induction k−1 k−1 k−1 \ \ X H − jy − µ((H − jy) \ (A − jy)) A − jy ≥ µ µ j=0 j=0 l=0 Furthermore, µ (H − ly) \ (A − ly) = µ(H \ A) < k−1 \ µ A − jy µ(A) 4k for each 0 ≤ j ≤ k − 1 and so > k−1 \ µ(A) µ H − jy − 4 j=0 ≥ µ(A) >0 2 j=0 To finish we claim that the set Λ = {n ∈ N : τ n ∈ (−δ, δ)} has positive lower asymptotic density, that is |Λ ∩ [1, N ]| lim inf >0 N →∞ N 22 It then follows that lim inf N →∞ k−1 N [ 1 X µ(A) |Λ ∩ [1, N ]| µ T −jn A ≥ lim inf >0 N →∞ N n=1 2 N j=0 To prove the claim we show Λ is syndetic: that there exists some a ∈ N such that {k, k + 1, . . . , k + a − 1} ∩ Λ 6= ∅ for all k ∈ N. As τ is irrational, by Proposition 3.1 the orbit of 0 is dense in T. Thus there exists some a ∈ N such that the set {0, τ, . . . , (a − 1)τ } is 2δ-dense in T. Moreover, for any k ∈ N the translated set {kτ, (k + 1)τ, . . . , (k + a − 1)τ } (5.2) is 2δ-dense in T. It follows that for every k ∈ N, the set (5.2) of a consecutive points of the orbit of 0 contains at least one element in (−δ, δ) and Λ is syndetic. The reason why Theorem 4.1 holds in the case of rotations is transparent: the rotations do not move nearby points apart. This is a markedly different situation than that of the Bernoulli shifts; here we have an example structured, as opposed to random-like behaviour. 5.2 Furstenberg’s Multiple Recurrence Theorem In both Proposition 5.1 and 5.2 in order to verify Theorem 4.1 held in the respective cases we in fact demonstrated a stronger result. This suggests that it is more natural to attempt to prove a stronger statement. Theorem 5.3 (Furstenberg’s Multiple Recurrence Theorem). Let (X, B, µ, T ) be a measure preserving system and A ∈ B with µ(A) > 0. Then for any k ∈ N lim inf N →∞ k−1 N \ 1 X T −jn A > 0 µ N n=1 j=0 The method by which we prove Theorem 5.3 is rather complicated; before we begin it is prudent to briefly sketch the first few steps of the argument. The proceeding section is occupied with determining the property of the Bernoulli system which accounts for the Multiple Recurrence Theorem holding in this instance. It transpires that Bernoulli systems belong a larger class of ‘weak mixing’ systems. In Section 6 we prove multiple recurrence holds for all weak mixing systems owing to their ‘random-like’ behaviour. On the other hand, rotations of the circle exemplifies what is known as a ‘compact system’. Compact, in contrast to weak mixing systems, exhibit ‘structured’ behaviour. In Section 7 we show multiple recurrence holds for all compact systems owing to their structured behaviour. The notions of weak mixing and compactness are mutually exclusive and in terms of randomness and structure they represent two extremes in behaviour. To show that the Multiple Recurrence Theorem holds for these two cases is a significant step toward the full proof: at the very least, as noted in [5], the fact that the result should hold for these two opposite classes of systems could be interpreted as an indication that we are dealing with a more general phenomenon. It is important to bear in mind that there are many examples of systems which are neither weak mixing nor compact. Importantly, however, we will see that there is a dichotomy between weak mixing and compact systems. Specifically, if a system is not weak mixing then it must contain some compact ‘subsystem’. The ‘subsystems’ in question are correctly known as factors. Thus the combined result of studying weak mixing and compact systems and demonstrating this dichotomy will be the following: Given any measure preserving system, the Multiple Recurrence Theorem holds for some (non-trivial) factor. This is the first significant step toward the full proof. 23 5.3 Reducing the Problem Throughout the remainder of this discourse we will make a number of reasonable assumptions concerning the measure preserving systems we consider. The reader is reminded that all our measure preserving systems are defined on probability space: it is necessary that the systems are finite for multiple recurrence to hold. In addition we will (tacitly) assume that all measure preserving systems we consider are both invertible and defined on Borel probability space. By careful consideration of the proof of the correspondence result Proposition 4.2, to deduce Szemerédi’s Theorem it suffices only to show the Multiple Recurrence Theorem for this class of systems. Moreover, the general case follows fairly easily once Multiple Recurrence has been established for such systems, but here we omit the proof of this fact (for the details, see [5, 8]). We claim that without loss of generality we may only consider ergodic systems. This is an application of the Ergodic Decomposition Theorem. Proposition 5.4. Suppose multiple recurrence holds for all invertible, ergodic systems defined on Borel probability space. Then it holds for any invertible system (X, B, µ, T ) defined on a Borel probability space. Proof. Consider (X, B, µ, T ) where (X, B, µ) is a Borel probability space. By Ergodic Decomposition there exists some X 0 ∈ B with µ(X 0 ) = 1 and a collection of probability measures {µx : x ∈ X 0 } on (X, B) such that each µx is an ergodic measure for T . Moreover, for all f ∈ L1 (X, B, µ) we have f ∈ L1 (X, B, µx ) for almost all x and Z Z Z f dµ = f dµx dµ(x) In particular, given A ∈ B with µ(A) > 0 and k ∈ N we have Z µx (A) dµ(x) = µ(A) > 0 and so µ({x ∈ X : µx (A) > 0}) > 0. By Fatou’s lemma, k−1 N \ 1 X −jn T A = µ lim inf N →∞ N n=1 j=0 Z lim inf N →∞ Z = lim inf N →∞ k−1 N \ 1 X −jn µx T A dµ(x) N n=1 j=0 k−1 N \ 1 X µx T −jn A dµ(x) > 0 N n=1 j=0 where we have applied the assumption that the Multiple Recurrence Theorem holds for ergodic systems. 6 Weak Mixing Systems As discussed earlier, the multiple recurrence result holds for Bernoulli system owing to the ‘random-like’ behaviour the system exhibits. In this section we discuss and define precisely what is meant by ‘random-like’ behaviour, and conclude that in general the multiple recurrence result holds for the general class of systems with this property. Definition 11. A measure preserving system (X, B, µ, T ) is mixing if for all A, B ∈ B the following holds: lim µ(A ∩ T −n B) = µ(A)µ(B) (6.1) n→∞ 24 It is natural to think of a mixing system as one which exhibits random-like behaviour. To see why, assume µ(B) > 0 and write (6.1) as µ(A ∩ T −n B) lim − µ(A) = 0 −n n→∞ µ(T B) −n B) where we have used the fact T is measure preserving. Then µ(A∩T is the conditional µ(T −n B) n probability that x ∈ A given T (x) ∈ B; namely, the probability of the event A happening now, given that B will occur n intervals of time into the future. Hence a mixing system is characterised by the property that for any pair of events A and B, the events along the orbit of B asymptotically bear no relation to A. Remark 10. Whilst we are primarily concerned with studying mixing with regard to multiple recurrence, it is worthwhile to note that the consideration of mixing systems naturally emanates from the notion of ergodicity and the Mean Ergodic Theorem. As discussed in ([5] Chapter 2; [16]), by the Mean Ergodic Theorem (Corollary 3.12) a measure preserving system (X, B, µ, T ) is ergodic if and only if for all f, g ∈ L2 (X, B, µ) we have X Z Z Z 1 N n f T g dµ − f dµ g dµ = 0 lim N →∞ N n=1 By considering characteristic functions this is equivalent to the property that given any A, B ∈ B −1 1 NX lim µ(A ∩ T −n B) − µ(A)µ(B) = 0 N →∞ N n=0 This convergence may take place for a number of reasons, one of which being that the sets A and T −n B are asymptotically independent, motivating the notion of mixing. It transpires that the mixing condition is rather strong. A weaker and more malleable notion is that of weak-mixing, and it is with weak-mixing systems we will concern ourselves. Definition 12. A measure preserving system (X, B, µ, T ) is weak mixing if for all A, B ∈ B the following holds: N 2 1 X lim µ(A ∩ T −n B) − µ(A)µ(B) = 0 (6.2) N →∞ N n=1 Note that, trivially, weak-mixing systems are ergodic. Lemma 6.1. A weak mixing system (X, B, µ, T ) is ergodic. Proof. Suppose B ∈ B is T -invariant. Then by the weak mixing property N 2 1 X µ(B ∩ T −n B) − µ(B)2 = 0 N →∞ N n=1 lim For all n ∈ N we have T −n B = B and so it follows µ(B) = µ(B)2 . Hence either B or B c is a null set. We can define weak mixing in terms of functions. Lemma 6.2. A system (X, B, µ, T ) is weak mixing if and only if for all f, g ∈ L2 (X, B, µ) 2 Z Z N Z 1 X n lim f T g dµ − f dµ g dµ = 0 N →∞ N n=1 25 (6.3) Proof. Clearly by taking f = 1A and g = 1B in (6.3) gives (6.2). Conversely, by assuming weak mixing we assume (6.3) holds for all characteristic functions. These functions span the subspace of simple functions, dense in L2 (X, B, µ). In general if (6.3) holds for all f, g ∈ S where S is a set linearly spanning a dense subspace V , then (6.3) holds for all f, g ∈ L2 (X, B, µ). This fact follows a simple approximation argument, similar to that used in Corollary 3.13. There are many equivalent formulations of the weak-mixing property11 which can been seen as an indication that the concept is a natural one. Presently we introduce some of the formulations which are necessary to our study. Proposition 6.3. A system (X, B, µ, T ) is weak mixing if and only if the product system (X × X, B ⊗ B, µ ⊗ µ, T ⊗ T ) is ergodic. Proof. Suppose (X, B, µ, T ) is weak mixing. By Lemma 6.1 it suffices to show the product system is weak mixing. Furthermore, by the proof of Lemma 6.2, it suffices to show that if f1 , f2 , g1 , g2 ∈ L2 (X, B, µ) then the weak mixing property (6.3) holds for f1 ⊗ f2 and g1 ⊗ g2 ∈ L2 (X × X) as functions of this type span a dense subspace of L2 (X × X). Note that Z Z Z n n f1 ⊗ f2 (T × T ) g1 ⊗ g2 d(µ ⊗ µ) = f1 T g1 dµ f2 T n g2 dµ Z Z Z f1 ⊗ f2 d(µ ⊗ µ) = f1 dµ f2 dµ Z Z Z g1 ⊗ g2 d(µ ⊗ µ) = g1 dµ g2 dµ Thus the result is easily deduced through an application of the triangle inequality. Conversely, suppose the product system is ergodic; it is easy to see that (X, B, µ, T ) must also be ergodic. For suppose B ∈ B is a T -invariant set. Then B ×B ∈ B ⊗B is T ×T -invariant and hence by ergodicity µ(B)2 = µ ⊗ µ(B × B) = 0 or 1. Thus B is null or has null complement. Let f, g ∈ L2 (X, B, µ). By the Mean Ergodic Theorem, Z Z N Z 1 X n f T g dµ − f dµ g dµ = 0 lim N →∞ N (6.4) n=1 As the product system is ergodic we can carry out exactly the same proceedure on (X × X, B ⊗ B, µ ⊗ µ, T ⊗ T ) and the functions g ⊗ g and f ⊗ f to give X Z 2 Z 2 Z 2 1 N lim f T n g dµ − f dµ g dµ = 0 N →∞ N (6.5) n=1 We claim (6.4) and (6.5) together imply the weak mixing property, namely 2 Z Z N Z 1 X n lim f T g dµ − f dµ g dµ = 0 N →∞ N n=1 Let an = properties R f T n g dµ and b = R R f dµ g dµ. Then the sequence (an )n∈N ⊆ R has the N 1 X an = b N →∞ N n−1 lim It follows that 11 (6.6) and N 1 X 2 an = b2 N →∞ N n−1 lim N N N 1 X 1 X 2 2b X (an − b)2 = an − an + b2 N n=1 N n=1 N n=1 For lists and proofs of equivalences see [1, 5, 8]. 26 Taking the limit as N → ∞ we have N 1 X (an − b)2 = 0 N →∞ N n=1 lim which is precisely (6.6), as required. Definition 13. A measure preserving system (X, B, µ, T ) is weak mixing of all orders if for any k ∈ N, given a collection A0 , . . . , Ak ∈ B of measurable sets Y 2 N \ k k 1 X −jn µ lim T Aj − µ(Aj ) = 0 N →∞ N n=1 j=0 j=0 (6.7) We can instantly deduce multiple recurrence holds for any system which is weak mixing of all orders: for any A ∈ B of positive measure and any k ∈ N let A0 = A1 = · · · = Ak = A and apply (6.7) to obtain \ N k 1 X lim µ T −jn A = µ(A)k+1 > 0 N →∞ N n=1 j=0 Presently we demonstrate that the ostensibly stronger condition of weak mixing of all orders is in fact equivalent to the weak mixing condition. Thus we will have shown that for any weak mixing system multiple recurrence holds. Remark 11. Analogous to weak mixing of all orders, there are several possible definitions of mixing of all orders. Once such, given in ([10], §3), is that a system (X, B, µ, T ) is mixing of all orders if for any collection A0 , . . . , Ak ∈ B we have lim nj →∞ |nj −ni |→∞ \ Y k k −nj T Aj = µ µ(Aj ) j=0 j=0 Acertaining whether mixing implies mixing of all orders is a major unsolved problem in Ergodic theory12 . For example it is not even known [10] whether mixing implies lim µ A0 ∩ T −n A1 ∩ T −2n A2 = µ(A0 )µ(A1 )µ(A2 ) n→∞ Theorem 6.4. Suppose (X, B, µ, T ) is a weak mixing system. Then it is weak mixing of all orders. Assuming (X, B, µ, T ) is weak mixing, we will in fact show that for any finite collection f0 , . . . , fk ∈ L∞ (X, B, µ) the following holds: 2 N Z Y k k Z Y 1 X ln T fl dµ − fl dµ = 0 lim N →∞ N n=1 l=0 (6.8) l=0 Theorem 6.4 then follows by applying this result to characteristic functions. To prove (6.8) we use induction on k ∈ N. As the system is weak mixing, applying Lemma 6.2 gives the case k = 1. In order to prove the inductive step we will need to introduce the van der Corput trick, the statement and proof of which are adapted from ([5] Chapter 7; [1]). Lemma 6.5 (Van der Corput Trick). Let (un )n∈N be a sequence in some Hilbert Space (H , h·, ·i). Define a sequence (sh )h∈N0 ⊂ H by N 1 X hun+h , un i sh = lim sup N →∞ N n=1 12 For a brief discussion of this problem see [12]. 27 If H−1 1 X sh = 0 H→∞ H lim (6.9) h=0 then N 1 X u lim n = 0 N →∞ N n=1 PN PN PH−1 1 Proof. First note that for fixed H, the sums N1 n=1 un and HN n=1 h=0 un+h are arbitrarily close for large N . Indeed, the sums overlap: they agree except on the first and last H terms. N H−1 1 XX un+h HN n=1 = h=0 = 1 NH 1 N X N un + n=1 N X n=H+1 N +1 X un + · · · + N +H−1 X n=2 un n=H+1 H 1 X n(un + uN +H−n ) un + N H n=1 As H is fixed and the sequence (un )n∈N is bounded, the difference between the two sums is averaged out as N → ∞. Explicitly, if M > 0 is such that kun k ≤ M for all n ∈ N it follows N H−1 N X 1 X 1 X M (2H + 1) un+h − un = o(1) HN ≤ N N n=1 n=1 (6.10) h=0 Now it suffices to consider the asymptotic behaviour of the norm of the double average. This is preferable as one may estimate the limit using terms of (sh )h∈N . By the triangle inequality 2 N H−1 X 1 X lim sup u n+h HN n=1 N →∞ ≤ lim sup N →∞ h=0 = lim sup N →∞ ≤ 1 H2 2 N H−1 X 1 X 1 u n+h N n=1 H h=0 1 N H−1 X h,h0 =0 N X H−1 1 X hun+h , un+h0 i H2 0 n=1 h,h =0 N 1 X lim sup hun+h , un+h0 i N →∞ N n=1 Notice that if h0 ≥ h then N 1 X lim sup hun+h , un+h0 i N →∞ N n=1 = +h 1 NX lim sup hun , un+h0 −h i N →∞ N n=1+h = N 1 X lim sup hun , un+h0 −h i = sh0 −h N →∞ N n=1 By a symmetric argument if h0 ≥ h then the left hand side is equal to sh−h0 . In short N 1 X hun+h , un+h0 i = s|h−h0 | lim sup N →∞ N n=1 Thus 2 N H−1 H−1 X 1 X 1 X lim sup un+h ≤ s|h−h0 | HN n=1 H2 N →∞ 0 h=0 h,h =0 28 PH−1 Finally we split the sum H12 h,h0 =0 s|h−h0 | into three parts and use the hypothesis (6.9) to estimate. Let > 0 be given. Choose H0 ∈ N such that H−1 1 X sh < H 4 for all H ≥ H0 . (6.11) H−H H−1 1 X0 1 X sh0 −h H H 0 (6.12) h=0 For any H ≥ H0 we may write H−1 1 X s|h−h0 | H2 0 = h=0 h,h =0 + h =h H−H 1 X0 1 H H h=0 + 1 H2 H−1 X sh−h0 (6.13) s|h0 −h| (6.14) h=h0 +1 H−1 X h0 ,h=H−H0 +1 Applying (6.11) we deduce that for all H0 ≤ h, ≤ H H−1 1 X sh0 −h < H 0 4 h =h so that the sum in (6.12) is bounded above by 4 . Similarly the sum in (6.13) is also bounded above by 4 . As the sh are bounded we may fix H ≥ H0 large so that the third sum (6.14) is bounded above by 4 . Now if we choose N0 ∈ N so large that M (2H+1) < 4 then by (6.10) we N0 conclude N 1 X un for all N ≥ N0 N < n=1 As the choice of > 0 was arbitrary, the result is established. The Van der Corput trick is the central ingredient in the inductive step in the proof of Theorem 6.4 which we now turn to. Here we follow the method outlined in ([10], §3). Proof (of Theorem 6.4). We shall show that for all f0 , . . . , fk ∈ L∞ (X, B, µ) the following two statements hold: 2 k Z N Z Y k Y 1 X fl dµ = 0 (6.15) lim T ln fl dµ − N →∞ N n=1 l=0 l=0 and X k Z k Y 1 N Y ln fl dµ T fl − lim N →∞ N n=1 l=1 l=1 =0 (6.16) L2 (µ) Here we use an inductive argument. Recall that (6.15) case k = 1 holds by the hypothesis that the system is weak mixing. We show that for k ≥ 2 • Claim (1): (6.15) case k − 1 ⇒ (6.16) case k • Claim (2): (6.16) case k ⇒ (6.15) case k and conclude that both results hold for all k ∈ N. Claim (1). Suppose every weak mixing system (X, B, µ, T ) has the property that for all g0 , . . . , gk−1 ∈ L∞ (X, B, µ) the following holds: 2 N Z k−1 k−1 Y YZ 1 X jn lim T gj dµ − gj dµ = 0 N →∞ N n=1 j=0 j=0 29 (6.17) Then for a fixed weak mixing system (X, B, µ, T ), given f1 , . . . , fk ∈ L∞ (X, B, µ) we have X k k Z Y 1 N Y ln lim T fl − fl dµ N →∞ N n=1 l=1 =0 L2 (µ) l=1 Proof. First consider the case that for some j0 ∈ 1, . . . , k the function fj0 has zero expectation: R fj0 dµ = 0. In this case we are required to prove N Y k 1 X ln lim T fl N →∞ N =0 (6.18) L2 (µ) n=1 l=1 To do this we use the van der Corput trick: let (un )n∈N ⊂ L∞ (X, B, µ) be the sequence defined by k Y un = T ln fl l=1 Note that (6.18) can then be written as N 1 X u lim n N →∞ N =0 (6.19) L2 (µ) n=1 and by van der Corput, to prove (6.19) it suffices to show H−1 1 X sh = 0 H→∞ H lim (6.20) h=0 where N 1 X sh = lim sup hun+h , un i N →∞ N n=1 Now, using the fact T is measure preserving, we have hun+h , un i = Z Y k T l(n+h) fl l=1 = Z Y k k Y T ln fl dµ l=1 T ln (fl T lh fl )dµ l=1 Z = Tn k−1 Y T ln (fl+1 T (l+1)h fl+1 )dµ l=0 = Z k−1 Y T ln (fl+1 T (l+1)h fl+1 )dµ l=0 For each h ∈ N and each l = 0, . . . , k − 1 define gh,l = fl+1 T (l+1)h fl+1 ∈ L∞ (X, B, µ). By applying the hypothesis (6.17) of the claim to the functions gh,l for a fixed h, sh N 1 X hun+h , un i lim sup N →∞ N n=1 N Z k−1 Y 1 X = lim sup gh,l dµ N →∞ N n=1 l=0 k−1 YZ = gh,l dµ = l=0 30 Now we show that lim sh = 0 h→∞,h∈Z / (6.21) where Z ⊂ N is a set of zero density and hence, recalling Theorem 2.1, establish (6.20). Indeed, by the weak mixing property applied to the function fj0 , Z H Z 2 2 1 X h =0 lim fj0 T fj0 dµ − fj0 dµ H→∞ H h=1 Recall by hypothesis R fj0 dµ = 0. Summing over a subsequence gives 2 H Z 1 X j0 h fj0 T fj0 dµ = 0 lim H→∞ H | {z } h=1 gh,j0 −1 Once again recalling Theorem 2.1, there exists some set Z ⊂ N of zero density such that Z gh,j0 −1 dµ = 0 lim h→∞,h∈Z / Noting that kgh,l k∞ ≤ maxj=1,...,k kfj k∞ is bounded uniformly in both h and l, k−1 YZ lim h→∞,h∈Z / gh,l dµ = 0 l=0 Thus we have (6.21) and consequently (6.18). The general case reduces to the above case by noting the following lemma: Lemma 6.6. Let (al )kl=1 , (bl )kl=1 ⊂ R be finite sequences of real numbers of length k. Then k Y al − l=1 k Y bl = l=1 k j−1 X Y j=1 al (aj − bj ) l=1 Y k bl l=j+1 Proof. We induct on k. The case k = 1 is trivial and for c1 , c2 , d1 , d2 ∈ R we have c1 c2 − d1 d2 = (c1 − d1 )d2 + c1 (c2 − d2 ) (6.22) Assume the result holds for the case k − 1 and let (al )kl=1 , (bl )kl=1 ⊂ R be finite sequences of real numbers of length k. Then # Y " k−1 k−1 k k j−1 k−1 X Y X j−1 Y Y Y al (aj − bj ) bl = al (aj − bj ) bl bk + al (ak − bk ) j=1 l=1 j=1 l=j+1 l=1 l=j+1 l=1 Qk−1 By applying the induction hypothesis and setting c1 = l=1 al , c2 = ak and d1 = d2 = bk we reduce the problem to the case k = 2 and thus we are done by (6.22). Qk−1 l=1 bl , Given f1 , . . . , fk ∈ L∞ (X, B, µ) with arbitrary expectation, pointwise application of the above lemma gives: k Y l=1 ln T fl − k Z Y l=1 fl dµ = k j−1 X Y j=1 ln T fl T jn l=1 Z fj − fj dµ Y Z k l=j+1 Now for each j = 1, . . . , k define  for  fl R fRj − fj dµ gj,l :=  fl dµ 31 l = 1, . . . , j − 1 l=j l = j + 1, . . . , k fl dµ (6.23) The right hand side of (6.23) then becomes k Y k X T ln gj,l j=1 l=1 Clearly for each j = 1, . . . , k the function gj,j has zero expectation and so we may apply the previous case to each set of functions gj,1 , . . . , gj,k ∈ L∞ (X, B, µ). Moreover, X X k k Y k Z k Y 1 N X 1 N Y ln ln T f − = f dµ T g l l j,l N N 2 L (µ) L2 (µ) n=1 n=1 j=1 l=1 l=1 l=1 ≤ k X N Y k X j=1 n=1 l=1 1 N T ln gj,l L2 (µ) And so the first claim is established. The second claim is somewhat easier to deduce. Claim (2). Suppose that every weak mixing system (X, B, µ, T ) has the property that for all f1 , . . . , fk ∈ L∞ (X, B, µ) the following holds: N Y k k Z Y 1 X ln (6.24) T fl − fl dµ lim 2 =0 N →∞ N L (µ) n=1 l=1 l=1 Then for a fixed weak mixing system (X, B, µ, T ), given g0 , . . . , gk ∈ L∞ (X, B, µ) we have 2 N Z Y k k Z Y 1 X ln lim T fl dµ − fl dµ = 0 N →∞ N n=1 l=0 l=0 Indeed, given g0 , . . . , gk ∈ L∞ (X, B, µ), by the hypothesis (6.24) of the claim N Y k k Z Y 1 X ln T g − g dµ lim l l 2 =0 N →∞ N L (µ) n=1 l=1 l=1 Now Rnorm convergence implies weak convergence and so by considering the linear functional f 7→ f g0 dµ on L∞ (X, B, µ) we have X Z Y k Z k Y 1 N ln lim gl dµ = 0 T gl dµ − N →∞ N n=1 l=0 l=0 In the proof of Proposition 6.3 we demonstrated that the product system (X × X, B ⊗ B, µ ⊗ µ, T ⊗ T ) is also weak mixing and hence we may applying the foregoing reasoning to this system and the functions gl ⊗ gl for l = 0, . . . , k. The result of this is that X 2 Z Y 2 Y k Z k 1 N ln gl dµ = 0 T gl dµ − lim N →∞ N n=1 l=0 l=0 By the same argument R Qprecisely Qk R as used in the end of Proposition 6.3, taking this time an = k ln T g dµ and b = l l=0 l=0 gl dµ we obtain the desired result. Thus we have obtained our first general result toward the proof of the Multiple Recurrence Theorem. Theorem 6.7. The Multiple Recurrence Theorem holds for any weak mixing system. More precisely, if (X, B, µ, T ) is a weak mixing system, A ∈ B a set of positive measure and k ∈ N then \ N k 1 X lim µ T −jn A = µ(A)k+1 > 0 N →∞ N n=1 j=0 32 7 Compact Systems We now turn our attention to the class of systems which exhibit ordered behaviour, exemplified by rotations of the circle. As in the previous section, we will prove the Multiple Recurrence Theorem holds for this particular class of systems. Thus we will have shown the result holds in the two extreme cases: when the system behaves randomly and when the behaviour of the system is structured. The work in this section is adapted from ([10], §4). Definition 14. Let (X, B, µ, T ) be a measure preserving system and f ∈ L2 (X, B, µ). We say f is almost periodic (AP) if the orbit of f under T is relatively compact. In other words clos{T n f : n ∈ N} is a compact set in the norm topology. A subset of a complete metric space is relative compact if and only if it is totally bounded. Hence a function f is AP if and only if for every > 0 there exists some finite set g1 , . . . , gm ∈ L2 (X, B, µ) such that min kT n f − gj kL2 (µ) < for all n ∈ N 1≤j≤m Definition 15. A system (X, B, µ, T ) is compact if every f ∈ L2 (X, B, µ) is almost periodic. Proposition 7.1. Let (X, B, µ, T ) be a measure preserving system and f ∈ L2 (X, B, µ) an AP function. Then for every > 0 there exists a syndetic set K ⊆ N such that kT n f − f kL2 (µ) < for all n ∈ K Proof. Let > 0 be given. Then as clos{T n f : n ∈ N0 } is compact there exists a (finite) maximum −seperated set, and by density we may assume this set has the form {T t1 f, . . . , T tr f } for some ti ∈ N numbered so that t1 < t2 < · · · < tr . For each n ∈ N consider the set {T n+t1 f, . . . , T n+tr f } (7.1) Note that as T is measure preserving, for any pair 1 ≤ i < j ≤ r kT n+ti f − T n+tj f kL2 (µ) = kT n (T ti f − T tj f )kL2 (µ) = kT ti f − T tj f kL2 (µ) ≥ Thus, for each n ∈ N the set (7.1) is also a maximum −seperated set. It follow for each n ∈ N there exists 1 ≤ i(n) ≤ n such that kT n+ti(n) f − f kL2 (µ) < as otherwise one may add f = T 0 f to the set (7.1) to obtain a larger −seperated set, contradicting maximality of (7.1). S1+t −t Finally take K := {n + ti(n) : n ∈ N}. It is easy to see N = a=1r 1 K − a and so K is indeed syndetic. Theorem 7.2. Let (X, B, µ, T ) be a compact measure preserving system, f ∈ L∞ (X, B, µ, T ) such that f ≥ 0 and f is not almost everywhere 0. Then for all k ∈ N N Z k 1 X Y ln lim inf T f dµ > 0 N →∞ N n=1 l=1 33 (7.2) Proof. By scaling we may assume without loss of generality 0 ≤ f ≤ 1. To prove the theorem it suffices to show there exists a set K ⊆ N of positive lower density and constant a > 0 such that for all n ∈ N Z Y k T ln f dµ ≥ a (7.3) l=1 For once this is established it then follows that N Z k 1 1 X Y ln T f dµ ≥ N n=1 N l=1 ≥ a Z Y k X n∈F ∩[1,N ] T ln f dµ l=1 |F ∩ [1, N ]| N and as d(F ) > 0, taking the limit infimum gives (7.2). The proof proceeds in two simple stages: • We show Proposition 7.1 implies given > 0 there exists a syndetic set K ⊆ N such that for all n ∈ K and for all l = 0, . . . , k one has kT ln f − f kL2 (µ) < . • Given n ∈ K we demonstrate as the T Rln f are −close in norm to f , it follows that R Qk that ln the difference between f k+1 dµ is small. As the latter intergral is l=0 T f dµ and strictly positive, for appropriate choice of so too is the former. By the proposition, for any > 0 there exists a syndetic set K ⊆ N such that for all n ∈ K kT n f − f kL2 (µ) < k Now, for all n ∈ K and all l = 0, . . . , k, by the triangle inequality and T measure preserving kT ln f − f kL2 (µ) ≤ l X kT (l−j+1)n f − T (l−j)n f kL2 (µ) j=1 = l X kT n f − f kL2 (µ) < j=1 For the second stage, note that for each n ∈ K Z k Z Z k j−1 Y Y X T ln f dµ − f k+1 dµ = T ln f (T jn f − f )f k−j dµ j=0 l=0 l=0 k Z j−1 X Y ≤ j=0 k Z X ≤ T ln f |T jn f − f |f k−j dµ l=0 |T jn f − f | dµ j=0 Where we have used our assumption 0 ≤ f ≤ 1. By the Cauchy Schwarz inequality f | dµ ≤ kT jn f − f kL2 (µ) < . Hence Z Y k T ln f dµ − Z f k+1 dµ < (k + 1) R |T jn f − (7.4) l=0 1 So far the choice of has been arbitrary. Now if we take 0 < < k+1 we conclude Z Y Z k T ln f dµ ≥ f k+1 dµ − (k + 1) > 0 l=0 Thus setting a = R f k+1 dµ − (k + 1) we have (7.3) as required. 34 R f k+1 dµ, then by (7.4) Theorem 7.3. Multiple recurrence holds for compact systems. Let (X, B, µ, T ) be a compact measure preserving system, A ∈ B a set of positive measure and k ∈ N. Then lim inf N →∞ \ N k 1 X µ T −ln A > 0 N n=1 l=0 Proof. Simply apply Theorem 7.2 to the characteristic function 1A . 8 The Dichotomy Between Weak Mixing and Compact Systems The purpose of the previous sections has been to prove the multiple recurrence result for certain types of measure preserving systems, namely weak mixing and compact systems. Whilst these classes are large enough to include important and interesting examples, by no means do they classify all measure preserving systems. In order to arrive at a comprehensive multiple recurrence result we shall have to delve deeper and use a more subtle approach. Recalling the intuitive notion of weak mixing systems exhibiting random and compact systems exhibiting structured behaviour, it is reasonable to suppose if a measure preserving system fails to behave completely randomly then some significant ‘subsystem’ must behave in a structured manner. Definition 16. Let (X, B, µ, T ) be a measure preserving system and suppose A ⊆ B is a sub-σ-algebra which is T -invariant: i.e. T −1 A ⊆ A . Then (X, A , µ, T ) is a measure preserving system and we say this system is a factor of (X, B, µ, T ). If A only contains null and conull sets, we say the factor (X, A , µ, T ) is trivial. Factors of measure preserving systems will henceforth dominate our study. Rather than attempt demonstrate multiple recurrence for the system as a whole, we shall concentrate on demonstrating it for factors. Definition 17. Suppose (X, A , µ, T ) is a factor of a system (X, B, µ, T ) and that for all A ∈ A with µ(A) > 0 and all k ∈ N we have lim inf N →∞ \ k N 1 X T −jn A = 0 µ N n=1 j=0 then we say the action of T on (X, A , µ) is Szemerédi (SZ). The bulk of this section will be occupied with proving the following Theorem: Theorem 8.1. A system (X, B, µ, T ) is not weak mixing if and only if there exists some nontrivial compact factor. An intuitive interpretation of the content of this theorem might be: ‘a system exhibits completely random behaviour if and only if no non-trivial factor exhibits structure’. An important consequence of this is the following: Corollary 8.2. Let (X, B, µ, T ) be a measure preserving system. Then there exists a non-trivial factor (X, A , µ, T ) for which the action of T on (X, A , µ) is SZ. Proof. Assuming Theorem 8.1 the result follows directly from the multiple recurrence results for weak mixing and compact systems, as explained by the following flow chart: 35 Is the system weak mixing? No By Theorem 8.1 there exists a non-trivial compact factor (X, A , µ, T ) Yes By Theorem 6.7 the action of T on (X, B, µ) is SZ By Theorem 7.3 the action of T on (X, A , µ) is SZ In order to prove Theorem 8.1 we use the following characterisation of compact factors: Proposition 8.3. Let (X, B, µ, T ) be a measure preserving system. Then there exists a nontrivial compact factor if and only if exists some non-constant AP function. Proof. Assume that every AP function f ∈ L2 (X, B, µ) is constant and (X, A , µ, T ) is some compact factor of (X, B, µ, T ). Then every g ∈ L2 (X, A , µ) is an AP function and thus also AP in the sense of L2 (X, B, µ). Thus the space L2 (X, A , µ) is precisely the set of constant functions on X and therefore A must only contain sets of measure 1 and their complements. For the converse, assume there exists some non-constant F ∈ L2 (X, B, µ) which is AP. Let A denote the σ-algebra generated by {T n F : n ∈ N}. It is easy to check A is a T -invariant sub-σ-algebra. Indeed, if A ∈ A then A = (T n F )−1 (B) for some Borel set B ⊆ R and T −1 A = T −1 {x ∈ X : T n F (x) ∈ B} = {x ∈ X : T n+1 F (x) ∈ B} = (T n+1 F )−1 (B) ∈ A So T −1 A ⊆ A as required. It remains to show the factor is compact: that every L2 (X, A , µ) function is AP. To begin we show the subset V ⊆ L∞ (X, A , µ) of bounded AP functions is a closed vector subspace. Suppose f, g ∈ V and a, b ∈ R \ {0} and let > 0 be given. Then there exist functions f1 , . . . , fl and g1 , . . . , gm ∈ L2 (X, A , µ) such that min kT n f − fi kL2 (µ) < min kT n g − gj kL2 (µ) < 1≤i≤l 1≤j≤m 2a 2b for all n ∈ N for all n ∈ N Applying the triangle inequality gives min kT n (af + bg) − (afi + bgj )k < 1≤i≤l 1≤j≤m for all n ∈ N Hence the function af + bg ∈ V and V is a vector space. By a method similar to the proof that V is a vector space, we can show that the limit in L2 (X, A , µ) of any convergent sequence of AP functions is AP and hence the space V is topologically closed. To complete the proof we claim that for every A ∈ A the function 1A is AP. It then follows by linearity that all simple functions are AP and thus, by the density of the space of simple functions, V = L2 (X, A , µ) as required. 36 Consider the case A = F −1 (a, ∞) for some a ∈ R. It is easy to show that for any pair f, g ∈ V the functions min(f, g), max(f, g) ∈ V . In particular the function G = max(F − a, 0) and functions Hn = min(1, nG) defined for all n ∈ N are AP. Now Z k1A − Hn kL2 (µ) = |1 − nG|2 dµ ≤ µ({x ∈ A : nG(x) < 1} {x∈A:nG(x)<1} and thus lim k1A − Hn kL2 (µ) = 0 n→∞ Hence 1A is the L2 (X, A , µ)-limit of AP functions and is therefore AP. As the collection of functions of the form F −1 (a, ∞) together with {∅, X} ia a generating algebra G for B, the general case A ∈ B follows by approximating A by a sequence of sets (An )n∈N ⊂ G . Applying Proposition 8.3 we restate Theorem 8.1 as the following: Proposition 8.4. A system (X, B, µ, T ) is not weak mixing if and only if there exists a nonconstant A.P. function f ∈ L2 (X, B, µ). Proof. Recall by Proposition 3.7 a system (X, B, µ, T ) is ergodic if and only all T -invariant functions are constant. Hence if (X, B, µ, T ) is not ergodic the result is trivial: simply take f ∈ L1 (X, B, µ) a non-constant T −invariant function. Clearly f is A.P. as the closure of the orbit clos{T n f : n ∈ N} is just the (compact) singleton {f }. Otherwise we consider (X, B, µ, T ) an ergodic system which is not weak mixing. In this case, by Proposition 6.3 the product system X × X is not ergodic. It follows that there must exist some non-constant T × T −invariant function H ∈ L1 (X × X). Consider the convolution operator H∗ : L2 (X, B, µ) → L2 (X, B, µ) given by Z H ∗ φ (x) = H(x, x0 )φ(x0 ) dµ(x0 ) for all φ ∈ L2 (X, B, µ) (8.1) We will show that every function in the image of this operator is AP. It will provide a wide selection of AP functions from which it is straightforward to find one which is non-constant. It is an elementary fact from linear analysis that (8.1) is a (Hilbert-Schmidt and therefore) compact operator (see [2, 24]). Thus for any R ≥ 0, the set clos{H ∗ θ : kθkL2 (µ) ≤ R} is compact in L2 (X, B, µ). We claim the following: T n (H ∗ φ) = H ∗ T n φ This is easily deduced using the fact T Z T n (H ∗ φ) (x) = Z = Z = for all φ ∈ L2 (X, B, µ) and all n ∈ N is measure preserving and H is T × T −invariant: H(T n x, x0 )φ(x0 ) dµ(x0 ) H(T n x, T n x0 )φ(T n x0 ) dµ(x0 ) H(x, x0 )φ(T n x0 ) dµ(x0 ) = H ∗ T n φ (x) Now fix φ ∈ L2 (X, B, µ), let R = kφkL2 (µ) and note that kT n φkL2 (µ) = R for all n ∈ N. It follows that clos{T n H ∗ φ : n ∈ N} = clos{H ∗ T n φ : n ∈ N} ⊂ clos{H ∗ θ : kθk ≤ R} Thus clos{T n H ∗ φ : n ∈ N} is a closed subset of a compact set, and hence compact. Moreover for each φ ∈ L2 (X, B, µ) the function H ∗ φ ∈ L2 (X, B, µ) is AP. 37 It remains to find a function φ̃ ∈ L2 (X, B, µ) such that H ∗ φ̃ is non-constant. Consider the function H1 ∈ L1 (X, B, µ) given by Z H1 (x0 ) = H(x, x0 ) dµ(x) It is easy to see H1 is T -invariant: 0 H1 (T x ) Z H(x, T x0 ) dµ(x) Z H(T x, T x0 ) dµ(x) Z H(x, x0 ) dµ(x) = H1 (x0 ) = = = Recalling our assumption (X, B, µ, T ) is ergodic, this function must therefore be constant, H1 ≡ c, say. Now H − c is a non-constant T × T -invariant function, so all our previous reasoning holds if we replace H with H − c. Thus we may assume without loss of generality that H has the property H1 ≡ 0. By definition H itself is non-constant and in particular H is not almost everywhere 0. Thus there exists some φ̃ ∈ L2 (X, B, µ) such that Z H ∗ φ̃ (x) = H(x, x0 )φ(x0 ) dµ(x0 ) 6= 0 It follows from our assumption H1 ≡ 0 and Fubini’s Theorem that Z ZZ H ∗ φ̃ (x) dµ(x) = H(x, x0 )φ̃(x0 ) dµ(x0 )dµ(x) Z Z 0 = H(x, x ) dµ(x) φ̃(x0 ) dµ(x0 ) = 0 As the function H ∗ φ̃ ∈ L2 (X, B, µ) is non-zero but has zero expectation, it must be nonconstant as required. For the converse, suppose (X, B, µ, T ) is a weak mixing system. We will show that any AP function is constant. Let > 0 be given and suppose f ∈ L2 (X, B, µ) is AP. If f ≡ 0 we are done, so we may assume kf kL2 (µ) 6= 0. By Proposition 7.1 there exists a syndetic set G ⊆ N such that kT n f − f kL2 (µ) < 2kf kL2 (µ) for all n ∈ G (8.2) On the other hand, if we recall the definition of weak mixing we have lim N →∞ N Z X f T n f dµ − Z 2 2 f dµ =0 n=1 Using the alternative characterisation of convergence in the mean, stated in Theorem 2.1, there exists a set Z ⊆ N of zero density such that Z Z 2 f T n f dµ − for all n ∈ N \ Z (8.3) f dµ < 2 Obviously G ∩ N \ Z 6= ∅ as otherwise G ⊆ Z and G is a set of positive lower density, a contradiction. Thus there exists some n0 ∈ N such that the both equalities stated in (8.2) and (8.3) hold for n = n0 . 38 Finally, by the triangle and Cauchy-Schwarz inequalities Z Z Z 2 Z 2 Z n0 n0 f 2 dµ − f dµ f dµ ≤ f (T f − f ) dµ + T f dµ − < kf kL2 (µ) kT n0 f − f kL2 (µ) + < 2 2 R R f dµ . The fuction x 7→ x2 on R is convex and As is arbitrary we must have f 2 dµ = we conclude for this equality to hold f must be constant. 8.1 Proof of the Multiple Recurrence Theorem: The Strategy We are now in a position to sketch the details of the strategy used to prove the Multiple Recurrence Theorem. The language used here is somewhat vague and naı̈ve, but it is hoped that this brief description will give the reader some sense of what we are aiming to achieve. Essentially the proof is a standard Zorn’s Lemma argument. Using our new terminology, we are required to prove that given any system (X, B, µ, T ) the action of T on (X, B, µ, T ) is SZ. Let F denote the collection of factors A ⊆ B for which the action of T on (X, A , µ, T ) is SZ, ordered by inclusion. By our previous work F is non-empty and in particular contains some non-trivial factor. In Section 10 we use Zorn’s Lemma to show F contains a maximal element. Sections 11 to 13 then concentrate on showing that no proper factor can be maximal and hence this maximal SZ factor must be the whole of B. In order to prove the latter claim we embark on a programme of ‘relativising’ the notions introduced in the previous sections. In particular, in Section 11 we concentrate on defining ‘weak mixing relative to (X, A , µ, T )’. The most significant result of this will be: If the action of T on (X, A , µ, T ) is SZ and (X, B, µ, T ) is a weak mixing relative to (X, A , µ, T ) then the action of T on (X, B, µ, T ) is also SZ. Similarly, in Section 12 we define what it is meant for a system to be ‘relatively compact’ and derive an entirely complementary result for the case (X, B, µ, T ) compact relative to (X, A , µ, T ). If the action of T on (X, A , µ, T ) is SZ and (X, B, µ, T ) is compact relative to (X, A , µ, T ) then the action of T on (X, B, µ, T ) is also SZ. The final step in proving the Multiple Recurrence Theorem will be to prove a dichotomy between relative weak mixing and relative compact systems. This result is very similar in flavour to Theorem 8.1. In particular we show For any factor (X, A , µ, T ) of (X, B, µ, T ) either: • (X, B, µ, T ) is a weak mixing relative to (X, A , µ, T ), or • There exists some intermediate factor A ⊆ B ∗ ⊆ B such that A is a proper factor of B ∗ and (X, B ∗ , µ, T ) is compact relative to (X, A , µ, T ). Combining these three results we conclude that no proper factor can be a maximal SZ factor and the proof of the Multiple Recurrence Theorem is complete. Although the terminology used here may be as yet unclear to the reader, it should be obvious that the remainder of the proof mirrors to some extent the work we have done so far. In short, we prove a result for two opposite cases and then demonstrate and exploit a relationship between these extremes. The definitions of ‘relatively compact’ and ‘relatively weak mixing’ will arise from considering conditional expectation and measure. In the following section we develop important measure theoretical techniques, generalising the familiar idea of conditional expectation of a discrete random variable from elementary probability theory. This will allow us to define a system of probability measures µy which are ‘conditioned on the factor A ’. The definitions of weak mixing relative to (X, A , µ, T ) and compact relative to (X, A , µ, T ) are then simply analogies of the familiar definitions of weak mixing and compact systems which use these conditional measures. 39 9 Factors of Measure Preserving Systems In the previous section we made significant progress toward the proof of the Multiple Recurrence Theorem. We demonstrated that for any measure preserving system (X, B, µ, T ) there exists some non-trivial factor (X, A , µ, T ) for which the action of T is SZ. Now we develop the theory necessary in order to make a serious study of factors, introducing certain tools which will allow us to relate the properties of the factors to the properties of an entire system. 9.1 9.1.1 Conditional Expectation Motivation This example serves as a reminder of the simple notion of conditional expectation assumed familiar to the reader13 . Example 5. Suppose (Ω, F , P) is a probability space and f, g are random variables, each with a finite number of outcomes f :Ω → {a1 , . . . , an } g:Ω → {b1 , . . . , bm } Then the conditional probability that f = ai given that g = bj is defined as P(f = ai | g = bj ) = P(f = ai , g = bj ) P(g = bj ) This leads to the definition of the conditional expectation of f , given g = bj E(f | g = bj ) = n X ai P(f = ai | g = bj ) i=1 The conditional expectation of f given g is then defined as a random variable E(f | g) E(f | g)(ω) = E(f | g = bj ) for ω ∈ g −1 (bj ) Implicit in this definition is the fact that the sets Dj = g −1 ({bj }) for j = 1, . . . , m are disjoint and partition Ω and that the conditional expectation is constant on each of the Dj . An important idea to grasp is that by defining G := σ({Dj : j = 1, . . . , m}) = σ(g), we may think of E(f | g) as a G -measurable function. Furthermore, if we give the set Ω0 = {bj }m j=1 the pull-back measure P0 = g ∗ P induced by g, that is P0 ({bj }) = P(g −1 {bj }) for j = 1, . . . , m then g : (Ω, F , P) → (Ω0 , P(Ω0 ), P0 ) (9.1) 0 is a measure preserving map and we may equally consider E(f | g) a P -measurable function on Ω0 . Our aim will be to generalise this notion of conditional expectation so that f, g may be taken to be continuous random variables. This is rather a non-trivial task and in some cases no meaningful analogies are available (although in many reasonable cases, and in particular when we are dealing with Borel probability space, such an analogy does exist). 13 Although it is stressed in advance that we will not, in fact, be adopting any decidedly probabilistic viewpoint in any part of this study. 40 Our approach will be to consider θ : (X, B, µ) → (Y, D, ν) a measure preserving map (c.f. g in (9.1)) and define for all f ∈ L1 (X, B, µ) the conditional expectation as a function E(f | Y) ∈ L1 (Y, D, ν). Intuitively, we think of E(f | Y) as a direct analogy of the simple notion of conditional expectation from Example 5: that is, for every y ∈ Y we have E(f | Y)(y) = E(f | θ = y) This, however, is not a precise way of thinking; it is clear that much more subtlety is required14 . The sophomore part of this section will relate the theory of conditional expectation to the study of factors by introducing the notion of an extension of measure preserving systems. The probabilistic context is not vital for this study: we shall introduce condition expectation as an object of linear algebra and not probability theory15 . However, understanding the major result of this section, Theorem 9.4, is greatly aided by some probabilistic intuition. 9.1.2 Definition and Basic Properties We follow the construction of the conditional expectation as given in ([8] Chapter 5). Let θ : (X, B, µ) → (Y, D, ν) denote a measure preserving map between measure spaces. There is a naturally defined lift θ̃ : L2 (Y, D, ν) → L2 (X, B, µ) given by θ̃(f ) = f θ where f θ (x) = f (θ(x)) for all x ∈ X. Now, it is easy to see L2 (Y, D, ν)θ ⊆ L2 (X, B, µ) is a closed, linear subspace. We may use orthogonal projects to relate the whole space to this subspace, and then map back to L2 (Y, D, ν) in order to give a map L2 (X, B, µ) → L2 (Y, D, ν). Definition 18. Define the conditional expectation operator16 E(· | Y) : L2 (X, B, µ) → L2 (Y, D, ν) by E(f | Y)θ = P f for all f ∈ L2 (X, B, µ) where P is the orthogonal projection of L2 (X, B, µ) onto the closed subspace L2 (Y, D, ν)θ . It is clear that E(· | Y) is a linear operator and furthermore it is a contraction. Note also that if f ∈ L2 (Y, D, ν) then E(f θ | Y) = f . Using elementary Hilbert space theory we establish the following: Proposition 9.1. Let f ∈ L2 (X, B, µ) and h ∈ L2 (Y, D, ν). Then Z Z f hθ dµ = E(f | Y)h dν (9.2) In particular, taking h = 1A for any A ∈ D we have Z Z f dµ = E(f | Y) dν for all f ∈ L2 (X, B, µ) (9.3) θ −1 (A) A Finally, if g ∈ L∞ (Y) then E(g θ f | Y) = gE(f | Y) 14 (9.4) For instance, are all the singletons {y} measurable; are the functions E(f | Y) necessarily everywhere defined? Of course a probabilistic definition does exists, see for instance ([5] Chapter 5; [25], Chapter 9). 16 Provided there is no risk of ambiguity, throughout the remainder of this discourse X will denote the measure space (X, B, µ) and Y the measure space (Y, D, ν). 15 41 Proof. Note that L2 (Y, D, ν)θ is a closed subspace of the Hilbert space L2 (X, B, µ) and so the projection P satisfies hP f, ki = hf, ki for all k ∈ L2 (Y, D, ν)θ . Thus, as θ is measure preserving, Z Z Z Z θ θ θ θ f h dµ = P f h dµ = E(f | Y) h dµ = E(f | Y)h dν giving (9.2) and (9.3). To show (9.4), note for any g ∈ L∞ (Y) we have gh ∈ L2 (Y) and f g θ ∈ L2 (X). Applying (9.3) to gh gives Z Z f (gh)θ dµ = E(f | Y)gh dν On the other hand, Z f (gh)θ dµ = Z (f g θ )hθ dµ = Z E(f g θ | Y)h dν R R Thus E(f | Y)gh dν = E(f g θ | Y)h dν for all h ∈ L2 (Y) so we must have (9.4) by the elementary properties of the inner product. Proposition 9.2. For any f ∈ L2 (X, B, µ) with f (x) ≥ 0 for almost every x ∈ X we have E(f | Y)(y) ≥ 0 for almost every y ∈ Y . Proof. Suppose f ∈ L2 (X, B, µ) with f ≥ 0, that is f (x) ≥ 0 for almost every x ∈ X. The subspace L2 (Y, D, ν)α ⊆ L2 (X, B, µ) is closed under f1 , f2 7→ max(f1 , f2 ) so that g = max(E(f | Y), 0) ∈ L2 (Y, D, ν)α . Now suppose E(f | Y) 0. The g 6= E(f | Y)kL2 (µ) and kf − gkL2 (µ) < kf − E(f | Y). But this contradicts the closest point property of orthogonal projections onto closed linear subspaces. Hence E(f | Y) ≥ 0. Proposition 9.3. The conditional expectation operator extends to an operator E( · | Y) : L1 (X, B, µ) → L1 (Y, D, ν) satisfying all previously established properties mutatis mutandis. Proof. The result follows by the density of L2 (X, B, µ) in L1 (X, B, µ) and applying standard Banach space theory. The details are omitted; see [5] or [8] for the full proof. 9.1.3 Disintegration of Measure We are now in a position to present the main result of this section which makes explicit the sense by which the conditional expectation operator generalises the notion of conditional expectation as detailed in Example 5. The statement and subsequent proof of the following theorem are adapted from ([5] Chapter 5; [8] Chapter 5). Theorem 9.4 (Disintegration Theorem). Let θ : (X, B, µ) → (Y, D, ν) be a measure preserving map between probability spaces, where we assume (X, B, µ) is a Borel probability space. Then there exists a D-measurable set Y 0 ⊆ Y of measure 1 and system of probability measures {µy : y ∈ Y 0 } on (X, B) such that 1. For all f ∈ L1 (X, B, µ) and y ∈ Y 0 , f ∈ L1 (X, B, µy ) and Z f dµy = E(f | Y)(y) (9.5) 2. For all f ∈ L1 (X, B, µ) the ν-almost everywhere defined function Z y 7→ f dµy is ν-measurable. Furthermore, Z Z Z f dµy dν(y) = f dµ 42 (9.6) Example 6. Inspired by Example 5, let (X, B, µ) be any probability space satisfying the hypotheses of Theorem 9.4 and θ ∈ L1 (X, B, µ) be any simple function. Then there exists some (minimal) finite subset {b1 , . . . , bm } ⊂ R such that θ(x) ∈ {b1 , . . . , bm } for almost every x ∈ X Let Bi = θ−1 ({bi }) for each i ∈ {1, . . . , m}. If y = bi then for any f ∈ L1 (X, B, µ) observe Z Z 1 f dµ f dµy = µ(Bi ) Bi R Hence on each of the Bi the value of f dµy is the average of the value of f over the region Bi . In particular, if A ∈ B then µ(Bi ∩ A) µy (A) = µ(Bi ) Thus the conditional measures and expectations generalise the simple case of Example 5. As the above example is somewhat degenerate we now turn to a more appropriate example. Example 7. Denote (X, B, µ) the unit square [0, 1] × [0, 1] with the usual Lebesgue measure and (Y, D, ν) the unit interval [0, 1] again with Lebesgue measure, so that µ = ν ⊗ ν. If we let θ denote the projection onto the first co-ordinate, θ : (X, B, µ) → (Y, D, ν) ; (x1 , x2 ) 7→ x1 then θ is a measure preserving map. For any f ∈ L1 (X, B, µ) the conditional expectation is given by Z E(f | Y)(x1 ) = f (x1 , x2 ) dν(x2 ) For any y ∈ Y the set A = {a} × [0, 1] ∈ B is µ-null. However, Z Z µy (A) = 1A (a, x2 ) dν(x2 ) = dν(x2 ) = 1 Hence the measure µy is supported on a µ-null set. In proving Theorem 9.4 the idea is to fix y ∈ Y and use the Riesz Representation Theorem to associate a measure to the (restricted) functional E(· | Y)(y) : C(X) → R. This approach, however, is naive and flawed: each E(f | Y) ∈ L1 (Y, D, ν) and thus only defined up to nullfunctions. For any given f ∈ C(X), the existence of E(f | Y)(y) is only guaranteed ν-almost everywhere. Happily, one can show the set on which the definition can fail is a countable union of null sets. The measures µy can then be defined for ν-almost every y ∈ Y as we will presently demonstrate. Proof (of Theorem 9.4). First note that once Part 1 of the theorem holds for some f ∈ L1 (X, B, µ), Part 2 holds for f by virtue of the properties of the conditional expectation operator. Hence we are only required to prove Part 1. Recall for the compact metric space X, the space (C(X), k · k∞ ) is separable. Therefore there exists a countable, dense set of functions {h0 = 1X , h1 , h2 , . . . } ⊂ C(X). The set of linear combinations of these functions over Q is countable dense vector space lying within C(X). We write this space as F = {f0 = 1X , f1 , f2 , . . . } ⊂ C(X) Recall, E( · | Y) : L1 (X, B, µ) → L1 (Y, D, ν) is a linear, bounded (with operator norm 1) and positive operator. Thus, fixing n ∈ N 1. If fn = pfi + qfj for some p, q ∈ Q and i, j ∈ N then E(fn | Y)(y) = pE(fi | Y)(y) + qE(fj | Y)(y) 43 ν-almost everywhere. 2. |E(fn | Y)(y)| ≤ kfn k∞ ν-almost everywhere. 3. If fn ≥ 0 then E(fn | Y)(y) ≥ 0 ν-almost everywhere. Thus, the set An ⊆ Y on which conditions 1, 2 and 3 hold for the function fn is of measure 1. The set Y 0 on whichTthe conditions hold for all f ∈ F is also measure 1: it is given by the countable intersection n∈N An . For each n ∈ N let gn ∈ L1 (Y, D, ν) be an (everywhere defined) function representing the equivalence class E(fn | Y) ∈ L1 (Y, D, ν). For each y ∈ Y 0 we may then define a functional Ey : F → R by Ey (fn ) = gn (x). By the construction of the set Y 0 this functional is Q-linear, bounded and positive. As the set F is dense in C(X), by elementary theory of linear operators we may extend Ey to a continuous Q-linear functional Ey : C(X) → R It then follows by continuity Ey is R-linear. By the Riesz Representation Theorem, for every y ∈ Y 0 as Ey ∈ C(X)∗ it can be represented by a signed measure µy on (X, B) with Z Ey (f ) = f dµy ∀f ∈ C(X) Moreover, Ey is a positive functional and by Proposition 9.1 we have Ey (f0 ) = E(1X | Y) = 1. It follows for all y ∈ Y 0 , the measure µy is a probability measure on (X, B). We have obtained our system of measures {µy : y ∈ Y 0 }. It remains to show the properties enumerated in the statement of the theorem. By definition for all fn ∈ F we have Z fn dµy = E(fn | Y)(y) almost everywhere y ∈ Y 0 (9.7) Thus the theorem holds for all fn ∈ F . We will use approximations to extend the result to all f ∈ L1 (X, B, µ). By the density of F , given any g ∈ C(X) there exists a sequence (fni )i∈N ⊆ F such that limi→∞ kg − fni k∞ = 0. Therefore, by (9.7) we immediately deduce the theorem holds for all continuous functions. As the continuous functions are dense in L1 (X, B, µ) for any f ∈ L1 (X, B, µ) there exists a sequence (gn )n∈N ⊂ C(X) with ∞ X kgn kL1 (µ) < ∞ such that k n=1 ∞ X gn − f kL1 (µ) = 0. n=1 Note that by R the preceding argument, for almost every y ∈ Y and each n ∈ N we have E(gn | Y) = gn dµy . The P conditional expectation operator is a contraction. As kE(|gn | | Y)kL1 (ν) ≤ kgn kL1 (µ) it ∞ follows n=1 kE(|gn | | Y)kL1 (ν) < ∞. In particular ∞ X E(|gn | | Y)(y) < ∞ almost everywhere y ∈ Y n=1 At a point y ∈ Y such that the above sum converges we have ∞ X |gn | ∈ L1 (X, B, µy ) n=1 44 (9.8) PN For each N ∈ N and almost every y ∈ Y the function n=1 gn ∈ L1 (X, B, µy ) is dominated by (9.8) and hence by dominated convergence Z X ∞ ∞ Z X gn dµy = gn dµy almost everywhere y ∈ Y n=1 = n=1 ∞ X E(gn | Y)(y) almost everywhere y ∈ Y n=1 Equivalently, thinking of y 7→ R PN n=1 gn dµy as an L1 (Y, D, ν) function, Z X N N X lim E(gn | Y) − gn dµ(·) N →∞ n=1 L1 (ν) n=1 =0 (9.9) On the other hand, N X |E(gn | Y)| L1 (ν) ≤ n=1 and so k P∞ n=1 N X kE(gn | Y)kL1 (ν) ≤ n=1 N X kgn kL1 (µ) < ∞ n=1 |E(gn | Y)|kL1 (ν) < ∞. Hence, recalling (9.9) we conclude Z X ∞ gn dµy = E ∞ X n=1 gn Y (y) almost everywhere y ∈ Y n=1 = E(f | Y)(y) It remains to show Z f dµy = Z X ∞ gn dµy almost everywhere y ∈ Y (9.10) n=1 P∞ In particular, suppose A ∈ B is such that n=1 gn − f 6= 0 for all x ∈ A. Then µ(A) = 0 so it is clear that (9.10) follows once we have shown A is also µy -null for almost all y ∈ Y . The open subsets of X generate B and so there exists a sequence (An )n∈N of open sets such that A ⊆ An and µ(An ) → 0 as n → ∞. Let hn ∈ C(X) be supported on clos(An ) with 0 ≤ hn ≤ 1 for all n ∈ N. Then for each > 0 the functions hn are continuous and Z Z Z hn dµy dν(y) = hn dµ ≤ µ(An ) Letting → 0 we have Z µy (An ) dν(y) ≤ µ(An ) and so, by letting n → ∞ Z µy (A) dν(y) = 0 Thus µy (A) = 0 for almost every y ∈ Y as required. 9.2 Extensions of Measure Preserving Systems So far we have only studied maps θ between measure spaces, rather than measure preserving systems. In order to apply the preceding theory to the ergodic theory setting we make the following definition. Definition 19. Let α : (X, B, µ, T ) → (Y, D, ν, S) be a measurable map between measure preserving systems. If α preserves the structure of the systems in the sense α ◦ T = S ◦ α then α is called an extension of measure preserving systems. 45 Proposition 9.5. Let α : (X, B, µ, T ) → (Y, D, ν, S) be an extension of invertible measure preserving systems. The conditional expectation operator E( · | Y) : L1 (X, B, µ) → L1 (Y, D, ν) satisfies SE(f | Y) = E(T f | Y) for all f ∈ L1 (X, B, µ) Proof. Let f ∈ L1 (X, B, µ). By the Proposition 9.1 we have Z Z E(T f | Y)h dν = T f hα dµ for all h ∈ L1 (Y, D, ν) As T is measure preserving so is T −1 and hence Z Z T f hα dµ = f T −1 hα dµ Now, by the property of extensions T −1 hα = (S −1 h)α . Putting everything together Z Z E(T f | Y)h dν = f (S −1 h)α dµ Z = E(f | Y)S −1 h dµ Z = SE(f | Y)h dµ As this holds for all h ∈ L1 (Y, D, ν), by the elementary properties of the inner product SE(f | Y) = E(T f | Y). Corollary 9.6. With the same hypotheses as the previous proposition, µy (T −1 A) = µSy (A) for all A ∈ B for all y ∈ Y be such that the conditional measure µy is defined. Proof. Let A ∈ B and y ∈ Y such that µy is defined in accordance with Theorem 9.4. Note that T 1A = 1T −1 A so by Proposition 9.5 Z SE(1A | Y)(y) = E(1T −1 A | Y)(y) = 1T −1 A dµy = µy (T −1 A) On the other hand, Z SE(1A | Y)(y) = E(1A | Y)(Sy) = 1A dµSy = µSy (A) Thus µy (T −1 A) = µSy (A) as required. 9.3 Factors and Extensions of Measure Preserving Systems In this section we will demonstrate a correspondence between the factors of a system and extensions which will allow us to apply the notion of conditional measures to the study of factors. Our work here is adapted from ([5] Chapter 5). It is easy to see that any extension of measure preserving space induces a factor. Lemma 9.7. Let α : (X, B, µ, T ) → (Y, D, ν, S) be an extension of measure preserving systems. Then A = α−1 D is T -invariant sub-σ-algebra of B and hence defines factor of (X, B, µ, T ). 46 Proof. It is clear that A is a σ-algebra on X and, as α is measurable, A ⊆ B. All that remains is to show A is T -invariant. Let B ∈ A then B = α−1 D for some set D ∈ D. Using the property of an extension, T −1 B = {x ∈ X : α ◦ T (x) ∈ D} = {x ∈ X : S ◦ α(x) ∈ D} = α−1 (S −1 D) As S is D-measurable, S −1 D ∈ D and so T −1 B ∈ A as required. Our aim will be to prove a converse. Before we do this we note an important fact about Borel probability systems (X, B, µ). We show that any sub-σ-algebra A ⊆ B is ‘almost generated by a countable collection of sets’. Definition 20. A σ-algebra A on a set X is countably generated if there exists a countable collection of sets {An : n ∈ N} ⊂ P(X) such that A = σ(A1 , A2 , . . . ) µ Definition 21. Let B be a σ-algebra and A , A 0 ⊆ B sub-σ-algebras. We write A ⊆ A 0 to denote the relation: for all A ∈ A there exists some A0 ∈ A 0 such that µ(A4A0 ) = 0. µ µ µ Furthermore, we write A = A 0 whenever A ⊆ A 0 and A 0 ⊆ A . Proposition 9.8. If (X, B, µ) is a Borel probability space and A ⊆ B is a sub-σ-algebra then µ there exists a countably generated sub-σ-algebra A 0 = σ(A1 , A2 , . . . ) such that A = A 0 . Proof. As C(X) is a separable metric space and dense in L1 (X, B, µ) it follows L1 (X, B, µ) is separable. Furthermore, the collection of characteristic functions {1A : A ∈ A } ⊆ L1 (X, A , µ) ⊆ L1 (X, B, µ) is separable. Let A1 , A2 , . . . be such that {1An : n ∈ N} is a dense subset of {1A : A ∈ A } and denote A 0 = σ(A1 , A2 , . . . ). Then A 0 ⊆ A . Moreover, by density for any A ∈ A there exists a sequence (nk )k∈N of positive integers such that µ(A4Ank ) = k1A − 1Ank kL1 (µ) < Denote A0 = T k∈N 1 k Ank . Then A0 ∈ A 0 and µ(A4A0 ) = 0 as required. We now turn to the converse of Lemma 9.7. Theorem 9.9. Let (X, B, µ, T ) be a measure preserving system defined on a Borel probability space and A a T -invariant sub-σ-algebra of B. Then there exists a system (Y, D, ν, S) defined on µ a Borel probability space and an extension α : (X, B, µ, T ) → (Y, D, ν, S) such that A = α−1 D. At a first glance the theorem may appear somewhat trivial: why not take (Y, D, ν, S) = (X, A , µ, T ) and α the identity map? Of course, the key to this result is that (Y, D, µ) is a Borel probability space: we cannot guarantee that this holds in general for (X, A , µ, T ). Our entire theory of conditional measure is reliant on the spaces in question being Borel probability spaces and thus this theorem is necessary when dealing with important situations involving intermediate extensions. We will take Y in the statement of Theorem 9.9 to be the (compact, metrizable) space M (X) of Borel probability measures on (X, B). First we show that a canonical Borel probability system may be constructed on this space; once this is accomplished, Theorem 9.9 follows with ease. Proof (of Theorem 9.9). As we demonstrated in Section 3.4, the set Y = M (X) with the weak* topology is a compact, metrizable space. Let D be the Borel σ-algebra generated by the weak*topology. In order to find a natural choice of measure on (Y, D) we use the Disintegration Theorem. Note that the identity on X induces a map ι : (X, B, µ) → (X, A , µ) 47 ; x 7→ x from a Borel probability space and is (trivially) measure preserving. By disintegration we have an almost everywhere defined measurable map α : (X, A , µ) → (Y, D) ; y 7→ µy (9.11) taking y ∈ X to the conditional measure µy . We give (Y, D) the pull-back measure ν = α∗ µ induced by this map, so that for all D ∈ D ν(D) = µ(α−1 D) Finally we wish to define a natural measure preserving transformation on the Borel probability space (Y, D, ν). This is marginally trickier than the previous constructions. Claim. The pull-back map S = T ∗ : (Y, D, ν) → (Y, D, ν) defined by Sλ(A) = λ(T −1 A) for all λ ∈ M (X) and all A ∈ A is measurable. Moreover, it preserves the measure ν. Proof (of Claim). We follow the argument given in ([5] Chapter 5). In order to prove S is measurable we must first make some observations concerning the topology that generates D. In particular, it is easy to see by the definition of the weak*-topology that a set U ⊆ M (X) is open if for every κ ∈ U there exists functions f1 , . . . , fr ∈ C(X) and constants 1 , . . . , r > 0 such that Z Z λ ∈ M (X) : fi dλ − fi dκ < i , for all i = 1, . . . , r ⊆ U Recall C(X) is separable. Let (gn )n∈N ⊂ C(X) be a dense subset of C(X) and > 0 be given. Then for any f ∈ C(X) there exists an n0 ∈ N such that kf − gn0 k∞ < 3 . If λ, κ ∈ M (X) are such that Z Z f dλ − f dκ < 3 then by the triangle inequality Z Z gn0 dλ − gn0 dκ < It follows that the collection of all finite intersections of sets of the form Z Z o n λ ∈ M (X) : gn dλ − gn dκ < for some κ ∈ M (X), > 0 and n ∈ N is a basis for the topology on M (X). Moreover, by the density of the rationals, the collection of finite intersections of sets of the form Z n o U(n,r,q) = λ ∈ M (X) : gn dλ − r < q for some n ∈ N and r, q ∈ Q is a countable basis for the topology on M (X). Now, Z n o S −1 U(n,r,q) = λ ∈ M (X) : T gn dλ − r < q is clearly measurable. Moreover, for any U ⊂ M (X) open, S −1 U is a countable union of finite intersections of measurable sets and therefore measurable. The open subsets of M (X) form a generating algebra for D and hence S is D-measurable. It remains to prove S preserves the measure ν. By Corollary 9.6, for all A ∈ B we have µT y (A) = Sµy (A) and so S◦α=α◦T (9.12) 48 It follows that for any D ∈ D we have α−1 S −1 (D) = T −1 α−1 (D) Finally α∗ µ(S −1 D) = µ(α−1 S −1 (D)) = µ(T −1 α−1 (D)) = ν(D) and hence S preserves ν. Now that the measure preserving system (Y, D, ν, S) has been constructed, most of the work in proving Theorem 9.9 has been done. Through a slight abuse of notation consider α : (X, B, µ, T ) → (Y, D, ν, S) Then, as we have shown, (Y, D, ν, S) is a measure preserving system and by (9.12), α is an extension of measure preserving systems. µ We are required to prove A = π −1 D. To see this note that as (X, B, µ) is a Borel probability space, by Proposition 9.8 there exists some countably generated sub-σ-algebra A 0 such that µ µ A = A 0 . As the = relation is transitive, we may assume therefore without loss of generality A = σ(A1 , A2 , . . . ) is countably generated. µ To see A ⊆ α−1 D it is enough to show that for each n ∈ N there exists some set A0n ∈ α−1 D such that µ(An 4A0n ) = 0. Consider the D-measurable set D = {ν ∈ Y : ν(An ) = 1}. Note that α−1 D = {x ∈ X : µx (An ) = 1} Recalling that the conditional measures are defined through the trivial extension ι it is easy to see µx (An ) = 1An (x) almost everywhere x ∈ X Define X0 = \ {x ∈ X : µx (An ) = 1An (x)} n∈N Then µ(X 0 ) = 1 and so, if we set A0n = An ∩ X 0 ∈ D then µ(A0n 4An ) = 0 for all n ∈ N. µ Conversely, to prove α−1 D ⊆ A it suffices to show α−1 (U ) ∈ A for all open sets U of the form (9.3) as these sets form a basis for the topology on M (X) which generates D. But this is immediate as for any f ∈ C(X) and any r, > 0 we have Z Z o o n n −1 = x∈X: f dµx − r < α f dν − r < ν ∈ M (X) : is A -measurable by the construction of the conditional measures. We now have enough measure-theoretic tools to return to the proof of the Multiple Recurrence Theorem. 10 Existence of Maximal SZ Factors As payoff for the work done in the previous section we turn to proving the following theorem, an important step towards the proof of the Multiple Recurrence Theorem. Theorem 10.1. Let (X, B, µ, T ) be a measure preserving system and F the set of factors A ⊆ B of the system for which the action of T on (X, A , µ, T ) is SZ. Then F contains a maximal element with respect to inclusion. 49 Following the argument given in ([10], §7) we use Zorn’s Lemma. In particular, we consider a chain, or totally ordered subset, {Ai : i ∈ I} ⊆ F of F and define [ sup Ai := σ Ai i∈I i∈I To prove Theorem 10.1 it suffices to show supi∈I Ai is an upperbound for the chain. Clearly • sup Ai is a σ-algebra • Ai ⊆ sup Ai for all i ∈ I • sup Ai is T -invariant Note that for the final bulletpoint, the set {A ∈ supi∈I Ai : T −1 A ∈ supi∈I Ai } is a d-system and thus by Dynkin’s Lemma it suffices to show T -invariance only for sets in the generating algebra ∪i∈I Ai . For such sets the result is trivial. The proof of Theorem 10.1 therefore boils down to showing that the action of T on (X, supi∈I Ai , µ, T ) is SZ. Explicitly, for any A ∈ supi∈I Ai such that µ(A) > 0 we need to show that given any k ∈ N, \ N k 1 X −jn µ T A >0 (10.1) lim inf N →∞ N n=1 j=0 Proof (of Theorem 10.1). Let A ∈ supi∈I Ai be such that µ(A) > 0 and k ∈ N. By the above discussion it suffices to show (10.1). By Lemma 3.8, if = 14 ηµ(A) where η = 21 (k + 1)−1 then there exists a set A1 ∈ Ai0 for some i0 ∈ I such that A ⊆ A1 and µ(A4A1 ) < (10.2) Note that µ(A1 ) > 0. In particular, 1 µ(A1 ) ≥ µ(A) − µ(A4A1 ) > µ(A) − ηµ(A) 4 (10.3) Now we apply the results of the previous section. By Theorem 9.9, there exists a Borel probability system (Y, D, ν, S) and extension α : (X, B, µ, T ) → (Y, D, ν, S) such that Ai0 = α−1 D. In particular, A1 = α−1 D1 for some set D1 ∈ D. Disintegrate the measure µ with respect to (Y, D, ν) to form a collection of probability measures µy defined for almost every y ∈ Y . Note that µy (A1 ) = E(1α D1 | Y)(y) = 1D1 (y) Now A1 may not give a good approximation to A in terms of the conditional measures µy . To this end we define D0 := {y ∈ Y : µy (A) ≥ 1 − η} and let A0 = α−1 (D0 ). We claim µ(A0 ) > 21 µ(A) > 0. Indeed, using the properties of the conditional measure Z µ(A1 \ A) = µy (A1 \ A) dν(y) Z ≥ µy (A1 ) − µy (A) dν(y) D1 \D0 Z = 1 − µy (A) dν(y) D1 \D0 ≥ ην(D1 \ D0 ) ≥ η(ν(D1 ) − ν(D0 )) 50 Now α is measure preserving and so 1 µ(A0 ) ≥ µ(A1 ) − µ(A1 \ A) η 1 = µ(A1 ) − µ(A1 4A) η (10.4) Finally, by applying (10.2) and (10.3) to (10.4) we get µ(A0 ) > 3−η 1 µ(A) > µ(A) 4 2 By hypothesis the action of T on (X, Ai0 , µ, T ) is SZ. In particular, \ N k 1 X µ T −jn A0 > 0 N n=1 j=0 lim inf N →∞ (10.5) We claim that µy \ k T −jn A > j=0 1 2 Once the claim is proven it is easy to see that \ k −jn µ T A ≥ Z µy Tk j=0 j=0 ≥ = 1 ν 2 \ k S −jn D0 \ k S T −jn A dν j=0 −jn D0 j=0 \ k 1 −jn T A0 µ 2 j=0 (10.6) and hence comparing (10.5) and (10.6) concludes the proof. Tk To prove the claim suppose y ∈ j=0 S −jn D0 . Then for j = 0, . . . , k, we have S jn y ∈ D0 and thus, applying Corollary 9.6, µy (T −jn A) = µS jn y (A) > 1 − η for all j ∈ {0, . . . , k} In particular the complement of each T −jn A has measure µy ((T −jn A)c ) < η. The measure of Tk the intersection j=0 T −jn A is smallest when the complements (T −jn A)c are all disjoint. Hence µy \ k T −jn A > 1 − (k + 1)η = j=0 1 2 and we are done. 11 Weak Mixing Extensions We will briefly recap what we have thus achieved and remind the reader of the proof strategy laid out earlier in the essay. So far we have shown that any system (X, B, µ, T ) contains some maximal (non-trivial) factor with respect to inclusion, (X, Aˆ, µ, T ) say. The proof of the Multiple Recurrence Theorem is concluded by demonstrating no proper factor can be maximal. The method used to achieve this rather ingeniously mirrors our previous work. 51 In this section we describe what it is meant for a system (X, B, µ, T ) to be weak mixing relative to a factor (X, A , µ, T ). As factors correspond precisely to extensions, this is equivalent to defining a particular kind of extension α : (X, B, µ, T ) → (Y, D, ν, S) known as a weak mixing extension. We will then show that weak mixing extensions preserve the SZ property: if the action of S on (Y, D, ν, S) is SZ and α : (X, B, µ, T ) → (Y, D, ν, S) is a weak mixing extension then the action of T on (X, B, µ, T ) is also SZ. Thus we will have shown: If (X, Aˆ, µ, T ) is a proper maximal SZ factor of (X, B, µ, T ) then (X, B, µ, T ) cannot be weak mixing relative to (X, Aˆ, µ, T ). Similarly, in Section 12 we define what is meant for a system to be compact relative to a factor via the notion of compact extensions. We will show that compact extensions also preserve the SZ property. Hence if (X, Aˆ, µ, T ) is a proper maximal SZ factor of (X, B, µ, T ) then (X, B, µ, T ) cannot be compact relative to (X, Aˆ, µ, T ). Furthermore, we deduce a more general statement: Suppose (X, Aˆ, µ, T ) is a proper maximal SZ factor of (X, B, µ, T ) and B ∗ is some intermediate factor, that is (X, B ∗ , µ, T ) is a factor of (X, B, µ, T ) and Â ⊆ B ∗ ⊆ B. If (X, Aˆ, µ, T ) is a proper factor of (X, B ∗ , µ, T ), then (X, B ∗ , µ, T ) cannot be compact relative to (X, Aˆ, µ, T ). In the final section we conclude the proof of the Multiple Recurrence Theorem by demonstrating a dichotomy between weak mixing and compact extensions. We demonstrate that for any proper factor (X, A , µ, T ) of (X, B, µ, T ) either: • (X, B, µ, T ) is weak mixing relative to (X, A , µ, T ) or • There exists some intermediate factor A ⊆ B ∗ ⊆ B such that A is a proper factor of B ∗ and (X, B ∗ , µ, T ) is compact relative to (X, A , µ, T ). Bringing all this together we conclude no proper factor can be maximal. Presently we will use the notions of conditional expectation and measure to define the concept of a ‘relative weak mixing’ system. Recall that a system (X, B, µ, T ) is weak mixing if and only the product system X × X is ergodic. This inspires us to consider defining a ‘conditional product’. 11.1 The Relative Square This subsection is based on ([8] Chapter 5). Definition 22. Suppose (X1 , B1 , µ1 , T1 ) and (X2 , B2 , µ2 , T2 ) are measure preserving systems defined on Borel probability spaces which both extend the system (Y, D, ν, S). Explicitly, suppose there exists extensions α1 : (X1 , B1 , µ1 , T1 ) → (Y, D, ν, S) α2 : (X2 , B2 , µ2 , T2 ) → (Y, D, ν, S) Thus we may disintegrate the measures µ1 and µ2 with respect to (Y, D, ν) to form two systems of conditional measures µ1,y and µ2,y , defined for almost all y ∈ Y . It is not difficult to see µ1,y ⊗ µ2,y is ν-measurable (for instance, see [8]). We define the relative independent joining X1 ×Y X2 of the systems X1 = (X1 , B1 , µ1 , T1 ) and X2 = (X2 , B2 , µ2 , T2 ) as the system (X1 × X2 , B1 ⊗ B2 , µ1 ×Y µ2 , T1 × T2 ) R where the measure µ1 ×Y µ2 = µ1,y ⊗ µ2,y dν(y). Lemma 11.1. Let f1 ∈ L2 (X1 , B1 , µ1 ) and f2 ∈ L2 (X2 , B2 , µ2 ). Then f1 ⊗ f2 ∈ L2 (µ1 ×Y µ2 ) and Z Z f1 ⊗ f2 d(µ1 ×Y µ2 ) = E(f1 | Y)E(f2 | Y) dν 52 Proof. It is easy to see Z f1 ⊗ f2 d(µ1 ×Y µ2 ) Z Z = Z Z = f1 ⊗ f2 d(µ1,y ⊗ µ2,y ) dν(y) Z f1 dµ1,y f2 dµ2,y dν(y) Z = E(f1 | Y)(y)E(f2 | Y)(y) dν(y) Definition 23. Suppose α : (X, B, µ, T ) → (Y, D, ν, S) is an extension of measure preserving systems where (X, B, µ, T ) is defined on a Borel probability space. Then we can form the relative independent joining X ×Y X. We say X ×Y X is the Relative Square of the system (X, B, µ, T ) with respect to the factor (Y, D, ν, S). Note by Lemma 11.1, for any f ∈ L2 (X, B, µ) we have Z Z f ⊗ f d(µ ×Y µ) = E(f | Y)2 dν If α : (X, B, µ, T ) → (Y, D, ν, S) is an extension and π : X × X → X denotes the projection onto the first co-ordinate then both π : X ×Y X → X and α ◦ π : X ×Y X → Y are extensions. In particular, we may define the expectation for any function f ∈ L2 (µ1 ×Y µ2 ) conditioned on either X or Y. Lemma 11.2. For any f ∈ L2 (µ1 ×Y µ2 ) we have E(f | Y) = E(E(f | X) | Y) Proof. Considering orthogonal projections P1 : L2 (X ×Y X) → L(X, B, µ)π P2 : L(X, B, µ) → L(Y, D, ν)α Q : L2 (X ×Y X) → L(Y, D, ν)α◦π it is easy to see Q = P2π ◦P1 . The result then follows immediately by the definition of conditional expectation. Proposition 11.3. For any f1 , f2 ∈ L2 (X, B, µ) we have E(f1 ⊗ f2 | Y) = E(f1 | Y)E(f2 | Y) Proof. First we show that E(f1 ⊗ f2 | X) = f1 E(f2 | Y)α . Indeed, for any h ∈ L2 (X, B, µ), by applying Lemma 11.1 and the standard properties of conditional expectation we have Z Z α f1 E(f2 | Y) h dµ = E(f1 E(f2 | Y)α h | Y) dν (11.1) Z = E(f1 h | Y)E(f2 | Y) dν Z = f1 ⊗ f2 hπ d(µ ×Y µ) Further, now considering the extension π1 : X ×Y X → X observe Z Z f1 ⊗ f2 hπ d(µ ×Y µ) = E(f1 ⊗ f2 | X)h dµ 53 (11.2) Hence, by comparing the left hand side of (11.1) and the right hand side of (11.2), and noting that the equalities hold for all h ∈ L2 (X, B, µ) we conclude E(f1 ⊗ f2 | X) = f1 E(f2 | Y)α by the elementary properties of the inner product. The proposition now follows by applying the preceding lemma. Explicitly, E(f1 ⊗ f2 | Y) = E(E(f1 ⊗ f2 | X) | Y) = E(f1 E(f2 | Y)α | Y) = E(f1 | Y)E(f2 | Y) 11.2 Weak Mixing Extensions The following discussion of weak mixing extensions is adapted from ([10], §8). Definition 24. Let α : (X, B, µ, T ) → (Y, D, ν, S) be an extension of measure preserving systems. We say the extension is weak mixing if the relative-square system X ×Y X is ergodic. Alternatively, in this case we may also say that (X, B, µ, T ) is weak mixing relative to the factor (X, A , µ, T ) where A = α−1 (D) or weak mixing relative to (Y, D, ν, S). Following the argument given in ([10], §8) we first note the following lemma. Lemma 11.4. Suppose α : (X, B, µ, T ) → (Y, D, ν, S) is a weak mixing extension and f, g ∈ L∞ (X, B, µ) then the following holds: N Z 2 1 X E(f T n g | Y) − E(f | Y)S n E(g | Y) dν = 0 N →∞ N n=1 lim Proof. Let f, g ∈ L∞ (X, B, µ) and first consider the case E(f | Y) ≡ 0. By hypothesis the system X ×Y X is ergodic. Apply Lemma 11.1 and the Mean Ergodic Theorem to the functions f ⊗ f, g ⊗ g ∈ L∞ (X ×Y X) to obtain N Z 2 1 X E(f T n g | Y) dν N →∞ N n=1 lim N Z 1 X f ⊗ f (T × T )n (g ⊗ g) d(µ ×Y µ) N →∞ N n=1 Z Z = f ⊗ f d(µ ×Y µ) g ⊗ g d(µ ×Y µ) Z Z = E(f | Y)2 dν E(g | Y)2 dν = 0 = lim The general case where f ∈ L∞ (X, B, µ) has arbitrary conditional expectation now follows by considering the function F = f − E(f | Y). Clearly E(F | Y) = E(f | Y) − E(E(f | Y) | Y) = E(f | Y) − E(f | Y) = 0 and by the previous case N Z 2 1 X lim E(F T n g | Y) dν = 0 N →∞ N n=1 But E(F T n g | Y) = E(f T n g | Y) − E(f | Y)E(g | Y), as required. Lemma 11.5. Suppose (X, B, µ, T ) is weak mixing relative to (Y, D, ν, S) and T is an ergodic transformation. Then the relative square system X ×Y X is also weak mixing relative to (Y, D, ν, S). Proof. Write X̃ = X ×Y X. In particular, let µ̃ = µ ×Y µ and T̃ = T × T . We are required to prove that the system X̃ ×Y X̃ is ergodic. Now write X̂ = X̃ ×Y X̃ so that µ̂ = µ̃ ×Y µ̃ and T̂ = T̃ × T̃ . By the Mean Ergodic Theorem, specifically Corollary 3.13, it suffices to show that there exists a dense subset H ⊆ L2 (µ̂) such that for all F, G ∈ H NX Z Z Z 1 −1 n lim F T̂ G dµ̂ − F dµ̂ G dµ̂ = 0 (11.3) N →∞ N n=0 54 In particular, it is suffices to show (11.3) holds for functions F and G of the form F (x1 , x2 , x3 , x4 ) = 4 M fi (x1 , x2 , x3 , x4 ) = i=1 G(x1 , x2 , x3 , x4 ) = 4 M 4 Y fi (xi ) i=1 gi (x1 , x2 , x3 , x4 ) = i=1 4 Y gi (xi ) i=1 where each fi , gi ∈ L∞ (X, B, µ). By applying Lemma 11.1 and Proposition 11.3 we obtain Z Z n F T̂ G dµ̂ = E(f1 T n g1 ⊕ f2 T n g2 | Y)E(f3 T n g3 ⊕ f4 T n g4 | Y) dν = Z Y 4 E(fi T n gi | Y) dν i=1 Similarly, Z F dµ̂ = Z Y 4 Z E(fi | Y) dν and G dµ̂ = i=1 Z Y 4 E(gi | Y) dν i=1 By using these equalities to rewrite (11.3) and then applying the triangle inequality it suffices to show the following two statements: N Z Y 4 4 Y 1 X n n E(fi T gi | Y) − E(fi | Y)S E(gi | Y) dν = 0 lim N →∞ N n=1 i=1 (11.4) i=1 and X Z Y Z Y Z Y 4 4 4 1 N n lim E(fi | Y)S E(gi | Y) dν − E(fi | Y) dν E(gi | Y) dν = 0 (11.5) N →∞ N n=1 i=1 i=1 i=1 Statement (11.4) is easily deduced by rewriting the integrand as an appropriate sum and applying Lemma 11.1. To verify (11.5) first apply the triangle and Cauchy-Schwarz inequalities to see that for each N ∈ N the N th term of the sequence is bounded above by Y 4 E(fi | Y) L2 (ν) i=1 X Z Y 4 4 1 N n Y E(gi | Y) − E(gi | Y) dν S N n=1 i=1 L2 (ν) i=1 Recall the transformation T is ergodic. This implies S is also ergodic and we are done by the Mean Ergodic Theorem. Analogous to the result proved during our study of weak mixing systems, we now show relative weak mixing in some sense implies relative weak mixing of all orders. Theorem 11.6. Suppose (X, B, µ, T ) is weak mixing relative to (Y, D, ν, S). Then for any k ∈ N≥2 and all f0 , . . . , fk ∈ L∞ (X, B, µ) the following hold: N Z 1 X N →∞ N n=1 lim k−1 2 k−1 Y Y ln ln E dν = 0 T f | Y − S E(f | Y) l l l=0 (11.6) l=0 and N Y k k Y 1 X ln ln α lim T fl − T E(fj | Y) N →∞ N n=1 l=1 l=1 55 L2 (µ) =0 (11.7) Proof. The work for the most part is analogous to that of proof of Theorem 6.4. Once again we use an inductive method; Lemma 11.4 is precisely (11.6) case k = 2 whilst (11.7) case k = 2 holds for the system X ×Y X by the Mean Ergodic Theorem. We shall show for k ≥ 3 • Claim (1): (11.6) case k − 1 ⇒ (11.7) case k • Claim (2): (11.7) case k for the system X ×Y X ⇒ (11.6) case k By Proposition 11.5, if the system (X, B, µ, T ) is a relatively weak mixing extension of (Y, D, ν, S) then so is X ×Y X. Therefore the Theorem follows directly from these two claims. Claim (1). Suppose for all g0 , . . . , gk−1 ∈ L∞ (X, B, µ) the following holds: N Z 1 X lim N →∞ N n=1 k−1 2 k−1 Y Y ln ln E T gl | Y − S E(gl | Y) dν = 0 l=0 (11.8) l=0 Then given f1 , . . . , fk ∈ L∞ (X, B, µ) we have N Y k k Y 1 X ln α lim T fl − E(fj | Y) N →∞ N n=1 l=1 =0 L2 (µ) l=1 As in the proof of the analogous claim in Theorem 6.4, this is an application of van der Corput’s trick17 . Given f1 , . . . , fk ∈ L∞ (X, B, µ), first consider the case where for some j0 ∈ {1, . . . , k} the function fj0 has zero conditional expectation: E(fj0 | Y) = 0. Then by taking un = k Y T ln fl l=1 it suffices to show H−1 1 X sh = 0 H→∞ H lim (11.9) h=0 where N 1 X sh = lim sup hun+h , un i N →∞ N n=1 As in the proof of Theorem 6.4 we have Z k−1 Y hun+h , un i = T ln (fl+1 T (l+1)h fl+1 ) dµ (11.10) l=0 Define gh,l = fl+1 T (l+1)h fl+1 ∈ L∞ (X, B, µ) for each h ∈ N and each l ∈ {0, . . . , k − 1}. Then (11.10) becomes hun+h , un i = Z k−1 Y T ln gh,l dµ l=0 Z = E k−1 Y l=0 17 The proof is very similar so we work rapidly. 56 T gh,l Y dν ln By applying the hypothesis of the claim applied to the collection of functions gh,0 . . . , gh,k−1 for each h ∈ N observe N 1 X sh = lim sup hun+h , un i N →∞ N n=1 k−1 N Z 1 X Y ln = lim sup E T gh,l Y dν N N →∞ n=1 l=0 Z k−1 Y = S jn E(gh,l | Y) dν (11.11) l=0 We claim that for some set Z ⊂ N of zero density lim sh = 0 h→∞,h∈Z / (11.12) and thus, by Theorem 2.1, establish (11.9). Indeed, as (X, B, µ, T ) is weak mixing relative to (Y, D, ν, S), we may apply Lemma 11.4 to the function fj0 to obtain H Z 2 1 X lim E(fj0 T j0 h fj0 | Y) − E(fj0 | Y)S j0 h E(fj0 | Y) dν = 0 H→∞ H h=1 By hypothesis E(fj0 | Y) = 0 and so Z lim h→∞,h∈Z / E(gh,j 0 −1 2 | Y) dν = 0 (11.13) for some set Z ⊂ N of zero density. Finally, applying the Cauchy Schwarz inequality to (11.11) gives |sh | ≤ ≤ E(gh,j0 −1 | Y) L2 (ν) k−1 Y S jn E(gh,l | Y) l=0 l6=j0 −1 k−1 Y kE(gh,j0 −1 | Y)kL2 (ν) L2 (ν) kfl k2∞ (11.14) l=0 l6=j0 −1 Thus, by comparing (11.13) and (11.14) we obtain (11.12) as desired. The general case reduces to the above case by writing Y k k j−1 k k Y Y X Y E(fj | Y)α = T ln fl T jn fj − E(fj | Y)α E(fj | Y)α (11.15) T ln fl − l=1 l=1 j=1 l=1 l=j+1 and noting E(fj − E(fj | Y)α | Y) = E(fj | Y) − E(fj | Y) = 0. For the second claim we turn our attention to the system X ×Y X. We adopt the notation µ̃ = µ ×Y µ and T̃ = T × T and let α̃ : X ×Y X → Y denote the weak mixing extension given by composing α with the projection onto the first co-ordinate. Claim (2). Suppose for all f1 , . . . , fk ∈ L∞ (X ×Y X) the following holds: X k k Y 1 N Y ln ln α̃ lim T̃ f − T̃ E(f | Y) l l 2 =0 N →∞ N L (µ̃) n=1 l=1 (11.16) l=1 Then given g0 , . . . , gk ∈ L∞ (X, B, µ) we have 2 k−1 N Z k−1 Y Y 1 X ln ln dν = 0 lim E T g | Y − S E(g | Y) l l N →∞ N n=1 l=0 l=0 57 (11.17) Here we use a slightly different approach than that utilised in dealing with the analogous claim in the proof of Theorem 6.4. The following argument is adapted from ([8] Chapter 7). Let g0 , . . . , gk ∈ L∞ (X, B, µ). First we consider the case where E(gj0 | Y) = 0 for some j0 ∈ {0, . . . , k}. By Proposition 11.5 the relative square system X ×Y X is also weak mixing relative to (Y, D, ν, S). Apply the hypothesis (11.16) of the claim to the functions g1 ⊗ g1 , . . . , gk ⊗ gk so that X k k Y 1 N Y ln ln α̃ T̃ g ⊗ g − lim T̃ E(g ⊗ g | Y) l l l l 2 =0 N →∞ N L (µ̃) n=1 l=1 l=1 Norm convergence implies weak convergence so that X Y Y Z Z k k 1 N ln ln α lim g0 ⊗ g0 T̃ gl ⊗ gl dµ̃ − g0 ⊗ g0 T̃ E(gl ⊗ gl | Y) dµ̃ = 0 N →∞ N n=1 l=1 l=1 We may rewrite the left hand integral as Z Y k T̃ ln gl ⊗ l=0 k Y T̃ ln gl dµ̃ = Z E Y k l=0 l=0 2 T ln gl Y dν Do the same for the right hand integral and apply the assumption E(gj0 | Y) = 0 to give 2 Y k N Z 1 X T ln gl Y dν = 0 E N →∞ N n=1 lim l=0 Thus we have established the result for this case. Note that by Theorem 2.1 we also have N Y k 1 X ln lim T gl Y 2 =0 E N →∞ N L (ν) n=1 (11.18) l=0 For the general case observe E k−1 Y l=0 k−1 k−1 k−1 Y Y Y T ln gl − T ln E(gl | Y)α Y T ln gl Y − S ln E(gl | Y) = E l=0 l=0 l=0 Using the identity stated in (11.15) (replacing the fl with gl ), the above expression may be written as Y k k X ln T Gj,l Y (11.19) E j=1 where Gj,l l=1   fl fj − E(fj | Y)α =  E(fl | Y)α if l = 1, . . . , j − 1 if l = j if l = j + 1, . . . , k For each j = 1, . . . , k we have E(Gj,j | Y) = 0 and so, by taking the L2 -norm of (11.19) and applying the triangle inequality we can reduce the general case to the previous case. Specifically, by applying (11.18) to each collection of functions Gj,1 , . . . , Gj,k it follow k−1 N k−1 Y Y 1 X ln ln lim T gl Y − S E(gl | Y) E 2 =0 N →∞ N L (ν) n=1 l=0 l=0 and we obtain (11.17) by once again applying Theorem 2.1. 58 We now use the previous theorem to deduce the main result of this section: that the SZ property is preserved under weak mixing extensions. Theorem 11.7. Suppose α : (X, B, µ, T ) → (Y, D, ν, S) is a weak mixing extension and that the action of S on (Y, D, ν, S) is SZ. Then so too is the action of T on (X, B, µ, T ). Proof. Let A ∈ B be a set with positive measure µ(A) > 0 and k ∈ N. Define n 1o Dn := y ∈ Y : µy (A) ≥ n Then each Dn is ν-measurable and Z lim n→∞ Dn µy (A) dν(y) − µ(A) = 0 and hence there must exist n0 ∈ N such that ν(Dn0 ) >0. In other words, there exists a constant a > 0 such that if we define D := y ∈ Y : µy (A) ≥ a then ν(D) > 0. Let > 0. By Theorem 11.6 there exists an N0 ∈ N such that k k N Y Y 1 X ln ln E T 1A Y − S E(1A | Y) 2 < for all N > N0 N n=1 L (ν) l=0 l=0 Thus, fixing N > N0 , by applying the triangle and Cauchy-Schwarz inequalities we have Y Z Y k k N Z 1 X ln ln < Y dν − T 1 S E(1 | Y) dν E A A N n=1 l=0 l=0 Note that Z Y \ k k ln −ln T 1A Y dν T A = E µ l=0 l=0 and as E(1A | Y) ≥ 0 it follows E(1A | Y) ≥ a1D . Thus \ k N 1 X T −ln A > µ N n=1 l=0 N Z k 1 X Y ln S E(1A | Y) dν − N n=1 l=0 N Z k 1 X Y ln ≥ ak+1 S 1D dν − N n=1 l=0 \ N k 1 X ≥ ak+1 ν S −ln D − N n=1 l=0 As was arbitrary and the SZ property holds for the system (Y, D, ν, S) it follows \ \ N k N k X 1 X −ln k+1 1 −ln lim inf µ T A ≥ lim inf a ν S D >0 N →∞ N N →∞ N n=1 n=1 l=0 l=0 as required. 12 Compact Extensions We now turn to defining compact extensions and showing that they too preserve the SZ property. 59 Definition 25. Let α : (X, B, µ, T ) → (Y, D, ν, S) be an extension of a measure preserving system and f ∈ L2 (X, B, µ). We say f is almost periodic relative to Y (AP rel. Y) if for every > 0 there exists functions g1 , . . . , gm ∈ L2 (X, B, µ) such that min kT n f − gj kL2 (µy ) < 1≤j≤m almost everywhere y ∈ Y Moreover, the extension is compact if every the set of AP rel. Y functions is dense in L2 (X, B, µ). Complementing the result of the previous section, the purpose of this section will be to prove the following theorem: Theorem 12.1. Let α : (X, B, µ, T ) → (Y, D, ν, S) be a compact extension and suppose the action of S on (Y, D, ν) is SZ. Then so is the action of T on (X, B, µ). The proof of this result builds upon our work on compact systems in a far from trivial manner. The proof is divided into a number of results: the first reduces the problem, and the remainder to some extent provide an analogy of the method used to show the SZ property for compact systems in Theorem 7.2. Recall that in order to prove Theorem 12.1 we must show that for all A ∈ B with µ(A) > 0 and for all k ∈ N \ N k 1 X −jn µ T A >0 (12.1) lim N →∞ N n=1 j=0 We begin by demonstrating that without loss of generality we may only consider A ∈ B with certain “nice” properties. The following lemma is adapted from ([5], Chapter 7). Lemma 12.2. Let α : (X, B, µ, T ) → (Y, D, ν, S) be a compact extension and A ∈ B a set of positive measure. Then there exists a measurable subset Ã ⊆ A of positive measure such that 1. For almost every y ∈ Y either µy (Ã) = 0 or µy (Ã) > 12 µ(Ã). 2. The characteristic function 1Ã ∈ L2 (X, B, µ) is A.P. rel. Y. Remark 12. Clearly (12.1) holds for A automatically if it holds for any measurable subset Ã ⊆ A. Thus, in proving Theorem 12.1 we may restrict our attention without loss of generality to showing (12.1) holds for sets A ∈ B with µ(A) > 0 which satisfy properties 1 and 2. Proof of Lemma 12.2. To begin we simply remove the parts of the set which contradict property 1. To do this let 1 B := y ∈ Y : µy (A) ≤ µ(A) 2 0 −1 Then B is measurable and hence so is A = A \ α (B). For almost every y ∈ Y \ B the set α−1 (B) is µy -null and so 1 1 µy (A0 ) = µy (A) > µ(A) ≥ µ(A0 ) 2 2 On the other hand, for almost every y ∈ B then the support of the measure µy is contained in α−1 (B) and so µy (A0 ) = 0 Hence for almost every y ∈ Y either µy (A0 ) > 12 µ(A0 ) or µy (A0 ) = 0. It remains to show A0 is a set of positive measure. If y ∈ B then µy (A \ A0 ) = µy (A) ≤ 12 µ(A) and if y ∈ Y \ B then µy (A \ A0 ) = 0. Integrating gives Z 1 µ(A \ A0 ) = µy (A \ A0 ) dν(y) ≤ µ(A) 2 and so µ(A0 ) ≥ 21 µ(A) > 0. We claim there exists a set B 0 ∈ D such that if we define Ã = A0 \ α−1 (B 0 ) then µ(Ã) > 0 and the function 1Ã is AP rel Y. This set then also has property 1: for almost every y ∈ B 0 60 we have µy (Ã) = 0. On the other hand, for almost every y ∈ / B 0 it follows µy (Ã) = µy (A0 ) > 1 1 0 2 µ(A ) ≥ 2 µ(Ã). 1 µ(A0 ) so that Let (l )l∈N be the sequence l = 2l+2 ∞ X l = l=1 1 1 µ(A0 ) < µ(A0 ) 4 2 (12.2) By hypothesis the set of AP rel Y functions is dense in L2 (X, B, µ). For every l ∈ N there exists an fl ∈ L2 (X, B, µ) such that fl is AP rel Y and k1A0 − fl kL2 (µ) < l Now consider the sets Bl = y ∈ Y : k1A0 − fl k2L2 (µy ) ≥ l defined for every l ∈ N. Then each Bl is measurable and Z Z 1 ν(Bl ) = k1A0 − fl k2L2 (µy ) dν(y) dν(y) ≤ l Bl Bl Z 1 ≤ k1A0 − fl k2L2 (µy ) dν(y) l 1 = k1A0 − fl k2L2 (µ) < l (12.3) l S 1 Define Ã = A0 \ α−1 l∈N Bl . By (12.2) and (12.3) one may easily deduce µ(Ã) > 2 µ(A). We 1 claim 1Ã is AP rel Y. Indeed, given > 0 there exists l0 ∈ N such that l0 < 2 and functions g1 , . . . , gm ∈ L2 (X, B, µ) such that for all n ∈ N and almost everywhere y ∈ Y 1≤j≤m 2 S Note, for almost every y ∈ Y if S n y ∈ / l∈N Bl then min kT n fl0 − gj kL2 (µy ) < kT n 1Ã − T n fl kL2 (µy ) On the other hand, if S n y ∈ S l∈N = k1Ã − fl kL2 (µSn y ) = k1A0 − fl kL2 (µSn y ) < 2 Bl then kT n 1Ã kL2 (µy ) = k1Ã kL2 (µSn y ) = 0 If we take g0 ≡ 0 then by the triangle inequality min kT n 1Ã − gj kL2 (µy ) ≤ min kT n 1Ã − T n fl kL2 (µy ) + kT n fl − gj kL2 (µy ) < 0≤j≤m 0≤j≤m as required. At the heart of the multiple recurrence result for compact systems was the idea expounded in Proposition 7.1, which we presently recall. If a system (X, B, µ, T ) is compact, then for all f ∈ L2 (X, B, µ) A.P. there exists a set K ⊂ N of positive lower density (in particular a syndetic set) such that kT n f − f kL2 (µ) < for all n ∈ K (12.4) Following the argument given in [5, 10], our recourse in proving Theorem 12.1 will be to formulate and prove a similar statement but concerning compact extensions α : (X, B, µ, T ) → (Y, D, ν, S) and A.P. rel. Y functions and in particular we consider f = 1A . 61 Proposition 12.3. Let f = 1A where we assume A ∈ B is of positive measure and satisfies properties 1 and 2 of Lemma 12.2 and let > 0 be given. There exists T a set D ∈ D of positive k measure and finite set F ⊂ Z such that given any n ∈ N, for any y ∈ l=0 S −ln D there exists an m ∈ F such that kT l(n+m) f − f kL2 (µy ) < for all l ∈ {1, . . . , k} (12.5) As in the proof of Proposition 7.1 we consider finite maximum −seperated sets. The main problem we encounter here is that we now need to find finite sets {T t1 f, . . . , T tr f } ⊂ L2 (X, B, µy ) for all y ∈ k \ S −ln D l=0 which are maximal −seperated in many different vector spaces. To this end we make the following definition: Lk 2 Definition 26. For each y ∈ Y let l=0 L (µy ) denote the vector space of (k + 1)−tuples 2 (f0 , . . . , fk ), where each fi ∈ L (µy ) with the addition and scalar multiplication defined in the Lk obvious way. Define a norm k · k⊕ on l=0 L2 (µy ) by k(f0 , . . . , fk )k⊕ = max kfj kL2 (µy ) 0≤j≤k Proof of Proposition 12.3. We devide the proof into three claims, first focusing our attention on subsets of the following form L (y) := m (f, T f, . . . , T km f) ∈ k M M k L2 (µy ) L (µy ) : m ∈ Z ⊆ 2 l=0 l=0 By the proof of Lemma 12.2 there exists a set D0 ∈ D of positive measure such that µy (A) > 1 0 0 2 µ(A) for almost every y ∈ D and µy (A) = 0 for almost every y ∈ Y \ D . Consider the subset ∗ L (y) ⊆ L (y) given by L ∗ (y) := (f, T m f, . . . , T km f ) : m ∈ Z, y ∈ k \ S −lm D0 l=0 Tk Explicitly, for any m ∈ Z, (f, T m f, . . . , T km f ) ∈ L ∗ (y) provided y ∈ l=0 S −lm D0 . This is m km precisely the subset consisting of the (f, T f, . . . , T f ) for which each entry has non-zero L2 (µy )-norm. Indeed, suppose (f, T m f, . . . , T km f ) ∈ L ∗ (y) then for any l ∈ {1, . . . , k} we have S lm y ∈ D0 and so r 1 lm kT f kL2 (µy ) = kf kL2 (µSlm y ) > µ(A) > 0 2 Claim (1). Let > 0 be given. Then there exists a (non-empty) finite set F0 ⊂ Z such that k n o M (f, T m f, . . . , T km f ) ∈ L2 (µy ) : m ∈ F0 l=0 is maximal -seperated in L ∗ (y) for all y ∈ Tk l=0 S −lm D0 . Proof. As f is AP rel Y, by definition the set {T n f : n ∈ N} is totally bounded in L2 (X, B, µy ). Lk It follows that L (y) and, moreover, L ∗ (y) are totally bounded in l=0 L2 (µy ). For any finite set F ⊂ Z let Sep (F ) be the set of all y ∈ D0 such that the set k n o M (f, T m f, . . . , T km f ) ∈ L2 (µy ) : m ∈ F l=0 62 (12.6) is maximal -seperated in L ∗ (y). Alternatively, define the function gF on Y by gF (y) gF (y) = = min 0 0 max kT lm f − T lm f kL2 (µy ) m6=m ∈F j=0,...,k if y ∈ k \ \ S −jm D (12.7) m∈F j=0 0 otherwise If the value of gF (y) > 0, then gF (y) gives the seperation of (12.6) in L (y) and it follows Sep (F ) ⊂ Y can be expressed in the following form Sep (F ) = y ∈ Y : gF (y) > but gF̃ (y) ≤ whenever |F̃ | > |F |} We claim that there exists some finite set F0 ⊂ Z such that ν(Sep (F )) > 0. To prove the claim it suffices to show the set system {Sep (F ) : F ⊂ Z finite} is (almost) a covering for D0 . As there are only countably many finite subsets of integers, all the Sep (F ) cannot have zero measure for otherwise X ν Sep (F ) = 0 0 < ν(D0 ) ≤ F ⊂Z F finite For any y ∈ D0 , as (f, . . . , f ) ∈ L ∗ (y) the set L ∗ (y) is non-empty and totally bounded and thus there exists a maximal -seperated subset of L ∗ (y). It follows directly that there exists Fy ⊂ Z finite such that y ∈ Sep (Fy ) and we are done. Henceforth we fix F = F0 ⊂ Z as a set of integers such that ν(Sep (F )) > 0. Note that for each l ∈ N 1 Sepl (F ) := y ∈ Y : gF (y) > + but gF̃ (y) ≤ whenever |F̃ | > |F |} l S is a measurable subset of Sep (F ) and Sep (F ) = l∈N Sepl (F ). There must be some l0 ∈ N such that ν(Sepl0 (F )) > 0. Thus, defining η = l10 the set D̃ := y ∈ Y : gF (y) > + η but gF̃ (y) ≤ whenever |F̃ | > |F |} (12.8) is of positive measure. Claim (2). There exists a subset D ⊆ D̃ of positive measure such that for each pair m, m0 ∈ F , each l = 0, . . . , k and each pair y, y 0 ∈ D lm kT f − T lm0 f kL2 (y) − kT lm f − T lm0 f kL2 (y0 ) < η Proof. For each choice of m, m0 ∈ F and l ∈ {0, . . . , k} define the function θ(m, m0 , l) : D̃ → [0, 1] by 0 θ(m, m0 , l) : y 7→ kT ml f − T m l f kL2 (µy ) Then these functions are measurable. Let I1 , . . . , It be a partition of [0, 1] into disjoint closed intervals, each of length < η. Note that for each choice of m, m0 and l the sets θ(m, m0 , l)−1 (I1 ), . . . , θ(m, m0 , l)−1 (Ik ) partition D̃ into a finite number of measurable sets. Let D̃1 , . . . , D̃M ⊆ D̃ be the partition which refines the collection of partitions corresponding P to every choice of m, m0 and l. Then each D̃i M is an intersection of measurable sets and ν(D̃) = i=1 ν(D̃i ). There must be some set D = D̃i0 with positive measure. It is easy to check D satisfies the required property. Finally, Proposition 12.3 proceeds directly from the following claim. 63 Claim (3). For any n ∈ N the set Bn := {(f, T n+m f, . . . , T k(n+m) f ) : m ∈ F } Tk is an −seperated subset of L ∗ (y) for all y ∈ j=0 S −jn D. Tk Once we have proven Claim 3, fixing n ∈ N and y ∈ j=0 S −jn D ⊆ D̃, it follows from the definition of D̃ given in (12.8) that for the set Bn is maximal −seperated in L ∗ (y). In particular, if we define F̃ = (n + F ) ∪ {0} then as |F̃ | > |F |, the set {(f, . . . , f )} ∪ Bn cannot be −seperated in L ∗ (y). Moreover, we must have some m ∈ F such that for l = 0, . . . , k kT l(m+n) f − f kL2 (µy ) ≤ k(f, T m+n f, . . . , T k(m+n) f ) − (f, . . . , f )k⊕ < Proof (of Claim 3). By hypothesis the action of S on (Y, D, ν) is SZ, and as ν(D) > 0 this Tk ensures that the set j=0 S −jn D in non-empty for infinitely many values of n and therefore the claim is not vacuous. Explicitly, \ N k 1 X −jn lim ν S D >0 N →∞ N n=1 j=0 and so there exists a set of positive lower density such that for all n belonging to this set ν \ k S −jn D >0 (12.9) j=0 Tk Fix n ∈ N such that j=0 S −jn D is non-empty. We will demonstrate that the set Bn has the desired properties. First we show that Bn ⊆ L ∗ (y). Recalling the definition of L ∗ (y), this is true provided Tk Tk that for each y ∈ l=0 S −ln D we have y ∈ l=0 S −l(m+n) D0 for every m ∈ F . Tk Let y ∈ l=0 S −ln D, then S ln y ∈ D for every l ∈ {0, . . . , k} and so it follows gF (S ln y) > T Tk + η > 0. We must have S ln y ∈ m∈F j=0 S −jm D0 by the definition of gF . In particular T S ln y ∈ m∈F S −lm D0 and so S l(m+n) y = S lm S ln y ∈ D0 as required. Now we show the −seperation. As y ∈ D ⊆ D̃, if m 6= m0 ∈ F then 0 kT lm f − T lm f kL2 (µy ) > + η (12.10) On the other hand, by Claim 2, lm kT f − T lm0 f kL2 (µ ) − kT lm f − T lm0 f kL2 (µ <η y S ln y) (12.11) and so combining (12.10) and (12.11) gives 0 kT lm f − T lm f kL2 (µSln y ) > But 0 kT l(n+m) f − T l(n+m ) f kL2 (y) 0 = kT ln (T lm f − T lm )f kL2 (y) = kT lm f − T lm f kL2 (S ln y) < 0 as required. This concludes the proof of Proposition 12.3. We now turn to the proof of Theorem 12.1. Recall we are trying to show if α : (X, B, µ, T ) → (Y, D, ν, S) is a compact extension and the action of S on (Y, D, ν, S) is SZ then it follows that the action of T on (X, B, µ, T ) is SZ. 64 1 Proof (of Theorem 12.1). Take = 4(k+1) µ(A) in Proposition 12.3 to obtain a set D ∈ D of positive Tk measure and finite set F ⊂ Z with the stated properties. Temporarily fixing n ∈ N and y ∈ l=0 S −ln D there exists m ∈ F such that kT l(n+m) f − f kL2 (µy ) < for all l ∈ {0, . . . , k} Thus, for any l ∈ {0, . . . , k} we have Z k j−1 X Y l(n+m) j(n+m) k−j T f (T f − f )f dµy Z k Z Y l(n+m) k+1 T f dµy − f dµy = j=0 l=0 l=0 k Z j−1 Y X ≤ j=0 l=0 k Z X ≤ T l(n+m) f |T j(n+m) f − f |f k−j dµy |T j(n+m) f − f | dµy j=0 R By the Cauchy Schwarz inequality Z Y k |T j(n+m) f − f | dµy ≤ kT j(n+m) f − f kL2 (µy ) < . Hence T ln f dµ − Z f k+1 dµ < (k + 1) l=0 By our choice of and property 1 of the set A as given in Lemma 12.2 we have Z Y k T l(n+m) f dµy Z ≥ f k+1 dµy − (k + 1) (12.12) l=0 1 1 = µy (A) − µ(A) ≥ µ(A) 4 4 Tk Now the choice of m does depend on y ∈ l=0 S −ln D, but now we exploit the fact there are only finitely many possible choices of m. Consider the sum k XZ Y m∈F T l(n+m) f dµy l=0 Tk For any y ∈ l=0 S −ln D, the inequality (12.12) must hold for some term in this summation and all terms are non-negative. Hence k XZ Y m∈F Integrating (12.13) over Tk l=0 l=0 1 µ(A) 4 (12.13) S −ln D therefore gives k XZ Y m∈F T l(n+m) f dµy ≥ T l(n+m) f dµ ≥ l=0 1 µ(A)ν 4 \ k S −ln D l=0 Averaging both sides over 1 ≤ n ≤ N and taking the limit as N → ∞ gives \ \ N k N k 1 X 1 1 X µ T −ln A ≥ µ(A) lim ν S −ln D > 0 N →∞ N N →∞ N 4 n=1 n=1 |F | lim l=0 l=0 65 13 The Dichotomy Between Weak Mixing and Compact Extensions We are now able to conclude the proof of Furstenberg’s Multiple Recurrence Theorem. The final piece of the argument is the following Theorem: Theorem 13.1. Let α : (X, B, µ, T ) → (Y, D, ν, S) be an extension of measure preserving systems. Then one of the following holds: • Either α : (X, B, µ, T ) → (Y, D, ν, S) is a weak mixing extension, • Or there exists a non-trivial intermediate compact extension. Explicitly, there exists a system (Z, E , θ, R) defined on a Borel Probability space and an extension β : (X, B, µ, T ) → (Z, E , θ, R) and a non-trivial compact extension γ : (Z, E , θ, R) → (Y, D, ν, S) with the property α = γβ. Once Theorem 13.1 has been established the Multiple Recurrence Theorem follows directly as we shall now illustrate: Proof (of Furstenberg’s Multiple Recurrence Theorem). Suppose (X, B, µ, T ) is an invertible, ergodic measure preserving system defined on a Borel probability space. By Theorem 10.1 there exists some maximal SZ factor (X, Aˆ, µ, T ), say. Suppose this is a proper factor. Recall that there exists some system (Y, D, ν, S) defined on a Borel probability space corresponding to this maximal factor, namely there is an extension α : (X, B, µ, T ) → (Y, D, ν, S) µ such that Aˆ = α−1 (D). Now this extension cannot be weak mixing for otherwise by Theorem 11.7 the action of T on (X, B, µ, T ) is SZ, contradicting the maximality of (X, Aˆ, µ, T ). Therefore, by Theorem 13.1 there exists some system (Z, E , θ, R) defined on a Borel Probability space; an extension β : (X, B, µ, T ) → (Z, E , θ, R) and a non-trivial compact extension γ : (Z, E , θ, R) → (Y, D, ν, S). By Theorem 12.1 the action of R on (Z, E , θ, R) is SZ. But this system corresponds to a factor (X, B ∗ , µ, T ) where B ∗ = β −1 (E ) and A ⊂ B ∗ is a proper factor. As the action of T on (X, B ∗ , µ, T ) is SZ, this too contradicts maximality. Hence the factor (X, Aˆ, µ, T ) cannot be proper and thus the action of T on (X, B, µ, T ) is SZ as required. The proof of Theorem 13.1 runs similarly to that of Theorem 8.1, albeit with substantial modifications. We will assume that the extension is not weak mixing and show there exists a non-trivial intermediate compact extension. To do this, using the correspondence between extensions and factors, it is enough to demonstrate the existence of a T -invariant σ-algebra B ∗ with the following properties: • It is an intermediate factor: α−1 (D) ⊂ B ∗ ⊆ B • The AP functions in L2 (X, B ∗ , µ) are dense in L2 (X, B ∗ , µ) Proof (of Theorem 13.1). The argument here is adapted from ([5] Chapter 7; [8] §10). We divide the proof into three claims. The first of these claims mirrors the approach used in proving Theorem 8.1 where we considered a convolution operator. Assume that the extension α : (X, B, µ, T ) → (Y, D, ν, S) is not weak mixing. Then X ×Y X is not ergodic and so there exists a bounded, non-constant T -invariant function H ∈ L∞ (X ×Y X). Define the convolution operator H∗ : L2 (X, B, µ) → L2 (X, B, µ) by Z H ∗ φ (x) = H(x, x0 )φ(x0 ) dµα(x) (x0 ) for all φ ∈ L2 (X, B, µ) (13.1) 66 Claim (1). Let φ ∈ L2 (X, B, µ) and > 0 be given. Then for almost every y ∈ Y there exist y ∈ L2 (X, B, µy ) such that functions g1y , . . . , gm(y) min |≤j≤m(y) kT n H ∗ φ − gjy kL2 (µy ) < for all n ∈ N0 Remark 13. Note that Claim 1 does not show each H ∗ φ is AP rel Y. For this we would need uniform boundedness: that given > 0 there exists a single set of functions g1 , . . . , gm ∈ L2 (X, B, µ) such that min kT n H ∗ φ − gj kL2 (µy ) < |≤j≤m almost everywhere y ∈ Y for all n ∈ N0 (13.2) Proof (of Claim 1). For almost every y ∈ Y the operator H∗ : L2 (X, B, µy ) → L2 (X, B, µy ) is compact. Also, we have the identity T n (H ∗ φ) = H ∗ T n φ This is easily deduced using the fact Z T n (H ∗ φ) (x) = Z = Z = Z = for all φ ∈ L2 (X, B, µ) and all n ∈ N (13.3) T is measure preserving and H is T × T −invariant: H(T n x, x0 )φ(x0 ) dµα(T n x) (x0 ) H(T n x, x0 )φ(x0 ) dµS n α(x) (x0 ) H(T n x, T n x0 )φ(T n x0 ) dµα(x) (x0 ) H(x, x0 )φ(T n x0 ) dµα(x) (x0 ) = H ∗ (T n φ) (x) Hence clos{T n H ∗ φ : n ∈ N} is a compact subset in L2 (X, B, µy ) for almost every y ∈ Y and Claim 1 follows directly. Although we have not demonstrated that the H ∗ φ are AP rel Y functions18 , the next claim shows us each H ∗ φ is arbitrarily close to an AP rel Y function in terms of L2 (X, B, µ)-norm. Claim (2). For any φ ∈ L∞ (X, B, µ), given > 0 there exists a function f ∈ L2 (X, B, µ) such that f is AP rel Y and kf − H ∗ φkL2 (µ) < . Proof. Let φ ∈ L∞ (X, B, µ) be some fixed function. The case φ ≡ 0 is trivial, so we assume kφk∞ 6= 0. We will construct a function f equal to H ∗ φ on all of X except for a set where H ∗ φ is ‘badly behaved’ in a sense presently described. Fix some y ∈ Y so that the conditional measure µy is defined and some δ > 0. The family of δ-balls {Bδ (n, y)}n∈Z where Bδ (n, y) = h ∈ clos{T m f : m ∈ Z} : kh − T n H ∗ φkL2 (µy ) < δ cover the compact subset clos{T n f : n ∈ Z} ⊆ L2 (X, B, µ). By taking a finite refinement that there exists some M ∈ N such that {T n H ∗ φ : |n| ≤ M } is δ-dense in {T n f : n ∈ Z} Now let mδ : Y → N be the almost everywhere defined function which takes y ∈ Y to the least positive integer such that {T m H ∗ φ : |m| ≤ mδ (y)} is δ-dense in {T n f : n ∈ Z}. This is a measurable function. To see this, note that it suffices to show for each M ∈ N the set m−1 ({0, . . . , M }) = {y ∈ Y : m(y) ≤ M } 18 It is possible to show that the H ∗ φ are indeed AP rel Y (see [5] Theorem 7.21). However, here we follow a different method and show the functions H ∗ φ can be slightly modified to produce function f easily seen to be AP rel Y. 67 is measurable. This is precisely the set of y ∈ Y with the property that min kT m (H ∗ φ) − T j (H ∗ φ)kL2 (µy ) < δ |j|≤M Recall for any F ∈ L2 (X, B, µ) the function y 7→ follows that the function Φ : Y → [0, ∞) given by R for all m ∈ N (13.4) F dµy is measurable by Theorem 9.4. It Φ : y 7→ kT m (H ∗ φ) − T i (H ∗ φ)kL2 (µy ) is measurable. Clearly the set defined by the condition in (13.4) is precisely the set Φ−1 ([0, δ)) and so m−1 ({0, . . . , M }) is measurable as required. Moreover, for each k ∈ N the set Ak = m−1 δ [k, ∞) ⊆ Y is measurable. It is clear that the sequence of sets (Ak )k∈N is decreasing in measure and that ν(Ak ) → 0 as k → ∞ Therefore, given η > 0 there exists M (δ, η) ∈ N such that ν(An ) < η for all n ≥ M (δ, ). Let E(δ, η) = AM (δ,) . Then ν(E(δ, η)) < η and for almost every y ∈ Y \ E(δ, η) one has mδ (y) < M (δ, η). Let > 0 be given. Define n = (2n kHk∞ kφk∞ )−1 for all n ∈ N P∞ so that = kHk∞ kφk∞ n=1 n . Also let δn = n1 for all n ∈ N. Repeat the above argument for each n ∈ N, taking δ = δn and η = n . to obtain a sequence of sets E(n) ⊆ Y of measure ν(E(n)) < n and integers M (n) ∈ N such that for y ∈ Y \ E(n) the set {T m H ∗ φ : |m| ≤ M (n)} is δn -dense in L2 (µy ). Let S 0 if x ∈ α−1 E(n) n∈N f (x) = (13.5) H ∗ φ (x) otherwise P∞ Then kf − H ∗ φkL2 (µ) < kHk∞ kφk∞ n=1 n = . It remains to show f is AP rel Y. But this is obvious from the construction: for given δ > 0 there exists n ∈ N such that δn < δ. Then for almost every y ∈ Y \ E(n) we have {T m H ∗ φ : |m| ≤ M (δn , n )} is δ−dense in {T m H ∗ φ : m ∈ Z} in the L2 (µy )-norm. Thus {0} ∪ {T m H ∗ φ : |m| ≤ M (δn , n )} is δ−dense in {T m f : m ∈ Z} in the L2 (µy )-norm for almost every y ∈ Y . Claim (3). Let F be the algebra generated by the set H = {H ∗ φ : H ∈ L∞ (µ ×Y µ), H is T × T -invariant; φ ∈ L∞ (X)} Denote B ∗ = σ(F ). Then the following holds: 1. B ∗ is T -invariant and hence defines a factor (X, B ∗ , µ, T ). 2. B ∗ is an intermediate factor: α−1 (D) ⊆ B ∗ ⊆ B 3. The AP functions in L2 (X, B ∗ , µ) are dense in L2 (X, B ∗ , µ) Proof of Claim 3. 1. The fact B ∗ is T -invariant follows immediately by the identity T (H ∗ φ) = H ∗ T φ from (13.3), which holds for all H ∗ φ ∈ H . 68 2. Note that the constant function 1 := 1X×X ∈ L∞ (µ ×Y µ) is T × T -invariant. For any ∞ α D ∈ D the function 1α D = 1α−1 (D) ∈ L (X, B, µ). Thus the convolution of 1 and 1D is a B ∗ -measurable function and is given by Z 0 0 1 ∗ 1α (x) = 1α D D (x ) dµα(x) (x ) α = E(1α D | Y)(x) = 1α D (x) = 1α−1 (D) (x) Therefore for every set D ∈ D the function 1α−1 (D) is B ∗ -measurable and subsequently α−1 (D) ⊆ B ∗ . 3. By Claim 2, given any H ∗ φ ∈ H and > 0 there exists an AP rel Y function f such that kf − H ∗ φkL2 (µ) < . The sets E(δ, η) constructed in the proof of Claim 2 are D-measurable. Considering the form of the function f as given in (13.5), it follows that f ∈ L2 (X, B ∗ , µ). Thus H ⊆ clos{f ∈ L2 (X, B ∗ , µ) : f is AP rel Y} (13.6) The set on the right hand side of (13.6) is an algebra and it follows F ⊆ clos{f ∈ L2 (X, B ∗ , µ) : f is AP rel Y} (13.7) It suffices to show that the set F is dense in L2 (X, B ∗ , µ). This follows from the following lemma, the proof of which is adapted from [5]. Lemma 13.2. Let (X, B, µ) be a probability space. Suppose F ⊆ L∞ (X, B, µ) is an algebra and 1X ∈ F . Then F is dense in L2 (X, B ∗ , µ) where B ∗ = σ(F ) is the σ-algebra generated by F . Proof. It is enough to prove that for all B ∈ B ∗ and any > 0 there exists some g ∈ F such that k1B − gkL2 (µ) < In other words, if we denote C = B ∈ B ∗ : 1B ∈ clos(F )} then it suffices to show B ∗ = C . Once this is established, by considering linear combinations and using the fact F is an algebra containing the constants, the space of all simple functions in L2 (X, B ∗ , µ) must also lie within the L2 -closure of F . By the density of the simple functions we conclude F is dense in L2 (X, B ∗ , µ). Sets of the form f −1 ([a, b]) for some f ∈ F and [a, b] ⊂ R generate B. Using the StoneWeierstrass Theorem (see [2]) we show these generators are contained in C . Indeed, for any function f ∈ F given any compact interval [a, b] ⊂ R by the Stone-Weierstrass Theorem the characteristic function 1[a,b] can be approximated by polynomials on the interval [−kf k∞ , kf k∞ ]. In particular, for every > 0 there exists some p ∈ R[x] such that k1[a,b] − pkL2 (f ∗ µ) < where f ∗ is the pull-back Borel measure on R induced by f , that is f ∗ µ(B) = µ(f −1 (B)) for all Borel sets B ⊆ R 69 (13.8) It is clear the expression in (13.8) is equivalent to k1f −1 [a,b] − p(f )kL2 (µ) < As F is an algebra, p(f ) ∈ F and hence f −1 ([a, b]) ∈ C . It remains to show C is a σ-algebra. Clearly X ∈ C . For any f ∈ F we have 1X − f ∈ F and it follows C is closed under complementation. Suppose A, B ∈ F and let > 0 be given. Then there exist f1 , f2 ∈ F such that and k1B − f2 kL2 (µ) < k1A − f1 kL2 (µ) < 2 2kf1 k∞ Hence k1A 1B − f1 f2 kL2 (µ) ≤ k1B k∞ k1A − f1 kL2 (µ) + kf1 k∞ k1B − f2 kL2 (µ) < so A ∩ B ∈ C . Moreover, by considering 1A∪B = 1A + 1B − 1A S1∞B we have A ∪ B ∈ C . Thus, given any sequence (An )n∈N ⊂ C and > 0 it is easy to deduce n=1 An ∈ C by noting that the function 1S∞ can be approximated in L2 (X, B, µ) by functions of the form 1SN . n=1 An n=1 An This concludes the proof of the Multiple Recurrence and therefore Szemerédi’s Theorem. References [1] http://matheuscmss.wordpress.com/2010/02/15/ Ergodic Ramsey Theory: where Combinatorics and Number Theory meet Dynamics; Viewed 17-01-2011 [2] B. Bolloás; Linear Analysis; Second Edition; Cambridge University Press, 1999. [3] N. Chernov, R. Markarian; Chaotic Billiards (Mathematical Surveys and Monographs); AMS, USA, 2006. [4] J. Doob; Measure Theory; GTM 143; Springer-Verlag, New York 1994. [5] M. Einsiedler, T. Ward; Ergodic Theory, with a view towards Number Theory; GTM 259; Springer-Verlag, London, 2011. [6] P. Erdös and P. Turán; ‘On some sequences of integers’; J. London Math. Soc. 11, 1936, 261 - 264. [7] H. Furstenberg; Ergodic Behaviour of Diagonal Measures and a Theorem of Szemerédi on Arithmetic Progressions; J. d’Analyse Math. Vol. 31, 1977. [8] H. Furstenberg; Recurrence in Ergodic Theory and Combinatorial Number Theory; Princeton University Press, 1992. [9] H. Furstenberg, Y. Katznelson; An Ergodic Szemerédi theorem for commuting transformations; J. d’Analyse Math. 3, 1978, 275 - 291. [10] H. Furstenberg, Y. Katznelson, D. Ornstein; The Ergodic Theoretical Proof of Szemerédi’s Theorem; Bulletin (New Series) of the American Mathematical Society, Vol. 7, Num. 3, November 1982. [11] R. Graham, B Rothschild; ‘A short proof of van der Waerden’s theorem on arithmetic progressions’; Proc. Amer. Math. Soc. 42, 1974, 385-386. [12] B. Hasselblatt, A. Katok; Principal Structures, from Handbook of Dynamical Systems 1A; Elsevier Science, 2002. [13] S. Kalikow, R. McCutcheon; An Outline of Ergodic Theory, Cambridge studies in advanced mathematics 122; Cambridge University Press, Cambridge, 2010. [14] A. Klenke; Probability Theory, A Comprehensive Course; Springer-Verlag, London, 2008. [15] B. Koopman, J. v Neumann; Dynamical systems of continous spectra, Proc. Nat. Acad. Sci. USA, 18 (1932), 255-263 70 [16] S. Luzzatt; Stochastic-Like Behaviour in Nonuniformly Expanding Maps, from Handbook of Dynamical Systems 1B; Elsevier Science, 2006. [17] M. Nadkarni; Basic Ergodic Theory, TRIM 6; Hindustan Book Agency, India, 1995. [18] K. R. Parthasarathy; Probability Measures on Metric Spaces; Academic Press, 1967. [19] K. Petersen; Ergodic Theory; Cambridge University Press, 1983. [20] K. Roth; ‘Sur quelques ensembles d’entiers’; C. R. Acad. Sci. Paris, 234, 1952, 388-390. [21] C. Silva; Invitation to Ergodic Theory, Student Mathematical Library Vol. 42; AMS, USA, 2008. [22] E. Szemerèdi; ‘On sets of integers containing no four elements in arithmetic progression’; Acta Math. Acad. Sci. Hungar. 20, 1969, 89-104. [23] E. Szemerèdi; ‘On sets of integers containing no k elements in arithmetic progression’; Acta Math. Acad. Sci. Hungar. 27, 1975, 199-245. [24] N. Young; An Introduction to Hilbert Space; Cambridge University Press, 1988. [25] D. Williams; Probability with Martingales; Cambridge University Press, 1991. 71

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download essay